Deformable Convolutional Networks
Deformable Convolution
Standard convolution has fixed sampling location and receptive field. To solve this problem, Deformable convolution use learnable offset.
2D Convolution
The standard 2D convolution consists of two steps: 1) sampling using a regular grid \(\mathcal{R}\) over the input feature map \(\mathbf{x}\); 2) summation of sampled values weighted by \(\mathbf{w}\).
The receptive field size and dilation define the grid \(\mathcal{R}\). For example, when 3 X 3 kernel with dilation 1 the grid is:
\[\mathcal{R}=\{(-1, -1), (-1, 0), \ldots, (0,1), (1, 1)\}\]For each location \(\mathbf{p}_0\) on the output feature map \(\mathbf{y}\), 2D convolution can be denoted as followings:
\[\mathbf{y}(\mathbf{p}_0)=\sum_{\mathbf{p}_n\in\mathcal{R}}\mathbf{w}(\mathbf{p}_n)\cdot \mathbf{x}(\mathbf{p}_0+\mathbf{p}_n),\]2D Deformable Convolution
In deformable convolution, the regular grid \(\mathcal{R}\) is augmented with offsets \(\{\Delta \mathbf{p}_n \lvert n=1,...,N\}\), where \(N= \lvert \mathcal{R} \lvert\). In other words, offsets can be different per grid offset. The offsets are obtained by applying a convolution layer over same input feature map, which means the offsets are learned.
\[\mathbf{y}(\mathbf{p}_0)=\sum_{\mathbf{p}_n\in\mathcal{R}}\mathbf{w}(\mathbf{p}_n)\cdot \mathbf{x}(\mathbf{p}_0+\mathbf{p}_n+\Delta \mathbf{p}_n).\]Now, the sampling is on the irregular because the offset \(\Delta \mathbf{p}_n\) is typically fractional. So \(\mathbf{x}(\mathbf{p}_0+\mathbf{p}_n+\Delta \mathbf{p}_n)\) is implemented via bilinear interpolation as
\[\mathbf{x}(\mathbf{p})=\sum_\mathbf{q} G(\mathbf{q},\mathbf{p})\cdot \mathbf{x}(\mathbf{q}),\]where \(\mathbf{p}=\mathbf{p}_0+\mathbf{p}_n+\Delta \mathbf{p}_n\) and \(\mathbf{q}\) is integral positions within 2 X 2 square which is centered with \(\mathbf{p}\). \(G(\cdot,\cdot)\) is the bilinear interpolation kernel and can be denoted as follows:
\[G(\mathbf{q},\mathbf{p})=(1- \lvert q_x-p_x \lvert ) \cdot (1- \lvert q_y-p_y \lvert )\]Result
As Deformable convolution has offset on its grid, It can have more flexible receptive field.
Standard Convolution | Deformable Convolution |
---|---|