# Deformable Convolutional Networks

## Deformable Convolution

Standard convolution has fixed sampling location and receptive field. To solve this problem, Deformable convolution use learnable offset.

### 2D Convolution

The standard 2D convolution consists of two steps: 1) sampling using a regular grid \(\mathcal{R}\) over the input feature map \(\mathbf{x}\); 2) summation of sampled values weighted by \(\mathbf{w}\).

The receptive field size and dilation define the grid \(\mathcal{R}\). For example, when 3 X 3 kernel with dilation 1 the grid is:

\[\mathcal{R}=\{(-1, -1), (-1, 0), \ldots, (0,1), (1, 1)\}\]For each location \(\mathbf{p}_0\) on the output feature map \(\mathbf{y}\), 2D convolution can be denoted as followings:

\[\mathbf{y}(\mathbf{p}_0)=\sum_{\mathbf{p}_n\in\mathcal{R}}\mathbf{w}(\mathbf{p}_n)\cdot \mathbf{x}(\mathbf{p}_0+\mathbf{p}_n),\]### 2D Deformable Convolution

In deformable convolution, the regular grid \(\mathcal{R}\) is augmented with offsets \(\{\Delta \mathbf{p}_n \lvert n=1,...,N\}\), where \(N= \lvert \mathcal{R} \lvert\). In other words, offsets can be different per grid offset. The offsets are obtained by applying a convolution layer over same input feature map, which means the offsets are learned.

\[\mathbf{y}(\mathbf{p}_0)=\sum_{\mathbf{p}_n\in\mathcal{R}}\mathbf{w}(\mathbf{p}_n)\cdot \mathbf{x}(\mathbf{p}_0+\mathbf{p}_n+\Delta \mathbf{p}_n).\]Now, the sampling is on the irregular because the offset \(\Delta \mathbf{p}_n\) is typically fractional. So \(\mathbf{x}(\mathbf{p}_0+\mathbf{p}_n+\Delta \mathbf{p}_n)\) is implemented via bilinear interpolation as

\[\mathbf{x}(\mathbf{p})=\sum_\mathbf{q} G(\mathbf{q},\mathbf{p})\cdot \mathbf{x}(\mathbf{q}),\]where \(\mathbf{p}=\mathbf{p}_0+\mathbf{p}_n+\Delta \mathbf{p}_n\) and \(\mathbf{q}\) is integral positions within 2 X 2 square which is centered with \(\mathbf{p}\). \(G(\cdot,\cdot)\) is the bilinear interpolation kernel and can be denoted as follows:

\[G(\mathbf{q},\mathbf{p})=(1- \lvert q_x-p_x \lvert ) \cdot (1- \lvert q_y-p_y \lvert )\]## Result

As Deformable convolution has offset on its grid, It can have more flexible receptive field.

Standard Convolution | Deformable Convolution |
---|---|