Deformable Convolutional Networks
Standard convolution has fixed sampling location and receptive field. To solve this problem, Deformable convolution use learnable offset.
The standard 2D convolution consists of two steps: 1) sampling using a regular grid over the input feature map ; 2) summation of sampled values weighted by .
The receptive field size and dilation define the grid . For example, when 3 X 3 kernel with dilation 1 the grid is:
For each location on the output feature map , 2D convolution can be denoted as followings:
2D Deformable Convolution
In deformable convolution, the regular grid is augmented with offsets , where . In other words, offsets can be different per grid offset. The offsets are obtained by applying a convolution layer over same input feature map, which means the offsets are learned.
Now, the sampling is on the irregular because the offset is typically fractional. So is implemented via bilinear interpolation as
where and is integral positions within 2 X 2 square which is centered with . is the bilinear interpolation kernel and can be denoted as follows:
As Deformable convolution has offset on its grid, It can have more flexible receptive field.
|Standard Convolution||Deformable Convolution|