Paper Review: Decoupled Attention Network for Text Recognition

Paper Link: https://arxiv.org/abs/1912.10205 Decoupled Attention Network (DAN) What is the difference? In traditional attention mechanism, alignment is coupled with decoding. They conduct alignment operation using visual information and historical decoding information. Traditional attention mechanism often ha...

Paper Review: Pix2Pix

Image-to-Image Translation with Conditional Adversarial Networks arxiv paper link First general purpose conditional GAN for image to image translation task Impressive output on inpainting, future state prediction, image manipulation guided by user constraints, style transfer, super-resolution Method U-net based a...

Basic Deep Learning Concepts good for you

Basic Deep Learning Concepts Hard concepts are Bolded Supervised Learning / Unsupervised Learning / semi-supervised, weakly-supervised weight initialization learning rate decay dropout forward propagation(inference) / backward propagation Activation What is activation layer and why use it ReLU, Leaky ReLU softmax sigmoid...

Paper Review: Efficient Sub-Pixel Convolutional Neural Network

Paper arxiv link: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network Efficient Sub-Pixel Convolutional neural Network (ESPCN) Figure 1. The proposed efficient sub-pixel convolutional neural network (ESPCN), with two convolution layers for feature maps extraction, and a sub-pixel convol...

Paper Review: Deformable Convolution

Deformable Convolutional Networks Paper arxiv link Deformable Convolution Standard convolution has fixed sampling location and receptive field. To solve this problem, Deformable convolution use learnable offset. 2D Convolution The standard 2D convolution consists of two steps: 1) sampling using a regular grid $$\mathcal{R}$$ over the input ...

EDVR: Video Restoration with Enhanced Deformable Convolutional Networks Paper arxiv link Overview The overall framework of EDVR Given $$2N+1$$ consecutive frames $$I_{[t-N:t+N]}$$, denote middle frame $$I_{t}$$ as the reference frame and the other frames as neighboring frames Inputs with high spatial resolution are first down-sampled to re...