How To Choose Kernel Size For Gaussian Filter

How to choose the size of the convolution filter or Kernel size for CNN?

Convolution is basically a dot product of kernel (or filter) and patch of an image (local receptive field) of the same size. Convolution is quite similar to correlation and exhibits a property of translation equivariant that means if we move or interpret the input and apply the convolution to information technology, it will act in the same manner every bit we first apply convolution and then translated an image.

During this learning process of CNN, you lot notice different kernel sizes at different places in the lawmaking, and so this question arises in one's listen that whether in that location is a specific way to cull such dimensions or sizes. Then, the reply is no. In the current Deep Learning world, we are using the virtually popular selection that is used by every Deep Learning practitioner out there, and that is 3x3 kernel size. Now, another question strikes your mind, why only 3x3, and non 1x1, 2x2, 4x4, etc. Just keep reading and y'all will getthe most crisp reason behind this in next few minutes!!

Basically, Nosotros split kernel sizes into smaller and larger ones. Smaller kernel sizes consists of 1x1, 2x2, 3x3 and 4x4, whereas larger one consists of 5x5 and so on, simply we utilize till 5x5 for 2D Convolution. In 2012, when AlexNet CNN compages was introduced, information technology used 11x11, 5x5 similar larger kernel sizes that consumed 2 to iii weeks in training. So because of extremely longer training time consumed and expensiveness, we no longer use such large kernel sizes.

One of the reason to prefer small kernel sizes over fully connected network is that it reduces computational costs and weight sharing that ultimately leads to lesser weights for back-propagation. So then came VGG convolution neural networks in 2015 which replaced such large convolution layers by 3x3 convolution layers but with a lot of filters. And since and so, 3x3 sized kernel has became equally a popular pick. Only still, why not 1x1, 2x2 or 4x4 as smaller sized kernel?

1x1 kernel size is only used for dimensionality reduction that aims to reduce the number of channels. It captures the interaction of input channels in just i pixel of feature map. Therefore, 1x1 was eliminated equally the features extracted will be finely grained and local that too with no information from the neighboring pixels.
2x2 and 4x4 are generally non preferred because odd-sized filters symmetrically split the previous layer pixels around the output pixel. And if this symmetry is non present, there will be distortions across the layers which happens when using an even sized kernels, that is, 2x2 and 4x4. And then, this is why we don't apply 2x2 and 4x4 kernel sizes.

Therefore, 3x3 is the optimal choice to be followed by practitioners until now. Just it is still the well-nigh expensive parts!

Bonus: Further excavation into it, I constitute an another interesting approach that was used in Inception V3 CNN architecture launched by Google during the ImageNet Recognition Claiming that replaced 3x3 convolution layer by 1x3 layer followed past 3x1 convolution layer, which is actually splitting down the 3x3 convolutions into a series of one dimensional convolution layer. And it came out to be quite toll-friendly!!

Thanks for giving it a read. I establish this as a well-nigh mutual question put up by novice in deep learning (including me….;), since a clear and well-baked reason behind using a specific kernel sizes is not generally covered in most of the learning courses. It'south my first article on Medium, so if you similar it, do not forget to give a clap!! Have a prissy mean solar day!!