In this blog we will delve into Convolutional Neural Networks. We will learn about the various steps involved in building a CNN and get familiar with the modern applications of CNN.
It is a neural network in which at least one layer is a convolutional layer. A Convolutional Neural Network (CNN) uses features to categorize or classify images.
For example, after we have trained our CNN to identify pictures of cats or dogs by feeding it different labeled images of cats and dogs, we are essentially training our neural network. Once trained, if we feed a new image of a cat to our network, it will categorize the image as a cat.
Jan Lekun is considered the grandfather of convolutional neural networks.
These are the layers of the convolutional neural network where filters are applied to the original image.
A Convolutional Neural Network consists of 4 main steps/layers which are:
- Convolutional operation
1.1. A step
1.2. ReLU layer
4. Full connection
The diagram below shows the different layers in a CNN.
So let’s look at each of the layers in detail.
In this process, we reduce the image size by passing the input image through a Feature Detector/Filter/Kernel so convert it to a Feature Map/ Collapsed Feature/ Activation Map
It helps to remove unnecessary details from the image.
We can create many feature maps (finds certain features from the image) to get our first convolution layer.
The convolution operation involves multiplying the convolution filter elementwise by the slice of an input matrix and finally summing all the values in the resulting matrix.
The number of pixels by which we move the filter on the input matrix is called a step.
1.2. ReLU Activation Feature:
ReLU is the most commonly used activation function in the world.
When we apply convolution, there is a risk of creating something linear, and there we must break the linearity.
The rectified linear unit can be described by the function f(x) = max(x, 0).
We apply the rectifier to increase the non-linearity in our image/CNN. Rectifier preserves only non-negative values of an image.
It helps to reduce the spatial size of the folded feature, which in turn helps to to reduce the computing power required to process the data.
Here we can preservation of dominant characteristicsthus helping in the process of training the model efficiently.
The merger transforms feature map in a Unified feature map.
Merger is divided into 2 types:
1. Max Merge — Returns the maximum value of the portion of the image covered by the kernel.
2. Medium merge — Returns the average of all values from the portion of the image covered by the kernel.
Includes conversion of a Unified feature map in one dimension Column vector.
The smoothed output of the smoothing step is fed to a feed-forward neural network with back-propagation applied to each iteration.
Over a series of epochs, the model is able to identify dominant and low-level features in the images and classify them using Classification of Softmax technique (It carries the output values between 0 and 1).
- OCR applications such as document analysis.
- Surveillance and security.
- Traffic monitoring such as congestion detection.
- Advertising and programmatic buying.
- Face recognition and detection by identifying pose, angle, external features, etc.
In this blog we learned about the Convolutional Neural Network and each of the different layers that make up a CNN. We looked at real-world applications of CNN and tried to understand what a convolutional neural network algorithm does for image classification and detection.