Top 4 Pre-Trained Models for Image Classification with Python Code (2024)

Introduction

The human brain can easily recognize and distinguish the objects in an image. For instance, given the image of a cat and dog, within nanoseconds, we distinguish the two, and our brain perceives this difference. In case a machine mimics this behavior, it is as close to Artificial Intelligence as we can get. Subsequently, the field of Computer Vision aims to mimic the human vision system – and there have been numerous milestones that have broken the barriers in this regard.

Moreover, nowadays, machines can easily distinguish between different images, detect objects and faces, and even generate images of people who don’t exist! Fascinating, isn’t it? One of my first experiences when starting with Computer Vision was the task of Image Classification. This very ability of a machine to distinguish between objects leads to more avenues of research – like distinguishing between people.

Top 4 Pre-Trained Models for Image Classification with Python Code (1)

The rapid developments in Computer Vision and, by extension – image classification has been further accelerated by the advent of Transfer Learning. To put it simply, Transfer learning allows us to use a pre-existing model, trained on a huge dataset, for our own tasks. Consequently reducing the cost of training new deep learning models and since the datasets have been vetted, we can be assured of the quality.

In Image Classification, there are some very popular datasets that are used across research, industry, and hackathons. The following are some of the prominent ones:

and many more.

In this article, I will cover the top 4 pre-trained Image Classification models that are state-of-the-art (SOTA) and are widely used in the industry as well. The individual models can be explained in much more detail, but I have limited the article to give an overview of their architecture and implement it on a dataset.

In case you want to learn computer vision in a structured format, refer to this course- Certified Computer Vision Master’s Program

Table of contents

  • Introduction
  • Setting Up the System
  • Preparing the Dataset
  • Examples of Pre-Trained Models for Image Classification
    • Very Deep Convolutional Networks for Large-Scale Image Recognition(VGG-16)
    • Inception
    • ResNet50
    • EfficientNet
  • Conclusion
  • Frequently Asked Questions

Setting Up the System

Since we started with cats and dogs, let us take up the dataset of Cat and Dog Images. The original training dataset on Kaggle has 25000 images of cats and dogs and the test dataset has 10000 unlabelled images. Since our purpose is only to understand these models, I have taken a much smaller dataset. You can straight-up run this and the rest of the code on Google Colab as well – so let us get started!

Let us also import the basic libraries. Further, I will cover future imports depending on the model including the best CNN model for image classification:

Python Code:

Preparing the Dataset

We will first prepare the dataset and separate out the images:

  1. We first divide the folder contents into the train and validation directories.
  2. Then, in each of the directories, create a separate directory for cats that contains only cat images, and a separate director for dogs having only dog images.

The following code will let us check if the images have been loaded correctly:

Now that we have our dataset ready, let us do it to the model building stage. We will be using 4 different pre-trained models on this dataset.

Examples of Pre-Trained Models for Image Classification

In this section, we cover the 4 pre-trained models for image classification as follows-

Very Deep Convolutional Networks for Large-Scale Image Recognition(VGG-16)

The VGG-16 is one of the most popular pre-trained models for image classification. Introduced in the famous ILSVRC 2014 Conference, it was and remains THE model to beat even today. Developed at the Visual Graphics Group at the University of Oxford, VGG-16 beat the then standard of AlexNet and was quickly adopted by researchers and the industry for their image Classification Tasks.

Here is the architecture of VGG-16:

Top 4 Pre-Trained Models for Image Classification with Python Code (2)

Here is a more intuitive layout of the VGG-16 Model. Top 4 Pre-Trained Models for Image Classification with Python Code (3)

The following are the layers of the model:

  • Convolutional Layers = 13
  • Pooling Layers = 5
  • Dense Layers = 3

Let us explore the layers in detail:

  1. Input: Image of dimensions (224, 224, 3).
  2. Convolution Layer Conv1:
    • Conv1-1: 64 filters
    • Conv1-2: 64 filters and Max Pooling
    • Image dimensions: (224, 224)
  3. Convolution layer Conv2: Now, we increase the filters to 128
    • Input Image dimensions: (112,112)
    • Conv2-1: 128 filters
    • Conv2-2: 128 filters and Max Pooling
  4. Convolution Layer Conv3: Again, double the filters to 256, and now add another convolution layer
    • Input Image dimensions: (56,56)
    • Conv3-1: 256 filters
    • Conv3-2: 256 filters
    • Conv3-3: 256 filters and Max Pooling
  5. Convolution Layer Conv4: Similar to Conv3, but now with 512 filters
    • Input Image dimensions: (28, 28)
    • Conv4-1: 512 filters
    • Conv4-2: 512 filters
    • Conv4-3: 512 filters and Max Pooling
  6. Convolution Layer Conv5: Same as Conv4
    • Input Image dimensions: (14, 14)
    • Conv5-1: 512 filters
    • Conv5-2: 512 filters
    • Conv5-3: 512 filters and Max Pooling
    • The output dimensions here are (7, 7). At this point, we flatten the output of this layer to generate a feature vector
  7. Fully Connected/Dense FC1: 4096 nodes, generating a feature vector of size(1, 4096)
  8. Fully ConnectedDense FC2: 4096 nodes generating a feature vector of size(1, 4096)
  9. Fully Connected /Dense FC3: 4096 nodes, generating 1000 channels for 1000 classes. This is then passed on to a Softmax activation function
  10. Output layer

As you can see, the model is sequential in nature and uses lots of filters. At each stage, small 3 * 3 filters are used to reduce the number of parameters all the hidden layers use the ReLU activation function. Even then, the number of parameters is 138 Billion – which makes it a slower and much larger model to train than others.

Additionally, there are variations of the VGG16 model, which are basically, improvements to it, like VGG19 (19 layers). You can find a detailed explanation

Let us now explore how to train a VGG-16 model on our dataset-

Step 1: Image Augmentation

Since we took up a much smaller dataset of images earlier, we can make up for it by augmenting this data and increasing our dataset size. If you are working with the original larger dataset, you can skip this step and move straight on to building the model.

Step 2: Training and Validation Sets

Step 3: Loading the Base Model

We will be using only the basic models, with changes made only to the final layer. This is because this is just a binary classification problem while these models are built to handle up to 1000 classes.

Since we don’t have to train all the layers, we make them non_trainable:

Step 4: Compile and Fit

We will then build the last fully-connected layer. I have just used the basic settings, but feel free to experiment with different values of dropout, and different Optimisers and activation functions.

We will now build the final model based on the training and validation sets we created earlier. Please note to use the original directories itself instead of the augmented datasets I have used below. I have used just 10 epochs, but you can also increase them to get better results:

Top 4 Pre-Trained Models for Image Classification with Python Code (4)

Awesome! As you can see, we were able to achieve a validation Accuracy of 93% with just 10 epochs and without any major changes to the model. This is where we realize how powerful transfer learning is and how useful pre-trained models for image classification can be. A caveat here though – VGG16 takes up a long time to train compared to other models and this can be a disadvantage when we are dealing with huge datasets.

That being said, I really liked how simple and intuitive this model is. Trained on the ImageNet corpus, another notable achievement of VGG-16 is that it secured the 1st Rank in the ImageNet ILSVRC-2014, and thus cemented its place in the list of top pre-trained models for image classification.

Inception

While researching for this article – one thing was clear. The year 2014 has been iconic in terms of the development of really popular pre-trained models for Image Classification. While the above VGG-16 secured the 2nd rank in that years’ ILSVRC, the 1st rank was secured by none other than Google – via its model GoogLeNet or Inception as it is now later called as.

The original paper proposed the Inceptionv1 Model. At only 7 million parameters, it was much smaller than the then prevalent models like VGG and AlexNet. Adding to it a lower error rate, you can see why it was a breakthrough model. Not only this, but the major innovation in this paper was also another breakthrough – the Inception Module.

As can be seen, in simple terms, the Inception Module just performs convolutions with different filter sizes on the input, performs Max Pooling, and concatenates the result for the next Inception module. The introduction of the 1 * 1 convolution operation reduces the parameters drastically.

Top 4 Pre-Trained Models for Image Classification with Python Code (5)

Though the number of layers in Inceptionv1 is 22, the massive reduction in the parameters makes it a formidable model to beat.

Top 4 Pre-Trained Models for Image Classification with Python Code (6)

The Inceptionv2 model was a major improvement on the Inceptionv1 model which increased the accuracy and further made the model less complex. In the same paper as Inceptionv2, the authors introduced the Inceptionv3 model with a few more improvements on v2.

The following are the major improvements included:

  • Introduction of Batch Normalisation
  • More factorization
  • RMSProp Optimiser
Top 4 Pre-Trained Models for Image Classification with Python Code (7)

While it is not possible to provide an in-depth explanation of Inception in this article, you can go through this comprehensive article covering the Inception Model in detail: Deep Learning in the Trenches: Understanding Inception Network from Scratch

As you can see that the number of layers is 42, compared to VGG16’s paltry 16 layers. Also, Inceptionv3 reduced the error rate to only 4.2%.

Let’s see how to implement it in python-

Step 1: Data Augmentation

You will note that I am not performing extensive data augmentation. The code is the same as before. I have just changed the image dimensions for each model.

Step 2: Training and Validation Generators

Step 3: Loading the Base Model

Step 4: Compile and Fit

Just like VGG-16, we will only change the last layer.

We perform the following operations:

  • Flatten the output of our base model to 1 dimension
  • Add a fully connected layer with 1,024 hidden units and ReLU activation
  • This time, we will go with a dropout rate of 0.2
  • Add a final Fully Connected Sigmoid Layer
  • We will again use RMSProp, though you can try out the Adam Optimiser too

We will then fit the model:

Top 4 Pre-Trained Models for Image Classification with Python Code (8)

As a result, we can see that we get 96% Validation accuracy in 10 epochs. Also note, how this model is much faster than VGG16. Each epoch is taking around only 1/4th the time that each epoch in VGG16. Of course, you can always experiment with the different hyperparameter values and see how much better/worse it performs.

I really liked studying the Inception model. While most models at that time were merely sequential and followed the premise of the deeper and larger the model, the better it will perform- Inception and its variants broke this mold. Just like its predecessors, the Inceptionv3 achieved the top position in CVPR 2016 with only a 3.5% top-5 error rate.

Here is a link to the paper: Rethinking the Inception Architecture for Computer Vision

ResNet50

Just like Inceptionv3, ResNet50 is not the first model coming from the ResNet family. The original model was called the Residual net or ResNet and was another milestone in the CV domain back in 2015.

The main motivation behind this model was to avoid poor accuracy as the model went on to become deeper. Additionally, if you are familiar with Gradient Descent, you would have come across the Vanishing Gradient issue – the ResNet model aimed to tackle this issue as well. Here is the architecture of the earliest variant: ResNet34(ResNet50 also follows a similar technique with just more layers)

You can see that after starting off with a single Convolutional layer and Max Pooling, there are 4 similar layers with just varying filter sizes – all of them using 3 * 3 convolution operation. Also, after every 2 convolutions, we are bypassing/skipping the layer in-between. This is the main concept behind ResNet models. These skipped connections are called ‘identity shortcut connections” and uses what is called residual blocks:

Top 4 Pre-Trained Models for Image Classification with Python Code (10)

In simple terms, the authors of the ResNet propose that fitting a residual mapping is much easier than fitting the actual mapping and thus apply it in all the layers. Another interesting point to note is the authors of ResNet are of the opinion that the more layers we stack, the model should not perform worse.

This is contrary to what we saw in Inception and is almost similar to VGG16 in the sense that it is just stacking layers on top of the other. ResNet just changes the underlying mapping.

The ResNet model has many variants, of which the latest is ResNet152. The following is the architecture of the ResNet family in terms of the layers used:

Top 4 Pre-Trained Models for Image Classification with Python Code (11)

Let us now use ResNet50 on our dataset:

Step 1: Data Augmentation and Generators

Step 2: Import the base model

Again, we are using only the basic ResNet model, so we will keep the layers frozen and only modify the last layer:

Step 3: Build and Compile the Model

Here, I would like to show you an even shorter code for using the ResNet50 model. We will use this model just as a layer in a Sequential model, and just add a single Fully Connected Layer on top of it.

We compile the model and this time let us try the SGD optimizer:

Step 4: Fitting the model

The following is the result we get-

Top 4 Pre-Trained Models for Image Classification with Python Code (12)

You can see how well it performs on our dataset and this makes ResNet50 one of the most widely used Pre-trained models. Just like VGG, it also has other variations as we saw in the table above. Remarkably, ResNet not only has its own variants, but it also spawned a series of architectures based on ResNet. These include ResNeXt, ResNet as an Ensemble, etc. Additionally, the ResNet50 is among the most popular models out there and achieved a top-5 error rate of around 5%

The following is the link to the paper: Deep Residual Learning for Image Recognition

EfficientNet

We finally come to the latest model amongst these 4 that have caused waves in this domain and of course, it is from Google. In EfficientNet, the authors propose a new Scaling method called Compound Scaling. The long and short of it is this: The earlier models like ResNet follow the conventional approach of scaling the dimensions arbitrarily and by adding up more and more layers.

However, the paper proposes that if we scale the dimensions by a fixed amount at the same time and do so uniformly, we achieve much better performance. The scaling coefficients can be in fact decided by the user.

Though this scaling technique can be used for any CNN-based model, the authors started off with their own baseline model called EfficientNetB0:

Top 4 Pre-Trained Models for Image Classification with Python Code (13)

MBConv stands for mobile inverted bottleneck Convolution(similar to MobileNetv2). They also propose the Compound Scaling formula with the following scaling coefficients:

  • Depth = 1.20
  • Width = 1.10
  • Resolution = 1.15

This formula is used to again build a family of EfficientNets – EfficientNetB0 to EfficientNetB7. The following is a simple graph showing the comparative performance of this family vis-a-vis other popular models:

Top 4 Pre-Trained Models for Image Classification with Python Code (14)

As you can see, even the baseline B0 model starts at a much higher accuracy, which only goes on increasing, and that too with fewer parameters. For instance, EfficientB0 has only 5.3 million parameters!

The simplest way to implement EfficientNet is to install it and the rest of the steps are similar to what we have seen above.

Installing EfficientNet:

!pip install -U efficientnet

Import it

Step 1: Image Augmentation

We will use the same image dimensions that we used for VGG16 and ResNet50. By now, you would be familiar with the Augmentation process:

Step 2: Loading the Base Model

We will be using the B0 version of EfficientNet since it is the simplest of the 8. I urge you to experiment with the rest of the models, though do keep in mind that the models go on becoming more and more complex, which might not be the best suited for a simple binary classification task.

Again, let us freeze the layers:

Step 3: Build the model

Just like Inceptionv3, we will perform these steps at the final layer:

Step 4: Compile and Fit

Let us again use the RMSProp Optimiser, though here, I have introduced a decay parameter:

We finally fit the model on our data:

Top 4 Pre-Trained Models for Image Classification with Python Code (15)

There we go – we got a whopping 98% accuracy on our validation set in only 10 epochs. I urge you to try training the larger dataset with EfficientNetB7 and share the results with us below.

The following is the link to the paper: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Conclusion

To summarize, in this article, I introduced to you 4 of the top State-of-the-Art pre-trained models for image classification. Here is a handy table for you to refer these models and their performance:

Top 4 Pre-Trained Models for Image Classification with Python Code (16)

I have only provided an overview of the top 4 pre-trained models for image classification and how to implement them. However, this is a continuously growing domain and there is always a new model to look forward to and push the boundaries further. I cannot wait to explore these new models and I also urge you to try out the above models on different datasets with different parameters, and share your results with us in the comments below!

Frequently Asked Questions

Q1. What is VGG-16 network?

A. VGG-16 isa convolutional neural network that has 16 layers in it. These networks can be trained on more than a million images to classify images into over 1000 categories.

Q2. What is ResNet50?

A. ResNet-50 isa convolutional neural network that has 50 layers in it. It consists of 48 convolutional layers, one MaxPool layer, and one average pool layer.

Q3. What is EfficientNet in CNN?

A. EfficientNet isa CNN scaling method that uniformly scales all the dimensions of an image (depth, width, and resolution) using a compound coefficient.

EfficientNetinceptionResNetVGG16

P

Purva Huilgol09 Aug 2023

Computer VisionImageIntermediateListiclePython

Top 4 Pre-Trained Models for Image Classification with Python Code (2024)

FAQs

Which pre-trained model to use for image classification? ›

Popular choices of models for image classification tasks include YOLOv5, the Vision Transformer, and Resnet34.

Which Pretrained model is best for medical image classification? ›

MedNet performs as the pre-trained model to tackle any real-world application from medical imaging and achieve the level of generalization needed for dealing with medical imaging tasks, e.g. classification.

What is the best pretrained model? ›

VGG16 and VGG19: These models are known for their simplicity and effectiveness. They are widely used for various computer vision tasks. ResNet50: ResNet is a deep CNN architecture that has achieved state-of-the-art performance on various image classification tasks.

How do you train a model for image classification in Python? ›

Tutorial: Create Your Image Classification Model Using Python and Keras
  1. Step 1: Load the Cats vs. Dogs dataset. ...
  2. Step 2: Create your dataset. ...
  3. Step 3: Visualize your data. ...
  4. Step 4: Augment the image data. ...
  5. Step 5: Standardize the data. ...
  6. Step 6: Preprocess your data. ...
  7. Step 7: Configure your dataset. ...
  8. Step 8: Build the model.

Which are pre-trained models? ›

A pre-trained model is a machine learning (ML) model that has been trained on a large dataset and can be fine-tuned for a specific task. Pre-trained models are often used as a starting point for developing ML models, as they provide a set of initial weights and biases that can be fine-tuned for a specific task.

What are pre-trained models in CNN? ›

Pre-trained models can be used in a wide range of computer vision tasks, including image classification, object detection, image segmentation, and more. There are pre-trained models available for various architectures, including VGG, ResNet, Inception, and MobileNet, among others.

What is better than CNN for image classification? ›

Classification Accuracy of SVM and CNN In this study, it is shown that SVM overcomes CNN, where it gives best results in classification, the accuracy in PCA- band the SVM linear 97.44%, SVM-RBF 98.84% and the CNN 94.01%, But in the all bands just have accuracy for SVM-linear 96.35% due to the big data hyper spectral.

Which ML algorithm is best for image classification? ›

Random Forest Classifier shows the best performance with 47% accuracy followed by KNN with 34% accuracy, NB with 30% accuracy, and Decision Tree with 27% accuracy.

Is VGG16 a Pretrained model? ›

This is what transfer learning accomplishes. We will utilize the pre-trained VGG16 model, which is a convolutional neural network trained on 1.2 million images to classify 1000 different categories. Since the domain and task for VGG16 are similar to our domain and task, we can use its pre-trained network to do the job.

Are BERT models pretrained? ›

Unlike previous models, BERT is a deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus.

What is the disadvantage of pretrained model? ›

Limited Flexibility: Pretrained models may have a specific architecture or number of layers that might not be suitable for your target task. Adapting the pretrained model to your specific needs can be challenging, especially if you require significant modifications to the architecture.

Is KNN good for image classification? ›

In the case of image classification, k-NN can be used to classify images based on their pixel values or other image features. In terms of performance, SVMs are generally faster than k-NN for large image datasets, especially if the feature space is high-dimensional.

How to train image models? ›

Three Steps To Train Your Image Recognition Models Efficiently
  1. Step 1: Preparation of the training dataset.
  2. Step 2: Preparation and understanding of how Convolutional Neural Network models work.
  3. Step 3: Evaluation and validation of the training results of your system.

Which machine learning model is best for image classification? ›

Different from traditional machine learning, convolution neural network can be better used for image and time series data processing, especially for image classification and language recognition.

What machine learning model is used for image classification? ›

Convolutional Neural Networks or CNNs are widely used in Image Recognition, Detection, and Classification.

Is Yolo a Pretrained model? ›

YOLOv8 models are pretrained on the COCO dataset, so when you trained the model on your dataset you basically re-trained it on your own data. In summary, what you're doing is correct since you're taking your trained weights.

Which optimizer to use for image classification? ›

For the Natural dataset, the Nadam optimizer was the best performer, and the Adagrad optimizer exhibited the most inferior accuracy; the SGD algorithm achieved the shortest convergence time, and the Adadelta algorithm had the longest convergence time.

Top Articles
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 5862

Rating: 4.6 / 5 (46 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.