Cheetah or Jaguar: Image Classification | Convolutional Neural Network

Yash Shah
7 min readJun 3, 2021

An image classifying project that differentiates between two very similar-looking wild cats: Cheetahs and Jaguars using Python and TensorFlow.

Introduction

Photo by Joshua J. Cotten on Unsplash

I’ve had an odd interest with animals since I was a child, especially big cats. However, as a child, I was often confused between these two felines: the Cheetah and the Jaguar, due to their similar appearances. Surprisingly, it wasn’t only me; in fact, many individuals are unaware that these animals are from distinct species. This was the primary motivation for creating this project that classifies these two animals.

With this blog, I’d like to discuss future implementation ideas that I’ve discussed in more detail in other sections, as well as explain a few technical skills that were used in the development of this project.

https://tigertribe.net/differences-between-jaguar-leopard-and-cheetah/

I would recommend reading this article where they describe the major differences between the two cats.

Want to read this story later? Save it in Journal.

About the Project

We have used a dataset of Cheetah and Jaguar images and trained a Deep Learning Model that would differentiate and classify images. Since our model is trained on images, it would try to distinguish the animals only based on their physical attributes, which are:

  • Their Coat (Cheetahs are Spotted while Jaguars have Rosettes)
Cheetah (left) vs Jaguar (right)
  • Physical size and build (Jaguars are more muscular and stocky compared to cheetahs leaner and longer frame)

Theory

We have used a Deep Learning Technique known as Convolutional Neural Networks (CNN) for this classification problem. CNN are Artificial Neural Networks that are very popular amongst Image Processing Problems.

CNN Image Classifications take an input image, process it and classify it under given classes (Cheetah or Jaguar in our case). Computers see an image as an array of pixels which depends on the image resolution. CNN specializes in recognizing and detecting patterns (edges, shapes, corners, etc in our case) which makes it perfect for these types of projects.

what we see vs what the computer sees

Each image passes through a set of layers before being classified. Each layer and its functions are explained in the Model Building section along with their code implementation.

Dataset

I found this dataset on Kaggle: https://www.kaggle.com/iluvchicken/cheetah-jaguar-and-tiger

Very conveniently 900 training images and 100 validation images are provided for each class. We made 2 folders:

  • Train: 2 folders (jaguar_train and cheetah_train)
  • Validation:(jaguar_validation and cheetah_validation)

Preparing the Data

We will be using TensorFlow to build this CNN model. First, we import all the necessary libraries:

#importing libraries 
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image

Preprocessing

train = ImageDataGenerator(rescale=1/255)
test = ImageDataGenerator(rescale=1/255)

ImageDataGenerator helps us load and label image datasets. We create two objects for ImageDataGenerator and also rescale the image such that their pixel values are normalized between 0 and 1(by dividing by 255).

We do this so that each input has a similar distribution and the model runs much faster since convergence takes a lot less time now.

Why 255?

RGB (Red, Green, Blue) are 8 bit each.
The range for each individual colour is 0–255 (as 2⁸ = 256 possibilities). So by dividing by 255, the 0–255 range can be described with a 0.0–1.0 range where 0.0 means 0 (0x00) and 1.0 means 255 (0xFF).

train_dataset = train.flow_from_directory(r"C:\Users\user\Desktop\Current_Projects\new_dataset\train",
target_size=(200,200),
class_mode = 'binary')

test_dataset = test.flow_from_directory(r"C:\Users\user\Desktop\Current_Projects\new_dataset\val",
target_size=(200,200),
class_mode = 'binary')

Then we use the objects to call the flow_from_directory method by specifying our path to the train and test directories.

The target size we used is 200 x 200, even though our dataset images were already resized, I used this to reduce the image sizes for even faster computation.

Class_mode is Binary since we have only 2 classes: Cheetah and Jaguar (0 or 1).

Model Building

In case, you are not interested in the highly technical parts of this project I recommend you skip this section.

CNN Model Architecture
model = keras.Sequential()model.add(keras.layers.Conv2D(32,(3,3),activation='relu',input_shape=(200,200,3)))
model.add(keras.layers.MaxPool2D(2,2))
model.add(keras.layers.Conv2D(64,(3,3),activation='relu'))
model.add(keras.layers.MaxPool2D(2,2))
model.add(keras.layers.Conv2D(128,(3,3),activation='relu'))
model.add(keras.layers.MaxPool2D(2,2))
model.add(keras.layers.Conv2D(128,(3,3),activation='relu'))
model.add(keras.layers.MaxPool2D(2,2))
model.add(keras.layers.Flatten())model.add(keras.layers.Dense(512,activation='relu'))model.add(keras.layers.Dense(1,activation='sigmoid'))

Step by Step Explanation

The input_shape is 200*200 which is our image size and 3 represents colour channel RGB (since we have coloured images).

Conv2D(): Neural networks apply a filter to an input image to create a feature map that summarizes the presence of detected features or patterns in the input. In our case, there are 32, 64, 128 and 128 filters or kernels in respective layers.

If you noticed, we increase the number of filters per layer this is because as we move forward in the layers, the patterns get more complex; hence there are larger combinations of patterns to capture. So, we increase the filter size in subsequent layers to capture as many combinations as possible.

MaxPool2D(): Max pooling is a pooling operation that selects the largest element from the region of the feature map covered by the filter. In simple terms in Max pooling, we choose the maximum value within a matrix to reduce image size without losing the image information.

Flatten(): Converts the multi-dimensional image data array to a single dimensional array.

Dense(): Fully connected neural network layer where each input node is connected to each output node. We have used this layer twice one in the hidden layer with 512 neurons and then for the output layer with a single neuron and sigmoid function to make final predictions.

model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

Then we specify the optimizer and the loss function for our model and also the metrics that we want to visualize while training. The role of the optimizer is it measure how good our model predicted output when compared with true output.

Here we use adam and binary_crossentropy since it's a binary classification project and adam is the best adaptive optimizer for most of the cases.

Training

model.fit_generator(train_dataset,
steps_per_epoch = 30,
epochs = 10,
validation_data = test_dataset)

We can train our model by calling the fit_generator() function which takes our training images as input for training and our validation images for testing.

Steps per epoch: Denote the number of batches to be selected for one Epoch.

Epoch: Consists of one full cycle through the training data. (Multiple steps).

Output

Training accuracy = 87.5% | Validation accuracy = 85.5%

Saving the Model

tf.keras.models.save_model(model,'best_model.hdf5')

It's a good practice to save your models so you don't need to train them every time and a saved model can also be helpful during deployment.

Testing

All fancy preprocessing, numbers, layers, and accuracy aside, the most effective way to test a model is to use it. Here, I found a few images that are neither from the training nor the test dataset. Let's see how or model performed:

def predict(filename):
img1 = image.load_img(filename,target_size=(200,200))

plt.imshow(img1)

Y = image.img_to_array(img1)

X = np.expand_dims(Y,axis=0)
val = model.predict(X)
print(val)

if val == 1:
plt.xlabel("A Jaguar")

elif val == 0:
plt.xlabel("A Cheetah")

Here we just created a function that will predict the image class based on our model upon giving the image path.

Final Notes

Classifying 2 animals using their images is a very basic yet foundational part of a larger implementation idea that I wish to discuss.

The coat patterns of cheetahs and jaguars differ. Their coat patterns are one-of-a-kind, just like human fingerprints. So, with the right datasets and possibly a beefier model, we should be able to identify the precise individual animal merely by their pictures (Biometrics for animals).

This could replace the need for an expert to identify every animal spotted on, for example, a night camera in a wildlife sanctuary or National Park. Further, we could create a pipeline that collects data from cameras installed in a Sanctuary, classifies and labels the individual and stores it with relevant information like camera spot, individual name, date and time, etc.

This way the wildlife experts can prioritize their time in analyzing the collected data to know more information about individual animal behaviour, their territories, breeding habits, hunting patterns, etc. This can also be helpful in case a new individual is spotted or for security purposes in terms of poaching.

Tracking jaguars in Iguaçu National Park, Brazil (WWF Species Tracker)

I understand that this type of solution appears to be quite theoretical, but we should never underestimate the power of data.

Cheers :)

Other Creator: Ishita Gupta (ish)

📝 Save this story in Journal.

--

--