top of page

Dog Vision

The Dog Vision Project harnesses a pretrained Convolutional Neural Network (CNN) and transfer learning techniques, using TensorFlow for efficient dog breed classification. By adapting a powerful CNN with a dataset of 20,000+ images from the Stanford Dogs Dataset, we aim to swiftly and accurately identify over 120 dog breeds, showcasing the effectiveness of transfer learning in applying existing models to new challenges.

dog.jpg

Setting Up a GPU for Efficient Neural Network Training

Neural networks require intensive computation, primarily matrix multiplication, which necessitates a powerful computer chip for swift processing. GPUs significantly outperform CPUs in this regard, reducing training time from hours to minutes. We'll begin by importing TensorFlow to leverage GPU capabilities for our neural network operations.

​

​

​

 

To determine if TensorFlow can utilize a GPU on your system, you can employ the method tf.config.list_physical_devices(). This function will help in identifying whether a GPU is available for TensorFlow to use.

​

​

​

​

​

​

​

​

Getting Data

We'll be utilizing the Stanford Dogs dataset for our project. Our first step is to acquire this dataset directly from the Stanford Dogs website. The dataset comprises three primary components:

  • Images - images.tar

  • Annotations - annotation.tar

  • Lists with train/test splits  - lists.tar

​

We aim to organize these files into a directory structure as follows:

​

​

​

​

 

The process to achieve this includes several steps:

  • Initiating access to Google Drive.

  • Defining constants such as our base directory for file storage, the specific files we need to download, and the source URL for these downloads.

  • Setting up the target local path for saving these files.

  • Verifying the existence of the target files on Google Drive and copying them to the local machine if present.

  • In case the files aren't available on Google Drive, we'll download them from the source URL using the !wget command.

  • Creating a dedicated file on Google Drive to store the downloaded files.

  • Transferring the downloaded files to Google Drive for future access or use.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

To view the contents of the local_dir (which is dog_vision_data), we should initially confirm its existence using Path.exists(). After verifying its presence, we can explore its contents by utilizing Path.iterdir(), where we'll iterate through each file and display their .name attribute.

​

​

​

​

​

Output:

​

​

​

​

​

The .tar format, while similar to .zip in terms of combining multiple files into one, differs in that .zip also compresses these files, whereas .tar primarily just consolidates them. To extract or "untar" the contents of the files in images.tar, annotation.tar, and lists.tar, we can employ the !tar command. This operation will decompress and unfold all the files enclosed within each .tar archive.

​

For this purpose, we use specific flags with the !tar command:

  • The -x  flag instructs tar to extract files from an archive.

  • The -f  flag is used to denote that the next parameter is the name of the archive file.

  • Flags can be merged for simplicity, as in -xf, combining the functionalities of both flags.

​

​

​

​

​

  Output:

​

 

 

 

 

 

 

By utilizing the Python command os.listdir("."), where "." signifies the current directory, we can explore the newly added files resulting from our extraction process. Here's what we've discovered:

  • train_list.mat - This file contains a compilation of all images in the training set.

  • test_list.mat - Similarly, this one enumerates all the images designated for the testing set.

  • Images/ - This directory houses the entire collection of dog images.

  • Annotation/ - This folder is dedicated to storing annotations corresponding to each image.

  • file_list.mat - A comprehensive list that merges both the training and testing sets.

​

These files and directories are the result of extracting the contents of the previously mentioned .tar archives.

Inspecting the Dataset Contents

Before diving into model building, it's a good practice to first familiarize yourself with the dataset. This involves visualizing the data and examining distributions, such as the number of samples in each class, to gain a better understanding of the data characteristics.

​

  •  Understanding the Target Data Format

As we aim to develop a computer vision model for dog breed classification, it's essential to have a method for indicating to our model the specific breed present in each image. A typical approach for classification tasks is organizing samples into folders, with each folder's name corresponding to the class of its contents.

​

To illustrate:

​

​

​

​

​

​

​

​

​

​

​

​

​

​

  •  Examining the File List Details

Let's delve into the train_list.mat, test_list.mat, and full_list.mat files. To open .mat files in Python, a solution found on Stack Overflow recommends using the scipy library, known for its applications in scientific computing.

​

​

​

​

​

​

​

 

Output:

​

​

​

​

​

 

 

 

 

 

 

 

 

 

 

 

 

 

It appears we have obtained a dictionary containing multiple potentially relevant fields. We should examine the keys within this dictionary for further information.

​

​

 

Output:

 

 

 

The file_list key likely contains the information we need, indicated by the array of image filenames (each ending with .jpg). It would be beneficial to ascertain the number of files within each file_list key.

​

​

​

​

Output:

​

​

​

​

It appears that these lists represent the splits for training and testing, with the full list encompassing all the files present in the dataset. Exploring train_list['file_list'] will provide deeper insights into the training dataset's structure.

​

​

​

Output:

​

​

​

​

​

​

It seems we're dealing with an array of arrays. Converting them into a Python list would simplify manipulation. This conversion can be accomplished by utilizing indexing and list comprehension to retrieve each element. 

​

​

​

 

 

Output:

​

​

​

Let's verify that the training image filenames are not present in the testing image filenames list. Maintaining separation between the test set and training set is a fundamental principle in machine learning to ensure model validity. To ensure there are no common elements, we can convert train_file_list to a set and use the intersection() method with test_file_list. Confirming no overlaps is crucial.

​

​

​

Output:

​

​

​

It appears there are no overlapping elements between the sets. We can further solidify this check by using an assert statement, which raises an error if the condition is not met. The assert statement follows the structure: assert condition, error_message_if_condition_failsThe absence of any output from the assert check indicates that our datasets are appropriately separated. 

​

​

​

​

It appears that there are no overlaps between the training and test sets. Now, let's continue our exploration of the data.

Analyzing the Annotation Folder

Next, let's examine the Annotation folder. We can delve into its contents using Python and utilize os.listdir() to get a glimpse of what's inside.

​

​

​

Output:

​

​

​

​

​

​

​

​

In the Annotation folder, we can observe files named after specific dog breeds, each containing multiple numbered files. These files house HTML versions of annotations corresponding to images. An example of such a file path is "Annotation/n02085620-Chihuahua/n02085620_10074.":

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

A function has been developed to confirm the existence of 120 subdirectories within the Annotations directory, each representing a different dog breed class for classification purposes. This function employs Python's pathlib.Path, iterating over the items in the Annotations directory and utilizing Path.is_dir() to determine if each item is a directory. This ensures the proper setup for our classification task, which relies on matching each image name with the appropriate class name for the 120 dog breeds in the dataset.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

Output:

​

​

​

 

We have 120 annotation subfolders, each corresponding to a different dog class we aim to recognize. Upon closer examination of our file listings, it appears that the class name is already included in the file path.

​

​

​

Output:

​

​

​

 

Using the provided information, we can deduce that the image named "n02085620_5927.jpg" should depict a Chihuahua. To confirm this, we can utilize "IPython.display.Image()" since Google Colab conveniently incorporates IPython (Interactive Python).

​

​

​

Output:

​

​

chihuahua.jpg

Investigating the Contents of the Images Directory

Let's begin by creating the following two items to simplify our data organization:

  • Generate a dictionary that maps folder names to their respective class names. This mapping will be useful when visualizing data from its original folder structure.

  • Create a list containing all the unique dog class names, presented in a straightforward format.

 

To initiate this process, we can start by retrieving a list of all the folders within the "Images" directory using the "os.listdir()" function.

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

Next, we will create a dictionary that links each folder name to a more straightforward representation of the class name. 

​

​

​

​

​

​

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

To obtain a list of unique dog names, we can simply extract the values from the "folder_to_class_name_dict" and convert them into a list.

​

​

​

Output:

​

​

​

​

​

Displaying a Selection of Random Images

To facilitate visualization, we'll create a function that takes a list of image paths and randomly selects 10 paths for display. The function will generate a 2x5 grid of matplotlib plots, randomly sample 10 image paths, iterate through the flattened axes, extract sample paths and titles, read and show images, set titles based on parent folder names and dog breed names, and finally, display the plots for visualization.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Output:

​

1.png

Analyzing the Data Distribution

Following the visualization, another informative method for data exploration involves assessing the data distribution, which essentially examines how the data is spread or distributed. In our context, this entails determining the count of dog images available for each breed. An ideal scenario would exhibit a balanced distribution, meaning roughly equal numbers of images for each breed. To accomplish this, we'll create a function to count the number of images within each subfolder within a specified directory.

​

The function, when provided with a target directory, performs the following tasks: it identifies subdirectories within the specified directory, counts the number of images in each subfolder by examining files with the ".jpg" extension, and compiles this information into a list of dictionaries, where each dictionary contains the class name and image count for the respective subfolder, ultimately returning this list as its output.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Now, let's proceed to run the function on our "Images" directory and take a look at the first few entries in the results.

​

​

​

​

Output:

​

​

​

​

 

Given that our "image_class_counts" variable is structured as a list of dictionaries, we can easily convert it into a pandas DataFrame. To make the classes with the highest number of images appear first, we'll arrange the DataFrame by "image_count" using the "DataFrame.sort_values()" function.

​

​

​

​

Output:

​

​

​

​

​

​

​

Additionally, we will enhance the readability of the "class_name" column by mapping its values to correspond with our "folder_to_class_name_dict."

​

​

​

​

Output:

​

​

​

​

​

​

​

Now that we have a DataFrame containing image counts per class, we can create a more visually appealing representation by converting it into a bar plot. This can be achieved using the "image_counts_df.plot(kind="bar", ...)" function, with the option to customize the plot as desired.

​

​

​

​

​

​

​

​

​

​

 

Output:

​

​

​

​

​

​

​

​

​

​

​

​

​

Our data appears to have a well-balanced distribution, with each dog breed having approximately 150 or more images. We can obtain additional summary statistics about our data using the "DataFrame.describe()" function.

​

​

​

Output:

​

​

​

​

​

​

​

​

The table reveals a pattern consistent with the plot, showcasing that the minimum number of images per class stands at 148, while the maximum count reaches 252. In the event that one class possessed significantly fewer images, perhaps as little as one-tenth of another class, it would warrant consideration for additional data collection to rectify the balance.

2.png

Establishing Directories for Training and Testing Data Split

After our data exploration, our next step involves creating experimental data splits. Fortunately, our dog dataset already includes predefined training and test set divisions. However, to facilitate quicker experiments and adhere to the machine learning engineer's principle of experimentation, we will also establish a smaller training set, randomly selecting 10% of the training data.

​

As previously mentioned, we aim to structure our directory as follows:

​

​

​

​

​

​

​

​

​

​

​

​

​

 

To achieve this directory structure, we will use the "Path.mkdir()" method to:

  • Create the "images/train/" directory

  • Create the "images/test/" directory.

  • Establish a directory within each of "images/train/" and "images/test/" for every dog breed class.

We will iterate through the list of "dog_names" and generate a folder for each class inside both the "images/train/" and "images/test/" directories.

​

​

​

​

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To examine the data split directories or folders we've generated, we can obtain the names of each directory by listing the subdirectories contained within them.

​

​

​

Output:

​

​

​

​

​

​

​

​

We will establish a function named "copy_files_to_target_dir()" designed to transfer images from the "Images" directory to their corresponding directories within "images/train" and "images/test". This function takes a list of source files to be copied, specifies the target directory for placement, iterates through the source files with a progress bar, converts file paths, generates destination folders, extracts target file names, ensures destination directories exist, provides optional progress tracking, and ultimately copies the source files to their designated destinations using Python's "shutil.copy2(src, dst)" method.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

To verify its functionality, let's initiate a test by using it to copy the files within the "train_file_list" to the "train_dir."

​

​

​

​

​

 

It appears that our copying function has successfully copied 12,000 training images to their respective directories within "images_split/train/." Now, let's replicate this process for the "test_file_list" and "test_dir."

​

​

​

 

 

We will now create code to ensure that the count of files in the "train_file_list" matches the number of image files within the "train_dir," and similarly, to verify the consistency between the test files.

​

​

​

​

​

​

​

Output:

​

​

​

​

To visualize our data, we should generate plots displaying a selection of random images from the "train_image_paths" list.

​

​

​

Output:

​

​

​

​

​

​

​

​

3.png

Creating a 10% Subset of the Training Dataset

The objective is to copy a random 10% of the training images into this new folder, allowing for model training on a smaller dataset subset to assess performance and feasibility.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Proceeding with the task at hand, we then calculated the count of images within each dog breed class in the "train_10_percent_dir." Subsequently, this data was structured into a DataFrame named "train_10_percent_image_class_counts_df," with the classes sorted in ascending order based on their respective image counts. This resultant DataFrame provides a comprehensive overview of the image distribution within the 10% reduced training dataset.

​

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

We move forward by visualizing the distribution of images within the 10% reduced training dataset. To accomplish this, we initially generated a bar plot utilizing the "train_10_percent_image_class_counts_df" DataFrame, effectively illustrating the quantity of images for each individual dog breed class.

​

​

​

​

​

​

​

​

​

​

 

Output:

​

​

4.png

Converting Datasets into TensorFlow Datasets

Continuing from the preceding explanation, we utilize the code segments to convert our datasets into TensorFlow Datasets, facilitating efficient data management for subsequent model training and evaluation. These datasets, namely "train_10_percent_ds," "train_ds," and "test_ds," are configured with batch sizes, image dimensions, shuffling preferences, and seed values tailored to their respective roles, ensuring both consistency and reproducibility in our machine learning workflows.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

The consistency is then checked to ensure that the class names are uniform across TensorFlow Datasets ("train_10_percent_ds," "train_ds," and "test_ds")

​

​

​

​

Upon validation, the script assigns these class names to the "class_names" variable and presents the initial five names for reference.

​

​

​

Output:

​

​

​

​

​

​

Advancing in our data exploration, we employ a code snippet to visualize images within our dataset. The provided code configures a 3x3 grid for displaying a selection of images and their corresponding class labels from the "train_ds" dataset. We then proceed to iterate through a single batch of images and their associated labels from the "train_ds" dataset using "train_ds.take(1)." Within a nested loop running nine times, we prepare individual subplots for image presentation.

​

​

​

​

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

We're now going to take a closer look at a single batch from the TensorFlow Dataset, which we've grabbed from the "train_ds" dataset.

​

​

​

​

Output:

​

​

​

​

As we continue our exploration, this piece of code below gives us a closer peek into what's inside a single image batch we've got from the "train_ds" TensorFlow Dataset. Beginning with image_batch[0], we access the first image within the batch, enabling a specific examination of its visual content. Moving forward, label_batch[0] grants access to the associated label, denoting the image's assigned class category. Furthermore, by employing class_names[label_batch[0]], we extract the precise class name corresponding to this label, enriching our understanding of the image's categorization.

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

Inference: The output presents image_batch[0] as a tensor reflecting the pixel data of the first image in a batch, with a dimensionality of 224x224 pixels across 3 color channels (RGB), where each pixel's intensity might be normalized. Accompanying this, label_batch[0] is a scalar tensor indicating that this image is categorized under class number 41, which, according to the class_names mapping, corresponds to the class 'english_springer'.

5.png

Optimizing Dataset Performance for Smooth Processing

This segment of code is focused on elevating the performance of our datasets, a pivotal measure for streamlining data processing.

​

 

 

 

 

 

To attain this objective, a series of actions are executed on both our training and test datasets:

  •  Training dataset containing 10% of the data (train_10_percent_ds):

We first cache the dataset, storing it in memory after the initial loading to expedite subsequent access. Next, we shuffle the dataset using a buffer size of 100, introducing randomness into the data order to prevent any potential biases during training. Lastly, we prefetch data with a buffer size determined by the AUTOTUNE value, which dynamically adjusts the CPU resources allocated for data loading, optimizing the process in the background while the model is trained.

​

  •  Full training dataset (train_ds):

Similar to the 10% dataset, we cache it for efficient access. We apply a larger shuffle buffer of 1000 to introduce more substantial randomness into the data order. The data prefetching operation, again utilizing the AUTOTUNE buffer size, overlaps data loading and model execution to accelerate the process.

​

  •  Test dataset (test_ds):

Since shuffling is unnecessary for test data, we solely cache and prefetch it. This ensures that the test data remains readily available for evaluation without compromising the order.

Building the Model

In this section, we establish our foundational model by initializing a convolutional neural network (CNN) based on the EfficientNetV2 architecture. The specific variant we employ is EfficientNetV2B0. 

 

  • Firstly, we set include_top to False, indicating our intention to exclude the final classification layer of the model. This step is essential as we plan to append our custom output layer later in the model-building process.

  • Next, we initialize the model with pre-trained weights from the ImageNet dataset, specified by the weights parameter. Leveraging pre-trained weights provides a strong starting point for our model and enhances its performance on our specific classification task. 

  • We define the input shape for our model's images as (img_size, img_size, 3), accommodating images of size img_size pixels by img_size pixels with three color channels (RGB). 

  • Lastly, we ensure that the input images undergo preprocessing akin to that applied during pre-training on ImageNet. This is achieved by setting include_preprocessing to True. 

  • After setting up the base model, we take the crucial step of freezing its layers by assigning False to the trainable property. This ensures that the pre-trained weights remain unaltered during subsequent training.

​

​

​

​

​

​

​

​

​

 

 

As we dive into building our image classifier, we're gearing up to create a brand-new neural network called "model_0." This model is going to be our go-to tool for image classification. To make it happen, we're putting together different parts and doing some key tasks:

​

  • We define the input layer of our model, specifying an input shape of (224, 224, 3), indicating that it expects images of size 224x224 pixels with three color channels (RGB).

  • We utilize the pre-trained base_model (EfficientNetV2B0) established earlier to process the input images. By setting training=False, we ensure that the base model's layers remain frozen during this stage, preserving the pre-trained knowledge.

  • Following the base model, we apply a global average pooling layer. This layer computes the average value for each feature map, reducing the spatial dimensions and providing a more compact representation of the image features.

  • To mitigate overfitting, we introduce a dropout layer with a dropout rate of 20%. Dropout randomly deactivates a fraction of neurons during each training iteration, promoting model generalization.

  • The final output layer consists of a dense layer with a number of units equal to num_classes. This layer employs the softmax activation function, which is appropriate for multi-class classification tasks. It produces probability distributions over the classes, determining the likelihood of each class for a given input.

  • We create the complete model by specifying its inputs and outputs. The resulting model, named "model_0," is ready for training and evaluation.

​

​

​

​

​

​

​

​

​

​

​

 

   Output:

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

Encapsulating the process of creating a neural network model, we introduce a reusable function called "create_model." This function enables the creation of various model variations with different configurations by adjusting its parameters.

  • The function accepts several optional arguments, such as whether to include the top layers of the model, the number of output classes, the input shape for images, whether to include preprocessing layers for image normalization, whether the base model should be trainable, the dropout rate, and the desired name for the model.

  • Inside the function, we initialize the base model using the EfficientNetV2B0 architecture.

  • We specify whether to freeze the base model's layers based on the "trainable" argument.

  • We create an input layer with the specified input shape (height, width, channels) and connect it to the base model.

  • The backbone of the model is formed by passing the input through the base model and adding a Global Average Pooling 2D layer for feature aggregation. A dropout layer with a rate of 0.2 is applied to improve generalization.

  • The final output layer, also known as the "classifier" layer, is added with the number of units equal to the specified "num_classes". The activation function used in this layer is "softmax". 

  • We connect the input and output layers to create the final model and return it.

  • We use this function to create "model_0" with the number of classes equal to the length of "class_names" (in this case, 120 dog breeds). The summary of "model_0" is then displayed, providing insights into its architecture and parameters.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

  

 Output:

​

​

​

Training "Model 0" on a 10% Subset of the Training Data

We proceed with the training of "Model 0," which is designed to classify images into one of 120 different dog breeds. We start by creating "Model 0" using the previously defined create_model function. This function initializes the model architecture with the specified number of output classes, which, in this case, is equal to the length of the "class_names" list (corresponding to 120 dog breeds).

​

After creating the model, we compile it with various configuration settings. We initiate the training process using the fit method, specifying the following parameters:

  •  train_10_percent_ds

This is our 10% subset of the training data, which will be used for model training.

​

  •  validation_data

The validation dataset is provided as "test_ds," allowing us to monitor the model's performance on unseen data.

​

  •  epochs

We set the number of training epochs to 5, determining how many times the entire dataset will be processed during training.

​

​

​

Testing and Evaluating Model 0 on the Test Data

Here, we define a function named plot_model_loss_curves designed to visualize the training and validation loss, along with the training and validation accuracy, based on a given model's training history.

​

We extract the training accuracy, validation accuracy, training loss, and validation loss from the input history object. We define the range of epochs based on the length of the training history.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

Output:

​

​

​

​

​

​

​

​

​

​

​

​

Inference: 

  •  Training and Validation Accuracy

The training accuracy is consistently increasing, which indicates that the model is learning and improving at predicting the correct class on the training data. The validation accuracy also increases, but it seems to plateau or increase at a slower rate than the training accuracy. This could suggest that the model may be starting to overfit to the training data, as the gap between training and validation accuracy is widening over time.

​

  •  Training and Validation Loss

The training loss is steadily decreasing, which typically indicates that the model is learning effectively. The validation loss decreases as well but appears to level off as epochs increase. This pattern is similar to the accuracy graph and suggests that while the model is learning, it might not be generalizing as well to new, unseen data.

​

We proceed to assess the performance of "model_0" on the test dataset by utilizing the model_0.evaluate(test_ds) function. This evaluation step provides crucial insights into how well our model generalizes to previously unseen data and helps us gauge its overall effectiveness and reliability.

​

​

​

Output:

​

​

 

Inferrence:

  • "model_0" achieved an accuracy of around 80.14% on the test dataset, yet it also presents a relatively high loss of 0.9386. The high loss may be attributed to the limited amount of training data used, indicating that the model could potentially improve if trained on the full dataset.

6.png

Training "Model 1" on the Full Training Dataset

We continue with the training of "Model 1,"with the full training dataset. The journey begins with the creation of "Model 1" through the utilization of the create_model function. This function kickstarts the model architecture with the specified number of output classes, mirroring the length of the "class_names" list, encompassing the 120 different dog breeds.

​

Subsequent to model creation, we proceed to the compilation stage, applying specific configuration settings. These settings include the optimizer, loss function, and evaluation metrics. Following compilation, we embark on the training process by invoking the fit method.

​

​

Evaluate Model 1 on the Test Data

We utilize the previously defined function to visualize the training and validation loss as well as the training and validation accuracy of Model 1's training history.

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

​

​

Inference:

  •  Training and Validation Accuracy

The training accuracy is quite high and increases over time, suggesting that the model is effectively learning from the training data. The validation accuracy starts off well but plateaus quickly, indicating that while the model performs well on the training data, its performance on unseen data (validation data) does not improve significantly after the initial epochs.

​

  •  Training and Validation Loss

The training loss decreases sharply and then flattens out, which is typical as the model begins to converge to a solution. The validation loss decreases much less sharply and starts to flatten out as well, albeit at a higher level than the training loss. This indicates that the model has a higher error rate on the validation data compared to the training data.

​

We proceed to evaluate the performance of "model_1" on the test dataset using the model_0.evaluate(test_ds) function.

​

​

​

Output:

​

​

​

Inference:

  • The average loss across all test batches is 0.3617. The accuracy on the test dataset is about 88.29%. This means that approximately 88.29% of the predictions made by the model on the test dataset were correct. This output suggests that the model performed quite well on the test set, with a high accuracy rate.

7.png

Making and Evaluating Predictions with the Optimal Model

To thoroughly assess the best-performing model, we'll proceed to make predictions using the test dataset and evaluate its predictive accuracy, we execute the code snippet to generate predictions using "model_1" on the test dataset. It utilizes the model_1.predict(test_ds) function to produce logits as predictions.

​

​

​

​

Now, we retrieve the shape of the initial prediction within the test_preds array, providing insights into the array's dimensions, typically associated with the number of classes in the classification task. Additionally, we utilize the tf.argmax function to determine the index corresponding to the highest value in the same prediction, aiding in the identification of the predicted class for a specific data point in the test dataset.

​

​

​

Output:

​

​

​

​

To facilitate additional analysis and evaluation of the test dataset, we employ two NumPy arrays.

  •  test_ds_images

This array is created by concatenating individual batches of images from the test dataset using a list comprehension. We iterate through the test dataset, extracting images from each batch and stacking them along the specified axis (axis=0) to form a single continuous array. This results in an array containing all the images from the test dataset.

  •  test_ds_labels

Similarly, this array is formed by concatenating the labels associated with each batch in the test dataset. As with the images, we use a list comprehension to extract and concatenate labels, creating a single array containing all the corresponding labels.

 

As a result, these two arrays, test_ds_images and test_ds_labels, allow us to access individual images and their corresponding labels within the test dataset. For instance, by accessing test_ds_labels[0], we can retrieve the label of the first image in the test dataset, while test_ds_images[0] provides access to the pixel values of that same image.

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

Here, we're visually comparing randomly selected images from the test dataset to assess our model's performance on these specific examples. Using random.sample(), we generate a list of 10 unique random indexes within the range of the number of images in the test dataset. These random indexes will allow us to select 10 random images from the test dataset for comparison. We loop through each subplot using enumerate(axes.flatten()), where ax represents an individual subplot in the grid. For each subplot, we do the following:

  • We retrieve the target image from the test_ds_images array based on the random index.

  • We find the true label of the target image using test_ds_labels and map it to its corresponding class name.

  • We access the prediction probabilities for the target image from test_preds generated by our model.

  • We determine the predicted class by finding the class name with the highest probability using tf.argmax() and map it to its corresponding class name.

​

We create a title for each subplot that includes the true label, predicted label, and the highest prediction probability. The title also indicates whether the prediction is correct or not, which is reflected in the title color (green for correct and red for incorrect).

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

Output:

​

​

​

​

​

​

​

​

​

​

​

​

8.png

Calculating Accuracy per Class

In this section, we aim to compare the predictions made by our model (stored in test_preds) with the actual labels (ground truth) from the test dataset (stored in test_ds_labels) on a per-class basis. To facilitate this comparison, we create a new array called test_preds_labels.

​

We start by using NumPy's argmax function along with the axis parameter set to -1. The axis parameter specifies along which axis we want to perform the operation. In this case, axis=-1 means that we want to find the index of the maximum value along the last axis of the test_preds array. This effectively gives us the predicted class labels for each example in the test dataset.

​

​

​

​

Output:

​

​

​

​

As a continuation of the analysis, we proceed to organize and evaluate the predictions made by our model on the test dataset. We begin by creating a DataFrame, which is a tabular data structure, to store and organize our prediction results. We define several columns for this DataFrame.

  •  test_pred_label

This column contains the predicted class labels for each image.

  •  test_pred_prob

It holds the maximum prediction probability for each image, representing how confident the model is in its prediction.

  •  test_pred_class_name

This column maps the predicted class labels to their corresponding class names using the class_names list.

  •  test_truth_label

This column stores the true class labels from the test dataset.

  •  test_truth_class_name

Similar to the previous column, this one maps true class labels to their class names.

​

We want to determine whether each prediction is correct or not. To do this, we compare the predicted class names (test_pred_class_name) with the true class names (test_truth_class_name). To convert these Boolean values into integers (1 for True, 0 for False), we use the .astype(int) method.

​

​

​

​

​

​

​

 

Output:

​

​

​

​

​

​

​

​

​

​

​

As a follow-up to our analysis, we aim to calculate the accuracy of our model on a per-class basis. This means we want to understand how well our model performs for each individual class (dog breed) within the dataset.

We start by grouping the data in the test_results_df DataFrame by the "test_truth_class_name" column. This groups the predictions and ground truth labels for each class together. For each group (class), we calculate the mean of the "correct" column. We then create a new DataFrame called accuracy_per_class_df to store the accuracy values along with the corresponding class names. Finally, we sort this DataFrame in descending order of accuracy, with the best-performing classes at the top.

​

​

​

 

Output:

​

​

​

​

​

​

​

​

​

​

​

​

We proceed to generate a horizontal bar chart that illustrates the model's accuracy for each dog breed class.

We use plt.barh to create a horizontal bar chart. The "y" parameter specifies the class names from the accuracy_per_class_df, and the "width" parameter specifies the corresponding accuracy values. This means each class name is associated with its accuracy score, and the length of the horizontal bars represents the accuracy.

​

​

​

​

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​​

​

9.png

Identifying the Most Incorrect Predictions

Now, let's explore how we identify and retrieve the top 100 most incorrect predictions made by our model. We start by filtering the rows in the test_results_df DataFrame where the "correct" column is equal to 0. This means we are selecting predictions that the model got wrong. After filtering, we sort the selected rows based on the "test_pred_prob" column in descending order and slice the DataFrame to retain only the top 100 most incorrect predictions based on their confidence scores.

​

​

​

​

Output:

​

​

​

​

​

​

​

​

​

​

​

We continue to analyze our model's performance by visually comparing a random selection of 10 images from the top 100 most incorrect predictions made by the model. We begin by selecting 10 random indexes from the top 100 most incorrect predictions using the top_100_most_wrong DataFrame. These indexes correspond to the specific predictions we want to examine further.

​

Using a loop, we iterate through each subplot in the grid. For each subplot, we retrieve the index of the target prediction from our random selection, extract the relevant information, including the target image, the true label of the image, the model's predicted probabilities for each class, and the predicted class itself.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

Output:

​

​

10.png

Generating a Confusion Matrix for Model Evaluation

Next up, we are creating a confusion matrix to evaluate the performance of our model, "Dog Vision," on the test dataset. A confusion matrix is a valuable tool for understanding how well the model classifies different categories.

​

We start by importing the necessary libraries, including confusion_matrix and ConfusionMatrixDisplay from the scikit-learn library. We compute the confusion matrix by comparing the true labels (test_ds_labels) with the predicted labels (test_preds_labels) obtained from our model. The result is stored in the confusion_matrix_dog_preds variable. We use the ConfusionMatrixDisplay class to create a visualization of the confusion matrix. This display allows us to see how well the model's predictions align with the true labels.

​

​

​

​

​

​

​

​

​

​

Output:

​

11.png

Saving and Loading the Best Model

Finally, we are saving and loading our trained machine learning model, "model_1." This process allows us to store the model's architecture and weights in a file for future use or sharing with others. We start by using the save method to save "model_1" to a file named "dog_vision_model.keras." This file will contain all the information about the model's architecture and trained weights.

​

Next, we load the saved model using the tf.keras.models.load_model function. This action retrieves the model's architecture and weights from the saved file and stores them in the "loaded_model" variable. We then evaluate the performance of the loaded model on our test dataset ("test_ds") using the evaluate method. This step helps us verify that the loaded model performs the same as the original "model_1" before saving.

 

We use an assert statement to check if the results of the evaluation for "model_1" and the loaded model ("loaded_model") are the same. If they are not, the assertion will raise an error, indicating a discrepancy in model behavior.

​

​

​

​

​

​

​

​

Output:

​

Visualizing Model Predictions on Custom Images

In this section, we begin by presenting the custom images that have been manually uploaded to Google Drive. These images are then displayed in a grid format. The variable custom_image_paths holds a list of image file paths. These paths are strings that point to where the images are stored on the disk.

​

​

​

​

​

​

​

​

 

 

Output:

​

​

​

​

 

 

 

 

To ensure that our pre-trained model can make predictions on custom images in the same format it was trained on, we define a function called pred_on_custom_image. Inside the function:

  • The custom image is loaded and prepared with the specified target size.

  • The image is converted into a tensor.

  • A batch dimension is added to the tensor to match the model's input requirements.

  • The model makes predictions on the custom image, resulting in prediction probabilities for each class.

  • The class with the highest prediction probability is identified as the predicted class.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

We build upon the groundwork laid out earlier by using the pred_on_custom_image function. Our objective is to extend our model's predictions to multiple custom images and present the outcomes in a visually informative grid format.

​

​

​

​

​

​

​

Output:

​

​

​

​

​

​

​

Inference:

  •  The first three images

The model accurately identifies the dogs as "labrador_retriever," which aligns with the visual cues from the images. Given that Labradors are a distinct breed and the images are clear and well-focused on the dogs, it's reasonable to assume that these correct predictions contribute to the high accuracy rate of the model.

​

  •  The fourth image

The model incorrectly identifies a human as "boston_bull," which is a misclassification. This kind of error would contribute negatively to the accuracy rate. However, since the accuracy is quite high, we can infer that such misclassifications are relatively infrequent in the overall dataset the model was tested on.

​

Overall, the model appears to be quite accurate in identifying dog breeds, but it might not be as robust in situations where no dog is present in the image. The low loss value suggests that, on average, the model is quite confident in its predictions. In this case, while the model is generally accurate, it may benefit from further training on a more diverse set of images, including those without dogs, to improve its reliability in different scenarios.

12.png
13.png
bottom of page