By Randley Morales¶
Problem Statement¶
Business Context¶
Workplace safety in hazardous environments like construction sites and industrial plants is crucial to prevent accidents and injuries. One of the most important safety measures is ensuring workers wear safety helmets, which protect against head injuries from falling objects and machinery. Non-compliance with helmet regulations increases the risk of serious injuries or fatalities, making effective monitoring essential, especially in large-scale operations where manual oversight is prone to errors and inefficiency.
To overcome these challenges, SafeGuard Corp plans to develop an automated image analysis system capable of detecting whether workers are wearing safety helmets. This system will improve safety enforcement, ensuring compliance and reducing the risk of head injuries. By automating helmet monitoring, SafeGuard aims to enhance efficiency, scalability, and accuracy, ultimately fostering a safer work environment while minimizing human error in safety oversight.
Objective¶
As a data scientist at SafeGuard Corp, you are tasked with developing an image classification model that classifies images into one of two categories:
- With Helmet: Workers wearing safety helmets.
- Without Helmet: Workers not wearing safety helmets.
Data Description¶
The dataset consists of 631 images, equally divided into two categories:
- With Helmet: 311 images showing workers wearing helmets.
- Without Helmet: 320 images showing workers not wearing helmets.
Dataset Characteristics:
- Variations in Conditions: Images include diverse environments such as construction sites, factories, and industrial settings, with variations in lighting, angles, and worker postures to simulate real-world conditions.
- Worker Activities: Workers are depicted in different actions such as standing, using tools, or moving, ensuring robust model learning for various scenarios.
Installing and Importing the Necessary Libraries¶
!pip install tensorflow[and-cuda] numpy==1.26.5 -q
ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11 ERROR: Could not find a version that satisfies the requirement numpy==1.26.5 (from versions: 1.3.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0.post2, 1.10.1, 1.10.2, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 1.11.3, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 1.13.3, 1.14.0, 1.14.1, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.20.0, 1.20.1, 1.20.2, 1.20.3, 1.21.0, 1.21.1, 1.22.0, 1.22.1, 1.22.2, 1.22.3, 1.22.4, 1.23.0, 1.23.1, 1.23.2, 1.23.3, 1.23.4, 1.23.5, 1.24.0, 1.24.1, 1.24.2, 1.24.3, 1.24.4, 1.25.0, 1.25.1, 1.25.2, 1.26.0, 1.26.1, 1.26.2, 1.26.3, 1.26.4, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.2.6, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.3.5) ERROR: No matching distribution found for numpy==1.26.5
import tensorflow as tf
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print(tf.__version__)
Num GPUs Available: 0 2.19.0
Note:
After running the above cell, kindly restart the notebook kernel (for Jupyter Notebook) or runtime (for Google Colab) and run all cells sequentially from the next cell.
On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in this notebook.
import os
import random
import numpy as np # Importing numpy for Matrix Operations
import pandas as pd
import seaborn as sns
import matplotlib.image as mpimg # Importing pandas to read CSV files
import matplotlib.pyplot as plt # Importting matplotlib for Plotting and visualizing images
import math # Importing math module to perform mathematical operations
import cv2
# Tensorflow modules
import keras
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Importing the ImageDataGenerator for data augmentation
from tensorflow.keras.models import Sequential # Importing the sequential module to define a sequential model
from tensorflow.keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPooling2D,BatchNormalization # Defining all the layers to build our CNN Model
from tensorflow.keras.optimizers import Adam,SGD # Importing the optimizers which can be used in our model
from sklearn import preprocessing # Importing the preprocessing module to preprocess the data
from sklearn.model_selection import train_test_split # Importing train_test_split function to split the data into train and test
from sklearn.metrics import confusion_matrix
from tensorflow.keras.models import Model
from keras.applications.vgg16 import VGG16 # Importing confusion_matrix to plot the confusion matrix
# Display images using OpenCV
from google.colab.patches import cv2_imshow
#Imports functions for evaluating the performance of machine learning models
from sklearn.metrics import confusion_matrix, f1_score,accuracy_score, recall_score, precision_score, classification_report
from sklearn.metrics import mean_squared_error as mse # Importing cv2_imshow from google.patches to display images
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')
# Set the seed using keras.utils.set_random_seed. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
tf.keras.utils.set_random_seed(812)
Data Overview¶
Loading the data¶
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
# Load the image file of the dataset
images = np.load("/content/drive/MyDrive/Computer Vision/images_proj.npy")
# Load the labels file of the dataset
labels = pd.read_csv("/content/drive/MyDrive/Computer Vision/Labels_proj.csv")
print(f"Image data shape: {images.shape}")
print(f"Labels data shape: {labels.shape}")
Image data shape: (631, 200, 200, 3) Labels data shape: (631, 1)
Observations:¶
Dataset Dimensions & Structure:
Sample Size: The dataset is relatively small, containing 631 images. This is confirmed by the shape output (631, ...) for both images and labels.
Image Resolution: The images are 200x200 pixels.
Color Channels: The images have 3 channels, indicating they are color images (RGB/BGR).
Consistency: The number of labels (631) matches the number of images perfectly, so there is no missing data alignment issue at this stage.
Exploratory Data Analysis¶
Plot random images from each of the classes and print their corresponding labels.¶
# Converting the images from BGR to RGB using cvtColor function of OpenCV
for i in range(len(images)):
images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB)
# Get indices of each class
helmet_indices = np.where(labels == 1)[0]
no_helmet_indices = np.where(labels == 0)[0]
# Choose 10 random images from each class
helmet_samples = np.random.choice(helmet_indices, 10, replace=False)
no_helmet_samples = np.random.choice(no_helmet_indices, 10, replace=False)
# Combine them into one array (20 images total)
selected_indices = np.concatenate([helmet_samples, no_helmet_samples])
# OPTIONAL: Shuffle so they are not grouped
np.random.shuffle(selected_indices)
# Create a 4x5 subplot grid
fig, axes = plt.subplots(4, 5, figsize=(15, 12))
axes = axes.flatten()
# Display each selected image
for i, idx in enumerate(selected_indices):
axes[i].imshow(images[idx])
title = "Worker WITH Helmet" if labels.loc[idx, 'Label'] == 1 else "Worker WITHOUT Helmet"
axes[i].set_title(title)
axes[i].axis('equal')
plt.tight_layout()
plt.show()
Observations:¶
- Task Identification
- Binary Classification: Based on the variable names in the helmet_indices and no_helmet_indices, the goal of this project is Helmet Detection.
Classes:
Class 1: Helmet (Indices where label is 1)
Class 0: No Helmet (Indices where label is 0)
- Data Preprocessing
Color Space Conversion: The code explicitly converts images from BGR to RGB (cv2.cvtColor(..., cv2.COLOR_BGR2RGB)).
- Observation: This suggests the original data was likely saved or processed using OpenCV (which defaults to Blue-Green-Red). This conversion is crucial because if you plotted these images using libraries like Matplotlib without this step, the colors would look inverted (e.g., red objects would appear blue).
File Formats:
The images are loaded from a .npy file (NumPy binary), which is faster and more efficient for loading large array data compared to reading individual image files (JPG/PNG).
The labels are loaded from a standard .csv file.
- Significant Scale and Crop Discrepancy (Potential Bias)
The most critical observation is a distinct difference in how the two classes are framed:
"Worker WITHOUT Helmet" (Class 0): These images appear to be tight crops of faces. They are zoomed in significantly, often cutting off the forehead or chin.
"Worker WITH Helmet" (Class 1): These images are generally zoomed out, showing the upper body, shoulders, and significant background context (construction sites, machinery, etc.).
Why this matters: This is a major risk for machine learning models. My model might accidentally learn to classify images based on "zoom level" or "amount of background clutter" rather than the actual presence of a helmet. It might assume that "Face filling the whole frame = No Helmet" and "Small face with background = Helmet."
- Visual Diversity
Helmet Colors: The "With Helmet" class shows good diversity in helmet colors (Yellow, Red, White/Blue), which will help the model generalize so it doesn't associate "Helmet" with just the color "Yellow."
Lighting: There is significant variation in lighting, from bright outdoor sunlight to darker indoor/tunnel environments.
- Code Logic
- Balanced Sampling: The code explicitly selects 10 random images from the "helmet" list and 10 from the "no_helmet" list. This is good practice for visualization because it ensures you see a representative sample of both classes, even if the underlying dataset is imbalanced.
Checking for class imbalance¶
# Create a count plot
plt.figure(figsize=(6, 4))
ax = sns.countplot(x=labels.iloc[:, 0], palette=['red', 'green'])
# Add exact counts on top of bars
for p in ax.patches:
ax.annotate(f'{int(p.get_height())}', (p.get_x() + p.get_width() / 2, p.get_height()),
ha='center', va='bottom', fontsize=10, )
# Add labels
plt.xlabel("Class Labels", fontsize=12)
plt.ylabel("Number of Images", fontsize=12)
plt.title("Count of Images per Class", fontsize=14)
plt.xticks(ticks=[0, 1], labels=["Without Helmet (0)", "With Helmet (1)"]) # Rename x-axis labels
# Show plot
plt.show()
Observations:¶
Class Balance
The dataset exhibits a near-perfect balance between the two classes, with a distribution of approximately 50.7% versus 49.3%. Consequently, no resampling techniques (oversampling or undersampling) are required to address class imbalance.
Without Helmet (0): 320 images
With Helmet (1): 311 images
Total = 631 (matches your earlier shapes)
Data Preprocessing¶
# Function to plot the original and processed images side by side
def grid_plot(img1,img2,gray=False):
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(img1)
axes[0].set_title('Original Image')
axes[0].axis('off')
if gray:
axes[1].imshow(img2,cmap='gray')
else:
axes[1].imshow(img2)
axes[1].set_title('Processed Image')
axes[1].axis('off')
plt.show()
Converting images to grayscale¶
images_gray = []
for i in range(len(images)):
img_gray = cv2.cvtColor(images[i], cv2.COLOR_BGR2GRAY) # Convert to grayscale
images_gray.append(img_gray)
# Display a sample Original vs Grayscale image
n = random.randint(0, len(images_gray) - 1)
orig = images[n]
gray = images_gray[n]
plt.figure(figsize=(8, 4))
# Original image
plt.subplot(1, 2, 1)
plt.imshow(orig)
plt.title("Original")
plt.axis('equal')
# Grayscale (preprocessed) image
plt.subplot(1, 2, 2)
plt.imshow(gray, cmap='gray')
plt.title("Grayscale (Preprocessed)")
plt.axis('equal')
plt.show()
# Choose how many original+gray pairs to include
num_pairs = 4 # change this value as you like
# Build a list that alternates: [orig1, gray1, orig2, gray2, ...]
mixed_images = []
for i in range(num_pairs):
mixed_images.append(images[i]) # original
mixed_images.append(images_gray[i]) # grayscale
cols = 4
rows = math.ceil(len(mixed_images) / cols)
plt.figure(figsize=(12, 3 * rows))
for idx, img in enumerate(mixed_images):
plt.subplot(rows, cols, idx + 1)
if idx % 2 == 1: # grayscale images at odd indices
plt.imshow(img, cmap='gray')
plt.title(f"Grayscale (Preprocessed)")
else:
plt.imshow(img)
plt.title(f"Original")
plt.axis("equal")
plt.tight_layout()
plt.show()
Splitting the dataset¶
# Split the data into train, validation, and test sets with stratification to maintain class balance.
X_train, X_temp, y_train, y_temp = train_test_split(np.array(images),labels , test_size=0.3, random_state=42, stratify=labels)
# Further split the temp set equally into validation and test sets, again with stratification.
X_val, X_test, y_val, y_test = train_test_split(X_temp,y_temp , test_size=0.5, random_state=42,stratify=y_temp)
# Print the shapes of the resulting splits to verify the split sizes.
print(X_train.shape,y_train.shape)
print(X_val.shape,y_val.shape)
print(X_test.shape,y_test.shape)
(441, 200, 200, 3) (441, 1) (95, 200, 200, 3) (95, 1) (95, 200, 200, 3) (95, 1)
Observations:¶
- Robust Splitting Strategy (70-15-15 Split)
The code implements a standard and effective 3-way split for machine learning:
Training Set: 441 images (~70%) — Used to teach the model.
Validation Set: 95 images (~15%) — Used to tune hyperparameters and check for overfitting during training.
Test Set: 95 images (~15%) — Used for the final unbiased evaluation.
- Data Integrity Check
- The shapes printed in the output (441, 95, 95) sum up exactly to 631, which confirms that no data was lost or mishandled during the splitting process.
Data Normalization¶
# Normalizing the image pixels
X_train_normalized = X_train.astype('float32')/255.0
X_val_normalized = X_val.astype('float32')/255.0
X_test_normalized = X_test.astype('float32')/255.0
Observations:¶
- Data Normalization
Rescaling: The code rescales the pixel intensity values from the range [0, 255] to [0, 1] by dividing by 255.0.
Why this is important: Neural networks converge much faster and more stably when input values are small (between 0 and 1). Large integer inputs (like 255) can lead to unstable gradients, making the model hard to train.
Precision: Converting to float32 is the standard for Deep Learning. It offers sufficient precision for training while using half the memory of the default float64, which is important for GPU performance.
Model Building¶
Model Evaluation Criterion¶
Utility Functions¶
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# checking which probabilities are greater than threshold
pred = model.predict(predictors).reshape(-1)>0.5
target = target.to_numpy().reshape(-1)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred, average='weighted') # to compute Recall
precision = precision_score(target, pred, average='weighted') # to compute Precision
f1 = f1_score(target, pred, average='weighted') # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame({"Accuracy": acc, "Recall": recall, "Precision": precision, "F1 Score": f1,},index=[0],)
return df_perf
def plot_confusion_matrix(model,predictors,target,ml=False):
"""
Function to plot the confusion matrix
model: classifier
predictors: independent variables
target: dependent variable
ml: To specify if the model used is an sklearn ML model or not (True means ML model)
"""
# checking which probabilities are greater than threshold
pred = model.predict(predictors).reshape(-1)>0.5
target = target.to_numpy().reshape(-1)
# Plotting the Confusion Matrix using confusion matrix() function which is also predefined tensorflow module
confusion_matrix = tf.math.confusion_matrix(target,pred)
f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(
confusion_matrix,
annot=True,
linewidths=.4,
fmt="d",
square=True,
ax=ax
)
plt.show()
Model 1: Simple Convolutional Neural Network (CNN)¶
# Initializing Model
model_1 = Sequential()
# Convolutional layers
# 🔹 Use your actual image size here, e.g. (128, 128, 3) or (64, 64, 3)
model_1.add(Conv2D(32, (3, 3), activation='relu', padding="same", input_shape=(200, 200, 3)))
model_1.add(MaxPooling2D((4, 4), padding='same'))
model_1.add(Conv2D(64, (3, 3), activation='relu', padding="same")) # 64 filters, 3x3 kernel, ReLU
model_1.add(MaxPooling2D((2, 2), padding='same')) # 2x2 pooling
model_1.add(Conv2D(128, (3, 3), activation='relu', padding="same")) # 128 filters, 3x3 kernel, ReLU
# Flatten and Dense layers
model_1.add(Flatten())
model_1.add(Dense(4, activation='relu'))
# For binary classification → 1 neuron with sigmoid
model_1.add(Dense(1, activation='sigmoid')) # output layer
# Compile with Adam Optimizer
opt = Adam(learning_rate=0.001) # typical starting learning rate
model_1.compile(optimizer=opt, loss='binary_crossentropy', metrics=["accuracy", "Precision"])
# Summary
model_1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 200, 200, 32) │ 896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 50, 50, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_1 (Conv2D) │ (None, 50, 50, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 25, 25, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_2 (Conv2D) │ (None, 25, 25, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten (Flatten) │ (None, 80000) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 4) │ 320,004 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 1) │ 5 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 413,257 (1.58 MB)
Trainable params: 413,257 (1.58 MB)
Non-trainable params: 0 (0.00 B)
history_1 = model_1.fit(
X_train_normalized, y_train,
epochs=20, # number of epochs
validation_data=(X_val_normalized, y_val),
shuffle=True,
batch_size=128, # batch size
verbose=2
)
Epoch 1/20 4/4 - 21s - 5s/step - Precision: 0.4627 - accuracy: 0.4649 - loss: 0.6986 - val_Precision: 0.0000e+00 - val_accuracy: 0.5053 - val_loss: 0.7471 Epoch 2/20 4/4 - 17s - 4s/step - Precision: 0.0000e+00 - accuracy: 0.5079 - loss: 0.6745 - val_Precision: 0.0000e+00 - val_accuracy: 0.5053 - val_loss: 0.6903 Epoch 3/20 4/4 - 18s - 4s/step - Precision: 0.9730 - accuracy: 0.6667 - loss: 0.6274 - val_Precision: 0.6528 - val_accuracy: 0.7368 - val_loss: 0.6177 Epoch 4/20 4/4 - 19s - 5s/step - Precision: 0.8366 - accuracy: 0.9002 - loss: 0.4113 - val_Precision: 1.0000 - val_accuracy: 0.9579 - val_loss: 0.3200 Epoch 5/20 4/4 - 17s - 4s/step - Precision: 0.9851 - accuracy: 0.9501 - loss: 0.2409 - val_Precision: 0.9592 - val_accuracy: 0.9789 - val_loss: 0.1130 Epoch 6/20 4/4 - 21s - 5s/step - Precision: 0.9773 - accuracy: 0.9841 - loss: 0.0642 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0334 Epoch 7/20 4/4 - 18s - 5s/step - Precision: 0.9907 - accuracy: 0.9887 - loss: 0.0428 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0469 Epoch 8/20 4/4 - 22s - 5s/step - Precision: 1.0000 - accuracy: 0.9864 - loss: 0.0372 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0766 Epoch 9/20 4/4 - 17s - 4s/step - Precision: 0.9819 - accuracy: 0.9909 - loss: 0.0147 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0658 Epoch 10/20 4/4 - 17s - 4s/step - Precision: 1.0000 - accuracy: 0.9955 - loss: 0.0087 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0465 Epoch 11/20 4/4 - 21s - 5s/step - Precision: 0.9954 - accuracy: 0.9955 - loss: 0.0127 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.1730 Epoch 12/20 4/4 - 18s - 4s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 0.0019 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0510 Epoch 13/20 4/4 - 17s - 4s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 0.0025 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0348 Epoch 14/20 4/4 - 18s - 4s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 0.0021 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0880 Epoch 15/20 4/4 - 20s - 5s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 0.0018 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.1079 Epoch 16/20 4/4 - 29s - 7s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 0.0012 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0915 Epoch 17/20 4/4 - 31s - 8s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 1.7766e-04 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0768 Epoch 18/20 4/4 - 18s - 4s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 8.6354e-05 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0669 Epoch 19/20 4/4 - 17s - 4s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 1.2064e-04 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0631 Epoch 20/20 4/4 - 22s - 6s/step - Precision: 1.0000 - accuracy: 1.0000 - loss: 1.7029e-04 - val_Precision: 0.9792 - val_accuracy: 0.9895 - val_loss: 0.0658
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
model_1_train_perf = model_performance_classification(model_1, X_train_normalized,y_train)
print("Train performance metrics")
print(model_1_train_perf)
14/14 ━━━━━━━━━━━━━━━━━━━━ 12s 856ms/step Train performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_1,X_train_normalized,y_train)
14/14 ━━━━━━━━━━━━━━━━━━━━ 5s 326ms/step
model_1_valid_perf = model_performance_classification(model_1, X_val_normalized,y_val)
print("Validation performance metrics")
print(model_1_valid_perf)
3/3 ━━━━━━━━━━━━━━━━━━━━ 2s 511ms/step Validation performance metrics Accuracy Recall Precision F1 Score 0 0.989474 0.989474 0.989693 0.989474
plot_confusion_matrix(model_1,X_val_normalized,y_val)
3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 328ms/step
Vizualizing the predictions¶
# The indices you want to inspect
indices = [12, 21, 33, 41]
plt.figure(figsize=(8, 8))
for i, idx in enumerate(indices):
# Plot image
plt.subplot(2, 2, i + 1)
plt.imshow(X_val[idx])
plt.axis("equal")
# Make prediction for the corresponding index
pred = model_1.predict(X_val_normalized[idx].reshape(1, 200, 200, 3))[0][0]
predicted_label = 1 if pred > 0.5 else 0
# True label
true_label = y_val.iloc[idx]
# Set title with prediction + truth
plt.title(f"Index {idx}\nPred: {predicted_label} | True: {true_label}")
plt.tight_layout()
plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step
Observations:¶
- What I’m seeing in my CNN architecture:
Input: (200, 200, 3) RGB images.
Backbone: 3 conv blocks (32 → 64 → 128 filters, 3×3, ReLU, padding="same").
Pooling choice: my first MaxPool is (4×4), which is a very aggressive downsample:
200×200 → 50×50 immediately. This can work, but it can also throw away fine details early.
Big flatten: after the last conv I have 25×25×128 = 80,000 features → Flatten().
This creates a classic “CNN → huge vector → dense” pattern that often overfits on small datasets.
Dense head is tiny: Dense(4) then Dense(1) sigmoid.
Interestingly, most of my parameters are in that Dense(4) because of the huge flatten:
Dense(4) params ≈ 80,000×4 + 4 = 320,004
Total params ≈ 413,257
So even though the head looks small, the model still has a lot of capacity.
- Training behavior (from my logs/plots)
Very fast convergence: by around epoch 4–6 my validation accuracy is already ~0.98–0.99, and then it stays flat.
Training becomes perfect: my training accuracy reaches 1.0, my training loss goes extremely close to 0, and my training confusion matrix is perfect (224 TN, 217 TP, 0 FP, 0 FN).
Validation is almost perfect: my validation confusion matrix shows 47 TN, 47 TP, 1 FP, 0 FN.
So my only error is one false positive.
This matches my validation precision/recall being ~0.989.
- Important “sanity” observations (potential red flags)
My dataset is small (based on the confusion matrices):
Train size = 224 + 217 = 441
Val size = 47 + 1 + 47 = 95, With image models, results can look amazing on small splits even when real-world generalization is weaker.
Epoch 1–2 behavior: my val_precision = 0 while my val_accuracy ≈ 0.505.
That usually happens when my model predicts almost everything as the negative class early on (or produces no positive predictions at a 0.5 threshold).
Then it “snaps” into a good solution after a couple epochs.
Possible leakage / overly-easy split: getting 1.0 training accuracy with only 1 validation mistake is great, but it’s also a pattern I might see when:
near-duplicates exist across my train/val sets,
my split wasn’t stratified/grouped properly,
some preprocessing leaks label information,
or the task is simply extremely easy (strong visual shortcuts).
Model 2: (VGG-16 (Base))¶
# Standard VGG16 definition
vgg_model = VGG16(weights='imagenet', include_top=False, input_shape=(200, 200, 3))
vgg_model.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 58889256/58889256 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
Model: "vgg16"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ input_layer_1 (InputLayer) │ (None, 200, 200, 3) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block1_conv1 (Conv2D) │ (None, 200, 200, 64) │ 1,792 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block1_conv2 (Conv2D) │ (None, 200, 200, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block1_pool (MaxPooling2D) │ (None, 100, 100, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block2_conv1 (Conv2D) │ (None, 100, 100, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block2_conv2 (Conv2D) │ (None, 100, 100, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block2_pool (MaxPooling2D) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_conv1 (Conv2D) │ (None, 50, 50, 256) │ 295,168 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_conv2 (Conv2D) │ (None, 50, 50, 256) │ 590,080 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_conv3 (Conv2D) │ (None, 50, 50, 256) │ 590,080 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_pool (MaxPooling2D) │ (None, 25, 25, 256) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_conv1 (Conv2D) │ (None, 25, 25, 512) │ 1,180,160 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_conv2 (Conv2D) │ (None, 25, 25, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_conv3 (Conv2D) │ (None, 25, 25, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_pool (MaxPooling2D) │ (None, 12, 12, 512) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_conv1 (Conv2D) │ (None, 12, 12, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_conv2 (Conv2D) │ (None, 12, 12, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_conv3 (Conv2D) │ (None, 12, 12, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_pool (MaxPooling2D) │ (None, 6, 6, 512) │ 0 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 14,714,688 (56.13 MB)
Trainable params: 14,714,688 (56.13 MB)
Non-trainable params: 0 (0.00 B)
# Making all the layers of the VGG model non-trainable. i.e. freezing them
for layer in vgg_model.layers:
layer.trainable = False
model_2 = Sequential()
# Adding the convolutional part of the VGG16 model from above
model_2.add(vgg_model)
# Flattening the output of the VGG16 model
model_2.add(Flatten())
# Adding a dense output layer
model_2.add(Dense(1, activation='sigmoid'))
opt = Adam(learning_rate=0.0001) # 1. Define the learning rate
# Compile model
model_2.compile(optimizer=opt, loss=keras.losses.BinaryCrossentropy(), metrics=["accuracy"]) # 2. Define the metric
# Generating the summary of the model
model_2.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ vgg16 (Functional) │ (None, 6, 6, 512) │ 14,714,688 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_1 (Flatten) │ (None, 18432) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 1) │ 18,433 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 14,733,121 (56.20 MB)
Trainable params: 18,433 (72.00 KB)
Non-trainable params: 14,714,688 (56.13 MB)
train_datagen = ImageDataGenerator()
# Epochs
epochs = 20
# Batch size
batch_size = 128
history_2 = model_2.fit(train_datagen.flow(X_train_normalized,y_train,
batch_size=batch_size,
seed=42,
shuffle=False),
epochs=epochs,
steps_per_epoch=X_train_normalized.shape[0] // batch_size,
validation_data=(X_val_normalized,y_val),
verbose=1)
Epoch 1/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 217s 77s/step - accuracy: 0.6094 - loss: 0.6528 - val_accuracy: 0.7895 - val_loss: 0.5805 Epoch 2/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 107s 41s/step - accuracy: 0.8421 - loss: 0.5498 - val_accuracy: 0.8421 - val_loss: 0.5525 Epoch 3/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 230s 83s/step - accuracy: 0.8663 - loss: 0.5282 - val_accuracy: 0.9263 - val_loss: 0.4740 Epoch 4/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 20s/step - accuracy: 0.9453 - loss: 0.4617 - val_accuracy: 0.9474 - val_loss: 0.4495 Epoch 5/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 235s 60s/step - accuracy: 0.9664 - loss: 0.4197 - val_accuracy: 0.9789 - val_loss: 0.3839 Epoch 6/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 139s 41s/step - accuracy: 0.9844 - loss: 0.3694 - val_accuracy: 0.9789 - val_loss: 0.3642 Epoch 7/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 182s 60s/step - accuracy: 0.9929 - loss: 0.3429 - val_accuracy: 1.0000 - val_loss: 0.3126 Epoch 8/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 138s 41s/step - accuracy: 0.9922 - loss: 0.2939 - val_accuracy: 1.0000 - val_loss: 0.2976 Epoch 9/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 176s 76s/step - accuracy: 0.9955 - loss: 0.2666 - val_accuracy: 1.0000 - val_loss: 0.2582 Epoch 10/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 135s 41s/step - accuracy: 1.0000 - loss: 0.2415 - val_accuracy: 1.0000 - val_loss: 0.2466 Epoch 11/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 220s 81s/step - accuracy: 1.0000 - loss: 0.2208 - val_accuracy: 1.0000 - val_loss: 0.2161 Epoch 12/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 20s/step - accuracy: 1.0000 - loss: 0.2062 - val_accuracy: 1.0000 - val_loss: 0.2072 Epoch 13/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 205s 76s/step - accuracy: 1.0000 - loss: 0.1884 - val_accuracy: 1.0000 - val_loss: 0.1832 Epoch 14/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 112s 41s/step - accuracy: 1.0000 - loss: 0.1552 - val_accuracy: 1.0000 - val_loss: 0.1761 Epoch 15/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 178s 61s/step - accuracy: 1.0000 - loss: 0.1537 - val_accuracy: 1.0000 - val_loss: 0.1576 Epoch 16/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 96s 20s/step - accuracy: 1.0000 - loss: 0.1416 - val_accuracy: 1.0000 - val_loss: 0.1520 Epoch 17/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 174s 75s/step - accuracy: 1.0000 - loss: 0.1277 - val_accuracy: 1.0000 - val_loss: 0.1375 Epoch 18/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 106s 24s/step - accuracy: 1.0000 - loss: 0.1275 - val_accuracy: 1.0000 - val_loss: 0.1332 Epoch 19/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 221s 60s/step - accuracy: 1.0000 - loss: 0.1174 - val_accuracy: 1.0000 - val_loss: 0.1217 Epoch 20/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 20s/step - accuracy: 1.0000 - loss: 0.1057 - val_accuracy: 1.0000 - val_loss: 0.1183
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
model_2_train_perf = model_performance_classification(model_2,X_train_normalized,y_train)
print("Train performance metrics")
print(model_2_train_perf)
14/14 ━━━━━━━━━━━━━━━━━━━━ 204s 14s/step Train performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_2,X_train_normalized,y_train)
14/14 ━━━━━━━━━━━━━━━━━━━━ 191s 14s/step
model_2_valid_perf = model_performance_classification(model_2, X_val_normalized,y_val)
print("Validation performance metrics")
print(model_2_valid_perf)
3/3 ━━━━━━━━━━━━━━━━━━━━ 44s 15s/step Validation performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_2,X_val_normalized,y_val)
3/3 ━━━━━━━━━━━━━━━━━━━━ 41s 14s/step
Visualizing the prediction:¶
# The indices you want to inspect
indices = [2, 15, 24, 36]
plt.figure(figsize=(8, 8))
for i, idx in enumerate(indices):
# Plot image
plt.subplot(2, 2, i + 1)
plt.imshow(X_val[idx])
plt.axis("equal")
# Make prediction for the corresponding index
pred = model_2.predict(X_val_normalized[idx].reshape(1, 200, 200, 3))[0][0]
predicted_label = 1 if pred > 0.5 else 0
# True label
true_label = y_val.iloc[idx]
# Set title with prediction + truth
plt.title(f"Index {idx}\nPred: {predicted_label} | True: {true_label}")
plt.tight_layout()
plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 401ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 415ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 404ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 395ms/step
Observations:¶
- What I built (VGG16 transfer learning “base”)
I’m using VGG16 pretrained on ImageNet with include_top=False, with input shape (200, 200, 3).
I froze all VGG layers (layer.trainable = False), so the convolutional backbone acts as a fixed feature extractor.
My head is very simple: Flatten() → Dense(1, sigmoid).
For a 200×200 input, VGG16 outputs (6, 6, 512), so flattening gives me 18,432 features.
- Parameters & capacity
My VGG16 base has 14,714,688 parameters, but they’re all non-trainable because I froze the backbone.
The only trainable part is my final classifier: 18,432 → 1, which has 18,433 trainable parameters.
This is a nice setup for a small dataset: I’m not trying to learn millions of weights from only a few hundred images.
- Training behavior (from my logs/plot)
Starts reasonably: in epoch 1 my training accuracy is ~0.61 and my validation accuracy is ~0.79.
Improves steadily: my val_accuracy reaches 1.0 by around epoch 7, and then stays at 1.0.
Training catches up: my training accuracy reaches 1.0 around epoch 11.
Confusion matrices show perfect classification on both sets:
Train: 224 TN, 217 TP, 0 FP, 0 FN
Val: 48 TN, 47 TP, 0 FP, 0 FN
My sample predictions match the labels: the predictions I showed for the inspected indices align correctly with the true labels.
- Key observations / implications
What I’ve built is basically logistic regression on top of VGG features. Since only my last layer is learning, I’m essentially learning a linear decision boundary in the VGG feature space.
Getting perfect validation with ~95 validation images (48/47 split) is possible, but it’s also a “check for leakage / duplicates” moment—especially if:
frames from the same video/person/session appear in both my train and validation sets.
near-duplicate images exist.
I augmented my training data but accidentally reused originals in validation.
my split wasn’t grouped (e.g., by subject/person or by source folder).
Big picture
Compared to Model 1, this VGG16-frozen approach is more “statistically sensible” for small datasets (tiny trainable head), but the perfect val score still screams: “either the task is extremely easy or the split is too friendly.”
Model 3: (VGG-16 (Base + FFNN))¶
model_3 = Sequential()
# The convolutional part of the VGG16 model from above
model_3.add(vgg_model)
# Flattening the output of the VGG16 model
model_3.add(Flatten())
# Adding the Feed Forward neural network
# 256 is a common choice
model_3.add(Dense(256, activation='relu'))
model_3.add(Dropout(rate=0.5))
model_3.add(Dense(128, activation='relu'))
# Adding a dense output layer
model_3.add(Dense(1, activation='sigmoid'))
opt = Adam(learning_rate=0.0001)
# Compile model
model_3.compile(optimizer=opt, loss=keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
# Generating the summary of the model
model_3.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ vgg16 (Functional) │ (None, 6, 6, 512) │ 14,714,688 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_2 (Flatten) │ (None, 18432) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 256) │ 4,718,848 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 256) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 128) │ 32,896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 1) │ 129 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 19,466,561 (74.26 MB)
Trainable params: 4,751,873 (18.13 MB)
Non-trainable params: 14,714,688 (56.13 MB)
# standard batch size for VGG16 on most GPUs
batch_size = 128
history_3 = model_3.fit(train_datagen.flow(X_train_normalized, y_train,
batch_size=batch_size,
seed=42,
shuffle=False),
epochs=20, # 10 to 20 is usually sufficient for Transfer Learning
steps_per_epoch=X_train_normalized.shape[0] // batch_size,
validation_data=(X_val_normalized, y_val),
verbose=1)
Epoch 1/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 194s 83s/step - accuracy: 0.7566 - loss: 0.5448 - val_accuracy: 0.8632 - val_loss: 0.3350 Epoch 2/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 138s 41s/step - accuracy: 0.9531 - loss: 0.3109 - val_accuracy: 1.0000 - val_loss: 0.1914 Epoch 3/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 177s 76s/step - accuracy: 0.9676 - loss: 0.2000 - val_accuracy: 0.9895 - val_loss: 0.0584 Epoch 4/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 20s/step - accuracy: 0.9922 - loss: 0.0924 - val_accuracy: 1.0000 - val_loss: 0.0422 Epoch 5/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 187s 67s/step - accuracy: 0.9951 - loss: 0.0561 - val_accuracy: 1.0000 - val_loss: 0.0207 Epoch 6/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 97s 22s/step - accuracy: 1.0000 - loss: 0.0250 - val_accuracy: 1.0000 - val_loss: 0.0172 Epoch 7/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 176s 60s/step - accuracy: 1.0000 - loss: 0.0215 - val_accuracy: 1.0000 - val_loss: 0.0113 Epoch 8/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 96s 21s/step - accuracy: 1.0000 - loss: 0.0128 - val_accuracy: 1.0000 - val_loss: 0.0100 Epoch 9/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 184s 65s/step - accuracy: 1.0000 - loss: 0.0112 - val_accuracy: 1.0000 - val_loss: 0.0073 Epoch 10/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 101s 20s/step - accuracy: 1.0000 - loss: 0.0075 - val_accuracy: 1.0000 - val_loss: 0.0068 Epoch 11/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 214s 62s/step - accuracy: 1.0000 - loss: 0.0057 - val_accuracy: 1.0000 - val_loss: 0.0060 Epoch 12/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 21s/step - accuracy: 1.0000 - loss: 0.0047 - val_accuracy: 1.0000 - val_loss: 0.0060 Epoch 13/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 215s 81s/step - accuracy: 1.0000 - loss: 0.0030 - val_accuracy: 1.0000 - val_loss: 0.0060 Epoch 14/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 140s 41s/step - accuracy: 0.9922 - loss: 0.0085 - val_accuracy: 1.0000 - val_loss: 0.0058 Epoch 15/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 216s 81s/step - accuracy: 0.9984 - loss: 0.0043 - val_accuracy: 1.0000 - val_loss: 0.0049 Epoch 16/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 135s 41s/step - accuracy: 1.0000 - loss: 0.0024 - val_accuracy: 1.0000 - val_loss: 0.0045 Epoch 17/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 219s 79s/step - accuracy: 1.0000 - loss: 0.0032 - val_accuracy: 1.0000 - val_loss: 0.0039 Epoch 18/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 106s 41s/step - accuracy: 1.0000 - loss: 0.0016 - val_accuracy: 1.0000 - val_loss: 0.0037 Epoch 19/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 216s 95s/step - accuracy: 1.0000 - loss: 0.0014 - val_accuracy: 1.0000 - val_loss: 0.0034 Epoch 20/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 97s 20s/step - accuracy: 1.0000 - loss: 0.0016 - val_accuracy: 1.0000 - val_loss: 0.0033
plt.plot(history_3.history['accuracy'])
plt.plot(history_3.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
model_3_train_perf = model_performance_classification(model_3,X_train_normalized,y_train)
print("Train performance metrics")
print(model_3_train_perf)
14/14 ━━━━━━━━━━━━━━━━━━━━ 192s 14s/step Train performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_3,X_train_normalized,y_train)
14/14 ━━━━━━━━━━━━━━━━━━━━ 191s 14s/step
model_3_valid_perf = model_performance_classification(model_3, X_val_normalized,y_val)
print("Validation performance metrics")
print(model_3_valid_perf)
3/3 ━━━━━━━━━━━━━━━━━━━━ 41s 14s/step Validation performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_3,X_val_normalized,y_val)
3/3 ━━━━━━━━━━━━━━━━━━━━ 42s 14s/step
Visualizing the predictions¶
# The indices you want to inspect
indices = [7, 30, 61, 87]
plt.figure(figsize=(8, 8))
for i, idx in enumerate(indices):
# Plot image
plt.subplot(2, 2, i + 1)
plt.imshow(X_val[idx])
plt.axis("equal")
# Make prediction for the corresponding index
pred = model_3.predict(X_val_normalized[idx].reshape(1, 200, 200, 3))[0][0]
predicted_label = 1 if pred > 0.5 else 0
# True label
true_label = y_val.iloc[idx]
# Set title with prediction + truth
plt.title(f"Index {idx}\nPred: {predicted_label} | True: {true_label}")
plt.tight_layout()
plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 419ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 423ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 411ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 474ms/step
Observations:¶
- What I built (VGG16 base + a bigger FFNN head)
I’m using the same VGG16 ImageNet backbone (include_top=False, output (6, 6, 512)), and it looks like I’m still keeping it frozen (non-trainable params: 14,714,688).
My head is now a real MLP:
Flatten() (18,432 features)
Dense(256, relu) (+4,718,848 params) ← this is the big jump
Dropout(0.5)
Dense(128, relu) (+32,896 params)
Dense(1, sigmoid) (+129 params)
- Capacity note
Total params: 19,466,561
Trainable params: 4,751,873 (all in the head)
This is a lot of trainable capacity for your dataset size (~441 train images).
- Training behavior (from my logs/plot)
I learn extremely fast:
Epoch 1: my training accuracy is ~0.76 and my validation accuracy is ~0.86
Epoch 2: my validation accuracy hits 1.0
By around epoch 6 onward, both my training and validation accuracy are basically 1.0, with tiny losses.
My curve shows a small wobble where training accuracy dips slightly in later epochs (very likely dropout noise), but my validation accuracy stays pinned at 1.0.
My confusion matrices are perfect again:
Train: 224 TN, 217 TP, 0 FP, 0 FN
Val: 48 TN, 47 TP, 0 FP, 0 FN
My “visualized predictions” examples are correct.
- Observations / interpretation
This head is probably overkill for my dataset. Because I flattened 18,432 features and then used 256 units, I created a very expressive classifier with millions of weights.
Even with Dropout(0.5), I still get perfect performance—so either:
my classes are highly separable in VGG feature space, or
my validation set is “too easy” (leakage, near-duplicates, same identity/source across splits, etc.).
Compared to Model 2 (VGG16 + single sigmoid):
I didn’t gain anything on validation (Model 2 was already perfect).
I increased training complexity and risk (more parameters, more overfitting potential, and more compute/memory).
Bottom line
Model 3 is a “bigger hammer” than Model 2. Since both are perfect on your current validation, Model 3’s main effect is raising overfitting risk and compute cost—not boosting measured performance. The next step isn’t making the model larger; it’s making the evaluation stricter to confirm the result is real.
Model 4: (VGG-16 (Base + FFNN + Data Augmentation)¶
In most of the real-world case studies, it is challenging to acquire a large number of images and then train CNNs.
To overcome this problem, one approach we might consider is Data Augmentation.
CNNs have the property of translational invariance, which means they can recognise an object even if its appearance shifts translationally in some way. - Taking this attribute into account, we can augment the images using the techniques listed below
- Horizontal Flip (should be set to True/False)
- Vertical Flip (should be set to True/False)
- Height Shift (should be between 0 and 1)
- Width Shift (should be between 0 and 1)
- Rotation (should be between 0 and 180)
- Shear (should be between 0 and 1)
- Zoom (should be between 0 and 1) etc.
Remember, data augmentation should not be used in the validation/test data set.
model_4 = Sequential()
# Adding the convolutional part of the VGG16 model from above
model_4.add(vgg_model)
# Flattening the output of the VGG16 model
model_4.add(Flatten())
# Adding the Feed Forward neural network
model_4.add(Dense(256, activation='relu'))
model_4.add(Dropout(rate=0.5))
model_4.add(Dense(128, activation='relu'))
# Adding a dense output layer
# Assuming Binary Classification based on your previous use of BinaryCrossentropy
model_4.add(Dense(1, activation='sigmoid'))
# A slightly lower learning rate is often better for model_4 (iterations)
opt = Adam(learning_rate=0.0001)
# Compile model
model_4.compile(optimizer=opt, loss=keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
# Generating the summary of the model
model_4.summary()
Model: "sequential_3"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ vgg16 (Functional) │ (None, 6, 6, 512) │ 14,714,688 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_3 (Flatten) │ (None, 18432) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_6 (Dense) │ (None, 256) │ 4,718,848 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_1 (Dropout) │ (None, 256) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_7 (Dense) │ (None, 128) │ 32,896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_8 (Dense) │ (None, 1) │ 129 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 19,466,561 (74.26 MB)
Trainable params: 4,751,873 (18.13 MB)
Non-trainable params: 14,714,688 (56.13 MB)
# Applying data augmentation
train_datagen = ImageDataGenerator(
rotation_range=30, # Rotates the image randomly up to 30 degrees
fill_mode='nearest',
width_shift_range=0.2, # Shifts the image horizontally by 20%
height_shift_range=0.2, # Shifts the image vertically by 20%
shear_range=0.2, # 'Slants' the image by 20%
zoom_range=0.2 # Zooms in or out by 20%
)
batch_size = 128 # Define this variable first so steps_per_epoch can use it
history_4 = model_4.fit(train_datagen.flow(X_train_normalized, y_train,
batch_size=batch_size,
seed=42,
shuffle=False),
epochs=epochs,
steps_per_epoch=X_train_normalized.shape[0] // batch_size,
validation_data=(X_val_normalized, y_val),
verbose=1)
Epoch 1/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 235s 101s/step - accuracy: 0.5643 - loss: 0.6694 - val_accuracy: 0.9158 - val_loss: 0.4046 Epoch 2/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 20s/step - accuracy: 0.8984 - loss: 0.4278 - val_accuracy: 0.9895 - val_loss: 0.3049 Epoch 3/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 273s 83s/step - accuracy: 0.9483 - loss: 0.3297 - val_accuracy: 1.0000 - val_loss: 0.1417 Epoch 4/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 98s 20s/step - accuracy: 0.9922 - loss: 0.1798 - val_accuracy: 1.0000 - val_loss: 0.1193 Epoch 5/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 185s 80s/step - accuracy: 0.9968 - loss: 0.1314 - val_accuracy: 1.0000 - val_loss: 0.0635 Epoch 6/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 95s 21s/step - accuracy: 0.9922 - loss: 0.0954 - val_accuracy: 1.0000 - val_loss: 0.0479 Epoch 7/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 224s 77s/step - accuracy: 1.0000 - loss: 0.0592 - val_accuracy: 1.0000 - val_loss: 0.0249 Epoch 8/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 138s 41s/step - accuracy: 1.0000 - loss: 0.0361 - val_accuracy: 1.0000 - val_loss: 0.0212 Epoch 9/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 182s 62s/step - accuracy: 0.9971 - loss: 0.0384 - val_accuracy: 1.0000 - val_loss: 0.0144 Epoch 10/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 99s 22s/step - accuracy: 0.9922 - loss: 0.0416 - val_accuracy: 1.0000 - val_loss: 0.0130 Epoch 11/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 220s 97s/step - accuracy: 1.0000 - loss: 0.0212 - val_accuracy: 1.0000 - val_loss: 0.0100 Epoch 12/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 96s 21s/step - accuracy: 1.0000 - loss: 0.0236 - val_accuracy: 1.0000 - val_loss: 0.0092 Epoch 13/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 182s 78s/step - accuracy: 0.9927 - loss: 0.0229 - val_accuracy: 1.0000 - val_loss: 0.0073 Epoch 14/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 138s 41s/step - accuracy: 1.0000 - loss: 0.0113 - val_accuracy: 1.0000 - val_loss: 0.0070 Epoch 15/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 222s 97s/step - accuracy: 0.9897 - loss: 0.0281 - val_accuracy: 1.0000 - val_loss: 0.0071 Epoch 16/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 138s 41s/step - accuracy: 0.9922 - loss: 0.0131 - val_accuracy: 1.0000 - val_loss: 0.0072 Epoch 17/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 181s 77s/step - accuracy: 0.9971 - loss: 0.0103 - val_accuracy: 1.0000 - val_loss: 0.0075 Epoch 18/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 139s 41s/step - accuracy: 1.0000 - loss: 0.0069 - val_accuracy: 1.0000 - val_loss: 0.0074 Epoch 19/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 182s 77s/step - accuracy: 0.9971 - loss: 0.0116 - val_accuracy: 1.0000 - val_loss: 0.0062 Epoch 20/20 3/3 ━━━━━━━━━━━━━━━━━━━━ 100s 20s/step - accuracy: 1.0000 - loss: 0.0045 - val_accuracy: 1.0000 - val_loss: 0.0055
plt.plot(history_4.history['accuracy'])
plt.plot(history_4.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
model_4_train_perf = model_performance_classification(model_4,X_train_normalized,y_train)
print("Train performance metrics")
print(model_4_train_perf)
14/14 ━━━━━━━━━━━━━━━━━━━━ 202s 14s/step Train performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_4,X_train_normalized,y_train)
14/14 ━━━━━━━━━━━━━━━━━━━━ 202s 14s/step
model_4_valid_perf = model_performance_classification(model_4, X_val_normalized,y_val)
print("Validation performance metrics")
print(model_4_valid_perf)
3/3 ━━━━━━━━━━━━━━━━━━━━ 42s 14s/step Validation performance metrics Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(model_4,X_val_normalized,y_val)
3/3 ━━━━━━━━━━━━━━━━━━━━ 41s 13s/step
Visualizing the predictions¶
# The indices you want to inspect
indices = [18, 27, 46, 57]
plt.figure(figsize=(8, 8))
for i, idx in enumerate(indices):
# Plot image
plt.subplot(2, 2, i + 1)
plt.imshow(X_val[idx])
plt.axis("equal")
# Make prediction for the corresponding index
pred = model_4.predict(X_val_normalized[idx].reshape(1, 200, 200, 3))[0][0]
predicted_label = 1 if pred > 0.5 else 0
# True label
true_label = y_val.iloc[idx]
# Set title with prediction + truth
plt.title(f"Index {idx}\nPred: {predicted_label} | True: {true_label}")
plt.tight_layout()
plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 710ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 576ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 409ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 424ms/step
Observations:¶
Here are the main observations I have for Model 4 (VGG16 base + FFNN + Data Augmentation) from what I’ve shown:
- Architecture / capacity
I’m using VGG16 as a frozen feature extractor (non-trainable params: 14,714,688), plus a small FFNN head: Flatten (18,432) → Dense(256) → Dropout(0.5) → Dense(128) → Dense(1, sigmoid)
My trainable params are ~4.75M, mostly from the Flatten → Dense(256) connection (a very large weight matrix). That’s high capacity for a small dataset.
- Augmentation is doing its job (training becomes “harder”)
With augmentation (rotation=30, width/height shift=0.2, shear=0.2, zoom=0.2), my epoch-1 training accuracy is low ~0.56 while my validation accuracy is already high ~0.92.
That big early gap makes sense: I’m training on harder, distorted images, while validation images are “clean.”
- But the results are too perfect
I end up with train accuracy = 1.0 and validation accuracy/precision/recall/F1 = 1.0, with perfect confusion matrices (0 misclassifications) for both train and validation.
In real-world image classification, this usually means one of the following is true:
my dataset is extremely easy (strong visual cues, controlled environment), or
there’s data leakage / near-duplicates between train and validation (same people/backgrounds/images, or frames from the same source split across sets), or
my validation set is very small (mine looks like ~95 images total: 48 + 47), so it’s easier to hit 100%.
- Training stability
- After convergence, my training accuracy fluctuates slightly around ~0.99–1.0 (normal with augmentation + dropout), while my validation accuracy stays pinned at 1.0 almost the whole time.
Model Performance Comparison and Final Model Selection¶
# training performance comparison
models_train_comp_df = pd.concat(
[
model_1_train_perf.T,
model_2_train_perf.T,
model_3_train_perf.T,
model_4_train_perf.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Simple Convolutional Neural Network (CNN)","VGG-16 (Base)","VGG-16 (Base+FFNN)","VGG-16 (Base+FFNN+Data Aug)"
]
models_valid_comp_df = pd.concat(
[
model_1_valid_perf.T,
model_2_valid_perf.T,
model_3_valid_perf.T,
model_4_valid_perf.T
],
axis=1,
)
models_valid_comp_df.columns = [
"Simple Convolutional Neural Network (CNN)","VGG-16 (Base)","VGG-16 (Base+FFNN)","VGG-16 (Base+FFNN+Data Aug)"
]
models_train_comp_df
| Simple Convolutional Neural Network (CNN) | VGG-16 (Base) | VGG-16 (Base+FFNN) | VGG-16 (Base+FFNN+Data Aug) | |
|---|---|---|---|---|
| Accuracy | 1.0 | 1.0 | 1.0 | 1.0 |
| Recall | 1.0 | 1.0 | 1.0 | 1.0 |
| Precision | 1.0 | 1.0 | 1.0 | 1.0 |
| F1 Score | 1.0 | 1.0 | 1.0 | 1.0 |
models_valid_comp_df
| Simple Convolutional Neural Network (CNN) | VGG-16 (Base) | VGG-16 (Base+FFNN) | VGG-16 (Base+FFNN+Data Aug) | |
|---|---|---|---|---|
| Accuracy | 0.989474 | 1.0 | 1.0 | 1.0 |
| Recall | 0.989474 | 1.0 | 1.0 | 1.0 |
| Precision | 0.989693 | 1.0 | 1.0 | 1.0 |
| F1 Score | 0.989474 | 1.0 | 1.0 | 1.0 |
models_train_comp_df - models_valid_comp_df
| Simple Convolutional Neural Network (CNN) | VGG-16 (Base) | VGG-16 (Base+FFNN) | VGG-16 (Base+FFNN+Data Aug) | |
|---|---|---|---|---|
| Accuracy | 0.010526 | 0.0 | 0.0 | 0.0 |
| Recall | 0.010526 | 0.0 | 0.0 | 0.0 |
| Precision | 0.010307 | 0.0 | 0.0 | 0.0 |
| F1 Score | 0.010526 | 0.0 | 0.0 | 0.0 |
Observations:¶
- Model 1 — Simple Convolutional Neural Network (CNN)
Performance: In my comparison table, my training metrics are all 1.0 (accuracy/precision/recall/F1), but my validation is about 0.989 (accuracy/recall/F1 ≈ 0.989474, precision ≈ 0.989693).
What that suggests:
This is the only model I have that doesn’t hit perfect validation, so it’s behaving more “normally.”
The gap between my training (1.0) and validation (~0.989) suggests mild overfitting, or simply that I’m less powerful than the pretrained VGG feature extractor.
Interpretation: Compared to VGG16 transfer learning, a simple CNN like mine usually needs more data to reach the same performance, so it makes sense that I’m slightly worse on validation.
- Model 2 — VGG-16 (Base) + Flatten + 1 sigmoid
Architecture: I’m using a frozen VGG16 conv base (14.7M non-trainable params) plus a single trainable Dense(1). My trainable parameters are tiny (~18k), so this is basically a linear classifier on top of pretrained features.
Training behavior: My accuracy jumps to ~1.0 within a few epochs, and my val_accuracy also becomes 1.0.
Takeaway: This is a strong baseline for me. Because my head is so small, it’s less likely to overfit than Models 3/4 in theory.
- Model 3 — VGG-16 (Base) + FFNN
Architecture: I’m using frozen VGG16 + Flatten → Dense(256) → Dropout(0.5) → Dense(128) → Dense(1).
Trainable params: I have ~4.75M trainable parameters (a big jump vs Model 2), mostly from Flatten(18,432) → Dense(256), which alone is ~4.7M weights.
Training behavior: I hit val_accuracy = 1.0 extremely fast (around epoch 2–3) and then stay there.
Takeaway: This is much higher capacity than I likely need (given the results). If my dataset is small, this head can memorize easily—even with dropout.
- Model 4 — Model 3 + Data Augmentation
Augmentation: I’m using rotation 30°, width/height shift 0.2, shear 0.2, and zoom 0.2 (train only—which is good).
Training behavior: My training accuracy starts lower (which is expected with augmentation), but my validation still becomes 1.0 quickly and stays perfect.
Takeaway: Augmentation should improve my robustness, but since all my metrics are already perfect, it isn’t proving anything new yet.
- How Model 1 fits vs Models 2–4
Model 1 looks realistic: near-perfect but not perfect on validation.
Models 2–4 being perfect on validation (1.0 across metrics + perfect confusion matrices) is what triggers the “possible leakage / too-easy val set / duplicates” warning.
- Practical takeaway (with all 4 models)
If everything is correct (no leakage), Models 2–4 clearly outperform Model 1.
If I suspect leakage/duplicates, Model 1’s results might actually be the most trustworthy signal until you validate the split with a clean test set.
The best model
Based on the results I showed, the best choice for my project is Model 4 (VGG16 Base + FFNN + Data Augmentation) — with one important condition:
✅ If my train/val split is clean (no leakage/duplicates)
I should pick Model 4 because:
it matches the top validation performance (1.0, like Models 2 & 3), and
data augmentation is my best bet for real-world robustness on slightly different images (lighting changes, rotations, shifts, zoom, etc.).
⚠️ But… my results are “too perfect”
Models 2–4 all getting 1.0 accuracy/precision/recall/F1 on validation, plus zero errors in the confusion matrices, is unusual—especially with a relatively small validation set. That often happens when:
my validation set is too easy, or
there are near-duplicate images across train and validation, or
there’s data leakage (same subject/frame, filenames, preprocessing signals, etc.).
Best answer for my submission/report
Selected model: Model 4 (VGG16 + FFNN + Data Aug)
Reason: Best expected generalization due to augmentation while maintaining top validation performance.
Test Performance¶
# Replace 'your_best_model_variable' with the actual name of your model (e.g., model_1, rf_model, etc.)
model_test_perf = model_performance_classification(model_4, X_test_normalized, y_test)
3/3 ━━━━━━━━━━━━━━━━━━━━ 42s 14s/step
model_test_perf
| Accuracy | Recall | Precision | F1 Score | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
plot_confusion_matrix(model_4, X_test_normalized,y_test)
3/3 ━━━━━━━━━━━━━━━━━━━━ 40s 13s/step
Actionable Insights & Recommendations¶
Actionable Insights¶
- Model 1 (CNN) is the only model that’s “imperfect” on validation (~0.989).
This is normal behavior and suggests I’m learning useful patterns, but I’m more sensitive to dataset size and variation.
Models 2–4 (VGG16 transfer learning models) all hit perfect validation (1.0 across metrics). That means either:
my problem is highly separable with pretrained features, or
my validation split is too easy / has leakage / contains near-duplicates.
Because Model 2 (tiny head) already gets 1.0, Model 3’s large dense head isn’t adding measurable value on my current split.
Model 4 is the most “deployment-friendly” choice if I confirm my evaluation is clean. Augmentation usually improves robustness to real-world shifts (angles, zoom, position, lighting).
Recommendations¶
- Lock a proper final evaluation (most important)
I should create a true held-out test set (10–20%) that I do not touch until the end.
If my data includes repeated sources, I should use a grouped split so similar images can’t appear in both train and val/test.
Then I should run all 4 models on this test set and pick the winner based on that.
If Model 4 stays best → it becomes my final model.
If everything is perfect again → my dataset is likely too easy or I still have leakage.
- Check for duplicates / leakage quickly
I should make sure I didn’t split after augmentation or accidentally reuse the same arrays.
If I have filenames:
- I should verify no identical (or near-identical) filenames appear in both train and validation.
If images come from videos:
- I should make sure frames from the same video aren’t split across train and val.
- Choose the simplest model that meets the goal
Given my current validation tie:
I should prefer Model 2 or Model 4 over Model 3.
Model 2 = simplest, fastest, least overfitting risk (tiny trainable head).
Model 4 = best robustness potential (augmentation) if my deployment images vary.
I should avoid Model 3 unless my real held-out test set shows it consistently beats Model 2/4—right now it’s extra complexity with no gain.
- Make my architecture more robust (easy upgrades)
I should replace Flatten() with GlobalAveragePooling2D() (big win):
- fewer parameters, less overfitting, better generalization.
I should make sure I’m using VGG16 preprocessing correctly:
- tf.keras.applications.vgg16.preprocess_input
I should add callbacks:
EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True)
ReduceLROnPlateau(monitor="val_loss", factor=0.2, patience=2)
- Improve reporting & decision-making metrics
Along with accuracy/F1, I should:
plot ROC-AUC / PR-AUC
show confidence distributions (histogram of predicted probabilities)
evaluate different thresholds (0.3, 0.5, 0.7) if false positives vs false negatives have different costs.
What I’d do as my “final plan”¶
I’ll build a leakage-safe test split (grouped if needed).
I’ll evaluate Model 2 vs Model 4 on that test split.
If they’re tied: I’ll choose Model 2 (simpler) unless I know my deployment images vary a lot—then I’ll choose Model 4.
I’ll also update the model to use GlobalAveragePooling2D + early stopping.
Power Ahead! ___