Introduction & Setup

State the notebook's purpose: perform in-depth analysis of model performance and extract actionable insights.
Import libraries: pandas, numpy, matplotlib.pyplot, seaborn, sklearn.metrics, tensorflow.
Load the trained model and its training history.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
from keras._tf_keras.keras.models import load_model
from keras._tf_keras.keras.preprocessing.image import ImageDataGenerator
import os

# Load necessary data
PROCESSED_DATA_DIR = 'data/processed'
TEST_DIR = os.path.join(PROCESSED_DATA_DIR, 'test')
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

# Load trained model (ensure this path is correct after 2.0 notebook)
model = load_model('trained_models/best_waste_classifier.h5')
training_history_df = pd.read_csv('trained_models/training_history.csv')

# Re-create test generator to ensure consistency with evaluations
val_test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = val_test_datagen.flow_from_directory(
    TEST_DIR,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    shuffle=False
)
CLASS_NAMES = list(test_generator.class_indices.keys())

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = 'trained_models/best_waste_classifier.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Review Training History

Re-plot training/validation accuracy and loss curves for quick reference and to confirm stability.
Discuss signs of overfitting/underfitting, if any, and how they were (or could be) addressed.

In [None]:
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(training_history_df['accuracy'], label='Train Accuracy')
plt.plot(training_history_df['val_accuracy'], label='Validation Accuracy')
plt.title('Training & Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(training_history_df['loss'], label='Train Loss')
plt.plot(training_history_df['val_loss'], label='Validation Loss')
plt.title('Training & Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()

Detailed Performance Metrics

Recalculate and display the classification report and confusion matrix using the test set. These are foundational for deeper analysis.
Discuss precision, recall, and F1-score per class in detail. What do these metrics tell you about the model's behavior for specific waste types?

In [None]:
# Ensure generator is reset and get predictions
test_generator.reset()
y_pred_probs = model.predict(test_generator)
y_pred_classes = np.argmax(y_pred_probs, axis=1)
y_true_classes = test_generator.classes

print("Classification Report:")
report = classification_report(y_true_classes, y_pred_classes, target_names=CLASS_NAMES, output_dict=True)
report_df = pd.DataFrame(report).transpose()
print(report_df)

cm = confusion_matrix(y_true_classes, y_pred_classes)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=CLASS_NAMES, yticklabels=CLASS_NAMES)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Misclassification Analysis

Identify the most confused classes: From the confusion matrix, pinpoint pairs of classes that are frequently misclassified (e.g., metal often predicted as plastic, or vice-versa).
Visualize misclassified examples: Show images where the model made errors. This helps to understand why it made mistakes (e.g., poor image quality, similar appearance, background noise).
Confidence analysis: Look at the prediction probabilities for misclassified items. Did the model make a confident wrong prediction or was it uncertain?

In [None]:
# Get misclassified indices
misclassified_indices = np.where(y_true_classes != y_pred_classes)[0]

if len(misclassified_indices) > 0:
    print("\nAnalyzing Misclassified Images:")
    plt.figure(figsize=(15, 12))
    for i, idx in enumerate(misclassified_indices[:15]): # Show up to 15 misclassified images
        img_path = test_generator.filepaths[idx]
        true_label = CLASS_NAMES[y_true_classes[idx]]
        predicted_label = CLASS_NAMES[y_pred_classes[idx]]
        confidence = y_pred_probs[idx][y_pred_classes[idx]] * 100 # Confidence of predicted class

        img = plt.imread(img_path)
        plt.subplot(3, 5, i + 1) # Adjust subplot grid based on number of images to show
        plt.imshow(img)
        plt.title(f"True: {true_label}\nPred: {predicted_label} ({confidence:.1f}%)",
                  color='red' if true_label != predicted_label else 'green', fontsize=10)
        plt.axis('off')
    plt.tight_layout()
    plt.show()

    # You can also filter for specific confused pairs
    # For example, to see cardboard misclassified as paper
    cardboard_as_paper_indices = [
        i for i in misclassified_indices
        if CLASS_NAMES[y_true_classes[i]] == 'cardboard' and CLASS_NAMES[y_pred_classes[i]] == 'paper'
    ]
    # Display these specific misclassifications as well

Data Insights & Actionable Recommendations

Impact of class imbalance: If you found an imbalance in Notebook 1.0, discuss how it might affect per-class performance (e.g., lower recall for minority classes). Suggest strategies (data augmentation, weighted loss, over/undersampling).
Dataset Limitations: Based on misclassifications, identify limitations of the current dataset (e.g., lack of diverse lighting, occlusions, too many similar backgrounds).
Potential Improvements:
Data Augmentation: Which augmentations might specifically help with identified issues?
Model Architecture: Could a different pre-trained model or fine-tuning strategy improve things?
Preprocessing: Any further steps (e.g., background removal, stronger contrast enhancement) that could help?
Error Correction: For a real-world system, how would these errors be handled? (e.g., human review for uncertain predictions).
Real-world implications: How do these insights translate to the actual deployment of an automated waste sorting system?

In [None]:
print("\nActionable Insights & Recommendations:")
print("- **Class Imbalance:** While TrashNet is relatively balanced, consider how imbalances in real-world data might impact sorting efficiency. Strategies like data augmentation tailored to underrepresented classes or weighted loss functions during training can mitigate this.")
print("- **Visual Ambiguity:** The confusion between 'metal' and 'trash' or 'paper' and 'cardboard' suggests visual ambiguity. Incorporating multi-modal data (e.g., weight, material properties) or more advanced image processing to highlight textures/shapes could improve this.")
print("- **Robustness to Conditions:** The model's difficulty with poor lighting or occlusions indicates a need for more diverse training data that simulates real-world collection environments. Augmenting with various lighting conditions or even synthetic data could be beneficial.")
print("- **Continuous Learning:** In a deployed system, a feedback loop where misclassified items are reviewed and added to the training data can lead to continuous improvement.")

Conclusion & Next Steps

Summarize the key insights gained from the analysis.
Outline concrete next steps or future work based on these insights, connecting back to the README.md's "Future Work" section.