# In-Class Exercise: Neural Networks with Scikit-Learn

**Time: 10 minutes**

In this exercise, you'll practice building and evaluating Multi-Layer Perceptrons using scikit-learn's `MLPClassifier` and `MLPRegressor`.

## Learning Objectives

- Build an MLP classifier for a multi-class problem
- Preprocess data appropriately for neural networks
- Evaluate model performance
- Compare different architectures


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')


## Part 1: Wine Classification (5 minutes)

We'll use the Wine dataset, which contains chemical analysis of wines from three different cultivars.

### Task 1.1: Load and split the data


In [None]:
# Load the wine dataset
wine = load_wine()
X = wine.data
y = wine.target

print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Feature names: {wine.feature_names[:3]}...")  # First 3 features


In [None]:
# TODO: Split the data into training (70%) and test (30%) sets
# Use random_state=42 for reproducibility
X_train, X_test, y_train, y_test = train_test_split(
    # YOUR CODE HERE
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")


### Task 1.2: Scale the features

Neural networks work best with normalized data. Use `StandardScaler` to scale the features.


In [None]:
# TODO: Create a StandardScaler and scale both training and test data
scaler = StandardScaler()
X_train_scaled = # YOUR CODE HERE
X_test_scaled = # YOUR CODE HERE

print(f"Original feature range: [{X_train.min():.2f}, {X_train.max():.2f}]")
print(f"Scaled feature range: [{X_train_scaled.min():.2f}, {X_train_scaled.max():.2f}]")


### Task 1.3: Build and train an MLP

Create an `MLPClassifier` with:
- Two hidden layers with 20 and 10 neurons
- ReLU activation
- Adam solver
- max_iter=500
- random_state=42


In [None]:
# TODO: Create and train the MLPClassifier
mlp = MLPClassifier(
    # YOUR CODE HERE - specify the parameters
)

# Train the model
mlp.fit(X_train_scaled, y_train)
print("Training complete!")


### Task 1.4: Evaluate the model


In [None]:
# TODO: Make predictions on the test set
y_pred = # YOUR CODE HERE

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy:.4f}")

# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))


### Task 1.5: Visualize the loss curve


In [None]:
# Plot the training loss curve
plt.figure(figsize=(10, 5))
plt.plot(mlp.loss_curve_)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Training Loss Curve')
plt.grid(True)
plt.show()

print(f"Number of iterations: {mlp.n_iter_}")


## Part 2: Architecture Comparison (5 minutes)

Now let's compare different neural network architectures!

### Task 2.1: Compare different architectures

Train MLPs with different hidden layer configurations and compare their performance.


In [None]:
# Define different architectures to test
architectures = {
    'Single Layer (50)': (50,),
    'Single Layer (100)': (100,),
    'Two Layers (20, 10)': (20, 10),
    'Two Layers (50, 25)': (50, 25),
    'Three Layers (30, 20, 10)': (30, 20, 10)
}

results = {}

# TODO: Train each architecture and store its test accuracy
for name, hidden_layers in architectures.items():
    mlp = MLPClassifier(
        hidden_layer_sizes=hidden_layers,
        activation='relu',
        solver='adam',
        max_iter=500,
        random_state=42,
        verbose=False
    )
    
    # YOUR CODE HERE: Fit the model and calculate test accuracy
    mlp.fit(X_train_scaled, y_train)
    accuracy = mlp.score(X_test_scaled, y_test)
    
    results[name] = accuracy
    print(f"{name:30s}: {accuracy:.4f}")


### Task 2.2: Visualize the comparison


In [None]:
# Create a bar plot to compare the architectures
plt.figure(figsize=(12, 6))
plt.bar(results.keys(), results.values(), color='steelblue')
plt.xlabel('Architecture')
plt.ylabel('Test Accuracy')
plt.title('Performance Comparison of Different MLP Architectures')
plt.xticks(rotation=45, ha='right')
plt.ylim([0.85, 1.0])  # Adjust as needed
plt.grid(True, axis='y', alpha=0.3)
plt.tight_layout()
plt.show()


## Bonus Challenge (Optional)

If you finish early, try this additional task:

### Bonus: Add early stopping

Modify your best architecture to use early stopping and see if it improves performance or training time.


In [None]:
# TODO: Create an MLP with early stopping
mlp_early = MLPClassifier(
    hidden_layer_sizes=(50, 25),
    activation='relu',
    solver='adam',
    max_iter=1000,
    early_stopping=True,
    validation_fraction=0.1,
    n_iter_no_change=10,
    random_state=42,
    verbose=False
)

mlp_early.fit(X_train_scaled, y_train)
accuracy_early = mlp_early.score(X_test_scaled, y_test)

print(f"Accuracy with early stopping: {accuracy_early:.4f}")
print(f"Training stopped at iteration: {mlp_early.n_iter_}")
print(f"Best validation score: {mlp_early.best_validation_score_:.4f}")


## Discussion Questions

1. **Which architecture performed best on the Wine dataset? Why do you think that is?**
   
   _Your answer here_

2. **Did you notice any architectures that were too simple or too complex?**
   
   _Your answer here_

3. **How did early stopping affect the training? Did it help or hurt performance?**
   
   _Your answer here_

4. **When would you choose scikit-learn's MLP over PyTorch for a real project?**
   
   _Your answer here_

## Key Takeaways

- Always **scale/normalize** your data before training neural networks
- **Simpler architectures** often work well for smaller datasets
- **Early stopping** can prevent overfitting and save training time
- Scikit-learn makes it easy to **experiment** with different architectures
- Use **cross-validation** and **grid search** to find optimal hyperparameters
