# Notebook 9: Scikit-Learn Basics - Your First Machine Learning Models

Welcome to your ninth Python notebook! Now you'll take the next step and build your first machine learning models using scikit-learn, the most popular ML library in Python.

**Learning Objectives:**
- Understand the machine learning workflow
- Load and explore real datasets
- Build classification and regression models
- Evaluate model performance
- Apply ML to solve real-world problems

**Prerequisites:** You should have completed notebooks 1-8, especially NumPy, Pandas, and Matplotlib.

## Essential Imports for Machine Learning

Every ML project starts with these imports:

In [None]:
# The "Big 4" for machine learning
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Scikit-learn imports
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
from sklearn.datasets import load_iris, load_boston

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
%matplotlib inline

print("🚀 All libraries imported successfully!")
print("Ready to build your first ML models!")

## The Machine Learning Workflow

Every ML project follows these steps:

1. **Load Data** 📊
2. **Explore & Visualize** 🔍
3. **Prepare Data** 🛠️
4. **Split Data** ✂️
5. **Train Model** 🧠
6. **Evaluate Performance** 📈
7. **Make Predictions** 🎯

Let's walk through this process with real examples!

---

## 🌸 Project 1: Iris Flower Classification

Let's start with the famous Iris dataset - the "Hello World" of machine learning!

### Step 1: Load and Explore the Data

In [None]:
# Load the Iris dataset
iris = load_iris()

# Convert to DataFrame for easier exploration
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target
iris_df['species_name'] = iris_df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

print("🌸 Iris Dataset Overview:")
print(f"Shape: {iris_df.shape}")
print(f"Features: {list(iris.feature_names)}")
print(f"Target classes: {list(iris.target_names)}")

# Show first few rows
print("\nFirst 5 rows:")
iris_df.head()

### Step 2: Visualize the Data

In [None]:
# Create visualization to understand the data
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Iris Dataset Exploration', fontsize=16, fontweight='bold')

# Scatter plot: Sepal length vs Sepal width
for i, species in enumerate(iris.target_names):
    mask = iris_df['species_name'] == species
    axes[0, 0].scatter(iris_df[mask]['sepal length (cm)'], 
                      iris_df[mask]['sepal width (cm)'], 
                      label=species, alpha=0.7)
axes[0, 0].set_xlabel('Sepal Length (cm)')
axes[0, 0].set_ylabel('Sepal Width (cm)')
axes[0, 0].legend()
axes[0, 0].set_title('Sepal Measurements')

# Scatter plot: Petal length vs Petal width
for i, species in enumerate(iris.target_names):
    mask = iris_df['species_name'] == species
    axes[0, 1].scatter(iris_df[mask]['petal length (cm)'], 
                      iris_df[mask]['petal width (cm)'], 
                      label=species, alpha=0.7)
axes[0, 1].set_xlabel('Petal Length (cm)')
axes[0, 1].set_ylabel('Petal Width (cm)')
axes[0, 1].legend()
axes[0, 1].set_title('Petal Measurements')

# Distribution of species
species_counts = iris_df['species_name'].value_counts()
axes[1, 0].bar(species_counts.index, species_counts.values)
axes[1, 0].set_xlabel('Species')
axes[1, 0].set_ylabel('Count')
axes[1, 0].set_title('Species Distribution')

# Feature distributions
iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']].hist(ax=axes[1, 1], bins=15)
axes[1, 1].set_title('Feature Distributions')

plt.tight_layout()
plt.show()

print("💡 Key Observations:")
print("- Setosa flowers have smaller petals")
print("- Virginica flowers have the largest petals")
print("- Species seem separable based on petal measurements!")

### Step 3: Prepare and Split the Data

In [None]:
# Prepare features (X) and target (y)
X = iris.data  # Features: sepal length, sepal width, petal length, petal width
y = iris.target  # Target: species (0=setosa, 1=versicolor, 2=virginica)

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")

# Split data into training and testing sets
# 80% for training, 20% for testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\n📊 Data Split:")
print(f"Training set: {X_train.shape[0]} samples")
print(f"Testing set: {X_test.shape[0]} samples")

# Check class distribution in splits
print(f"\nTraining set class distribution: {np.bincount(y_train)}")
print(f"Testing set class distribution: {np.bincount(y_test)}")

### Step 4: Train Your First Machine Learning Model!

In [None]:
# Create and train a Random Forest Classifier
print("🧠 Training Random Forest Classifier...")

# Create the model
rf_classifier = RandomForestClassifier(
    n_estimators=100,  # Number of trees in the forest
    random_state=42    # For reproducible results
)

# Train the model
rf_classifier.fit(X_train, y_train)

print("✅ Model training completed!")

# Make predictions on the test set
y_pred = rf_classifier.predict(X_test)

print(f"\n🎯 Predictions made for {len(y_test)} test samples")
print(f"Actual species: {y_test}")
print(f"Predicted species: {y_pred}")

### Step 5: Evaluate Model Performance

In [None]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"🎯 Model Accuracy: {accuracy:.2%}")

# Detailed classification report
print("\n📊 Detailed Performance Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Feature importance
feature_importance = rf_classifier.feature_importances_
feature_names = iris.feature_names

# Visualize feature importance
plt.figure(figsize=(10, 6))
bars = plt.bar(feature_names, feature_importance)
plt.title('Feature Importance in Iris Classification', fontweight='bold')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.xticks(rotation=45)

# Add value labels on bars
for bar, importance in zip(bars, feature_importance):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{importance:.3f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

print(f"\n💡 Most important feature: {feature_names[np.argmax(feature_importance)]}")

### Step 6: Make Predictions on New Data

In [None]:
# Let's predict the species for some new flower measurements
new_flowers = np.array([
    [5.1, 3.5, 1.4, 0.2],  # Looks like setosa
    [6.2, 2.8, 4.8, 1.8],  # Looks like versicolor/virginica
    [7.2, 3.0, 5.8, 1.6]   # Looks like virginica
])

predictions = rf_classifier.predict(new_flowers)
probabilities = rf_classifier.predict_proba(new_flowers)

print("🔮 Predictions for New Flowers:")
print("="*50)

for i, (flower, pred, prob) in enumerate(zip(new_flowers, predictions, probabilities)):
    predicted_species = iris.target_names[pred]
    confidence = prob[pred] * 100
    
    print(f"\nFlower {i+1}: {flower}")
    print(f"Predicted species: {predicted_species} (confidence: {confidence:.1f}%)")
    
    # Show all probabilities
    print("All probabilities:")
    for j, species in enumerate(iris.target_names):
        print(f"  {species}: {prob[j]*100:.1f}%")

---

## 🏠 Project 2: House Price Prediction (Regression)

Now let's try a regression problem - predicting house prices!

### Load and Explore the Boston Housing Dataset

In [None]:
# Load Boston housing dataset
boston = load_boston()

# Convert to DataFrame
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
boston_df['price'] = boston.target

print("🏠 Boston Housing Dataset Overview:")
print(f"Shape: {boston_df.shape}")
print(f"Target: House prices (in $1000s)")

print("\nDataset description:")
print(boston.DESCR[:500] + "...")

# Show basic statistics
print("\n📊 Price Statistics:")
print(f"Mean price: ${boston_df['price'].mean():.1f}k")
print(f"Median price: ${boston_df['price'].median():.1f}k")
print(f"Price range: ${boston_df['price'].min():.1f}k - ${boston_df['price'].max():.1f}k")

boston_df.head()

### Visualize Key Relationships

In [None]:
# Visualize relationships between features and house prices
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
fig.suptitle('House Price Relationships', fontsize=16, fontweight='bold')

# Price distribution
axes[0, 0].hist(boston_df['price'], bins=20, edgecolor='black', alpha=0.7)
axes[0, 0].set_xlabel('Price ($1000s)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Price Distribution')

# Number of rooms vs Price
axes[0, 1].scatter(boston_df['RM'], boston_df['price'], alpha=0.6)
axes[0, 1].set_xlabel('Average Number of Rooms')
axes[0, 1].set_ylabel('Price ($1000s)')
axes[0, 1].set_title('Rooms vs Price')

# Crime rate vs Price
axes[0, 2].scatter(boston_df['CRIM'], boston_df['price'], alpha=0.6)
axes[0, 2].set_xlabel('Crime Rate')
axes[0, 2].set_ylabel('Price ($1000s)')
axes[0, 2].set_title('Crime Rate vs Price')

# Distance to employment centers vs Price
axes[1, 0].scatter(boston_df['DIS'], boston_df['price'], alpha=0.6)
axes[1, 0].set_xlabel('Distance to Employment Centers')
axes[1, 0].set_ylabel('Price ($1000s)')
axes[1, 0].set_title('Distance vs Price')

# Pupil-teacher ratio vs Price
axes[1, 1].scatter(boston_df['PTRATIO'], boston_df['price'], alpha=0.6)
axes[1, 1].set_xlabel('Pupil-Teacher Ratio')
axes[1, 1].set_ylabel('Price ($1000s)')
axes[1, 1].set_title('School Quality vs Price')

# Correlation heatmap of top features
correlation_features = ['price', 'RM', 'LSTAT', 'PTRATIO', 'DIS', 'CRIM']
corr_matrix = boston_df[correlation_features].corr()
im = axes[1, 2].imshow(corr_matrix, cmap='RdBu', vmin=-1, vmax=1)
axes[1, 2].set_xticks(range(len(correlation_features)))
axes[1, 2].set_yticks(range(len(correlation_features)))
axes[1, 2].set_xticklabels(correlation_features, rotation=45)
axes[1, 2].set_yticklabels(correlation_features)
axes[1, 2].set_title('Feature Correlations')

# Add correlation values to heatmap
for i in range(len(correlation_features)):
    for j in range(len(correlation_features)):
        axes[1, 2].text(j, i, f'{corr_matrix.iloc[i, j]:.2f}', 
                       ha='center', va='center', fontsize=8)

plt.tight_layout()
plt.show()

print("💡 Key Insights:")
print("- More rooms = higher prices")
print("- Higher crime rate = lower prices")
print("- Better schools (lower pupil-teacher ratio) = higher prices")

### Train Regression Models

In [None]:
# Prepare data
X_boston = boston.data
y_boston = boston.target

# Split the data
X_train_boston, X_test_boston, y_train_boston, y_test_boston = train_test_split(
    X_boston, y_boston, test_size=0.2, random_state=42
)

print(f"🏠 Training set: {X_train_boston.shape[0]} houses")
print(f"🏠 Testing set: {X_test_boston.shape[0]} houses")

# Train two different models
print("\n🧠 Training models...")

# 1. Linear Regression
linear_reg = LinearRegression()
linear_reg.fit(X_train_boston, y_train_boston)

# 2. Random Forest Regressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train_boston, y_train_boston)

print("✅ Both models trained successfully!")

### Compare Model Performance

In [None]:
# Make predictions with both models
y_pred_linear = linear_reg.predict(X_test_boston)
y_pred_rf = rf_regressor.predict(X_test_boston)

# Calculate Mean Squared Error for both models
mse_linear = mean_squared_error(y_test_boston, y_pred_linear)
mse_rf = mean_squared_error(y_test_boston, y_pred_rf)

# Calculate R² score (coefficient of determination)
r2_linear = linear_reg.score(X_test_boston, y_test_boston)
r2_rf = rf_regressor.score(X_test_boston, y_test_boston)

print("📊 Model Performance Comparison:")
print("=" * 50)
print(f"Linear Regression:")
print(f"  Mean Squared Error: {mse_linear:.2f}")
print(f"  R² Score: {r2_linear:.3f} ({r2_linear*100:.1f}% variance explained)")
print(f"  RMSE: ${np.sqrt(mse_linear):.2f}k")

print(f"\nRandom Forest:")
print(f"  Mean Squared Error: {mse_rf:.2f}")
print(f"  R² Score: {r2_rf:.3f} ({r2_rf*100:.1f}% variance explained)")
print(f"  RMSE: ${np.sqrt(mse_rf):.2f}k")

# Visualize predictions vs actual values
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Linear Regression predictions
axes[0].scatter(y_test_boston, y_pred_linear, alpha=0.6)
axes[0].plot([y_test_boston.min(), y_test_boston.max()], 
            [y_test_boston.min(), y_test_boston.max()], 'r--', lw=2)
axes[0].set_xlabel('Actual Price ($1000s)')
axes[0].set_ylabel('Predicted Price ($1000s)')
axes[0].set_title(f'Linear Regression\nR² = {r2_linear:.3f}')
axes[0].grid(True, alpha=0.3)

# Random Forest predictions
axes[1].scatter(y_test_boston, y_pred_rf, alpha=0.6)
axes[1].plot([y_test_boston.min(), y_test_boston.max()], 
            [y_test_boston.min(), y_test_boston.max()], 'r--', lw=2)
axes[1].set_xlabel('Actual Price ($1000s)')
axes[1].set_ylabel('Predicted Price ($1000s)')
axes[1].set_title(f'Random Forest\nR² = {r2_rf:.3f}')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

winner = "Random Forest" if r2_rf > r2_linear else "Linear Regression"
print(f"\n🏆 Winner: {winner} performs better on this dataset!")

### Feature Importance in House Price Prediction

In [None]:
# Analyze feature importance from Random Forest
feature_importance_boston = rf_regressor.feature_importances_
feature_names_boston = boston.feature_names

# Create a DataFrame for easier handling
importance_df = pd.DataFrame({
    'feature': feature_names_boston,
    'importance': feature_importance_boston
}).sort_values('importance', ascending=False)

# Plot feature importance
plt.figure(figsize=(12, 8))
bars = plt.barh(importance_df['feature'], importance_df['importance'])
plt.xlabel('Feature Importance')
plt.title('Feature Importance in House Price Prediction', fontweight='bold')
plt.gca().invert_yaxis()  # Highest importance at top

# Add value labels
for bar, importance in zip(bars, importance_df['importance']):
    plt.text(bar.get_width() + 0.005, bar.get_y() + bar.get_height()/2, 
             f'{importance:.3f}', ha='left', va='center')

plt.tight_layout()
plt.show()

print("🏠 Most Important Features for House Prices:")
for i, (_, row) in enumerate(importance_df.head().iterrows()):
    print(f"{i+1}. {row['feature']}: {row['importance']:.3f}")

# Feature explanations
feature_explanations = {
    'LSTAT': 'Lower status of population (%)',
    'RM': 'Average number of rooms per dwelling',
    'PTRATIO': 'Pupil-teacher ratio by town',
    'DIS': 'Distances to employment centers',
    'NOX': 'Nitric oxides concentration',
    'CRIM': 'Per capita crime rate',
    'TAX': 'Property tax rate',
    'AGE': 'Proportion of owner-occupied units built prior to 1940'
}

print("\n📖 Feature Explanations:")
for feature in importance_df['feature'].head(5):
    if feature in feature_explanations:
        print(f"• {feature}: {feature_explanations[feature]}")

### Make Predictions for New Houses

In [None]:
# Let's predict prices for some example houses
# Using average values but changing key features
average_features = X_boston.mean(axis=0)

# Create 3 different house profiles
house_profiles = []

# House 1: Luxury house (more rooms, low crime, good schools)
luxury_house = average_features.copy()
luxury_house[5] = 8.0    # RM: 8 rooms (vs avg 6.3)
luxury_house[0] = 0.5    # CRIM: Low crime (vs avg 3.6)
luxury_house[10] = 15.0  # PTRATIO: Good schools (vs avg 18.5)
house_profiles.append(("Luxury House", luxury_house))

# House 2: Average house
house_profiles.append(("Average House", average_features))

# House 3: Budget house (fewer rooms, higher crime, worse schools)
budget_house = average_features.copy()
budget_house[5] = 4.5    # RM: 4.5 rooms
budget_house[0] = 8.0    # CRIM: Higher crime
budget_house[10] = 22.0  # PTRATIO: Worse schools
house_profiles.append(("Budget House", budget_house))

print("🏠 House Price Predictions:")
print("=" * 50)

for name, features in house_profiles:
    # Reshape for prediction (sklearn expects 2D array)
    features_2d = features.reshape(1, -1)
    
    # Predict with both models
    price_linear = linear_reg.predict(features_2d)[0]
    price_rf = rf_regressor.predict(features_2d)[0]
    
    print(f"\n{name}:")
    print(f"  Rooms: {features[5]:.1f}")
    print(f"  Crime Rate: {features[0]:.1f}")
    print(f"  Pupil-Teacher Ratio: {features[10]:.1f}")
    print(f"  Predicted Price (Linear): ${price_linear:.1f}k")
    print(f"  Predicted Price (Random Forest): ${price_rf:.1f}k")
    print(f"  Average Prediction: ${(price_linear + price_rf)/2:.1f}k")

---

## 🎯 Key Machine Learning Concepts You've Learned

### **Classification vs Regression**
- **Classification**: Predicting categories (Iris species)
- **Regression**: Predicting continuous values (house prices)

### **The ML Workflow**
1. **Data Loading**: `load_iris()`, `load_boston()`
2. **Data Splitting**: `train_test_split()`
3. **Model Training**: `.fit()`
4. **Predictions**: `.predict()`
5. **Evaluation**: `accuracy_score()`, `mean_squared_error()`

### **Model Types**
- **Random Forest**: Combines many decision trees (usually performs well)
- **Linear Regression**: Simple, interpretable, good baseline
- **Logistic Regression**: Linear model for classification

### **Evaluation Metrics**
- **Classification**: Accuracy, Precision, Recall, F1-score
- **Regression**: Mean Squared Error (MSE), R² score, RMSE

---

## 🔧 Common ML Errors and How to Fix Them

In [None]:
# Common ML Mistakes and Solutions

print("🚨 Common Machine Learning Errors and Solutions:")
print("=" * 60)

print("\n1. Data Shape Errors:")
print("   Problem: ValueError: Expected 2D array, got 1D array")
print("   Solution: Use .reshape(-1, 1) or .reshape(1, -1)")

# Example of correct reshaping
single_sample = X_test[0]  # This is 1D
print(f"   1D shape: {single_sample.shape}")
single_sample_2d = single_sample.reshape(1, -1)  # Make it 2D
print(f"   2D shape: {single_sample_2d.shape}")

print("\n2. Data Leakage:")
print("   Problem: Accidentally using future information to predict the past")
print("   Solution: Always split data BEFORE any preprocessing")

print("\n3. Overfitting:")
print("   Problem: Model performs great on training data, poor on test data")
print("   Solution: Use cross-validation, simpler models, more data")

print("\n4. Underfitting:")
print("   Problem: Model performs poorly on both training and test data")
print("   Solution: Use more complex models, more features, less regularization")

print("\n5. Wrong Metric:")
print("   Problem: Using accuracy for imbalanced datasets")
print("   Solution: Use precision, recall, F1-score, or ROC-AUC")

print("\n💡 Best Practices:")
print("   ✅ Always set random_state for reproducible results")
print("   ✅ Use stratify=y for balanced train/test splits")
print("   ✅ Scale/normalize features when needed")
print("   ✅ Visualize your data before modeling")
print("   ✅ Start with simple models, then increase complexity")

---

## 🏋️ Practice Challenge: Build Your Own Model

Now it's your turn! Try to solve this challenge:

### Challenge: Wine Quality Prediction

Your task: Predict wine quality using the famous wine dataset!

In [None]:
# Challenge: Wine Quality Prediction
from sklearn.datasets import load_wine

# Load wine dataset
wine = load_wine()
wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)
wine_df['wine_class'] = wine.target
wine_df['wine_name'] = wine_df['wine_class'].map({0: 'class_0', 1: 'class_1', 2: 'class_2'})

print("🍷 Wine Dataset Challenge:")
print(f"Shape: {wine_df.shape}")
print(f"Classes: {wine.target_names}")
print(f"Features: {len(wine.feature_names)} chemical properties")

print("\n🎯 Your Challenge:")
print("1. Explore the wine dataset")
print("2. Split the data into train/test sets")
print("3. Train a classifier (try different models!)")
print("4. Evaluate performance")
print("5. Find the most important features")

print("\n💪 Stretch Goals:")
print("- Compare multiple models")
print("- Create visualizations")
print("- Try feature selection")

# Show first few rows to get started
wine_df.head()

In [None]:
# YOUR CODE HERE - Try the wine classification challenge!
# Start by exploring the data, then build your model

# Hint: You can copy and modify code from the Iris example above

# Step 1: Explore the data
# print(wine_df.describe())

# Step 2: Prepare and split the data
# X_wine = wine.data
# y_wine = wine.target
# X_train_wine, X_test_wine, y_train_wine, y_test_wine = train_test_split(...)

# Step 3: Train a model
# model = RandomForestClassifier(...)
# model.fit(...)

# Step 4: Evaluate
# predictions = model.predict(...)
# accuracy = accuracy_score(...)

# Write your solution below:


---

## ✅ Self-Assessment Checklist

Before moving forward, make sure you can:

**Machine Learning Fundamentals:**
- [ ] Understand the difference between classification and regression
- [ ] Follow the ML workflow: load → explore → split → train → evaluate → predict
- [ ] Use `train_test_split()` to create training and testing sets
- [ ] Explain why we split data (to avoid overfitting)

**Model Training:**
- [ ] Import and use scikit-learn models
- [ ] Train models with `.fit(X_train, y_train)`
- [ ] Make predictions with `.predict(X_test)`
- [ ] Get prediction probabilities with `.predict_proba()`

**Model Evaluation:**
- [ ] Calculate accuracy for classification problems
- [ ] Calculate MSE and R² for regression problems
- [ ] Interpret classification reports
- [ ] Understand feature importance

**Practical Skills:**
- [ ] Debug common shape errors (1D vs 2D arrays)
- [ ] Visualize model predictions vs actual values
- [ ] Compare different models
- [ ] Make predictions on new data

**Data Science Connection:**
- [ ] Recognize real-world ML applications
- [ ] Understand when to use different models
- [ ] Know how to interpret results for business decisions

---

## 🚀 What's Next in Your Data Science Journey?

Congratulations! You've built your first machine learning models. Here's what to explore next:

### **Immediate Next Steps:**
- **Cross-Validation**: Learn to validate models properly
- **Feature Engineering**: Create better features from raw data
- **Model Tuning**: Optimize hyperparameters for better performance
- **Pipeline Creation**: Automate your ML workflow

### **Advanced Topics to Explore:**
- **Ensemble Methods**: Combine multiple models
- **Deep Learning**: Neural networks with TensorFlow/PyTorch
- **Natural Language Processing**: Work with text data
- **Computer Vision**: Work with image data
- **Time Series**: Predict future values

### **Real-World Applications:**
- **Business Analytics**: Customer segmentation, churn prediction
- **Finance**: Fraud detection, algorithmic trading
- **Healthcare**: Medical diagnosis, drug discovery
- **Technology**: Recommendation systems, search algorithms

### **Tools to Master:**
- **Advanced Scikit-learn**: More algorithms and techniques
- **XGBoost/LightGBM**: State-of-the-art gradient boosting
- **TensorFlow/PyTorch**: Deep learning frameworks
- **MLflow/Kubeflow**: ML experiment tracking and deployment

---

## 🎉 Congratulations!

You've completed your introduction to machine learning with scikit-learn! You now have the foundation to:

✅ **Build classification models** to predict categories  
✅ **Build regression models** to predict continuous values  
✅ **Evaluate model performance** using appropriate metrics  
✅ **Understand feature importance** in your models  
✅ **Apply ML to real-world problems** with confidence  

**Remember:** Every data scientist started where you are now. The key is to keep practicing with real datasets and gradually tackle more complex problems.

**What will you predict next? 🔮**