## Introduction to Experiment Tracking

Experiment tracking in machine learning is the process of systematically recording and managing all relevant information about your model development lifecycle. This includes storing details about each training run, such as the specific code version used, hyperparameters configured, datasets involved, and the performance metrics achieved.

### Why is experiment tracking important?

1.  **Reproducibility**: Machine learning experiments can be complex, involving many variables. Tracking allows you to precisely recreate past experiments, ensuring that results are verifiable and consistent.
2.  **Comparison and Analysis**: By logging parameters and metrics for different runs, you can easily compare the performance of various models, hyperparameter configurations, or data preprocessing techniques. This facilitates informed decision-making and helps identify what works best.
3.  **Collaboration**: In team environments, experiment tracking provides a centralized record of all experiments, enabling team members to understand, build upon, and reproduce each other's work efficiently.
4.  **Debugging and Optimization**: When a model performs unexpectedly, detailed experiment logs can help trace back the cause, whether it's a code change, a specific hyperparameter setting, or an issue with the data. It's crucial for iterative improvement.
5.  **Auditability and Compliance**: For regulated industries, having a clear audit trail of model development and performance is often a requirement, which experiment tracking naturally provides.

## Install Necessary Libraries

### Subtask:
Install `mlflow`, `wandb`, `scikit-learn`, `pandas`, `numpy`,`optuna` and `matplotlib` and explain their purpose.


In [1]:
!pip install mlflow optuna wandb scikit-learn pandas numpy matplotlib nbformat

Defaulting to user installation because normal site-packages is not writeable


[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: C:\Users\Satej Raste\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





In [2]:
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature
import optuna
import joblib  # <--- Added for persistence
import numpy as np
import pandas as pd
import os
import shutil
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import Ridge, Lasso,LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_percentage_error
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
import wandb




RESET_ALL_DATA = True ##(when you want to create new study set it to true)


  from .autonotebook import tqdm as notebook_tqdm


### Explanation of Installed Libraries

*   `mlflow`: An open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking, reproducibility, and model deployment.
*   `wandb` (Weights & Biases): A tool for experiment tracking, model optimization, and collaboration in machine learning. It provides rich visualizations and reporting capabilities.
*   `scikit-learn`: A popular open-source machine learning library for Python, providing simple and efficient tools for data mining and data analysis.
*   `pandas`: A powerful and flexible open-source data analysis and manipulation library for Python, built on top of NumPy.
*   `numpy`: The fundamental package for scientific computing with Python, providing support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
*   `matplotlib`: A comprehensive library for creating static, animated, and interactive visualizations in Python.

### 2. CLEANUP / SETUP 
The code below create folder structure required to run and execute all functionalities of mlflow

In [3]:
mlruns_dir = "mlruns"
trash_dir = os.path.join(mlruns_dir, ".trash")
models_dir = os.path.join(mlruns_dir, "models") # <--- NEW: Required by UI

# Ensure no active runs hold a lock
try:
    mlflow.end_run()
except:
    pass

if RESET_ALL_DATA:
    print(f"üßπ Wiping '{mlruns_dir}' and old study files to start fresh...")
    if os.path.exists(mlruns_dir):
        try:
            shutil.rmtree(mlruns_dir)
        except Exception as e:
            print(f"   ‚ö†Ô∏è Warning: Could not delete '{mlruns_dir}' (Is 'mlflow ui' running?): {e}")
    
    # Delete old Joblib study files
    for f in os.listdir("."):
        if f.endswith("_optuna_study.pkl"):
            os.remove(f)
            print(f"   Deleted old study file: {f}")
else:
    print("üîÑ RESUME MODE: Keeping existing experiments...")

# FIX: Create ALL required folders to prevent UI crashes
for directory in [mlruns_dir, trash_dir, models_dir]:
    if not os.path.exists(directory):
        os.makedirs(directory)
        print(f"   Created missing directory: {directory}")

# Force MLflow to use this local directory
mlflow.set_tracking_uri(f"file:./{mlruns_dir}")

üßπ Wiping 'mlruns' and old study files to start fresh...
   Deleted old study file: Lasso_optuna_study.pkl
   Deleted old study file: Ridge_optuna_study.pkl
   Created missing directory: mlruns
   Created missing directory: mlruns\.trash
   Created missing directory: mlruns\models


### Diabetes Dataset Explanation

The **Diabetes dataset** is a classic dataset in machine learning, often used for regression tasks. It consists of 442 patients and 10 baseline variables. These variables are physiological measurements such as age, sex, body mass index (BMI), average blood pressure, and six blood serum measurements. The target variable is a quantitative measure of disease progression one year after baseline. This dataset is commonly used to predict the progression of diabetes based on these physiological factors.

In [4]:


# Load the diabetes dataset
diabetes = load_diabetes()

# Create feature DataFrame X
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)

# Create target Series y
y = pd.Series(diabetes.target, name="target")

print("Diabetes dataset loaded and converted to DataFrame (X) and Series (y).")
print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")

Diabetes dataset loaded and converted to DataFrame (X) and Series (y).
X shape: (442, 10)
y shape: (442,)


In [8]:
X.head()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641


In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Data split into training and testing sets.")
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")


Data split into training and testing sets.
X_train shape: (353, 10)
X_test shape: (89, 10)
y_train shape: (353,)
y_test shape: (89,)



 ------------------------------------------------------------------------------
### 3. HELPER: Advanced Plotting & Logging
 ------------------------------------------------------------------------------

In [10]:

def log_advanced_visualizations(model, X_test, y_test, model_name):
    """Generates and logs rich matplotlib plots to MLflow artifacts"""
    preds = model.predict(X_test)
    residuals = y_test - preds
    
    # A. Residual Plot
    fig_res, ax_res = plt.subplots(figsize=(8, 5))
    sns.scatterplot(x=preds, y=residuals, ax=ax_res, color="blue", alpha=0.6)
    ax_res.axhline(0, color='red', linestyle='--')
    ax_res.set_xlabel("Predicted Values")
    ax_res.set_ylabel("Residuals")
    ax_res.set_title(f"{model_name} - Residual Plot")
    plt.close(fig_res)
    mlflow.log_figure(fig_res, f"plots/{model_name}_residuals.png")
    
    # B. Prediction Error Plot
    fig_pred, ax_pred = plt.subplots(figsize=(8, 5))
    sns.scatterplot(x=y_test, y=preds, ax=ax_pred, color="green", alpha=0.6)
    min_val = min(min(y_test), min(preds))
    max_val = max(max(y_test), max(preds))
    ax_pred.plot([min_val, max_val], [min_val, max_val], 'r--')
    ax_pred.set_xlabel("Actual Values")
    ax_pred.set_ylabel("Predicted Values")
    ax_pred.set_title(f"{model_name} - Actual vs Predicted")
    plt.close(fig_pred)
    mlflow.log_figure(fig_pred, f"plots/{model_name}_prediction_error.png")

    # C. Feature Importance
    if hasattr(model, "coef_"):
        fig_imp, ax_imp = plt.subplots(figsize=(10, 6))
        features = X_test.columns if hasattr(X_test, "columns") else [f"Feat_{i}" for i in range(X_test.shape[1])]
        coefs = pd.Series(model.coef_, index=features).sort_values()
        coefs.plot(kind="barh", ax=ax_imp, color="purple")
        ax_imp.set_title(f"{model_name} - Feature Coefficients")
        plt.tight_layout()
        plt.close(fig_imp)
        mlflow.log_figure(fig_imp, f"plots/{model_name}_feature_importance.png")

##### --ENABLE SYSTEM METRICS ---

In [11]:

print("üìä Enabling System Metrics Logging (CPU/RAM)...")
try:
    mlflow.enable_system_metrics_logging()
except Exception as e:
    print(f"   ‚ö†Ô∏è Could not enable system metrics (install psutil?): {e}")


# --- TRACING HELPER ---
@mlflow.trace(name="Model_Training_Evaluation", span_type="FUNCTION")
def trace_model_execution(model, X_train, y_train, X_test):
    """Wraps the training and prediction in a Trace Span."""
    # 1. Train
    with mlflow.start_span(name="Fit_Model") as span:
        model.fit(X_train, y_train)
        span.set_inputs({"X_shape": str(X_train.shape)})
    
    # 2. Predict
    with mlflow.start_span(name="Predict_Model") as span:
        preds = model.predict(X_test)
        span.set_outputs({"preds_mean": float(np.mean(preds))})
        
    return preds


üìä Enabling System Metrics Logging (CPU/RAM)...



 ------------------------------------------------------------------------------
### 4. GENERIC OPTIMIZATION FUNCTION
 ------------------------------------------------------------------------------

In [12]:
# A. BASELINE RUNNER (For Linear Regression - No Optimization)
def run_baseline(experiment_name):
    mlflow.set_experiment(experiment_name)
    print(f"\nüèÅ Running Baseline: Linear Regression in '{experiment_name}'...")
    
    with mlflow.start_run(run_name="Linear_Regression_Baseline"):
        model = LinearRegression()
        
        # Trace execution
        preds = trace_model_execution(model, X_train, y_train, X_test)
        
        # Metrics
        mse = mean_squared_error(y_test, preds)
        rmse = np.sqrt(mse)
        r2 = r2_score(y_test, preds)
        mape = mean_absolute_percentage_error(y_test, preds)
        n = len(y_test); p = X_test.shape[1]
        adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)

        # Log everything
        mlflow.log_param("model_type", "LinearRegression")
        mlflow.log_metrics({"mse": mse, "rmse": rmse, "r2": r2, "mape": mape, "adj_r2": adj_r2})
        
        # Log Model & Plots
        signature = infer_signature(X_train, model.predict(X_train))
        input_example = X_train.head(3) if hasattr(X_train, "head") else X_train[:3]
        
        mlflow.sklearn.log_model(model, "baseline_model", signature=signature, input_example=input_example)
        log_advanced_visualizations(model, X_test, y_test, "LinearRegression")
        
        mlflow.set_tag("status", "Baseline")
        print(f"   ‚úÖ Baseline MSE: {mse:.6f}")




In [13]:
# 1. Run Baseline (Single run, no tuning needed)
run_baseline("Linear_Regression_Baseline")

  return FileStore(store_uri, store_uri)
2025/11/20 21:06:57 INFO mlflow.tracking.fluent: Experiment with name 'Linear_Regression_Baseline' does not exist. Creating a new experiment.



üèÅ Running Baseline: Linear Regression in 'Linear_Regression_Baseline'...


2025/11/20 21:06:59 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.
2025/11/20 21:06:59 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 7/7 [00:00<00:00, 160.33it/s]
2025/11/20 21:07:44 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/11/20 21:07:45 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!


   ‚úÖ Baseline MSE: 2900.193628


### Explanation of Lasso `alpha` parameter

The `alpha` parameter in Lasso (Least Absolute Shrinkage and Selection Operator) regression is a regularization strength parameter. It controls the amount of L1 regularization applied to the model. L1 regularization adds a penalty equal to the absolute value of the magnitude of coefficients to the loss function.

*   **Effect**: A higher `alpha` value increases the penalty, forcing more coefficients to become exactly zero. This means Lasso can perform automatic feature selection, effectively removing less important features from the model. A smaller `alpha` value reduces the penalty, allowing more features to contribute to the model. If `alpha` is 0, Lasso effectively becomes equivalent to Ordinary Least Squares (OLS) regression.
*   **Purpose**: It helps prevent overfitting, especially in cases with many features, and yields sparser models that are often easier to interpret.

### Explanation of Ridge `alpha` parameter

The `alpha` parameter in Ridge regression is the regularization strength parameter, similar to Lasso's alpha. It controls the amount of L2 regularization applied to the model. L2 regularization adds a penalty equal to the square of the magnitude of coefficients to the loss function.

*   **Effect**: A higher `alpha` value increases the penalty, which shrinks the coefficients towards zero. Unlike Lasso, Ridge regression typically does not force coefficients to become exactly zero but rather reduces their magnitude. This helps prevent multicollinearity and makes the model more robust to noisy data.
*   **Purpose**: It primarily addresses multicollinearity (when independent variables are highly correlated) and reduces model complexity to prevent overfitting by penalizing large coefficients. If `alpha` is 0, Ridge regression becomes equivalent to Ordinary Least Squares (OLS) regression.

In [14]:
# B. OPTIMIZATION RUNNER (For Ridge/Lasso - Uses Optuna)
def run_optimization(model_class, experiment_name, n_trials=10):
    mlflow.set_experiment(experiment_name)
    model_type_str = model_class.__name__
    study_filename = f"{model_type_str}_optuna_study.pkl"
    
    print(f"\nüöÄ Optimizing {model_type_str} in '{experiment_name}'...")

    if not RESET_ALL_DATA and os.path.exists(study_filename):
        print(f"   üìÇ Loading existing study '{study_filename}'...")
        study = joblib.load(study_filename)
    else:
        print(f"   ‚ú® Creating NEW study...")
        study = optuna.create_study(direction="minimize")

    def objective(trial):
        if model_type_str == "Lasso":
            alpha = trial.suggest_float("alpha", 1e-5, 10.0, log=True)
        else: # Ridge
            alpha = trial.suggest_float("alpha", 1e-3, 100.0, log=True)
            
        with mlflow.start_run(nested=True, run_name=f"{model_type_str}_Trial_{trial.number}"):
            model = model_class(alpha=alpha, random_state=42)
            preds = trace_model_execution(model, X_train, y_train, X_test)
            
            mse = mean_squared_error(y_test, preds)
            rmse = np.sqrt(mse)
            r2 = r2_score(y_test, preds)
            
            mlflow.log_params({"alpha": alpha})
            mlflow.log_metrics({"mse": mse, "rmse": rmse, "r2": r2})
            return mse

    with mlflow.start_run(run_name=f"{model_type_str}_Optimization_Batch"):
        study.optimize(objective, n_trials=n_trials)
        
        print(f"üèÜ Best {model_type_str} Alpha: {study.best_params['alpha']:.6f}")

        # Log Champion Model
        best_model = model_class(alpha=study.best_params['alpha'], random_state=42)
        best_model.fit(X_train, y_train)
        
        signature = infer_signature(X_train, best_model.predict(X_train))
        input_example = X_train.head(3) if hasattr(X_train, "head") else X_train[:3]

        mlflow.sklearn.log_model(best_model, f"best_{model_type_str}_model", signature=signature, input_example=input_example)
        log_advanced_visualizations(best_model, X_test, y_test, model_type_str)
        
        mlflow.log_metric("best_mse", study.best_value)
        mlflow.set_tag("status", "Best_Candidate")

    joblib.dump(study, study_filename)





 ------------------------------------------------------------------------------
### 5. EXECUTE
 ------------------------------------------------------------------------------

In [15]:
# 2. Optimize Complex Models (Search for best Alpha)
run_optimization(Ridge, "Ridge_Rich_Experiment", n_trials=5)


2025/11/20 21:15:59 INFO mlflow.tracking.fluent: Experiment with name 'Ridge_Rich_Experiment' does not exist. Creating a new experiment.
[I 2025-11-20 21:16:00,406] A new study created in memory with name: no-name-6689365e-f912-4e72-944f-77f08aac0765



üöÄ Optimizing Ridge in 'Ridge_Rich_Experiment'...
   ‚ú® Creating NEW study...


2025/11/20 21:16:01 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.
2025/11/20 21:16:01 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
2025/11/20 21:16:01 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.
2025/11/20 21:16:01 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
2025/11/20 21:16:02 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/11/20 21:16:03 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!
[I 2025-11-20 21:16:03,133] Trial 0 finished with value: 3497.911703116246 and parameters: {'alpha': 2.449550265588355}. Best is trial 0 with value: 3497.911703116246.
2025/11/20 21:16:03 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger

üèÜ Best Ridge Alpha: 0.042343


Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 7/7 [00:00<00:00, 231.01it/s]
2025/11/20 21:16:29 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/11/20 21:16:29 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!


In [16]:
run_optimization(Lasso, "Lasso_Rich_Experiment", n_trials=5)


2025/11/20 21:26:18 INFO mlflow.tracking.fluent: Experiment with name 'Lasso_Rich_Experiment' does not exist. Creating a new experiment.
[I 2025-11-20 21:26:18,420] A new study created in memory with name: no-name-243a36d1-1248-48f6-8e74-aa436334e16e



üöÄ Optimizing Lasso in 'Lasso_Rich_Experiment'...
   ‚ú® Creating NEW study...


2025/11/20 21:26:18 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.
2025/11/20 21:26:19 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
2025/11/20 21:26:19 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.
2025/11/20 21:26:19 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
2025/11/20 21:26:20 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/11/20 21:26:21 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!
[I 2025-11-20 21:26:21,115] Trial 0 finished with value: 4953.994314438658 and parameters: {'alpha': 1.960076602034882}. Best is trial 0 with value: 4953.994314438658.
2025/11/20 21:26:21 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger

üèÜ Best Lasso Alpha: 0.034721


Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 7/7 [00:00<00:00, 227.67it/s]
2025/11/20 21:26:46 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/11/20 21:26:46 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!


In [14]:
print("\n‚úÖ Execution Complete.")
print("üëâ View System Metrics: Click on any run -> 'System Metrics' tab")
print("üëâ View Traces: Click on Experiment -> 'Traces' tab (on the left sidebar)")


‚úÖ Execution Complete.
üëâ View System Metrics: Click on any run -> 'System Metrics' tab
üëâ View Traces: Click on Experiment -> 'Traces' tab (on the left sidebar)


## MCQ Quiz 1: MLFlow Basics

### Question:
What is the primary purpose of MLFlow in a machine learning workflow?

A) To provide advanced data visualization tools for model output.
B) To manage and track machine learning experiments, including parameters, metrics, and models.
C) To automatically deploy machine learning models to production environments.
D) To perform hyperparameter tuning using genetic algorithms.


### Correct Answer:


**B) To manage and track machine learning experiments, including parameters, metrics, and models.**

## Introduction to Weights & Biases

Weights & Biases (W&B) is a machine learning platform that helps developers and teams track, visualize, and collaborate on their machine learning experiments. It provides a centralized dashboard to log hyperparameters, metrics, and models, making it easier to understand, compare, and reproduce results across different runs.

### Key Benefits of Weights & Biases:

1.  **Rich Visualizations**: W&B offers interactive and customizable dashboards that allow for in-depth visualization of training metrics (e.g., loss, accuracy), system metrics (e.g., GPU utilization, memory usage), and custom charts. This helps in quickly identifying trends and anomalies in model performance.
2.  **Easy Comparison of Runs**: The platform facilitates side-by-side comparison of multiple experiment runs. Users can overlay plots, compare parameter configurations, and analyze metric differences to understand the impact of various changes (e.g., hyperparameter tuning, model architecture updates).
3.  **Collaboration Features**: W&B is designed for team collaboration. It allows multiple users to share projects, view each other's experiments, and add notes or comments, streamlining the development process and improving communication within ML teams.
4.  **Detailed Reporting Capabilities**: Users can generate comprehensive reports directly from their tracked experiments. These reports can include visualizations, code snippets, and explanations, making it simple to document findings, share insights with stakeholders, and maintain a historical record of all development efforts.
5.  **Model Versioning and Artifact Management**: W&B allows for tracking and versioning of models, datasets, and other artifacts. This ensures reproducibility and helps manage the lifecycle of machine learning assets.
6.  **Hyperparameter Optimization**: Integrated tools for hyperparameter sweeps (e.g., grid search, random search, Bayesian optimization) help automate the process of finding optimal model configurations.

In [15]:

# ==============================================================================
# ‚öôÔ∏è CONFIGURATION
# ==============================================================================
WANDB_PROJECT = "Automated_Regression_Pipeline" ## project
RESET_ONLINE_PROJECT = True   # ‚ö†Ô∏è WARNING: This deletes the project on the Cloud! 
RESET_LOCAL_CACHE = True      # Cleans local 'wandb' folder
# ==============================================================================


Key arguments:
*   `project`: This string specifies the name of the project in which the run will be logged. If the project does not exist, W&B will create it. Using a consistent project name helps organize related experiments. For example, `project="MLFlow_and_WandB_Tracking"` or `project="Diabetes_Regression_Experiments"` will group all experiments related to this task under that project in the W&B UI.
*   `name`: (Optional) A human-readable name for the specific run. If not provided, W&B generates a random name.
*   `config`: (Optional) A dictionary of hyperparameters or settings to log for the run.

Before calling `wandb.init()`, it's usually necessary to authenticate with `wandb.login()`, which links your local environment to your W&B account.

### Login Through Credentials for WandB 

In [None]:
key_var=''
### Enter your api key here

In [17]:
# ------------------------------------------------------------------------------
wandb.login(key=key_var) # this is used to login our wanb page 

if RESET_LOCAL_CACHE and os.path.exists("wandb"):
    print("üßπ Cleaning local buffer...")
    shutil.rmtree("wandb")

if RESET_ONLINE_PROJECT:
    print(f"üî• DELETING cloud project '{WANDB_PROJECT}' to start fresh...")
    api = wandb.Api()
    try:
        # Get default entity (username) and delete project
        entity = api.default_entity
        api.project(f"{entity}/{WANDB_PROJECT}").delete()
        print("   ‚úÖ Project deleted successfully.")
    except Exception as e:
        print(f"   ‚ö†Ô∏è Project not deleted (maybe it didn't exist): {e}")

NameError: name 'key_var' is not defined

### Data Artifact 
A Data Artifact captures a specific version of your dataset so it cannot be changed accidentally. This creates a reliable paper trail, allowing you to see exactly which data was used to build any specific model in the past, ensuring you can always reproduce your results.

In [24]:
print("\nüì¶ preparing data artifact...")

# Save Data locally so we can upload it
if not os.path.exists("data"): os.makedirs("data")
X_train.to_csv("data/X_train.csv", index=False)
y_train.to_csv("data/y_train.csv", index=False)

# Start a quick run just to upload the data artifact
with wandb.init(project=WANDB_PROJECT, job_type="data_prep", name="Upload_Dataset") as run:
    data_artifact = wandb.Artifact("Training_Data", type="dataset", description="Randomly generated regression data")
    data_artifact.add_dir("data")
    run.log_artifact(data_artifact)
    print("   ‚úÖ Data Artifact uploaded.")
    


üì¶ preparing data artifact...


[34m[1mwandb[0m: Adding directory to artifact (data)... Done. 0.1s
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


   ‚úÖ Data Artifact uploaded.


### Explanation of `wandb.init()`

`wandb.init()` is the entry point for starting a new Weights & Biases run. Each run is an isolated experiment where you can log metrics, parameters, and other artifacts. It's crucial for organizing and comparing your machine learning experiments.



###  HELPER: Training Function for Sweeps


In [25]:
# This function must take NO arguments. It reads everything from wandb.config.
def train_sweep():
    # Initialize the run (WandB Agent passes parameters automatically here)
    with wandb.init() as run:
        config = wandb.config
        
        # 1. "Pipeline" Step: Mark that we are using the Data Artifact
        # This draws the line from Data -> Run in the UI
        artifact = run.use_artifact("Training_Data:latest")
        artifact_dir = artifact.download()
        
        # (Ideally we load from artifact_dir, but variables are already in memory for speed)
        
        # 2. Train Model based on Config
        if config.model_type == "Ridge":
            model = Ridge(alpha=config.alpha, random_state=42)
        elif config.model_type == "Lasso":
            model = Lasso(alpha=config.alpha, random_state=42)
        else:
            model = LinearRegression()

        model.fit(X_train, y_train)
        preds = model.predict(X_test)

        # 3. Calculate Metrics
        mse = mean_squared_error(y_test, preds)
        rmse = np.sqrt(mse)
        r2 = r2_score(y_test, preds)
        
        # 4. Log Metrics to WandB (This is what the Sweep optimizes)
        wandb.log({"mse": mse, "rmse": rmse, "r2": r2})
        
        # 5. Log Plots (Optional: only for good runs to save space?)
        # We log residuals for every run to see how fit changes
        fig, ax = plt.subplots()
        sns.scatterplot(x=preds, y=y_test-preds, ax=ax)
        ax.axhline(0, color='r', linestyle='--')
        ax.set_title(f"{config.model_type} Residuals (Alpha={config.alpha:.4f})")
        wandb.log({"residuals": wandb.Image(fig)})
        plt.close(fig)




### 3. DEFINE SWEEP CONFIGURATIONS

 Define the hyperparameter search space and optimization strategy for the `sweep`.


In [26]:


# A. RIDGE SWEEP CONFIGURATION
ridge_sweep_config = {
    "method": "bayes",  # Use Bayesian Optimization (Smart search) since we are using same in optuna 
    "metric": {"name": "mse", "goal": "minimize"},
    "parameters": {
        "model_type": {"value": "Ridge"},
        "alpha": {"min": 0.001, "max": 10.0} # Range to search
    }
}

# B. LASSO SWEEP CONFIGURATION
lasso_sweep_config = {
    "method": "bayes",
    "metric": {"name": "mse", "goal": "minimize"},
    "parameters": {
        "model_type": {"value": "Lasso"},
        "alpha": {"min": 0.0001, "max": 1.0}
    }
}


In [27]:
# A. RUN BASELINE (Single manual run)
print("\nüèÅ Running Baseline (Linear Regression)...")
with wandb.init(project=WANDB_PROJECT, job_type="baseline", name="Linear_Regression", config={"model_type": "LinearRegression", "alpha": 0}) as run:
    run.use_artifact("Training_Data:latest")
    model = LinearRegression()
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    wandb.log({"mse": mean_squared_error(y_test, preds)})
    print(f"   Baseline MSE: {mean_squared_error(y_test, preds):.6f}")


# B. RUN RIDGE SWEEP
print("\nüöÄ Starting Ridge Sweep (Automated Tuning)...")
# 1. Register the sweep on the server
ridge_sweep_id = wandb.sweep(ridge_sweep_config, project=WANDB_PROJECT)
# 2. Start the Agent (The Robot) - It will run 'train_sweep' 10 times
wandb.agent(ridge_sweep_id, function=train_sweep, count=10)


# C. RUN LASSO SWEEP
print("\nüöÄ Starting Lasso Sweep (Automated Tuning)...")
lasso_sweep_id = wandb.sweep(lasso_sweep_config, project=WANDB_PROJECT)
wandb.agent(lasso_sweep_id, function=train_sweep, count=20)


print("\n‚úÖ Automated Pipeline Complete.")
print(f"üëâ Go to: https://wandb.ai/home -> Click '{WANDB_PROJECT}'")
print("   1. Click 'Sweeps' on the left to see the tuning graphs.")
print("   2. Click 'Artifacts' on the left to see the Data -> Model pipeline graph.")


üèÅ Running Baseline (Linear Regression)...


[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


   Baseline MSE: 2900.193628


0,1
mse,‚ñÅ

0,1
mse,2900.19363



üöÄ Starting Ridge Sweep (Automated Tuning)...
Create sweep with ID: 1erhyq66
Sweep URL: https://wandb.ai/varunraste-fractal/Automated_Regression_Pipeline/sweeps/1erhyq66


[34m[1mwandb[0m: Agent Starting Run: xsm6i7ek with config:
[34m[1mwandb[0m: 	alpha: 1.859296415048256
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,3342.39896
r2,0.36914
rmse,57.81348


[34m[1mwandb[0m: Agent Starting Run: l5zlt85i with config:
[34m[1mwandb[0m: 	alpha: 5.768975949783191
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,4074.66781
r2,0.23093
rmse,63.83312


[34m[1mwandb[0m: Agent Starting Run: 5hxbuqoe with config:
[34m[1mwandb[0m: 	alpha: 1.857615392911801
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,3341.92473
r2,0.36923
rmse,57.80938


[34m[1mwandb[0m: Agent Starting Run: 303wxv5y with config:
[34m[1mwandb[0m: 	alpha: 0.48008289352349515
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2911.51572
r2,0.45047
rmse,53.95846


[34m[1mwandb[0m: Agent Starting Run: 4l77ibyz with config:
[34m[1mwandb[0m: 	alpha: 0.03136663395598599
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2872.25149
r2,0.45788
rmse,53.59339


[34m[1mwandb[0m: Agent Starting Run: 0ql5839t with config:
[34m[1mwandb[0m: 	alpha: 9.99879202858138
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,4443.87713
r2,0.16124
rmse,66.66241


[34m[1mwandb[0m: Agent Starting Run: vq33xpbq with config:
[34m[1mwandb[0m: 	alpha: 0.009463998428762731
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2882.68313
r2,0.45591
rmse,53.69062


[34m[1mwandb[0m: Agent Starting Run: a2b8qpc4 with config:
[34m[1mwandb[0m: 	alpha: 0.17988683609067846
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2853.80355
r2,0.46136
rmse,53.421


[34m[1mwandb[0m: Agent Starting Run: tkfjzzab with config:
[34m[1mwandb[0m: 	alpha: 0.12435405708477452
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2854.23364
r2,0.46128
rmse,53.42503


[34m[1mwandb[0m: Agent Starting Run: 2wnt8hws with config:
[34m[1mwandb[0m: 	alpha: 3.629919459236493
[34m[1mwandb[0m: 	model_type: Ridge


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,3751.42191
r2,0.29194
rmse,61.24885



üöÄ Starting Lasso Sweep (Automated Tuning)...
Create sweep with ID: as8sw77k
Sweep URL: https://wandb.ai/varunraste-fractal/Automated_Regression_Pipeline/sweeps/as8sw77k


[34m[1mwandb[0m: Agent Starting Run: 879s6auf with config:
[34m[1mwandb[0m: 	alpha: 0.6064662510166131
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,3016.01737
r2,0.43074
rmse,54.91828


[34m[1mwandb[0m: Agent Starting Run: gh39x2vg with config:
[34m[1mwandb[0m: 	alpha: 0.45556093887088334
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2922.59122
r2,0.44838
rmse,54.061


[34m[1mwandb[0m: Agent Starting Run: 3bp1t7up with config:
[34m[1mwandb[0m: 	alpha: 0.18214409646566432
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2813.96146
r2,0.46888
rmse,53.04679


[34m[1mwandb[0m: Agent Starting Run: 8lukk29c with config:
[34m[1mwandb[0m: 	alpha: 0.09812840785707208
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.26821
r2,0.47184
rmse,52.89866


[34m[1mwandb[0m: Agent Starting Run: 7k5zk75c with config:
[34m[1mwandb[0m: 	alpha: 0.0008384152660950126
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2897.0035
r2,0.4532
rmse,53.82382


[34m[1mwandb[0m: Agent Starting Run: 6hgfrlbf with config:
[34m[1mwandb[0m: 	alpha: 0.932758262839146
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,3325.94399
r2,0.37224
rmse,57.671


[34m[1mwandb[0m: Agent Starting Run: o8e9i7uo with config:
[34m[1mwandb[0m: 	alpha: 0.127459804662238
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2799.31925
r2,0.47164
rmse,52.90859


[34m[1mwandb[0m: Agent Starting Run: f8e4rf55 with config:
[34m[1mwandb[0m: 	alpha: 0.2849131907024542
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2862.08479
r2,0.4598
rmse,53.49846


[34m[1mwandb[0m: Agent Starting Run: c69ic7m9 with config:
[34m[1mwandb[0m: 	alpha: 0.11009863439587804
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.14045
r2,0.47186
rmse,52.89745


[34m[1mwandb[0m: Agent Starting Run: i3mizwtv with config:
[34m[1mwandb[0m: 	alpha: 0.10413036648492947
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.0958
r2,0.47187
rmse,52.89703


[34m[1mwandb[0m: Agent Starting Run: 9756quqf with config:
[34m[1mwandb[0m: 	alpha: 0.1160906996002944
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.36149
r2,0.47182
rmse,52.89954


[34m[1mwandb[0m: Agent Starting Run: u83xs7aj with config:
[34m[1mwandb[0m: 	alpha: 0.1015548357746502
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.14579
r2,0.47186
rmse,52.8975


[34m[1mwandb[0m: Agent Starting Run: mpwc3ao2 with config:
[34m[1mwandb[0m: 	alpha: 0.1067852311422064
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.10226
r2,0.47187
rmse,52.89709


[34m[1mwandb[0m: Agent Starting Run: pkcm7k6g with config:
[34m[1mwandb[0m: 	alpha: 0.14542907493774232
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2802.26344
r2,0.47109
rmse,52.93641


[34m[1mwandb[0m: Agent Starting Run: g8gdkyj7 with config:
[34m[1mwandb[0m: 	alpha: 0.10458527399410956
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.09076
r2,0.47187
rmse,52.89698


[34m[1mwandb[0m: Agent Starting Run: vzycf48y with config:
[34m[1mwandb[0m: 	alpha: 0.10580937225096546
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.08302
r2,0.47188
rmse,52.89691


[34m[1mwandb[0m: Agent Starting Run: vi0ek0uv with config:
[34m[1mwandb[0m: 	alpha: 0.08982997677179753
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.83103
r2,0.47173
rmse,52.90398


[34m[1mwandb[0m: Agent Starting Run: yh7br7hj with config:
[34m[1mwandb[0m: 	alpha: 0.10756964001346234
[34m[1mwandb[0m: 	model_type: Lasso


[34m[1mwandb[0m:   2 of 2 files downloaded.  
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.10589
r2,0.47187
rmse,52.89713


[34m[1mwandb[0m: Agent Starting Run: oy2i6wg1 with config:
[34m[1mwandb[0m: 	alpha: 0.10528298636614704
[34m[1mwandb[0m: 	model_type: Lasso


Traceback (most recent call last):
  File "c:\Users\Satej Raste\AppData\Local\Programs\Python\Python311\Lib\site-packages\wandb\sdk\wandb_init.py", line 1004, in init
    result = wait_with_progress(
             ^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Satej Raste\AppData\Local\Programs\Python\Python311\Lib\site-packages\wandb\sdk\mailbox\wait_with_progress.py", line 23, in wait_with_progress
    return wait_all_with_progress(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Satej Raste\AppData\Local\Programs\Python\Python311\Lib\site-packages\wandb\sdk\mailbox\wait_with_progress.py", line 77, in wait_all_with_progress
    return asyncer.run(progress_loop_with_timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Satej Raste\AppData\Local\Programs\Python\Python311\Lib\site-packages\wandb\sdk\lib\asyncio_manager.py", line 136, in run
    return future.result()
           ^^^^^^^^^^^^^^^
  File "c:\Users\Satej Raste\AppData\Local\Programs\Python\Python311\Lib\concu

[34m[1mwandb[0m:   2 of 2 files downloaded.  


0,1
mse,‚ñÅ
r2,‚ñÅ
rmse,‚ñÅ

0,1
mse,2798.18475
r2,0.47186
rmse,52.89787



‚úÖ Automated Pipeline Complete.
üëâ Go to: https://wandb.ai/home -> Click 'Automated_Regression_Pipeline'
   1. Click 'Sweeps' on the left to see the tuning graphs.
   2. Click 'Artifacts' on the left to see the Data -> Model pipeline graph.





## Conclusion

This notebook has provided a comprehensive introduction to experiment tracking in machine learning, demonstrating its critical role in model development and iteration. We explored two leading platforms, MLFlow and Weights & Biases (W&B), to track and manage our experiments.

**Key Concepts Learned:**

*   **Importance of Experiment Tracking**: We began by understanding why systematically recording training runs, parameters, metrics, and models is essential for reproducibility, comparison, collaboration, debugging, and auditability in ML projects.
*   **MLFlow for Experiment Management**:
    *   We learned how to initialize MLFlow runs and log various aspects of our models, including parameters (`model_name`, `alpha`), performance metrics (`mse`, `r2_score`), and the trained models themselves using `mlflow.log_param`, `mlflow.log_metric`, and `mlflow.sklearn.log_model`.
    *   We applied this to Linear Regression, Lasso Regression, and Ridge Regression models, noting their individual performance characteristics.
    *   We also covered how to launch and navigate the MLFlow UI to compare runs and inspect logged artifacts.
*   **Weights & Biases for Enhanced Visualization and Collaboration**:
    *   We introduced W&B, highlighting its benefits for rich visualizations, easy run comparison, collaboration, and detailed reporting.
    *   We learned to authenticate with `wandb.login()` and initialize runs using `wandb.init()` within a defined project.
    *   Similar to MLFlow, we tracked parameters and metrics for our Linear, Lasso, and Ridge regression models using `wandb.config` and `wandb.log`. We also demonstrated logging models as artifacts for versioning.
    *   We explored how to leverage W&B's UI to create comparative dashboards and analyze different model runs effectively.
*   **Conceptual Report Generation**: We discussed the significance of preparing reports from experiment tracking data and outlined how both MLFlow and W&B facilitate this process, including programmatic data retrieval and conceptual summary tables.

By applying both MLFlow and W&B to track the performance of Linear, Lasso, and Ridge regression models on the diabetes dataset, we saw firsthand how these tools enable:

*   **Systematic Comparison**: Quickly comparing MSE and R2 scores across different models and hyperparameters.
*   **Reproducibility**: Ensuring that each model's configuration and results are recorded and can be recreated.
*   **Insight Generation**: Identifying that Lasso Regression, with a specific `alpha`, achieved a slightly better R2 score on our test set compared to Linear and Ridge Regression, suggesting potential benefits of L1 regularization for this dataset.

Ultimately, mastering experiment tracking with tools like MLFlow and Weights & Biases is crucial for any data scientist or machine learning engineer looking to build robust, reproducible, and explainable models efficiently and collaboratively.
```

## Summary:

### Data Analysis Key Findings
*   The subtask, which required creating a markdown cell to introduce experiment tracking in machine learning, was fulfilled by an existing markdown cell already present in the notebook.
*   This existing cell comprehensively covered the definition of experiment tracking and its importance for reproducibility, comparison, analysis, collaboration, debugging, optimization, auditability, and compliance.

### Insights or Next Steps
*   The task was successfully completed as the necessary introductory content for experiment tracking was already available and appropriately placed in the notebook, avoiding redundant content creation.
