### Getting Started With ML Project With MLFLOW

- Installing MLflow.

- Starting a local MLflow Tracking Server.

- Logging and registering a model with MLflow.

- Loading a logged model for inference using MLflow’s pyfunc flavor.

- Viewing the experiment results in the MLflow UI.

### Install MLFlow

In [None]:
pip install mlflow

### Start Local MLflow Tracking Server

1. Open Terminal.

2. Run the below command to start MLflow on localhost on port `5001`:
   ```bash
   python -m mlflow server --host 127.0.0.1 --port 5001

## Access MLFlow Server

Open your browser and access the MLflow server at the following link:

   http://127.0.0.1:5001

### Import packages

In [11]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_absolute_error, r2_score
import mlflow
import mlflow.sklearn

### Load and preprocess the input

In [12]:
# Load the dataset
file_path = 'Walmart.csv' 
dataset = pd.read_csv(file_path)

# Convert 'transaction_date' to datetime and extract features
dataset['transaction_date'] = pd.to_datetime(dataset['transaction_date'], errors='coerce')
dataset['transaction_day'] = dataset['transaction_date'].dt.day
dataset['transaction_month'] = dataset['transaction_date'].dt.month
dataset['transaction_weekday'] = dataset['transaction_date'].dt.weekday
dataset['transaction_year'] = dataset['transaction_date'].dt.year
dataset = dataset.drop(columns=['transaction_date'])

# Encode categorical variables
categorical_columns = ['category', 'store_location', 'payment_method', 'promotion_applied', 
                       'promotion_type', 'weather_conditions', 'holiday_indicator', 'weekday', 
                       'customer_loyalty_level', 'customer_gender']

encoder = LabelEncoder()
for col in categorical_columns:
    dataset[col] = encoder.fit_transform(dataset[col].astype(str))

# Define features (X) and target (y)
X = dataset.drop(columns=['transaction_id', 'customer_id', 'product_id', 'product_name', 'actual_demand'])
y = dataset['actual_demand']

### Train the Model

In [13]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define hyperparameters for the model
params = {
    "n_estimators": 100,       # Number of trees
    "max_depth": None,         # No maximum depth
    "min_samples_split": 2,    # Minimum samples required to split an internal node
    "min_samples_leaf": 1,     # Minimum samples required to be a leaf node
    "random_state": 42         # Random seed for reproducibility
}

# Initialize and train the Random Forest Regressor
model = RandomForestRegressor(**params)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
print("Walmart Demand Forecasting model is trained")

Walmart Demand Forecasting model is trained


 ### Evaluate the Model

In [14]:
# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Output the results
print(f"Mean Absolute Error: {mae}")
print(f"R2 Score: {r2}")

Mean Absolute Error: 107.17233
R2 Score: -0.03293575634425849


### Track the experiment with MLFlow

In [15]:
### MLFLOW tracking
mlflow.set_tracking_uri(uri="http://127.0.0.1:5001")
mlflow.set_experiment("Walmart Demand Forecast Model Experiment")

# Start an MLflow run
with mlflow.start_run():

    # Log parameters, metrics, and the model
    mlflow.log_params(params)
    mlflow.log_metric("mean absolute error", mae)
    mlflow.log_metric("r2 Score", r2)
    mlflow.sklearn.log_model(model, "random_forest_regressor_model")

# End MLflow run
mlflow.end_run()

    



🏃 View run bedecked-gull-604 at: http://127.0.0.1:5001/#/experiments/304321649578051944/runs/c8ba87d4c0614c7999faebfc85c3de88
🧪 View experiment at: http://127.0.0.1:5001/#/experiments/304321649578051944


### Retrain the model with different params

In [None]:
# Define hyperparameters for the model
params = {
    "n_estimators": 150,       # Number of trees
    "max_depth": 5,         # No maximum depth
    "min_samples_split": 2,    # Minimum samples required to split an internal node
    "min_samples_leaf": 1,     # Minimum samples required to be a leaf node
    "random_state": 42         # Random seed for reproducibility
}

# Initialize and train the Random Forest Regressor
model = RandomForestRegressor(**params)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
print("Walmart Demand Forecasting model is Re-trained")

Titanic Prediction Model is trained


### Model Inference

In [17]:
# Load the dataset
file_path = 'Walmart_validation.csv' 
dataset = pd.read_csv(file_path)

# Convert 'transaction_date' to datetime and extract features
dataset['transaction_date'] = pd.to_datetime(dataset['transaction_date'], errors='coerce')
dataset['transaction_day'] = dataset['transaction_date'].dt.day
dataset['transaction_month'] = dataset['transaction_date'].dt.month
dataset['transaction_weekday'] = dataset['transaction_date'].dt.weekday
dataset['transaction_year'] = dataset['transaction_date'].dt.year
dataset = dataset.drop(columns=['transaction_date'])

# Encode categorical variables
categorical_columns = ['category', 'store_location', 'payment_method', 'promotion_applied', 
                       'promotion_type', 'weather_conditions', 'holiday_indicator', 'weekday', 
                       'customer_loyalty_level', 'customer_gender']

encoder = LabelEncoder()
for col in categorical_columns:
    dataset[col] = encoder.fit_transform(dataset[col].astype(str))

# Define features (X) and target (y)
X = dataset.drop(columns=['transaction_id', 'customer_id', 'product_id', 'product_name', 'actual_demand'])

In [20]:
# Load the model
logged_model = 'runs:/c8ba87d4c0614c7999faebfc85c3de88/random_forest_regressor_model'
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on validation data
predictions = loaded_model.predict(X)

# Display predictions
print(predictions)

[227.43 405.39 372.31 378.35 388.77 282.41 267.1  275.21 267.35]


### Access Registered Model

In [21]:
# Load the model
model_name="Walmart_Demand_Forecast"
model_version="latest"
model_uri = f'models:/{model_name}/{model_version}'
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on validation data
predictions = loaded_model.predict(X)

# Display predictions
print(predictions)

[227.43 405.39 372.31 378.35 388.77 282.41 267.1  275.21 267.35]
