# Paddy Yield Prediction Model - Training Notebook

This notebook trains machine learning models for paddy yield prediction and generates advisory outputs.

## Overview
- **Input**: 5 features (Temperature, Soil pH, Rainfall, Field Area, Humidity)
- **Output 1**: Numeric predictions for yield and agricultural parameters
- **Output 2**: Categorical predictions for methods and advice (text classification)
- **Models**: Random Forest Regressor + Multi-Output Classifier
- **Artifacts**: Saved to `backend/model/` as joblib pickle files

In [None]:
import os
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.multioutput import MultiOutputClassifier
import joblib

# ============================================
# 1. Load Dataset
# ============================================
# Get the project root (navigate up from backend/model/)
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))
dataset_path = os.path.join(PROJECT_ROOT, 'data', 'SriLanka_Paddy_ML_Dataset.csv')

print(f"Loading dataset from: {dataset_path}")
df = pd.read_csv(dataset_path)

# Display basic info
print(f"\nDataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(f"\nFirst few rows:\n{df.head()}")

['E:\\SmartPaddyAdvisor\\backend\\model\\paddy_model_numeric.pkl']

In [None]:
# ============================================
# 2. Train Models (Numeric + Text Outputs)
# ============================================

# Prepare input features
X = df[["Temperature_C", "Soil_pH", "Rainfall_mm", "FieldArea_ha", "Humidity_%"]]

# Prepare numeric outputs
y_numeric = df[["PredictedYield_kg_ha",
                "PloughDepth_cm",
                "SoilAdjustment_kgLime",
                "SeedAmount_kg",
                "PlantSpacing_cm",
                "Fertilizer_Basal_Urea_kg",
                "Fertilizer_Basal_TSP_kg",
                "Fertilizer_Basal_MOP_kg",
                "Fertilizer_2ndDose_Urea_kg",
                "Fertilizer_2ndDose_TSP_kg",
                "Fertilizer_2ndDose_MOP_kg"]]

# Train numeric model (Multi-output regression)
print("Training numeric model...")
model_numeric = RandomForestRegressor(n_estimators=100, random_state=42)
model_numeric.fit(X, y_numeric)
print("✅ Numeric model trained successfully!")

# Prepare text outputs
y_text = df[["PloughMethod",
             "IrrigationAdvice",
             "WaterManagementAdvice_Stage4",
             "TillerIncreaseTip",
             "WaterControlAdvice_Stage5",
             "WaterControlAdvice_Stage6",
             "PesticideSuggestion",
             "PostHarvestAdvice"]]

# Encode text labels
print("\nEncoding text labels...")
label_encoders = {}
for col in y_text.columns:
    le = LabelEncoder()
    y_text[col] = le.fit_transform(y_text[col])
    label_encoders[col] = le
print("✅ Text labels encoded successfully!")

# Train text model (Multi-output classification)
print("\nTraining text model...")
model_text = MultiOutputClassifier(RandomForestClassifier(n_estimators=100, random_state=42))
model_text.fit(X, y_text)
print("✅ Text model trained successfully!")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y_text[col] = le.fit_transform(y_text[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y_text[col] = le.fit_transform(y_text[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y_text[col] = le.fit_transform(y_text[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try us

['E:\\SmartPaddyAdvisor\\backend\\model\\label_encoders.pkl']

In [None]:
# ============================================
# 3. Save Models and Artifacts
# ============================================

# Create model folder using relative path
model_folder = os.path.join(PROJECT_ROOT, 'backend', 'model')
os.makedirs(model_folder, exist_ok=True)

# Save numeric model
numeric_path = os.path.join(model_folder, 'paddy_model_numeric.pkl')
joblib.dump(model_numeric, numeric_path)
print(f"✅ Numeric model saved to: {numeric_path}")

# Save text model
text_path = os.path.join(model_folder, 'paddy_model_text.pkl')
joblib.dump(model_text, text_path)
print(f"✅ Text model saved to: {text_path}")

# Save label encoders
encoders_path = os.path.join(model_folder, 'label_encoders.pkl')
joblib.dump(label_encoders, encoders_path)
print(f"✅ Label encoders saved to: {encoders_path}")

print("\n" + "="*50)
print("✅ ALL MODELS TRAINED AND SAVED SUCCESSFULLY!")
print("="*50)

In [None]:
# ============================================
# 4. Verify Models Work (Test Predictions)
# ============================================

print("Testing predictions with sample data...\n")

# Create a sample input
sample_input = pd.DataFrame({
    'Temperature_C': [28.5],
    'Soil_pH': [6.5],
    'Rainfall_mm': [150.0],
    'FieldArea_ha': [1.5],
    'Humidity_%': [75.0]
})

# Predict numeric outputs
numeric_pred = model_numeric.predict(sample_input)
print("Numeric Predictions (sample):")
print(f"  - Predicted Yield (kg/ha): {numeric_pred[0][0]:.2f}")
print(f"  - Plough Depth (cm): {numeric_pred[0][1]:.2f}")

# Predict text outputs
text_pred = model_text.predict(sample_input)
print("\nText Predictions (encoded values):")
for i, col in enumerate(label_encoders.keys()):
    print(f"  - {col}: {text_pred[0][i]}")

print("\n✅ Models are working correctly!")

  dataset_path = "E:\SmartPaddyAdvisor\data\SriLanka_Paddy_ML_Dataset.csv"
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y_text[col] = le.fit_transform(y_text[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y_text[col] = le.fit_transform(y_text[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y_text[col] = le.fit_transform(y_text[col]

✅ All models trained and saved successfully!
