#### The next cell installs all required Python packages listed in requirements.txt.
#### This ensures that the notebook has all necessary dependencies (such as scikit-learn, pandas, mlflow, etc.) before running any code related to model training or evaluation.


In [None]:
! pip install -r requirements.txt

In [1]:
import sys
sys.path.append('../src')

#### The next cell imports the `main` function from the `train.py` script located in the `src` directory.
#### This allows us to run the entire model training and evaluation pipeline defined in that script directly from the notebook.


In [None]:
from train import main

 #### The next cell calls the `main()` function from the `train.py` script.
 #### This executes the full model training and evaluation workflow, including:
 #### - Loading and preprocessing the data
 #### - Splitting the data into training and test sets
 #### - Training multiple machine learning models with hyperparameter tuning
 #### - Evaluating model performance using relevant metrics
 #### - Logging results and models to MLflow for experiment tracking and reproducibility
 #### - Registering the best-performing model in the MLflow Model Registry
 #### The output will display the best model and its evaluation metrics.


In [17]:
main()

Loading data from: /content/data/processed
One-hot encoding columns: []
Label encoding columns: []
Training LogisticRegression...




LogisticRegression metrics:
  accuracy: 0.9999477342810851
  precision: 0.9999452054794521
  recall: 1.0
  f1: 0.9999726019890955
  roc_auc: 0.9999999999999999
Training RandomForest...


Registered model 'CreditRiskModel_BestModel' already exists. Creating a new version of this model...


RandomForest metrics:
  accuracy: 1.0
  precision: 1.0
  recall: 1.0
  f1: 1.0
  roc_auc: 1.0
Registering best model: RandomForest (ROC-AUC: 1.0000)
Best model: RandomForest
Best metrics:
  accuracy: 1.0
  precision: 1.0
  recall: 1.0
  f1: 1.0
  roc_auc: 1.0


Created version '4' of model 'CreditRiskModel_BestModel'.


# Summary of Output:
- MLflow issued warnings about the use of deprecated parameters (`artifact_path`) and missing model signatures/input examples.
- The model 'CreditRiskModel_BestModel' was already registered in MLflow; a new version was created and registered.
- The process completed successfully, and version '4' of the model was created in the MLflow Model Registry.


# Analysis of Output:
 
 1. **MLflow Warnings:**
  - The output includes warnings from MLflow about deprecated parameters (`artifact_path`) and missing model signatures/input examples.
  - These warnings do not stop the process but indicate that future versions of MLflow may require updates to the code for full compatibility and best practices.
  - Specifically, the warning about missing model signature suggests that input/output schema is not automatically captured, which could affect model deployment or reproducibility.

 2. **Model Registration:**
  - The output shows that the model 'CreditRiskModel_BestModel' was already registered in the MLflow Model Registry.
  - Instead of failing, MLflow created a new version (version '4') of the existing registered model.
  - This is expected behavior when retraining and re-registering models with the same name.
 3. **Process Completion:**
  - Despite the warnings, the process completed successfully.
  - The best model was selected, logged, and registered in MLflow, ensuring experiment tracking and model versioning.
 4. **Implications:**
   - The workflow is robust and supports iterative model development.
   - To further improve, consider updating the MLflow logging code to include model signatures and input examples, and replace deprecated parameters.
   - The MLflow Model Registry now contains multiple versions of the best model, which is useful for model comparison and rollback if needed.
