# ML Workflow

## A Machine Learning Project's Life Cycle

Here I will describe a custom Cross Industry Standart Process for Machine Learning (CRISP-ML) lifecycle in my ML projects.
The framework of ML project lifecycle steps are defined in the figure with backtracking below:

![ML Life Cycle](image-20220613-061022.png "ML Life Cycle")

**Important!**

Each step must go over a quality assurance procedure to ensure that errors are caught as early as possible to minimize costs in the later stages of the development.

## __Table of Contents__

1. [Business Understanding]()
    1. Define business objectives
    2. Translate business objectives into ML objectives
    3. Collect and verify the raw/input data
    4. Assess the project feasibility
    5. Agree on the scope and timeline for a POC 
2. [Literature & Best Practices Review]()
    1. Define the important criteria of the project to filter out less related works
    2. Search for open source products and SaaS providers for similar objectives
    3. Look out only for mature projects, academic literature and practical tutorials 
    4. Make a pros/cons table for all methods with their summaries and comparisons 
    5. Make a subset of chosen methods and reasonings
3. [Data Understanding]()
    1. Talk to the data analyst of the product to understand their data sources including their collection frequency and all limitations
    2. Verify the integrity and quality; save the criteria and indices of the parts where the data is more/less reliable (in terms of both rows and columns)
    3. Do not aggregate the data in this phase yet and focus on the more detailed granular levels
    4. Create a data catalog with explained data source and constraints
    5. Push the data with individual tables to DVC with `raw` tag
4. [Data Preparation & Preprocessing]()
    1. Feature selection (columns)
    2. Data-Sample selection (rows)
    3. Class balancing
    4. Cleaning data (noise reduction, missing data imputation, normalisation, outlier detection etc.)
    5. Feature engineering  
    6. Data augmentation
    7. Data standardization 
    8. Merging and aggregating the data in preparation of the final data set. Pushing it to DVC with `dataset` tag
    9. Split data (train, val, test), and push indices to DVC
5. [Modelling]()
    1. Setting an (MLflow) environment for experiments: Documenting the trials
    2. Define quality measure of the model
    3. Baseline model selection
    4. Adding domain knowledge to specialize the model
    5. Model training
    6. Optional: using transfer learning with a pre-trained model
    7. Model compression
    8. Ensemble learning
6. [Evaluation]()
    1. Validate the model's performance
    2. Determine robustness
    3. Increase model's explainability
    4. Make a decision whether to deploy the model
    5. Document the evaluation criteria 
7. [Production & Deployment]()
    1. Evaluate model once more on real world data in production conditions
    2. Setup continuous integration pipeline for re-training, versioning, and deployment of model
    3. Specify the deployment strategy to be used (A/B testing, multi-armed bandits) 
    4. Assure user acceptance and usability
    5. Define procedures for model governance to monitor while maintenance
8. [Monitoring & Maintenance]()
    1. Monitor the efficiency and efficacy of the deployed model's predictions 
    2. Create a dashboard tracks the predefined success criteria (e.g. quality thresholds)
    3. Retrain the model if/when required on new data
    4. Perform labelling of the new data points
    5. Repeat tasks from the _modeling_ and _evaluation_ steps

## Sources
1. [The Paper](https://arxiv.org/pdf/2003.05155.pdf)
1. [Andrew NG's course on Coursera](https://www.coursera.org/learn/introduction-to-machine-learning-in-production)
1. https://christophergs.com/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/
1. https://ml-ops.org/content/crisp-ml
1. https://towardsdatascience.com/machine-learning-in-production-why-you-should-care-about-data-and-concept-drift-d96d0bc907fb