### Machine Learning Architecture
* Machine learning in produciton requires multiple components in order to work:
  + infrastructure: hardware, network components, and os
  + applications
  + data
  + documentation
  + configuration
* Architecture
  + ISO/IEC 42010 defines architecture as:
    Fundamental concepts or properities of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution
  + in plain English: The way software components are arranged and the interactions between them
* develop and deploy ML system can be fast and cheap, however
  + maintaining ML systems is difficult
    + All challenges of traditional software systems plus new challenges for model and data changes    

### Challenges of deploying and maintaining ML systems
* ML systems are complex
  + ML system is the black box in the center with many components surronding it
* the need for reproducibility (versioning everywhere)
* data dependencies
  + models may be trained on data from many different sources (in-house, external, or from API)
* configuration issues
  + model hyperparameters, versions, requirements, data sources can all be changed and modified via config
    + e.g. a yaml file in your source code. Is this tested?
* Data and feature preparation
  + the steps required to prepare data and transform it into features for the model may be complex
  + A typical pipeline requires us to 
    + transform numerical data
    + transform categorical data
    + handle outliers
    + derive features from raw data
    + many other tasks
  + we need to make sure we deply code to prodcution in a way taht does not cause differences between research and production
* detecing model errors
  + traditional tests ofter do not detect errors in ML systems by catching exceptions
  + when you deploy a model which performs worse, no exceptions are raised
  + you api will not return any 500 status codes
  + standard tests will not catch these sorts of mistakes
  + we need to desing alternative ways of capturing and detecting those mistakes in our models
* make up of the team to consider when it comes to ML pipeline deployment
  + data scientists work on research environment
  + engineers work more on production apps
  + dev ops engineers responsible for system and infrastructure
  + product owners or the business who have the best understanding of requirements of the system
  + a breakdown of communcation between any one of these teams can lead to a deployment under performing
    + people working in different disciplines need to work together and communicate
* it worth getting strategic about the ML system deployments and think about the architecture     
  

### Key principles for ML system
* best practices are still being established and there is a lot of contradicting advice
* really useful to pay attention to what some of the large technology companies are publishing in this field
  + they have been doing ML deployments at scale for a relatively long period of time
* automation of all stages of the ML workflow
  + get rid of manual steps to get a model to production ( manual steps create room for error)
  + how to do?
    + adding data processing and feature engineering steps to our production code and use CI/CD to automatically version and deploy models
    + any time you find yourself manually running a script, ssh to a remote server, processing data on local machine, you should autmate it
* reproducibility
  + everything to do with ML system should be under version control, even for primilinary iterations of model
    + you can pinpoint code and parameters when investigating model or piepline
  + every model specification undergoes a code review and is checked into a repository
  + models can be quickly and safely rolled back to a previous serving version
    + should be an easy way to undo a deployment so that the previous model version is restored
  + versioning
    + each model is tagged with a provenance tag that explains with which data it has been trained on and which version of the model
    + each dataset is tagged with information about where it originated from and which version of the code was used to extract it (and any related features)
    + synmatic version: Major.Minor.Patch-prerelease.1+meta
* testing
  + the full ML pipeline is integration tested by testing the entire pipeline (change in one compoent may affetct components down the pipeline)
    + challenge: take long time to train 
      + use subset of data
      + simpler model
  + all input feature code is tested
    + unit test should exist for feature engineering and preprocessing steps
    + this code is ofter quite complex and poorly understood, so it is a common source of model errors
  + model specification code is unit tested
    + model configuration is tested (expected value ranges and enums in hyper paratmeters and protect from mistyped config)
  + model quality is validated before attempting to serve it
    + couple of quality issues
      + sudden degradation (usually caused by a bug in new version)
        + can be done by testing quality with previous versions
      + slow degradation of model quality
        + using a fixed dedicated test dataset to test using a particular threshold or benchmark when there is a new release
* infrastructure
  + models are tested via a shadow (not deploy to users) or canary process before they enter production serving environments
  + monitor the mode performance
    + always need a way to tack its performance when deploy models
* run through the checklist
  + it is useful to observe the links and dependencies across different best practices:
    + without model specification review and version control, it would be hard for reproducible training
    + without reproducible training, the effectiveness and predictability of canary release are significantly reduced
      + because you are not sure you are testing the thing that you think you have changed
    + without knowing the impact of model staleness, it's hard to implement effective monitoring
      + because you don't know what you are looking for in your monitoring

### Architecture approaches for ML systems
* Model embedded in application
  + model is pre-trained
  + prediction-on-the-fly
  + model artifact packaged within the consuming application
  + variations: embedded on mobile device, run in broswer, or in Dijango app that provide user interface
  + trade off simplicity against flexibility
    + to update the model needs to re-deploy the app
* served via a dedicated service
  + a dedicated model API
  + example: django send request to a separate, dedicated ML API service, and fetch back results to users
  + increase the complexity of maintaing a separate service
  + have the flexibility to keep model deployments separated from main app deployment
  + can scale up server and app separately to support high traffic
* model published as data(streaming)
  + ingest new version of models at runtime
  + complex implementation, but can seamlessly upgrade models
* batch prediction (offline process)
  + predictions are triggered and run asynchronously, by app or on a scheduled job
  + after a few hours or even days, the predictions would be collected in a database or some form of storage
  + app servs the predictions via a dashboard, or a report or any UI
  + we can check prediction before exposing to users
  + we can re-run predictions for mistakes
  + got more flexibility and less chance to make mistakes, for inability oto offer predictions on the fly

### Architecture Component Breakdown
* High level architecture
  + data layer
    + functions or even entire apps to load and process training data
    + maybe complex, pulling data from multiple database or a hadoop/distributed file system/api calls
    + the purpose is to prepare the data so that next step can run
  + feature layer
    + feature extraction using app or scripts to generate features or entire modules
  + scoring layer
    + model builder
  + evalutaon layer
  + pipeline of data layer, feature laye and model builder (offline batch mode)
    + model builder 
      + persist our trained model into the supported format
      + vesion the persisted models and ensure that they are in a format where they can be deployed
    + data layer (training data), feature layer (feature extractor) and model builder are
      + grouped into a pipeline to ensure the steps are always run in the same order
      + also help us pinpoint failures 
    + for simpler pipelines, we can use the built-in pipeline function of sklearn or pandas
    + for complex pieplines, we can use apache spark or apache airflow
    + the pipeline is operated via a CI platform to ensure the process is automated
    + the output of the pipeline is a tained model which can be published and consumed as a dependency
      + either directly by our app
      + or by a dedicated ML microservice
  + pipeline/structure for online live prediction
    + ML trained model embeded as an inpendency
    + consists of a REST API to accept inputs from users to predict
    + users' inputs are cleaned and prepared in real time by a feature extraction module
      + feature extraction module should mirror the code used in offline training phase as close as possible
    + predictions are generated by the pre-trained model and returned to the client on the fly    

### CI/CD Automation of the Deployment
* application code and data to train model in an fashion. This training produces artifacts
* main artifacts includes
  + trained models
  + doker images to snap shot application and its dependencies for quick deployment
  + use these artifacts to deploy ML app in an automated way to deploy (platform as a severice paas) or infrastructure as a service (iaas)

### Serving ML Models - Formats
* Serializing the model object with pickle
* MLFlow provides a common serialiation format for exporting/importing spark, scikit-learn, and tensorflow models
* language-agnostic exchagne formats to share models, such as PMML, PFA and ONNX


### Production code
* production code is designed to be deployed to end users
* not for experimentation, proof of concept, usually short term in nature
* testability and maintainability are huge for production code
* divide code into modules, which is more extensible and easier to test
* separate config from code where possible
* ensure functionality is tested and documented
* code adheres to standard such as pep8 so it is easy for others to read
* scalability and performance are also important
  + code need to be ready to be deployed to infrastructure that can be scaled
  + in modern web app, this means containerisation for vertical or horizontal scaling
* refactor inefficient part of the code base
* reproducibility
  + the code resides under version control with clear processes for tracking releases and release versions
  + requirements, files, mark which dependencies and which versions are used by the code  

### Python package
* a module is a file which contains various python functions and global variables. It is just a file with a .py extension which has python executable code
* A package is a collection of modules
  + in addition, it has certain standardized files which have to be present so that it
  + we have to follow certain python standards and conventions
    + can be published 
    + installed in other python applications
* why use package
  + a package allows us to wrap our train model and make it available to other consuming applications as dependency
  + with the additional benefits of version control, clear metadata and reproducibility
  