# MLOps 

- Special Topics in IDS (IDS 594)
- Version: 2021-01-20
- Term: Spring 2021
- CRN: 44420
- Number of Credits: 2
- Instructor: Dr. Theja Tulabandhula

This practice-oriented course surveys modern best practices around getting machine learning (ML) models into production. It continues where IDS 572 and IDS 575 left off, which is to learn multiple ways of operationalizing machine learning workflows and models in the context of the larger business end-goals. The course is complementary to IDS 561. We will gain a better understanding of strategies for model management, monitoring and deployment. We will also intertwine these topics with online experimentation techniques (A/B testing) and software engineering ideas such as version control, containerization, and continuous integration/continuous deployment.

A tentative list of topics is as follows:

- Deploying ML models using web servers (e.g., using Flask)
- Containers for machine learning: the Docker ecosystem and Kubernetes
- Model management: model tracking and logging
- General tools such as Apache Airflow and Apache Kafka.
- Case studies using Databricks’ MLFlow, Google’s TFX/Kubeflow, Uber’s Michelangelo, Facebook’s FBLearner Flow.
- A/B testing of model performance and data considerations


## Textbook

- In addition to specific notes made available on Blackboard/Github, [Data Science in Production](https://leanpub.com/ProductionDataScience/) by Ben Weber (2020, $5 for the ebook/pdf) will be used. A sample of the first three chapters is available at the publishers page linked here.


## Logistics

 - Time: Fri 6-8.30 PM
 - Location: Zoom (see Blackboard)
 - Office Hours: Zoom right after lectures (same zoom link as the lectures). 
 - Communication: We will use Blackboard. You can also use email (theja@uic.edu).
 - TA: Tengteng Ma (tma24@uic.edu)

## Assignments, Project and Grading

 - Assignments: There are no graded assignments or exams for this course.
 - Project: The objective will be to demonstrate a deployment of an existing machine learning model that you have access to. A suitable documentation of this process along with the complete set of scripts/codes/commands used is to be submitted on the due date (with no exceptions). See the [project page](Project.ipynb) for the due date and more detailed instructions.
 - Grade: Grades will be assigned based on the project (see project evaluation details on the project page on Blackboard) (80%) and course participation/attendance (20%).



## Instruction Schedule


- Chapter 1: Serving ML Models Using Web Servers
  - Optional Reference: Book Chapter 2
  - Learning Goals:
    - Be able to set up a Python environment
    - Be able to set up a jupyter session with SSH tunneling
    - Be able to secure a web server
    - Be able to use Flask to serve a ML model

- Chapter 2: Serving ML Models Using Serverless Infrastructure
  - Optional Reference: Book Chapter 3
  - Learning Goals:
    - Be able to differentiate hosted vs managed solutions
    - Assess devops effort for web server vs serverless deployments
    - Be able to deploy a ML model using Google Cloud Functions and AWS Lambda Functions


- Chapter 3: Serving ML Models Using Docker
  - Optional Reference: Book Chapter 4, upto 4.2
  - Learning Goals:
    - Be able to reason the pros and cons of container technologies
    - Be able to differentiate containers from virtual machines
    - Be able to create a new Docker image using Dockerfile
    - Be able to upload the image to a remote registry

- Chapter 4: Kubernetes for Orchestrating ML Deployments
  - Optional Reference: Book Chapter 4, 4.3 onwards
  - Learning Goals:
    - Understand the uses of Kubernetes
    - Be able to set up a single node Kubernetes cluster using `kubectl` and `minicube`
    - Be able to serve a prediction model on a container in the Kubernetes cluster
    - Be able to deploy a prediction model on Google Kubernetes Engine (GKE)

- Chapter 5: ML Model Pipelines
  - Optional Reference: Book Chapter 5
  - Learning Goals:
    - Learn how to manage a model building workflow
    - Learn how to set up automated jobs using `cron`
    - Learn the basics of Apache Airflow
    - Learn a managed workflow tool (Google Cloud Composer)

- Chapter 6: PySpark Ecosystem
  - Optional Reference: Book Chapter 6
  - Learning Goals:
    - Understand the components on a spark cluster
    - Be able to use PySpark and spark dataframes
    - Be able to use models from MLLib
    - Be able to work with a managed solution such as Databricks

- Chapter 7: Streaming Model Deployments
  - Optional Reference: Book Chapter 8
  - Learning Goals:
    - Understand the difference between a streaming model deployment workflow vs a batch model deployment workflow
    - Learn the basics of streaming with Apache Kafka
    - Be able to differentiate between a batch Pyspark workflow and a  Pyspark streaming workflow


- Chapter 8: Online Experimentation
  - Optional Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/model-ab-testing.html
  - Learning Goals:
    - Know the considerations for A/B testing of models before full rollouts
    - Be acquainted with a few statistical hypothesis tests and how sample sizes are determined
    - Be able to create simple experiments using `planout` and a flask based deployment setup
