# Web Servers

The list of topics we will cover are:

- SSH and Firewall
- Conda Environments
- Jupyter
- Making requests and processing responses
- Serving a model using Flask
- (Optional) Model Persistence using MLFlow
- (Optional) Using VSCode


### Python

- We will be predominantly concerned with the Python ecosystem.
- A big advanage is that local system development can be easily moved to cloud (which are basically remote computers) and or to a scalable on-prem solution.
- Many companies use python to start data science projects in-house (via fresh recruits, interns etc).
- Python has some relatively easy ways to access databases as well as to perform key data science related steps.
- Big data platforms such as Spark have great python bindings.
  - E.g., Pandas dataframe and Spark dataframe have some parity with each other.
- Many/most machine learning models (deep learning, pre-trained) are built in the python ecosystem.
- Many useful libraries: pandas, matplotlib, flask, pytorch, numpy, scipy, ...

### Our Objective

- Learn the general patterns, not master the specific tools. For many of you this will be a first exposure to operationalizing machine learning, and for others, the tools may already look familiar. 
- We will only be scratching the surface and acquainting ourselves with some key tools via _mimimum working examples_.

### Deployment Targets


- On-prem or self-hosted machines (needs DevOps skills)
  - Local machines (_this lecture_)
  - Vultr
  - Linode
  - Hertzner
  - OVH
  - ...
- Managed cloud:
  - Heroku (PAAS)
  - Azure
  - GCP
  - AWS (IAAS)
  - DigitalOcean
- The decision to deply on one versus the other depends on:
  - skills
  - business need
    - internal vs external
  - scale, reliability, security
  - costs
  - ease of deployment
  - maintenance

### Local Deployments are Hard

 - Need to learn linux security
 - Need to learn how to manage access
 - Need for learn backups
 - Need to learn hot switching / reliability (e.g., you may not be able to take your ML deployment machine offline)

### Cloud Deployments are not Easy either

- Also need to learn a complex ecosystem
- Vendor lock-in (for successful businesses, this is not an issue)

### Aside: Software Tools

Python development can happen in many places:

- In text editors (e.g., sublime-text)
- In IDEs (e.g., Pycharm or VSCode)
- In Jupyter notebooks and variants (Google Colab, Databricks notebooks)
  - Vanilla juputer notebook currently does not allow collaboration as such

### Part 1: Setting up Jupyter access on a VPS

 - We will use [Digital Ocean](https://www.digitalocean.com/), but all steps are vendor agnostic. Alternatives include: Vultr, AWS EC2, Google Cloud, Azure etc.
 - SSH passwordless access will be set up.
 - Next, we set up a basic firewall for security.
 - This is followed by installing `conda`.
 - (Optional) To run the jupyter server uninterrupted, we will run it within a screen session.
 - We will access the server and notebooks on our local browser using SSH tunneling.


### Part 2: Preparing an ML Model

 - We will review how data is accessed, and how a model is trained (this should be familiar to you). Each case is typically different.
   - In particular, we will look at a (moive) recommendation problem.

 - There are aspects of saving and loading models that become important in production. For instance, we would like the models to be able to live across potentially different dev/staging/prod environments/machines. For this, we will have to think of the notion of model persistence
   - Natively: 
     - For example, pytorch has native [save and load methods](https://pytorch.org/tutorials/recipes/recipes/save_load_across_devices.html).
     - Same is the case for scikit-learn and a variety of other packages.

   - Using MLFlow: 
     - [MLFlow](https://www.mlflow.org/docs/latest/python_api/mlflow.pytorch.html) addresses (among many other things in the ML lifecycle) the problem of moving models across different environments without issues of incompatibility (minor version numbers, OS etc) among other things.
     - See these links for more information: [https://pypi.org/project/mlflow/](https://pypi.org/project/mlflow/) and [mlflow.org](https://www.mlflow.org/docs/latest/models.html)

