### Microsoft Azure - Cloud Computing
![Azure](https://www.hyve.com/wp-content/uploads/2017/11/microsoft-azure-cloud-1.png)

Azure is a Cloud offering (aka product) from Microsoft and serves as a competitor to AWS (Amazon Web Services) and GCP (Google Cloud Platforms).

### Why Cloud?
- **On-Premise/Local vs Cloud**
    - On-Premise/Local is more expensive with upfront costs (i.e RAM, GPU, etc)
- **RAM vs CPU vs GPU for VM instances**
    - RAM for reading in data directly into memory (pandas dataframes)
    - CPU is not entirely important, but you want a decent one in general for parsing data. A good CPU will support mutliprocessing better
    - GPU is important for model training / deep learning 
    - https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes
- **Industry standards**
    - Cloud tech is now the new industry norm for ETL, Databricks, Web Apps, Cloud Computing etc. 
    - Much more secure (up to certain classification of data), accessible (by your team) and scalable
    - Vertically scale up (more resources / compute power / memory capacity). You can think of it as an apartment which needs to accomodate tenants. When more tenants come in, you need to vertically scale your resources.
    - Horizontally scale up (multiple servers / clusters). You can think of it as a highway which needs to give access between cities. The more tenants you have = more cars on the highway. Eventually, your two-lane highway might need to expand to 8-lanes in order to remain more efficient
- **Automation**
    - Cloud offers batch jobs that can be run at scheduled times
    - Pipelines can be made to run and deploy code every commit / pull request

### General Compute Instances
Preview Compute Instances (still in development): https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup

1. Sign up for a free Azure account (once-off free tier with $280): https://azure.microsoft.com/en-us/free/  

2. Sign into the Azure Portal: https://portal.azure.com/  

3. Go to "Create a resource" on the LHS nav bar and search for "Machine Learning"  
![ml resource](./cloud/ml.PNG)

4. Create this resource with the details below  
![resource deeets](./cloud/resource.PNG)
    - Here, we will be using the "free tier" (no cost) plan which serves as a trial for new Azure users
    - You will need to create a resource group. This is a way at a high level to "oversee" usage of resources by group. I have called it `MAST30034`!
    - Workspace name is just the name of your workspace
    - Region is set to Australia East (server is hosted locally instead of overseas to reduce latency)  
    - Workspace edition is either "Basic" or "Enterprise". Basic should be sufficient, but you can play around with Enterprise for extra features (at a higher cost)
    
5. Confirm the "Review + Create" and wait for the resource(s) to be deployed.   

6. When it is done, select "Go to resources" and select "Compute"
![resource deeets](./cloud/compute.PNG)

7. Set up the VM for your cloud. If you know how to SSH via Mac/Linux/WSL, then you can add a public key for access. 
![resource deeets](./cloud/vm.PNG)
    - Note, the resource name must be unique accross all instances (as you can see `test` has been taken).
    - Pricing tiers can be viewed here (https://azure.microsoft.com/en-us/pricing/details/machine-learning/) for your VM costs
    - **This is where you need to decide if you want more RAM / GPU / Compute power**
    
8. When it is deployed, you now have access to Jupyter / JupyterLab / RStudio
![resource deeets](./cloud/deployed.PNG)
    - These work the same way as a local version of IPython / R, but are hosted on the cloud
    - Click on the URI of your preferred tool (I still personally prefer Jupyter) and you're done!
    
9. If you want a more interactive UI to work with, click on the "Launch the Studio"
![resource deeets](./cloud/launch.PNG)

    - Here, you can create pipelines, add versioned models, etc  
    
    
### Extras
- You have the option to build clusters. These are essentially resources and configs for data science which are automated as batch jobs. These can be a ETL pipeline which feeds into an ML algorithm  


- This comes with a connected Azure Blob Storage, which means you can (up to a certain limit) upload/output files onto the resource directly.  


- If you require additional libraries, you will need to install them via terminal. For example (using Jupyter):
    1. Create a new terminal
    ![resource deeets](./cloud/terminal.PNG)
    2. Access the new terminal and pip install as required
    ![resource deeets](./cloud/pip.PNG)
    3. Double check that you can import them!
    ![resource deeets](./cloud/import.PNG)

### Data Science Virtual Machines (DSVM)
- Tutorial: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#dsvm
- Custom variant which comes preinstalled with several needed Data Science libraries / Jupyter Notebook