# Setup Environment for Machine Learning with Python


## Agenda

- What are the benefits of scikit-learn?
- How do I install scikit-learn?
- How do I use the Jupyter Notebook?
- What are some good resources for learning Python?

![scikit-learn algorithm map](images/02_sklearn_algorithms.png)

## Benefits and drawbacks of scikit-learn

### Benefits:

- **Consistent interface** to machine learning models
- Provides many **tuning parameters** but with **sensible defaults**
- Exceptional **documentation**
- Rich set of functionality for **companion tasks**
- **Active community** for development and support

### Further reading:

- Ben Lorica: [Six reasons why I recommend scikit-learn](http://radar.oreilly.com/2013/12/six-reasons-why-i-recommend-scikit-learn.html)
- scikit-learn authors: [API design for machine learning software](http://arxiv.org/pdf/1309.0238v1.pdf)
- Data School: [Should you teach Python or R for data science?](http://www.dataschool.io/python-or-r-for-data-science/)

# Environment Setup


## Python version 3.6.x
Install Python 3.6.x on your particular system.


## Python Virtual Environment

It is recommended that you create a virtual environment to hold all of the python libraries seperate from your other python environments.

`python3 -m venv /path/to/new/virtual/environment`

then source the new environment so all python applications use the virtual environment:

`source /path/to/new/virtual/environment/bin/activate`

## Python Libraries for Scikit-Learn Machine Learning

* pip install scipy
* pip install scikit-learn
* pip install jupyter
* pip install matplotlib
* pip install pandas
* pip install seaborn

`pip install scipy scikit-learn jupyter matplotlib pandas seaborn`


## Jupyter notebook

To run the jupyter notebook after activating your virtual environment execute the following on the command line:

`jupyter notebook`

- Don't close the command line window while the Notebook is running



![Jupyter header](images/jupyter.svg)

## Using the Jupyter Notebook

### Components:

- **Jupyter interpreter:** enhanced version of the standard Python interpreter
- **Browser-based notebook interface:** weave together code, formatted text, and plots

### Keyboard shortcuts:

**Command mode** (gray border)

- Create new cells above (**a**) or below (**b**) the current cell
- Navigate using the **up arrow** and **down arrow**
- Convert the cell type to Markdown (**m**) or code (**y**)
- See keyboard shortcuts using **h**
- Switch to Edit mode using **Enter**

**Edit mode** (green border)

- **Ctrl+Enter** to run a cell
- **Option+Enter** to run AND a new cell below
- Switch to Command mode using **Esc**

### Jupyter and Markdown resources:

- [nbviewer](http://nbviewer.jupyter.org/): view notebooks online as static documents
- [IPython documentation](http://ipython.readthedocs.io/en/stable/): focuses on the interpreter
- [IPython Notebook tutorials](http://jupyter.readthedocs.io/en/latest/content-quickstart.html): in-depth introduction
- [GitHub's Mastering Markdown](https://guides.github.com/features/mastering-markdown/): short guide with lots of examples
- [IBM Jupyter Notebook Markdown Cheatsheet](https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed)

## Resources for learning Python

- [TalkPython.fm](https://talkpython.fm/): Really good courses explained very well.
- [Codecademy's Python course](https://www.codecademy.com/learn/python): browser-based, tons of exercises
- [DataQuest](https://www.dataquest.io/): browser-based, teaches Python in the context of data science
- [Google's Python class](https://developers.google.com/edu/python/): slightly more advanced, includes videos and downloadable exercises (with solutions)
- [Python for Informatics](http://www.pythonlearn.com/): beginner-oriented book, includes slides and videos

# What does machine learning look like from an App Dev perspective?

Below is an example of applying a machine learning algorithm to a known dataset.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Load the data
iris_dataset = load_iris()
X = iris_dataset['data']
y = iris_dataset['target']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=12)

# Instantiate the model
knn = KNeighborsClassifier(n_neighbors=1)

# Fit the model
knn.fit(X_train, y_train)

# Score the model on the unseen test data set
score = knn.score(X_test, y_test)
print(f"Testing Score: {score}")

# Predict on new data
new_data = np.array([[5, 2.9, 1, 0.2]])
prediction = knn.predict(new_data)
print(f"Prediction: {prediction[0]}")
print(f"Predicted target name: {iris_dataset['target_names'][prediction]}")

Testing Score: 0.9736842105263158
Prediction: 0
Predicted target name: ['setosa']


#### We will look at the details of this process in the next Jupyter Notebooks