<img src="https://drive.google.com/uc?id=1-cL5eOpEsbuIEkvwW2KnpXC12-PAbamr" style="Width:1000px">

# Instructions

Welcome to the <strong style="color:teal">Data Science and Machine Learning for Planet Earth</strong> module. We will be doing machine learning exercises during the next two weeks, and this notebook contains some useful information.

## Installing the necessary dependencies

In a terminal, navigate to the module directory (where this `README.md` file is located). Then, to create the `dsml4pe` environment run:

`conda env create -f environment.yml`

The environment can then be activated via

`conda activate dsml4pe`

##  📑 Canonical imports
Although in theory you could import any libary under any name your choose, there are best practices for common data-science library that are so ingrained that you should always follow them. Please follow these for this week (and beyond):

* **Numpy**: <code>import numpy as np</code>
* **Pandas**: <code>import pandas as pd</code>
* **Matplotlib** (in a notebook): <code>import matplotlib.pyplot as plt</code>
* **Seaborn**: <code>import seaborn as sns</code>
* **Scikit-learn**: <code>see below</code>
<br>

<img src="https://drive.google.com/uc?id=1te6yagT-qFdszMK3yvp4yb-0IiM56Phm" style="width:500">

## 🧱 Module and Class imports in Scikit-learn
Scikit-learn is a vast libary, and so learning the different modules and submodules is a challenge in itself. There are many ways to import modules and classes in notebooks, but there is a best practice, certainly for `sklearn`!.

🚫 Import of entire library <br>
<code>import sklearn
model=sklearn.linear_model.LinearRegression</code> <br>
**Must type library and module prefix every time**


🚫 import of entire module
import sklearn.linear_model
model = linear_model.LinearRegression()

Must type module prefix every time

🚫 Import of entire module
from sklearn import linear_model
model = linear_model.LinearRegression()
Must type module prefix every time

🚫 Import of entire module<br>
<code>from sklearn.linear_model import *
model = LinearRegression()</code><br>
**Implicit import of a lot of classes**

*"Explicit is better than implicit" - The Zen of Python*<br>

✅  <code>from sklearn.linear_model import linearRegression()
model = LinearRegression()</code><br>

# 🧪 How tests are used in this class

By `tests` I mean `PyTest`, not assessment! Here is what is useful to know:

* Exercises will have coding test within each notebook (see example below).
* Run these tests: the test error or success will tell you if you have succeeded, or if there is a mistake in your code
* This gives you a certain degree of independence in progressing with your work
* You can check the test code in the `tests` folder, in case you want to know what the pass criteria are
* Usually, I will ask you to **name a variable in a certain way** in your code, so the test can pick it up. For instance, I would ask you to "save the value of your `accuracy_score` into a variable named `score`". Make sure to save the variable in that name, and beware of capitalization. Otherwise, the test will fail.
* Tests are not perfect: passing a test could still mean you have a bug, and not passing a test could mean I have a bug in my code. So, use your good judgement.
* The tests are for you, not for me: I don't collect the test scores, and this does not contribute to your module mark. So use them to guide your coding: run the tests!!!

### Do the following:

Below, I give an example with 3 variables set at the correct values for the test to pass. I will set 4 tests based on these three variables.

Try the test to see the results, and then change the values of the `variables` and **rerun the test** to see what errors are outputed. You can change the values of the variable, or change the types of the variables (in which case the test will crash, but also let you know why it crashed (<span style="color:red">TypeError</span>). 

Learn to read the test results to understand what it means, and guide you in debuging your code!

In [None]:
accuracy = 0.3
best_model = 'svc'
predictions = [1,2,5,6]

### ☑️ Test your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('first_time',
                         score = accuracy,
                         model = best_model,
                         predictions = predictions
)

result.write()
print(result.check())

# 🎲 Setting the `random_state`

For tests to be reproducible, we need the `random_seed` to be similar between my code and your code (and your friends code). This is why from now on, always use the values `42` when you use an algorithm that requires a random seed value. This includes `train_test_split`, `RandomSearchCV`, `GridSearchCV`, and many others.


# 🏁 Finished!

Well done! <span style="color:teal">**Push your exercise to GitHub**</span>, and move on to the next one.

# 🥳 I hope that you will enjoy these next two weeks!!!