<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 1. Scikit-Learn: Open-source ML library for Python. Built on NumPy, SciPy, and Matplotlib
*in Python*

----
Scikit-learn is a library in Python that provides many unsupervised and supervised learning algorithms. It’s built upon some of the technology you might already be familiar with, like NumPy, pandas, and Matplotlib! 

<br/>As you build robust Machine Learning programs, it’s helpful to have all the `sklearn` commands all in one place in case you forget.

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 2. Linear Regression
*in Python*

----
Linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. A scalar is an element of a field which is used to define a vector space.

<br/>It fits a linear model with coefficients w = (w<sub>1</sub>, …, w<sub>p</sub>) to minimize the residual sum of squares between the observed targets in the dataset (dependent variables), and the targets predicted by the linear approximation (independent variables).

<br/>Process:
1. Import and create the model:

In [1]:
from sklearn.linear_model import LinearRegression
 
your_model = LinearRegression()

2. Fit:

In [None]:
your_model.fit(x_training_data, y_training_data)

- `.coef_`: contains the coefficients
- `.intercept_`: contains the intercept

3. Predict:

In [None]:
predictions = your_model.predict(your_x_data)

- `.score()`: returns the coefficient of determination R²

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 3. Naive Bayes
*in Python*

----
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. 

<br/>Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently categorical, meaning that it falls into any one of a set of categories that cannot be ordered in any meaningful way) and for which there are more than two categories. Some examples would be:
- Which major will a college student choose, given their grades, stated likes and dislikes, etc.?
- Which blood type does a person have, given the results of various diagnostic tests?
- In a hands-free mobile phone dialing application, which person's name was spoken, given various properties of the speech signal?
- Which candidate will a person vote for, given particular demographic characteristics?
- Which country will a firm locate an office in, given the characteristics of the firm and of the various candidate countries?

<br/>These are all statistical classification problems. They all have in common a dependent variable to be predicted that comes from one of a limited set of items that cannot be meaningfully ordered, as well as a set of independent variables (also known as features, explanators, etc.), which are used to predict the dependent variable. Multinomial logistic regression is a particular solution to classification problems that use a linear combination of the observed features and some problem-specific parameters to estimate the probability of each particular value of the dependent variable. The best values of the parameters for a given problem are usually determined from some training data (e.g. some people for whom both the diagnostic test results and blood types are known, or some examples of known words being spoken).

<br/>Process:
1. Import and create the model:

In [4]:
from sklearn.naive_bayes import MultinomialNB
 
your_model = MultinomialNB()

2. Fit:

In [None]:
your_model.fit(x_training_data, y_training_data)

3. Predict:

In [None]:
# Returns a list of predicted classes - one prediction for every data point
predictions = your_model.predict(your_x_data)
 
# For every data point, returns a list of probabilities of each class
probabilities = your_model.predict_proba(your_x_data)

<img src="Images\atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 4. K-Nearest Neighbors
*in Python*

----
The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. 

<br/>Process:
1. Import and create the model:

In [2]:
from sklearn.neighbors import KNeighborsClassifier
 
your_model = KNeighborsClassifier()

2. Fit:

In [None]:
your_model.fit(x_training_data, y_training_data)

3. Predict:

In [None]:
# Returns a list of predicted classes - one prediction for every data point
predictions = your_model.predict(your_x_data)
 
# For every data point, returns a list of probabilities of each class
probabilities = your_model.predict_proba(your_x_data)

<img src="Images\atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 5. K-Means
*in Python*

----
K-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

<br/>Process:
1. Import and create the model:

In [3]:
from sklearn.cluster import KMeans
 
your_model = KMeans(n_clusters=4, init='random')

- `n_clusters`: number of clusters to form and number of centroids to generate
- `init`: method for initialization
 - `k-means++`: K-Means++ [default]
 - `random`: K-Means
- `random_state`: the seed used by the random number generator [optional]

2. Fit:

In [None]:
your_model.fit(x_training_data)

3. Predict:

In [None]:
predictions = your_model.predict(your_x_data)

<img src="Images\atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 6. Validating the Model
*in Python*

----
In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the **precision** and **recall** of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as **positive predictive value,** and recall is also known as **sensitivity** in diagnostic binary classification.

<br/>The F<sub>1</sub> score is the harmonic mean of the precision and recall. It thus symmetrically represents both precision and recall in one metric. The more generic F<sub>$\beta$</sub> score applies additional weights, valuing one of precision or recall more than the other.

<br/>The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.
<img src="Images\F1_score.svg">

<br/>Process:
1. Import and print accuracy, recall, precision, and F1 score:

In [None]:
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
 
print(accuracy_score(true_labels, guesses))
print(recall_score(true_labels, guesses))
print(precision_score(true_labels, guesses))
print(f1_score(true_labels, guesses))

2. Import and print the confusion matrix:

In [None]:
from sklearn.metrics import confusion_matrix
 
print(confusion_matrix(true_labels, guesses))

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 7. Training Sets and Test Sets
*in Python*

----

In [None]:
from sklearn.model_selection import train_test_split
 
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, test_size=0.2)

- `train_size`: the proportion of the dataset to include in the train split
- `test_size`: the proportion of the dataset to include in the test split
- `random_state`: the seed used by the random number generator [optional]