# [Scikit-learn](https://scikit-learn.org/stable/)

1. [Custering]().
2. [Classification]().
3. [Regression]().
4. [Dimensionality reduction]().

## Table of Contents:

1. [Install](#install)

## 1. Install <a class="anchor" id="install"></a>

In [None]:
!pip3 install scikit-learn

## 2. Supervised clustering using KNN (Clustering in N sets)  <a class="anchor" id="KNN"></a>

In [7]:
import time
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from  sklearn.datasets import fetch_covtype
x, y = fetch_covtype(return_X_y=True)
# Data Set Information:
# Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).
# This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.

# for sake of time is 1/4th of the data
subset = x.shape[0]//4
x = x[:subset,:]
y = y[:subset]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=72)

params = {
    'n_neighbors': 40,  
    'weights': 'distance',  
    'n_jobs': -1
}

start_time = time.time()
knn = KNeighborsClassifier(**params).fit(x_train, y_train) # Classifier selection + training
predicted = knn.predict(x_test) # Inference
patched_time = time.time() - start_time
print("Time to calculate \033[1m knn.predict scikit-learn {:4.1f}\033[0m seconds".format(patched_time))
report = metrics.classification_report(y_test, predicted)
print(f"Classification report for kNN:\n{report}\n")

Time to calculate [1m knn.predict in Patched scikit-learn  3.6[0m seconds
Classification report for kNN:
              precision    recall  f1-score   support

           1       0.93      0.86      0.89      6120
           2       0.94      0.98      0.96     20540
           3       0.84      0.46      0.60       428
           4       0.84      0.95      0.89       441
           5       0.86      0.64      0.73       631
           6       0.80      0.58      0.67       449
           7       0.84      0.87      0.85       442

    accuracy                           0.93     29051
   macro avg       0.86      0.76      0.80     29051
weighted avg       0.93      0.93      0.93     29051


