# Example: Getting started
--------------------------

This example shows how to get started with the atom-ml library.

The data used is a variation on the [Australian weather dataset](https://www.kaggle.com/jsphyg/weather-dataset-rattle-package) from Kaggle. You can download it from [here](https://github.com/tvdboom/ATOM/blob/master/examples/datasets/weatherAUS.csv). The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target `RainTomorrow`.

In [1]:
import pandas as pd
from atom import ATOMClassifier

# Load the Australian Weather dataset
X = pd.read_csv("https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv")

In [2]:
atom = ATOMClassifier(X, y="RainTomorrow", n_rows=1000, verbose=2)


Algorithm task: Binary classification.

Shape: (1000, 22)
Train set size: 800
Test set size: 200
-------------------------------------
Memory: 176.13 kB
Scaled: False
Missing values: 2260 (10.3%)
Categorical features: 5 (23.8%)



In [3]:
atom.impute(strat_num="median", strat_cat="most_frequent")  
atom.encode(strategy="Target", max_onehot=8)

Fitting Imputer...
Imputing missing values...
 --> Imputing 8 missing values with median (11.6) in feature MinTemp.
 --> Imputing 2 missing values with median (22.3) in feature MaxTemp.
 --> Imputing 12 missing values with median (0.0) in feature Rainfall.
 --> Imputing 425 missing values with median (4.8) in feature Evaporation.
 --> Imputing 480 missing values with median (8.55) in feature Sunshine.
 --> Imputing 59 missing values with most_frequent (N) in feature WindGustDir.
 --> Imputing 59 missing values with median (37.0) in feature WindGustSpeed.
 --> Imputing 90 missing values with most_frequent (N) in feature WindDir9am.
 --> Imputing 28 missing values with most_frequent (SW) in feature WindDir3pm.
 --> Imputing 10 missing values with median (13.0) in feature WindSpeed9am.
 --> Imputing 19 missing values with median (17.0) in feature WindSpeed3pm.
 --> Imputing 17 missing values with median (70.0) in feature Humidity9am.
 --> Imputing 31 missing values with median (51.0) in f

In [4]:
atom.run(models=["LDA", "AdaB"], metric="auc", n_trials=10)


Models: LDA, AdaB
Metric: auc


Running hyperparameter tuning for LinearDiscriminantAnalysis...
| trial |  solver | shrinkage |     auc | best_auc | time_trial | time_ht |    state |
| ----- | ------- | --------- | ------- | -------- | ---------- | ------- | -------- |
| 0     |   eigen |       0.9 |  0.8807 |   0.8807 |     0.162s |  0.162s | COMPLETE |
| 1     |     svd |       nan |  0.8445 |   0.8807 |     0.147s |  0.309s | COMPLETE |
| 2     |     svd |       nan |  0.8445 |   0.8807 |     0.001s |  0.310s | COMPLETE |
| 3     |     svd |       nan |  0.8445 |   0.8807 |     0.001s |  0.311s | COMPLETE |
| 4     |     svd |       nan |  0.8445 |   0.8807 |     0.001s |  0.312s | COMPLETE |
| 5     |   eigen |       0.9 |  0.8807 |   0.8807 |     0.000s |  0.312s | COMPLETE |
| 6     |     svd |       nan |  0.8445 |   0.8807 |     0.000s |  0.312s | COMPLETE |
| 7     |     svd |       nan |  0.8445 |   0.8807 |     0.001s |  0.313s | COMPLETE |
| 8     |   eigen |       0.5 |  

In [5]:
atom.evaluate()

Unnamed: 0,accuracy,ap,ba,f1,jaccard,mcc,precision,recall,auc
LDA,0.785,0.5888,0.7533,0.5825,0.411,0.4542,0.5,0.6977,0.8037
AdaB,0.82,0.5801,0.7165,0.561,0.3898,0.449,0.5897,0.5349,0.8353
