# Quick Start

First, install the API using the <b>pip</b> command.
```shell
root@ubuntu:~$ pip3 install vertica_ml_python
```

You can then choose which type of connection you want to use. 
 - <b>vertica_python</b> (Native Python Client)
 - <b>pyodbc</b> (ODBC) 
 - <b>jaydebeapi</b> (JDBC)

These modules allow you to communicate with your Vertica Database with a Database Cursor.

For example, use the following command to install the <b>vertica_python</b> module.
```shell
root@ubuntu:~$ pip3 install vertica_python
```

If you already have a Database DSN, you can use it to create a cursor.

In [1]:
from vertica_ml_python import *
cur = vertica_cursor("VerticaDSN")

Otherwise, you can set-up a permanent auto-connection.

In [2]:
from vertica_ml_python.connections.connect import *
new_auto_connection({"host": "10.211.55.14", 
                     "port": "5433", 
                     "database": "testdb", 
                     "password": "PPpdmzLX", 
                     "user": "dbadmin"}, 
                    method = "vertica_python", 
                    name = "VerticaDSN")
change_auto_connection("VerticaDSN")

You can start experimenting with the data in your Vertica Database.

In [3]:
vDataFrame("public.iris")

0,1,2,3,4,5
,PetalLengthCm,SepalWidthCm,SepalLengthCm,Species,PetalWidthCm
0.0,1.10,3.00,4.30,Iris-setosa,0.10
1.0,1.40,2.90,4.40,Iris-setosa,0.20
2.0,1.30,3.00,4.40,Iris-setosa,0.20
3.0,1.30,3.20,4.40,Iris-setosa,0.20
4.0,1.30,2.30,4.50,Iris-setosa,0.30
,...,...,...,...,...


<object>  Name: iris, Number of rows: 150, Number of columns: 5

If you don't have any data, you can load well-known datasets.

In [4]:
from vertica_ml_python.learn.datasets import load_titanic
vdf = load_titanic()

We can start exploring our data.

In [5]:
vdf

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
,fare,sex,body,pclass,age,name,cabin,parch,survived,boat,ticket,embarked,home.dest,sibsp
0.0,151.55000,female,,1,2.000,"Allison, Miss. Helen Loraine",C22 C26,2,0,,113781,S,"Montreal, PQ / Chesterville, ON",1
1.0,151.55000,male,135,1,30.000,"Allison, Mr. Hudson Joshua Creighton",C22 C26,2,0,,113781,S,"Montreal, PQ / Chesterville, ON",1
2.0,151.55000,female,,1,25.000,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",C22 C26,2,0,,113781,S,"Montreal, PQ / Chesterville, ON",1
3.0,0.00000,male,,1,39.000,"Andrews, Mr. Thomas Jr",A36,0,0,,112050,S,"Belfast, NI",0
4.0,49.50420,male,22,1,71.000,"Artagaveytia, Mr. Ramon",,0,0,,PC 17609,C,"Montevideo, Uruguay",0
,...,...,...,...,...,...,...,...,...,...,...,...,...,...


<object>  Name: titanic, Number of rows: 1234, Number of columns: 14

With the exception of aggregations, everything is happening in the database and nothing is loaded in memory. You can view the generated SQL code using the <b>sql_on_off</b> method.

In [6]:
vdf.sql_on_off()
vdf.describe()

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
,fare,sex,body,pclass,age,name,cabin,parch,survived,boat,ticket,embarked,home.dest,sibsp
0.0,151.55000,female,,1,2.000,"Allison, Miss. Helen Loraine",C22 C26,2,0,,113781,S,"Montreal, PQ / Chesterville, ON",1
1.0,151.55000,male,135,1,30.000,"Allison, Mr. Hudson Joshua Creighton",C22 C26,2,0,,113781,S,"Montreal, PQ / Chesterville, ON",1
2.0,151.55000,female,,1,25.000,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",C22 C26,2,0,,113781,S,"Montreal, PQ / Chesterville, ON",1
3.0,0.00000,male,,1,39.000,"Andrews, Mr. Thomas Jr",A36,0,0,,112050,S,"Belfast, NI",0
4.0,49.50420,male,22,1,71.000,"Artagaveytia, Mr. Ramon",,0,0,,PC 17609,C,"Montevideo, Uruguay",0
,...,...,...,...,...,...,...,...,...,...,...,...,...,...


<object>  Name: titanic, Number of rows: 1234, Number of columns: 14

VerticaPy is smart enough to not recompute the same aggregation twice. Each virtual column has its own catalog which will be updated as the user modifies it.

In [7]:
vdf["age"].catalog

{'25%': 21,
 '50%': 28,
 '75%': 39,
 'avg': 30.1524573721163,
 'biserial': {},
 'count': 997,
 'cov': {},
 'cramer': {},
 'kendall': {},
 'max': 80,
 'min': 0.33,
 'pearson': {},
 'regr_avgx': {},
 'regr_avgy': {},
 'regr_count': {},
 'regr_intercept': {},
 'regr_r2': {},
 'regr_slope': {},
 'regr_sxx': {},
 'regr_sxy': {},
 'regr_syy': {},
 'spearman': {},
 'std': 14.4353046299159,
 'unique': 96}

Two lines of codes are enough to prepare your data and to train your ML models.

In [8]:
# Modules
from vertica_ml_python.learn.model_selection import cross_validate
from vertica_ml_python.learn.ensemble import RandomForestClassifier

In [9]:
# Data Preparation
vdf["sex"].label_encode()["boat"].fillna(method = "0ifnull")["name"].str_extract(
    ' ([A-Za-z]+)\.').eval("family_size", expr = "parch + sibsp + 1").drop(
    columns = ["cabin", "body", "ticket", "home.dest"])["fare"].fill_outliers().fillna(
    ).to_db("titanic_clean")

# Model Evaluation
cross_validate(RandomForestClassifier("rf_titanic", max_leaf_nodes = 100, n_estimators = 30), 
               "titanic_clean", 
               ["age", "family_size", "sex", "pclass", "fare", "boat"], 
               "survived", 
               cutoff = 0.35)

The view titanic_clean was successfully dropped.
795 element(s) was/were filled


0,1,2,3,4,5,6,7,8,9,10,11
,auc,prc_auc,accuracy,log_loss,precision,recall,f1-score,mcc,informedness,markedness,csi
1-fold,0.997543209876543,0.9952570413267694,0.978571428571429,0.0243404652845131,0.9933333333333333,0.9490445859872612,0.9720496507762603,0.9544283681126704,0.9452423046184397,0.9637037037037035,0.9430379746835443
2-fold,0.9857843805370042,0.9649935042748308,0.962311557788945,0.277996880665958,0.9310344827586207,0.9642857142857143,0.9627606038820993,0.9183711750829263,0.9255260243632337,0.9112716369088183,0.9
3-fold,0.9928562600420211,0.991183792248719,0.966346153846154,0.0430437379750121,0.9483870967741935,0.9607843137254902,0.9651629847057006,0.9278790026890955,0.9303660627749197,0.9253985910270672,0.9130434782608695
avg,0.9920612834851894,0.9838114459501064,0.9690763800688427,0.11512702797516107,0.9575849709553825,0.9580382046661552,0.9666577464546867,0.9335595152948974,0.9337114639188644,0.9334579772131963,0.918693817648138
std,0.005919586780640461,0.016423581709953398,0.008466785156484254,0.14135909871308325,0.03215178630842041,0.007983034146037371,0.0048215487724541884,0.018687735547533897,0.010275052719372594,0.02712924043110501,0.022068338578353334


<object>

VerticaPy helps you to bring the logic to the data with just a few lines of code.<br>
Enjoy!