<a href="https://colab.research.google.com/github/taknev83/pywedge/blob/gh-pages/Examples/Pywedge_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pywedge - Classification Demo**

* Interactive Charts
* Interactive Baseline Models
* Interactive Hyperparameter Tuning
* Track Hyperparameters on-the-go!

[GitHub](https://github.com/taknev83/pywedge) | [Docs](https://taknev83.github.io/pywedge-docs/) | [PyPi](https://pypi.org/project/pywedge/)



In [None]:
# Install Pywedge

!pip install pywedge --quiet

In [None]:
# Import Pywedge

import pywedge as pw

In [None]:
# Import dataset - Credit Risk Dataset

import pandas as pd
train = pd.read_csv('https://raw.githubusercontent.com/taknev83/datasets/master/credit_risk_train.csv')
test = pd.read_csv('https://raw.githubusercontent.com/taknev83/datasets/master/credit_risk_test.csv')

In [None]:
train.head()

Unnamed: 0,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status,other_parties,residence_since,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker,class
0,'no checking',18,'critical/other existing credit',radio/tv,1800,'<100','1<=X<4',4,'male single',none,2,car,24,none,own,2,skilled,1,none,yes,good
1,'<0',24,'existing paid',radio/tv,2439,'<100','<1',4,'female div/dep/mar',none,4,'real estate',35,none,own,1,skilled,1,yes,yes,bad
2,'no checking',36,'no credits/all paid',repairs,2613,'<100','1<=X<4',4,'male single',none,2,car,27,none,own,2,skilled,1,none,yes,good
3,'no checking',15,'existing paid',education,4623,'100<=X<500','1<=X<4',3,'male single',none,2,'life insurance',40,none,own,1,'high qualif/self emp/mgmt',1,yes,yes,bad
4,'<0',21,'existing paid',furniture/equipment,3599,'<100','4<=X<7',1,'female div/dep/mar',none,4,car,26,none,rent,1,'unskilled resident',1,none,yes,good


In [None]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   checking_status         800 non-null    object
 1   duration                800 non-null    int64 
 2   credit_history          800 non-null    object
 3   purpose                 800 non-null    object
 4   credit_amount           800 non-null    int64 
 5   savings_status          800 non-null    object
 6   employment              800 non-null    object
 7   installment_commitment  800 non-null    int64 
 8   personal_status         800 non-null    object
 9   other_parties           800 non-null    object
 10  residence_since         800 non-null    int64 
 11  property_magnitude      800 non-null    object
 12  age                     800 non-null    int64 
 13  other_payment_plans     800 non-null    object
 14  housing                 800 non-null    object
 15  existi

# Instantiate Pywedge_Charts
T
he Pywedge_Charts class takes following inputs,

Inputs:

* Dataframe
* c = any redundant column to be removed (like ID column etc., at present supports a single column removal, subsequent version will provision multiple column removal requirements)
* y = target column name as a string

Returns:

Charts widget

In [None]:
mc = pw.Pywedge_Charts(train, c=None, y = 'class')

# Call make_charts method from the instantiated class


In [None]:
charts = mc.make_charts()

HTML(value='<h2>Pywedge Make_Charts </h2>')

Tab(children=(Output(), Output(), Output(), Output(), Output(), Output(), Output(), Output()), _titles={'0': '…

# Instantiate baseline_model class
Args:

* train = train dataframe
* test = test dataframe
* c = any redundant column to be removed (like ID column etc., at present supports a single column removal, subsequent version will provision multiple column removal requirements)
* y = target column name as a string
* type = Classification(Default) / Regression

In [None]:
blm = pw.baseline_model(train, test, c=None, y='class')

# Call Classification_Summary method for Regression type of tasks

Returns:

* Interactive pre-processing steps
* User input for test size in train test split
* Top 10 feature importance using Adaboost regressor
* Baseline models in 10 different algorithms with metrics
* Predict selected baseline models on standout test dataset

In [None]:
blm.classification_summary()

HTML(value='<h2>Pywedge Baseline Models </h2>')

Tab(children=(Output(), Output()), _titles={'0': 'Baseline Models', '1': 'Predict Baseline Model'})

# Instantiate Pywedge_HP Class for interactive hyperparameter tuning

Args:

* train = train dataframe
* test = test dataframe
* c = any redundant column to be removed (like ID column etc., at present supports a single column removal, subsequent version will provision multiple column removal requirements)
* y = target column name as a string
* tracking = True/False(Default) #to enable mlflow hyperpameter tracking

In [None]:
pph = pw.Pywedge_HP(train, test, c=None, y='class', tracking=True)

## Install mlflow version 1.12.1

In [None]:
pip install mlflow==1.12.1 --quiet

# Call HP_Tune_Classification for classification hyperparameter tuning tasks

Returns:

* Interactive widget for inputing various hyperparameters
* Output tab with tuned model results
* Predictions on standout test data using tuned model

In [None]:
pph.HP_Tune_Classification()

HTML(value='<h2>Pywedge HP_Tune</h2>')

Tab(children=(Output(), Output(), Output()), _titles={'0': 'Input', '1': 'Output', '2': 'Helper Page'})

# HP_Tune - User Guide:
* Please enter multiple numerical values as comma seperates values, for eg. in Decision Tree Hyperpameter search space, multiple Max_Depth can be entered as 5, 10, 15 (Use of numpy notation is not yet supported)
* Use ctrl + click to select multiple
* Use n_jobs as -1 for faster hyperparameter search
* The helper page tab provides the relavent estimator's web page for quick reference
* Hyperparameters can be tracked on-the-go by passing tracking=True & invoke MLFlow user interface from command prompt, more details in the [docs page](https://taknev83.github.io/pywedge-docs/)

## Install pyngrok to create a tunnel to access MLFlow User Interface in Google Colab

In [None]:
!pip install pyngrok --quiet

## Call get_tracking method from instantiated class to get MLFLow tracking URL

In [None]:
pph.get_tracking_url()

MLflow Tracking UI: https://e3e585264450.ngrok.io
