## What is PyCaret ?
**Author: Moez Ali**

PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within seconds in your choice of notebook environment.

PyCaret aims to reduce the cycle time from hypothesis to insights. It is well suited for **seasoned data scientists** who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for **citizen data scientists** and those new to data science with little or no background in coding

PyCaret is a deployment ready Python library

## Getting Started
https://pycaret.org/

## Install PyCaret
**https://pycaret.org/install/**

In [None]:
!pip install pycaret

Collecting pycaret
[?25l  Downloading https://files.pythonhosted.org/packages/91/ae/000d825af8f7d9ff86808600f220e7ad57a873987fd6119c87dc4c5b1d91/pycaret-2.0-py3-none-any.whl (255kB)
[K     |█▎                              | 10kB 14.9MB/s eta 0:00:01[K     |██▋                             | 20kB 1.7MB/s eta 0:00:01[K     |███▉                            | 30kB 2.3MB/s eta 0:00:01[K     |█████▏                          | 40kB 2.6MB/s eta 0:00:01[K     |██████▍                         | 51kB 2.0MB/s eta 0:00:01[K     |███████▊                        | 61kB 2.3MB/s eta 0:00:01[K     |█████████                       | 71kB 2.5MB/s eta 0:00:01[K     |██████████▎                     | 81kB 2.7MB/s eta 0:00:01[K     |███████████▌                    | 92kB 2.9MB/s eta 0:00:01[K     |████████████▉                   | 102kB 2.7MB/s eta 0:00:01[K     |██████████████                  | 112kB 2.7MB/s eta 0:00:01[K     |███████████████▍                | 122kB 2.7MB/s eta 0:00

## Getting data

https://pycaret.org/get-data/

In [None]:
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

Unnamed: 0,Number of times pregnant,Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Diastolic blood pressure (mm Hg),Triceps skin fold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age (years),Class variable
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
type(diabetes)

pandas.core.frame.DataFrame

In [None]:
diabetes.head()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Diastolic blood pressure (mm Hg),Triceps skin fold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age (years),Class variable
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
diabetes.dtypes

Number of times pregnant                                                      int64
Plasma glucose concentration a 2 hours in an oral glucose tolerance test      int64
Diastolic blood pressure (mm Hg)                                              int64
Triceps skin fold thickness (mm)                                              int64
2-Hour serum insulin (mu U/ml)                                                int64
Body mass index (weight in kg/(height in m)^2)                              float64
Diabetes pedigree function                                                  float64
Age (years)                                                                   int64
Class variable                                                                int64
dtype: object

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [None]:
# Importing data using pandas
import pandas as pd
loan_df = pd.read_csv('/content/drive/My Drive/2020/Youtube/bankloan.csv')

In [None]:
loan_df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0.0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1.0,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0.0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0.0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0.0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


# Setting up Environment

https://pycaret.org/setup/

### Importing Specific Modules

In [None]:
from pycaret.classification import *

# Intializing the setup

Using setup():

- Data type Inference
- Missing value treatment
- One hot encoding
- train/test split

In [None]:
#exp_clf = setup(diabetes, target = 'Class variable' )
exp_clf = setup(diabetes, target = 'Class variable')

IntProgress(value=0, description='Processing: ', max=13)

Text(value="Following data types have been inferred automatically, if they are correct press enter to continue…

Unnamed: 0,Data Type
Number of times pregnant,Categorical
Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Numeric
Diastolic blood pressure (mm Hg),Numeric
Triceps skin fold thickness (mm),Numeric
2-Hour serum insulin (mu U/ml),Numeric
Body mass index (weight in kg/(height in m)^2),Numeric
Diabetes pedigree function,Numeric
Age (years),Numeric
Class variable,Label


quit


SystemExit: ignored

### Data Type Inference: 
https://pycaret.org/data-types/

In [None]:
# exp_clf = setup(diabetes, target = 'Class variable',categorical_features=['Age (years)'])
# exp_clf = setup(diabetes, target = 'Class variable',numeric_features=['Age (years)'])
# exp_clf = setup(diabetes, target = 'Class variable',ignore_features=['Age (years)'])
# exp_clf = setup(diabetes, target = 'Class variable',date_features=['Age (years)'])

### Data Cleaning and Preparation: 
https://pycaret.org/missing-values/

In [None]:
# exp_clf = setup(diabetes, target = 'Class variable',numeric_imputation='median')
# exp_clf = setup(diabetes, target = 'Class variable',categorical_imputation='mode')

# Loan Example

In [None]:
lr = create_model('lr')
tuned_lr = tune_model(lr)

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.7442,0.6718,0.8333,0.8065,0.8197,0.3801,0.3807
1,0.814,0.6538,0.9667,0.8056,0.8788,0.4926,0.5327
2,0.8837,0.7769,0.9667,0.8788,0.9206,0.7051,0.7164
3,0.814,0.8,0.9333,0.8235,0.875,0.5169,0.5326
4,0.7907,0.7026,0.9333,0.8,0.8615,0.4432,0.466
5,0.7907,0.7709,0.931,0.7941,0.8571,0.4749,0.4965
6,0.7907,0.7734,1.0,0.7632,0.8657,0.4284,0.5221
7,0.7674,0.7192,0.9655,0.7568,0.8485,0.3786,0.4363
8,0.7907,0.8374,0.931,0.7941,0.8571,0.4749,0.4965
9,0.6667,0.7745,0.8621,0.7143,0.7813,0.1064,0.1152


In [None]:
#exp_clf = setup(loan_df, target = 'Loan_Status')
exp_clf = setup(loan_df, target = 'Loan_Status',ignore_features=['Loan_ID'])

Setup Succesfully Completed!


Unnamed: 0,Description,Value
0,session_id,2031
1,Target Type,Binary
2,Label Encoded,"N: 0, Y: 1"
3,Original Data,"(614, 13)"
4,Missing Values,True
5,Numeric Features,3
6,Categorical Features,9
7,Ordinal Features,False
8,High Cardinality Features,False
9,High Cardinality Method,
