# 📊 Logistic Regression in Python

<br><br>
<br><b>Viviana Márquez</b><br>
http://vivianamarquez.com


# ✅ Today's Goals:

• Learn what Logistic Regression is.

• When should you use Logistic Regression.

• Build a machine learning model on a real-world application in Python.

# Logistic Regression 

• 📸 Most famous machine learning algorithm after linear regression.

<br><center><img src="meme1.png"  style="height:500px;"></center>

# Logistic Regression 

• 📸 Most famous machine learning algorithm after linear regression.

<br><center><img src="meme2.png"  style="height:500px;"></center>

# 🥊 Linear Regression vs Logistic Regression 

<br><br>

• <b>Linear regression</b> is used to predict/forecast values (<b>continuous values</b>)

• <b>Logistic regression</b> is used classification tasks (<b>discrete values</b>: yes/no, dead/alive, pass/fail, ham/spam)

# [Recap] Linear Regression 

<br><br>
<center>
$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$,
</center>

where $y$ is the dependent variable and $x_1,x_2,...,x_n$ are the explanatory variables.

<br><br>
• In Linear Regression, the predicted value can be anywhere between $-\infty$ to $\infty$.

• For Logistic Regression, we need the values to be between 0 and 1.

# Our friend: The Sigmoid Function

<br>
<center>
    <img src="https://hackernoon.com/hn-images/0*s6Rhp40yHBtxMIcC.png" style='height:250px;'>
    $y = \dfrac{1}{1+e^{-x}}$
</center>

• Applying the Sigmoid function on linear regression, we obtain logistic regression:

<br>
<center>
    $y = \dfrac{1}{1+e^{(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n)}}$
</center>

# 🛂 Learning check-point:

## What should I use for this problem?

<br><br>
• Determining the price of houses.

• Determining whether a song is rock or jazz.

• Predicting the weight of a person.

• Predicting if a customer is going to make a purchase or not.

# Machine Learning Model time!

<br>
<center><img src="https://media1.tenor.com/images/1f84b096cbe1cc9f3763c803bb17e10e/tenor.gif?itemid=5878976"></center>

# Goal

<br>
<center><img src="http://images4.fanpop.com/image/photos/20800000/Squirtle-Charmander-pokemon-20889696-800-600.jpg" style="height:250px"></center>

<br>
<center><big>Predict: Water or a Fire Pokemon</big></center>

In [67]:
import pandas as pd

# Load data
data = pd.read_csv("Pokemon.csv")

# Clean data
filter_pokemon = ["Water", "Fire"]
data = data[data['Type 1'].isin(filter_pokemon)]
data = data.reset_index()
data = data.drop(['Type 2', 'Total','Generation','Legendary', "#", "index"], axis=1)

# Preview data
data.head()

Unnamed: 0,Name,Type 1,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Charmander,Fire,39,52,43,60,50,65
1,Charmeleon,Fire,58,64,58,80,65,80
2,Charizard,Fire,78,84,78,109,85,100
3,CharizardMega Charizard X,Fire,78,130,111,130,85,100
4,CharizardMega Charizard Y,Fire,78,104,78,159,115,100


In [68]:
X = data[data.columns[2:]]
y = data['Type 1']

In [69]:
X.head()

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,39,52,43,60,50,65
1,58,64,58,80,65,80
2,78,84,78,109,85,100
3,78,130,111,130,85,100
4,78,104,78,159,115,100


In [70]:
y.head()

0    Fire
1    Fire
2    Fire
3    Fire
4    Fire
Name: Type 1, dtype: object

In [71]:
# Split data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

🛂 What is the shape of `X_train, X_test, y_train, y_test`?

In [72]:
print(X.shape)
print(X_train.shape)
print(X_test.shape)

(164, 6)
(131, 6)
(33, 6)


In [73]:
print(y.shape)
print(y_train.shape)
print(y_test.shape)

(164,)
(131,)
(33,)


In [74]:
# Model

from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression()

logreg.fit(X_train,y_train)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

In [75]:
y_pred = logreg.predict(X_test)

# Is our model working?

In [76]:
from sklearn import metrics

metrics.accuracy_score(y_test, y_pred)

0.7878787878787878

<img src="https://static.pokemonpets.com/images/monsters-images-300-300/5-Charmeleon.png" style="height:50px" align="left"><br><b>  Charmeleon</b>

In [77]:
data[data['Name']=="Charmeleon"][data.columns[2:]]

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
1,58,64,58,80,65,80


In [78]:
# Predict Charmeleon
logreg.predict(data[data['Name']=="Charmeleon"][data.columns[2:]])

array(['Water'], dtype=object)

<img src="https://static.pokemonpets.com/images/monsters-images-300-300/8-Wartortle.png" style="height:55px" align="left"><br><b>  Wartortle</b>

In [79]:
# Predict Wartortle
logreg.predict(data[data['Name']=="Wartortle"][data.columns[2:]])

array(['Water'], dtype=object)

# How to improve our model?

In [80]:
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix

array([[ 3,  5],
       [ 2, 23]])

In [81]:
data['Type 1'].value_counts()

Water    112
Fire      52
Name: Type 1, dtype: int64

# Next time: Dealing with unbalanced data sets.