
# Logistic Regression Model Example

## Supervised Learning

### Regression
For this project, we will use Supervised Learning and the Regression model to predict poverty level. Poverty level is a continuous variable which can be any numerical value within a certain range. The regression model's algorithms would attempt to learn patterns that exist among the selected economic factors. If presented with the data of a state, the model would make a prediction of the proverty level based on previously learned patterns from the dataset.

### Classification
Classification is used to predict discrete outcomes. The target variable only has two possible values.

### Dataset for Regression and Classification
Dataset is divided into features and target. 
* Features are the variables used to make a prediction
* Target is the predicted outcome.

### Basic procedures for implementing a supervised learning model:
1. Create a model with LogisticRegression().
2. Train the model with model.fit().
3. Make predictions with model.predict().
4. Validate the model with accuracy_score().

In [1]:
# Import database dependencies
from sqlalchemy import inspect, create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
import config as creds

# Import Pandas and matplotlib dependencies
import pandas as pd
import numpy as np
import datetime as dt
from pathlib import Path
import matplotlib.pyplot as plt

# Import scikit packages
import sklearn.preprocessing as preprocessing
from sklearn.linear_model import LinearRegression
import sklearn.datasets as datasets
# For splitting of data into train and test set
from sklearn.model_selection import train_test_split
# Metrics for Evaluation of model Accuracy and F1-score
from sklearn.metrics  import f1_score,accuracy_score
import sklearn.metrics as metrics

In [2]:
# Create engine
engine = create_engine(f'postgresql://{creds.PGUSER}:{creds.PGPASSWORD}@{creds.PGHOST}:5432/{creds.PGDATABASE}')

In [3]:
# Create our session (link) from Python to the DB
session = Session(bind=engine.connect())

In [4]:
# reflect an existing database into a new model
Base = automap_base()
# reflect the tables
Base.prepare(engine, reflect=True)

In [5]:
# List tables in database
inspect(engine).get_table_names()

['ave_wage_indexing',
 'welfare_education',
 'cpi_inflation_rate',
 'crime_rate',
 'economic_features_full',
 'economic_features',
 'divorce_rate',
 'homeownership_rate',
 'min_wage_effective',
 'poverty_rates',
 'unemployment_rate']

In [6]:
# List columns in a specific table ('min_wage')
[column['name'] for column in inspect(engine).get_columns('economic_features')]

['year',
 'state',
 'population_million',
 'education_million',
 'welfare_million',
 'crime_rate',
 'unemployment_rate',
 'divorce_rate_per_1000_people',
 'homeownership_rate',
 'minimum_wage_effective',
 'cpi_average',
 'avg_wage_index',
 'poverty_rate']

## Data Preprocessing
* Visualize dataset
* Reshape data format if needed
* Use standardizing functions if necessary: MinMax and Standard functions

In [None]:
y = df["Poverty Level"]
X = df.drop(columns="Poverty Level")

In [None]:
plt.scatter(df.YearsExperience, df.Salary)
plt.xlabel('')
plt.ylabel('')
plt.show()

# xlabel = independent variable
# ylabel = target variable that we want to predict 

### Reshape data format if necessary

In [None]:
# Use reshape() to format data to meet the Scikit-learn library requirements
# Reshape dataset format 
X = df.X_column_name.values.reshape(-1, 1)

In [None]:
# To examine the first 5 entries in X
X[:5]

In [None]:
# To examine the shape of X
X.shape

# 30 rows and 1 column

### Scale standardization - MinMax

In [None]:
# MinMax shrinks the range of each figure 
import sklearn.preprocessing as preprocessing
 
minmax = preprocessing.MinMaxScaler()
# X is a matrix with float type
minmax.fit(X)
X_minmax = minmax.transform(X)

### Scale standardization - StandardScaler

In [None]:
import sklearn.preprocessing as preprocessing

std = preprocessing.StandardScaler()
# X is a matrix
std.fit(X)
X_std = std.transform(X)

## Assigning target variable

In [None]:
# Assign target variable
y = df.Salary

## Splitting into Train & Test set
Conventional split is: 75% for training, 25% for testing set

* Model uses the training dataset to learn from it
* Model uses the testing dataset to assess its performance

In [None]:
# Split the dataset into training and testing sets 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
    y, random_state=1, stratify=y)

# X = input
# y = output or what we wish to predict

## Creating instance of logistic regression model
* import LogisticRegression from the Scikit-learn library, and then instantiate the model.
* **solver argument** is set to 'lbfgs', which is the default setting, 
* **random_state** is specified to reproduce the same results as you run this notebook.

In [None]:
# Instantiate a Logistic Regression model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(solver='lbfgs', random_state=1)
classifier

In [None]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
   intercept_scaling=1, max_iter=100, multi_class='warn', penalty='12',
   random_state=1, solver='lbfgs' tol=0.0001, warm_start=False)

## Training (fitting) the model

In [None]:
# Fit the data into the model
# By convention, X is capitalized and y is lowercase

classifier.fit(X_train, y_train)

## Predicting and evaluating the model
Generate predictions and evaluate performance of model

In [11]:
# Create the predictions using predict() methodd and put results into a Pandas DataFrame
# The model creates predicted y values based on X values
predictions = classifier.predict(X_test)
pd.DataFrame({"Prediction": predictions, "Actual": y_test})


# Validate the model, or evaluate its performance
from sklearn.metrics import accuracy_score
accuracy_score(y_test, predictions)

(30,)


## Plotting the results

In [None]:
# Create a new data point to show up as a red dot on the new plot
import numpy as np
new_data = np.array([[-2, 6]])
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.scatter(new_data[0, 0], new_data[0, 1], c="r", marker="o", s=100)
plt.show()

In [None]:
predictions = classifier.predict(new_data)
print("Classes are either 0 (purple) or 1 (yellow)")
print(f"The new point was classified as: {predictions}")