## Machine Learning: Clasificador por Regresión Logística
### Proyecto creado por: Víctor Manuel Sánchez Morales
#### 06-octubre-2020

Se usaron datos públicos de la siguiente dirección: [KAGGLE: Red Wine Quality](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009?select=winequality-red.csv)

## Machine Learning: Logistic Regression Classifier
### Project created by: Víctor Manuel Sánchez Morales
#### October 6th, 2020

Public data from: [KAGGLE: Red Wine Quality](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009?select=winequality-red.csv)

In [None]:
# importamos las librerias que necesitamos
# import required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import warnings
warnings.filterwarnings("ignore")


# leemos la fuente de datos y la asigamos a una variable como dataframe
# we read datasource and load it into a variable as a dataframe

winequality = pd.read_csv("winequality-red.csv")
df = pd.DataFrame(winequality)

In [None]:
# echemos un vistazo a los datos...
# lets take a look at the data...

print(df.head())

In [None]:
# Necesitamos primero pre-clasificar los datos para poder entrenar posteriormente el modelo
# Usaremos la información de "quality" para determinar si el vino es de buena calidad o no

# first we need to pre-clasify the data in order to train the model later on
# we will use the information in "quality" to determine if wine is good (1) or bad (0)

calidad = np.unique(df["quality"])
print("Calidades:", calidad)
print("Qualities:", calidad)

In [None]:
# Digamos que el vino es bueno si la calidad es 7 u 8
# Crearemos una nueva columna en el dataframe llamada BuenoMalo
# Si "quality" = [7,8] el valor de BuenoMalo será 1, de lo contrario será 0

# Let's say the wine is "good" if quality es 7 or 8
# We'll create a new column in dataframe named BuenoMalo
# If "quality" is 7 or 8, then the value in BuenoMalo will be 1, else it will be 0

df["BuenoMalo"] = df["quality"].apply(
    lambda x: 1 if (x == 7 or x == 8) else 0
)

#print(np.unique(df["BuenoMalo"]))

* A continuación vamos a crear el modelo usando la librería **LogisticRegression** de **sklearn**
* Now we'll create the model using library **LogisticRegression** from **sklearn**

In [None]:
# crearemos dos variables
# 'x' servirá para cargar todas las características que ayudarán a clasificar el vino
# 'y' servirá para cargar los datos de la calidad del vino

# we'll create two variables
# we'll load all features that will help classify wine in 'x'
# 'y' will do for loading wine quality

x = df[["fixed acidity", "volatile acidity", "residual sugar", "chlorides", "total sulfur dioxide", "density", "sulphates", "alcohol"]]
y = df[["BuenoMalo"]]

In [None]:
# vamos a separar los datos en conjunto de entrenamiento y conjunto de pruebas
# now let's slit the dataset in training and test sets

trainf, testf, trainl, testl = train_test_split(x, y)

In [None]:
# vamos a normalizar los datos
# we need data normalization

scalr = StandardScaler()

trainf = scalr.fit_transform(trainf)
testf = scalr.fit_transform(testf)

In [None]:
# ahora crearemos el modelo y lo entrenaremos
# now to create and train the model

modelo = LogisticRegression()
modelo.fit(trainf, trainl)

In [None]:
print("Score del modelo:", modelo.score(testf, testl))
print("Coeficientes del modelo:", modelo.coef_)
print("""""")
print("Model Score:", modelo.score(testf, testl))
print("Model Coeficients:", modelo.coef_)

#### El primer resultado indica que el modelo se entrenó para clasificar con éxito aproximadamente el 85% de los casos
* Crearemos ahora una función que nos permita entrenar el modelo hasta que este alcance más del 90% de efectividad

#### Fists score indicates the model has been succesfully trained to clasify aprox. 85% of the cases
* We'll now create a function that allow us to train the model until this reaches more than 90% success rate (Score)

In [None]:
titls = {"Caracteristica": ["fixed acidity", "volatile acidity", "residual sugar", "chlorides", "total sulfur dioxide", "density", "sulphates", "alcohol"]}

efe = modelo.score(testf, testl)

# cambia este valor de 0.90 a 0.95 para que veas que sucede (entre más grande más tardará en entrenarse el modelo)
# try changing this value between 0.90 and 0.95 and see what happens (the greater the longer it takes to train itself)
meta = 0.93

i = 0
total_time = 0.0
start_time = datetime.now()
while efe < meta:
    trainf, testf, trainl, testl = train_test_split(x, y)
    trainf = scalr.fit_transform(trainf)
    testf = scalr.fit_transform(testf)
    modelo.fit(trainf, trainl)
    efe = modelo.score(testf, testl)
    if efe > meta:
        end_time = datetime.now()
        total_time = (end_time - start_time).total_seconds()
        break
    i += 1
try:
    coe = modelo.coef_.T
    coefs = pd.concat([pd.DataFrame(titls), pd.DataFrame(coe)], axis = 1)
    coefs.columns = ["Caracteristica", "Coeficiente"]
except Exception as ex:
    print("")

print("""""")
print("MODELO ENTRENADO EXITÓSAMENTE")
print("Eficiencia esperada:", meta)
print("-----------------------------")
print("""""")
print("Se requirieron:", i, "iteraciones y", total_time, "segundos")
print("""""")
print("Se utilizaron.", len(trainf), "elementos para el entrenamiento")
print("Se evaluaron:",len(testf), "elementos para calcular la eficiencia")
print("""""")
print("Eficiencia final:", efe)
print("Coeficientes finales:")
print(coefs)

print("""""")
print("""""")
print("MODEL TRAINED SUCCESFULLY")
print("Expected Score:", meta)
print("-----------------------------")
print("""""")
print("It required:", i, "iterations and", total_time, "seconds")
print("""""")
print("It used:", len(trainf), "training observations")
print("It evaluated:",len(testf), "observations to calculate Score")
print("""""")
print("Final Score:", efe)
print("Final model coeficients:")
print(coefs)