# Actividad evaluable del módulo 2: Clasificación de vinos

En este módulo, volveremos a tratar los problemas de clasificación binarios, pero esta vez con un conjunto de datos del mundo real.

Para la actividad, hemos elegido un conjunto de datos sobre la calidad del vino (https://archive.ics.uci.edu/ml/datasets/wine+quality). Este conjunto contiene información sobre más de 6000 botellas de vino tinto y blanco. Su tarea será desarrollar un clasificador de una sola neurona capaz de distinguir entre ambas variedades con una precisión razonable. Más abajo, hemos incluido código que le servirá de ayuda para subir los archivos y preparar el conjunto (se trata de una buena oportunidad para aprender Python en la práctica). Además, hemos incluido las llamadas finales a la función que queremos que ejecute para entrenar y evaluar su clasificador. No dude en reutilizar código que ya haya visto o escrito en cuadernos anteriores.

In [None]:
#Descargar los archivos .csv del repositorio de datos
!rm -f winequality-red.csv winequality-white.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

--2022-02-01 16:44:14--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84199 (82K) [application/x-httpd-php]
Saving to: ‘winequality-red.csv’


2022-02-01 16:44:15 (582 KB/s) - ‘winequality-red.csv’ saved [84199/84199]

--2022-02-01 16:44:15--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 264426 (258K) [application/x-httpd-php]
Saving to: ‘winequality-white.csv’


2022-02-01 16:44:15 (941 KB/s) - ‘winequality-white.csv’ saved [264426/264426]



In [4]:
#Estos son los paquetes necesarios para completar esta actividad
import pandas as pd
import numpy as np

#Utilice Pandas para leer el archivo cvs en un marco de datos
#Fíjese en que, en este .cvs, el delimitador es un punto y coma, ";", en vez de una coma
df_red = pd.read_csv('winequality-red.csv',delimiter=";")

#Como estamos realizando una tarea de clasificación, les asignaremos a todos los vinos tintos la etiqueta 1
df_red["color"] = 1

#El método .head() es muy útil para previsualizar los datos
df_red.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1


In [2]:
df_white = pd.read_csv('winequality-white.csv',delimiter=";")
df_white["color"] = 0  #Asignaremos a todos los vinos blancos la etiqueta 0
df_white.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,0
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,0
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6,0
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,0
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,0


In [7]:
#Ahora, combinamos los dos marcos de datos
df = pd.concat([df_red, df_white])

#Y mezclamos los datos de los vinos blancos y los tintos
df = df.sample(frac=1).reset_index(drop=True)
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color
0,6.5,0.29,0.31,1.7,0.035,24.0,79.0,0.99053,3.27,0.69,11.4,7,0
1,5.6,0.35,0.14,5.0,0.046,48.0,198.0,0.9937,3.3,0.71,10.3,5,0
2,5.3,0.585,0.07,7.1,0.044,34.0,145.0,0.9945,3.34,0.57,9.7,6,0
3,5.8,0.29,0.15,1.1,0.029,12.0,83.0,0.9898,3.3,0.4,11.4,6,0
4,6.3,0.32,0.26,12.0,0.049,63.0,170.0,0.9961,3.14,0.55,9.9,6,0


In [26]:
#Elegimos tres atributos en los que basar la predicción
input_columns = ["citric acid", "residual sugar", "total sulfur dioxide"]
output_columns = ["color"]

#Extraemos los atributos relevantes en los arrays x e y
X = df[input_columns].to_numpy()
Y = df[output_columns].to_numpy().ravel()
print("Shape of X:", X.shape)
print("Shape of Y:", Y.shape)
in_features = X.shape[1]


Shape of X: (6497, 3)
Shape of Y: (6497,)


In [34]:
# que actúe como padre de los modelos de clasificación y regresión
class SingleNeuronModel():
    def __init__(self, in_features):
       
        # Es preferible definir los pesos iniciales como valores pequeños distribuidos de forma normal
        self.w = 0.01 * np.random.randn(in_features)
        self.w_0 = 0.01 * np.random.randn()
        self.non_zero_tolerance = 1e-8 # add this to divisions to ensure we don't divide by 0


    def forward(self, x):
        # Calcular y guardar la preactivación z
        self.z = x @ self.w.T + self.w_0

        # Aplicar la función de activación y devolver
        self.a = self.activation(self.z)
        return self.a


    # Actualizar los pesos en función de los gradientes y la tasa de aprendizaje
    def update(self, grad_w,grad_w_0, learning_rate):
       
        self.w -= grad_w * learning_rate
        self.w_0 -= grad_w_0 * learning_rate

# ¡Nueva implementación! Modelo de clasificación de una sola neurona
class SingleNeuronClassificationModel(SingleNeuronModel):
    # Función de activación sigmoide para la clasificación
    def activation(self, z):
        return 1 / (1 + np.exp(-z) + self.non_zero_tolerance)

    # Gradiente del output respecto a los pesos para la activación sigmoide
    def gradient(self, x,errors):
        
        grad_w = x.T @ errors / len(x)
        grad_w_0 = np.mean(errors)
        return grad_w, grad_w_0

def train_model_NLL_loss(model, input_data, labels, learning_rate, epochs):
    for epoch in range(epochs):
        # Forward pass
        predictions = model.forward(input_data)
        # Calcular la pérdida
        loss = -np.mean(labels * np.log(predictions) + (1 - labels) * np.log(1 - predictions))
        
        # Backward pass y actualización de pesos
        errors = predictions - labels
        grad_w, grad_w_0 = model.gradient(input_data, errors)
        model.update(grad_w, grad_w_0, learning_rate)

        if (epoch+1) % 20 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss:.4f}')

# Evaluar el modelo
def evaluate_classification_accuracy(model, input_data, labels):
    correct = 0
    num_samples = len(input_data)
    for i in range(num_samples):
        x = input_data[i,...]
        y = labels[i]
        y_predicted = model.forward(x)
        label_predicted = 1 if y_predicted > 0.5 else 0
        if label_predicted == y:
            correct += 1
    accuracy = correct / num_samples
    print("Our model predicted", correct, "out of", num_samples,
          "correctly for", accuracy*100, "% accuracy")
    return accuracy



# Parámetros de entrenamiento
learning_rate = 0.01
epochs = 1000

# Inicializar y entrenar el modelo
model = SingleNeuronClassificationModel(in_features=X.shape[1])
print("este es el modelo", model)
print("est es el X", X)
print("este es el Y", Y)
train_model_NLL_loss(model, X, Y, learning_rate, epochs)







este es el modelo <__main__.SingleNeuronClassificationModel object at 0x0000011A287D5490>
est es el X [[3.100e-01 1.700e+00 7.900e+01]
 [1.400e-01 5.000e+00 1.980e+02]
 [7.000e-02 7.100e+00 1.450e+02]
 ...
 [2.200e-01 1.815e+01 1.390e+02]
 [2.800e-01 1.000e+00 1.000e+02]
 [2.300e-01 2.000e+00 1.340e+02]]
este es el Y [0 0 0 ... 0 0 0]
Epoch [20/1000], Loss: 4.7066
Epoch [40/1000], Loss: 4.7016
Epoch [60/1000], Loss: 4.6973
Epoch [80/1000], Loss: 4.6925
Epoch [100/1000], Loss: 4.6870
Epoch [120/1000], Loss: 4.6810
Epoch [140/1000], Loss: 4.6744
Epoch [160/1000], Loss: 4.6673
Epoch [180/1000], Loss: 4.6597
Epoch [200/1000], Loss: 4.6514
Epoch [220/1000], Loss: 4.6427
Epoch [240/1000], Loss: 4.6332
Epoch [260/1000], Loss: 4.6232
Epoch [280/1000], Loss: 4.6127
Epoch [300/1000], Loss: 4.6019
Epoch [320/1000], Loss: 4.5906
Epoch [340/1000], Loss: 4.5789
Epoch [360/1000], Loss: 4.5667
Epoch [380/1000], Loss: 4.5540
Epoch [400/1000], Loss: 4.5408
Epoch [420/1000], Loss: 4.5269
Epoch [440/1000]

In [33]:
# Evaluar la precisión del modelo
accuracy = evaluate_classification_accuracy(model, X, Y)
print("Model trained", accuracy)

Our model predicted 4898 out of 6497 correctly for 75.38864091118978 % accuracy
Model trained 0.7538864091118977
