## 1. Perceptron

<img src="./img/perceptron-6168423.jpg" alt="drawing" width="650"/>


Perceptrons are a type of artificial neural network that are **used primarily for binary classification.** Some key uses and applications of perceptrons include:

**Linear binary classification:** Perceptrons can classify data that is linearly separable into two classes (0 or 1, yes or no, etc). The output is based on a linear prediction function.

**Pattern recognition:** Perceptrons can be trained to recognize simple patterns in data. For example, scanning images to recognize shapes or characters.

**Function approximation:** Perceptrons can approximate simple functions to map inputs to outputs. For example, modeling relationships between variables.

**Logical operators:** Perceptrons can be configured to model logical operators like AND, OR, NOT. Useful for implementing simple logic circuits.

**Single layer neural networks**: Single perceptrons are limited, but networks of perceptrons in a single layer can model more complex functions.

**Activation function:** The perceptron algorithm applies an activation function like a step function to the weighted sum of inputs to generate an output.

Perceptron excel at **basic linear classification and pattern recognition tasks, but are limited in their capabilities compared to multilayer neural networks**. Their simple linear structure makes them easy to understand and implement.





In [1]:
import numpy as np
import pandas as pd
import seaborn as sns

Cargamos datos. Utilizaremos el dataset de pinguinos de seaborn

In [3]:
import ssl #ignore ssl certificate errors
ssl._create_default_https_context = ssl._create_unverified_context  

In [4]:
# Load and display the first 5 rows of the penguins dataset.

df = sns.load_dataset("penguins")
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB


In [6]:
df = sns.load_dataset("penguins")# Load the penguins dataset from seaborn

# Feauture engineering 
 
# Remove any rows that contain missing values.
df.dropna(inplace=True)

# Dictionary maps each categorical value to a numeric code.
#  Replacing the values encodes the categories as numbers
cleanup_nums = {"species": {"Adelie": 0,
                            "Chinstrap": 1,
                            "Gentoo": 2},
               "sex": {"Male": 0,
                       "Female": 1}}
df.replace(cleanup_nums, inplace=True)# replace the values in the dictionary with the values in the dictionary

'''creates one-hot encoded columns for each possible category. 
This converts each categorical variable into 
multiple binary columns 
that can be used as inputs.'''

df = pd.get_dummies(df)

df.head()

Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,island_Biscoe,island_Dream,island_Torgersen
0,0,39.1,18.7,181.0,3750.0,0,0,0,1
1,0,39.5,17.4,186.0,3800.0,1,0,0,1
2,0,40.3,18.0,195.0,3250.0,1,0,0,1
4,0,36.7,19.3,193.0,3450.0,1,0,0,1
5,0,39.3,20.6,190.0,3650.0,0,0,0,1


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 333 entries, 0 to 343
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            333 non-null    int64  
 1   bill_length_mm     333 non-null    float64
 2   bill_depth_mm      333 non-null    float64
 3   flipper_length_mm  333 non-null    float64
 4   body_mass_g        333 non-null    float64
 5   sex                333 non-null    int64  
 6   island_Biscoe      333 non-null    uint8  
 7   island_Dream       333 non-null    uint8  
 8   island_Torgersen   333 non-null    uint8  
dtypes: float64(4), int64(2), uint8(3)
memory usage: 19.2 KB


In [8]:
df.describe()

Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,island_Biscoe,island_Dream,island_Torgersen
count,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0
mean,0.918919,43.992793,17.164865,200.966967,4207.057057,0.495495,0.489489,0.369369,0.141141
std,0.889718,5.468668,1.969235,14.015765,805.215802,0.500732,0.500642,0.48336,0.348691
min,0.0,32.1,13.1,172.0,2700.0,0.0,0.0,0.0,0.0
25%,0.0,39.5,15.6,190.0,3550.0,0.0,0.0,0.0,0.0
50%,1.0,44.5,17.3,197.0,4050.0,0.0,0.0,0.0,0.0
75%,2.0,48.6,18.7,213.0,4775.0,1.0,1.0,1.0,0.0
max,2.0,59.6,21.5,231.0,6300.0,1.0,1.0,1.0,1.0


* There are 333 total samples, which is a decent amount of data to train a model.

* The **'species' column shows the samples are not evenly distributed between the 3 classes** (Adelie, Chinstrap, Gentoo), with Adelie being the most common. This class imbalance is something we'll need to consider when training and evaluating models.

* **The mean 'bill_length_mm' and 'bill_depth_mm' vary quite a bit between the min and max values, suggesting these features may be useful for distinguishing between penguin species.**

* **'flipper_length_mm' could also be a useful distinguishing feature based on its range of values.**

* There is a large range in 'body_mass_g' **but the standard deviation is also high, so this may not be as useful of a feature.**

* The 'sex' column is evenly split between 0 and 1, so no class imbalance problem there.

* *Only some samples are from the 'island_Biscoe' and very few from 'island_Torgersen', so island data may not be that useful.*



Dividimos en train test

In [24]:
from sklearn.model_selection import train_test_split

# Train / Test

X = df.iloc[:, 1:]#X = df.drop('species', axis=1) 
y = df.iloc[:, 0]#y = df['species']


X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)

In [25]:
y.value_counts()# count the number of occurrences of each value(spicies)

0    146
2    119
1     68
Name: species, dtype: int64

In [11]:
# Validating the shapes verifies the training and test sets
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(266, 8)
(67, 8)
(266,)
(67,)


Vamos a probar un Perceptrón

In [26]:
from sklearn.linear_model import Perceptron
"""
Perceptron classifier model from scikit-learn.
"""

"""
Fits a Perceptron classifier model on the training data, 
scores it on the test data, and returns the accuracy score.

Parameters:
X_train (array-like): Training data
y_train (array-like): Training labels
X_test (array-like): Test data
y_test (array-like): Test labels  

Returns:
Accuracy score of the model on the test data
"""

per_clf = Perceptron(random_state=1) # reduce variance =1, considered good practice
per_clf.fit(X_train, y_train)
per_clf.score(X_test, y_test)

0.19402985074626866

**Score 0.19402985074626866**

* Percentage of test samples that were classified correctly by the model. It is computed as:

* **Accuracy** = Number of correct predictions / Total number of predictions

* So in this case, with an accuracy score of 0.19402985074626866, it means the model made **correct predictions on 19.40% of the test set samples.**

* This suggests it is not very effective at classifying the penguin species in this dataset and has substantial room for improvement.

* Some key things to note about accuracy as an evaluation metric:

* Higher is better, with 1.0 meaning 100% correct classification
* It can be misleading for imbalanced datasets - precision/recall may be more informative
* It only measures raw prediction correctness, not model confidence
* Useful for baseline model performance, but other metrics also important





In [13]:
from sklearn.linear_model import LogisticRegression

# Logistic regression model

log_reg = LogisticRegression(max_iter=10000)
log_reg.fit(X_train, y_train)

#  Score
log_reg.score(X_test, y_test)

0.9850746268656716

* The logistic regression model is performing very well, with an accuracy of 98% on the test data. This is a significant improvement over the previous perceptron model.

* A score of 0.98 indicates the model is **correctly classifying almost all of the penguin samples in the test set across the 3 species classes.**

* The high accuracy suggests logistic regression is much better suited for this multi-class classification task compared to the binary perceptron model.

* Logistic regression is likely **able to capture non-linear relationships between the input features and species outputs.** This results in higher precision.

* **The regularization used in logistic regression is preventing overfitting**, allowing accurate generalization to the test data.



Probemos a estandarizar

Parece que el perceptrón por si solo es bastante inútil, habrá que probar configuraciones más complejas.

## 2. Multi Layer Perceptron

In [27]:
from sklearn.neural_network import MLPClassifier
#from sklearn.neural_network import MLPRegressor

mlp = MLPClassifier(random_state=42)
mlp.fit(X_train, y_train)

mlp.score(X_test, y_test)

0.43283582089552236

**Test accuracy reported for the Multilayer Perceptron model on this dataset is 0.43283582089552236.**



* The MLP is still outperforming the original single perceptron, but only slightly at 43% vs 19% accuracy.

* It is not able to effectively model the relationships between the input features and species outputs beyond what a linear model can do.

* There are likely challenges in properly tuning and optimizing a neural network model on this small, relatively simple dataset.

* Overfitting is probably still an issue, leading to poorer generalization.



Probemos otra configuración. Es posible crear una red neuronal desde la propia función de MLPClassifier()

In [15]:
mlp = MLPClassifier(max_iter=500,
                   activation='tanh',
                   hidden_layer_sizes = (150, 150, 150),
                   random_state=42)

mlp.fit(X_train, y_train)

mlp.score(X_test, y_test)

0.34328358208955223

The changes to the MLP (more layers, tanh activation, more iterations) did not lead to significant improvement in test performance.

The accuracy is still only around 34%, similar to the original MLP model.

It seems the increased depth and non-linearity has not helped the model generalize better to the test data.

The additional complexity may have made overfitting worse, even with the same training/test split.

This dataset and problem may be too small and simple for a deep neural network to provide benefit.



Utilizan descenso del gradiente, y por tanto son muy sensibles al escalado. Estandarizamos para el siguiente ejemplo

**Standardization** put all features on the same scale, allowing the Perceptron to weigh them appropriately.

It likely **helped the model converge much faster during training without certain features dominating.**

**Overfitting was probably reduced** since feature scales were no longer skewed.

*The test accuracy of 1.0 shows excellent generalization ability despite perfect training accuracy.



In [16]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X_train)
X_train_s = sc.transform(X_train)
X_test_s = sc.transform(X_test)

# Evaluate perceptron after scaling

per_clf = Perceptron()
per_clf.fit(X_train_s, y_train)
print(per_clf.score(X_train_s, y_train))
print(per_clf.score(X_test_s, y_test))

1.0
1.0


**Standardize the input features before training a Perceptron model.** Standardization can improve model performance in several ways:

* It centers the data to have a mean of 0 and scales it to have a unit standard deviation. This puts all features on the same scale.

* It helps the model converge faster during training. Features on different scales can slow learning.

* It prevents features with larger ranges dominating those with smaller ranges.

* It may reduce overfitting on training data.


**Fits StandardScaler()** to the training data and uses it to transform both train and test sets.

* Trains a Perceptron model on the scaled training data.

* Evaluates on scaled test data.

* **Right approach** -  scale using statistics calculated only on the training set, then apply the same scaling to test data.

* Evaluating the model on scaled data will give you a more accurate sense of its real-world performance. The scaling step is simple to implement and can often noticeably improve results.





In [28]:
# Evaluate Log Regress after scaler applied
log_reg = LogisticRegression(max_iter=500)
log_reg.fit(X_train_s, y_train)


print(log_reg.score(X_train_s, y_train))
print(log_reg.score(X_test_s, y_test))

1.0
1.0


In [18]:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train_scal =scaler.transform(X_train)
X_test_scal =scaler.transform(X_test)

mlp = MLPClassifier(max_iter=500)
mlp.fit(X_train_scal, y_train)
print(mlp.score(X_train_scal, y_train))
print(mlp.score(X_test_scal, y_test))

1.0
1.0


**re-evaluate the Multilayer Perceptron (MLP) model:**

**MLPs can be sensitive to feature scales**: Like many machine learning models, MLPs perform best when all features are on a similar scale. Standardization helps address this.

**Avoid large feature dominance**: Scaling prevents features with large original ranges from dominating the MLP model training.

**Faster convergence**: Standardized features can help the MLP converge much faster during training.

**Reduced overfitting**: Scaling can reduce overfitting on training data, improving generalization.

**Fair evaluation**: Comparing MLP performance on standardized data to the other models (like logistic regression) is a more fair and direct comparison.

**Best practice**: Preprocessing data and re-evaluating models is considered a good machine learning practice to quantify impact.

**Confirm previous results**: We can verify whether standardization helps improve the MLP model on this dataset, compared to previous results without scaling.



In [19]:
from sklearn.metrics import confusion_matrix

# Visualizing the performance of the classification model

confusion_matrix(y_test, mlp.predict(X_test_scal))

array([[31,  0,  0],
       [ 0, 13,  0],
       [ 0,  0, 23]])

The diagonal entries show the number of correct predictions for each class, while the off-diagonal entries show the errors between classes.


**Class 1** had 31 correct predictions, 0 incorrect predictions as class 2, and 0 incorrect as class 3<br>
**Class 2** had 0 incorrect predictions as class 1, 13 correct predictions, and 0 incorrect as class 3<br>
**Class 3** had 0 incorrect predictions as class 1, 0 incorrect as class 2, and 23 correct predictions<br>

* Relatively good performance, with most predictions along the diagonal and minimal confusion between the different classes. The total number of examples for each class can be seen by summing the rows or columns.