Carga el conjunto de datos MNIST y divídelo en un conjunto de entrenamiento, un conjunto de validación y un conjunto de prueba (por ejemplo, utiliza 50,000 instancias para el entrenamiento, 10,000 para la validación y 10,000 para las pruebas). 

In [1]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier, VotingClassifier, BaggingClassifier, AdaBoostClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC

In [2]:
from sklearn.datasets import fetch_openml

X_mnist, y_mnist = fetch_openml('mnist_784', return_X_y=True, as_frame=False, parser='auto')

In [3]:
from sklearn.model_selection import train_test_split

# Supongamos que 'X' es tu conjunto de características y 'y' es tu conjunto de etiquetas
# Aquí debes reemplazar 'X' e 'y' con tus datos reales

# División inicial en conjuntos de entrenamiento (80%) y prueba (20%)
X_train, X_test, y_train, y_test = train_test_split(X_mnist, y_mnist, test_size=0.2, random_state=42)

# Ahora, dividimos el conjunto de entrenamiento nuevamente en conjuntos de entrenamiento (80%) y validación (20%)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)


Entrena clasificadores individuales, por ejemplo un clasificador RandomForest, un clasificador SVM y un MLP. 

In [4]:
X_train = X_train/255
X_val = X_val/255
X_test = X_test/255

# Decision Tree

In [8]:
from sklearn.tree import DecisionTreeClassifier
tree_clf = DecisionTreeClassifier(max_depth=8, random_state=42)
tree_clf.fit(X_train, y_train)

In [9]:
tree_clf.score(X_val, y_val)

0.7966964285714285

In [10]:
from sklearn.metrics import accuracy_score
y_pred = tree_clf.predict(X_val)
accuracy_score(y_val, y_pred)

0.7966964285714285

# Logistic Regression

In [5]:
from sklearn.linear_model import LogisticRegression
lr_clf = LogisticRegression(random_state=42)
lr_clf.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [7]:
lr_clf.score(X_val, y_val)

0.9198214285714286

# MLP

In [12]:
mlp_clf = MLPClassifier(hidden_layer_sizes=(100, 100, 100), max_iter=30, alpha=0.0001, solver='sgd', verbose=10,  random_state=42, tol=0.1)
mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 2.04226076
Iteration 2, loss = 1.09728245
Iteration 3, loss = 0.62259790
Iteration 4, loss = 0.47825839
Iteration 5, loss = 0.41017369
Iteration 6, loss = 0.37101587
Iteration 7, loss = 0.34581103
Iteration 8, loss = 0.32617341
Iteration 9, loss = 0.31078895
Iteration 10, loss = 0.29771795
Iteration 11, loss = 0.28711170
Iteration 12, loss = 0.27674763
Iteration 13, loss = 0.26787670
Iteration 14, loss = 0.25981259
Iteration 15, loss = 0.25148638
Training loss did not improve more than tol=0.100000 for 10 consecutive epochs. Stopping.


In [13]:
mlp_clf.score(X_val, y_val)

0.9233928571428571

A continuación, intenta combinarlos en un ensamble cuyo desempeño supere a cada clasificador individual en el conjunto de validación, utilizando votación por mayoría hard y soft. Una vez que hayas encontrado uno, pruébalo en el conjunto de prueba.

# Voting

In [15]:
voting_clf = VotingClassifier(
    estimators=[
        ('rf', tree_clf),
        ('lr', lr_clf),
        ('mlp', mlp_clf)
    ],
    voting='hard'
)
voting_clf.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Iteration 1, loss = 2.04226076
Iteration 2, loss = 1.09728245
Iteration 3, loss = 0.62259790
Iteration 4, loss = 0.47825839
Iteration 5, loss = 0.41017369
Iteration 6, loss = 0.37101587
Iteration 7, loss = 0.34581103
Iteration 8, loss = 0.32617341
Iteration 9, loss = 0.31078895
Iteration 10, loss = 0.29771795
Iteration 11, loss = 0.28711170
Iteration 12, loss = 0.27674763
Iteration 13, loss = 0.26787670
Iteration 14, loss = 0.25981259
Iteration 15, loss = 0.25148638
Training loss did not improve more than tol=0.100000 for 10 consecutive epochs. Stopping.


In [16]:
voting_clf.score(X_val, y_val)

0.9246428571428571

Prueba otros ensambles usando Bagging, Boosting y Stacking (Stacking utiliza validación cruzada por lo que es mejor dividir los datos sólo en entrenamiento y prueba)

# Stacking

In [22]:
modelos = [tree_clf, lr_clf, mlp_clf]

from sklearn.ensemble import StackingClassifier

stacking_clf = StackingClassifier(
    estimators=[
        ('lr', lr_clf),
        ('rf', tree_clf),
        ('mlp', mlp_clf)
    ],
    final_estimator=RandomForestClassifier(random_state=43),
    cv=5
)
stacking_clf.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Iteration 1, loss = 2.04226076
Iteration 2, loss = 1.09728245
Iteration 3, loss = 0.62259790
Iteration 4, loss = 0.47825839
Iteration 5, loss = 0.41017369
Iteration 6, loss = 0.37101587
Iteration 7, loss = 0.34581103
Iteration 8, loss = 0.32617341
Iteration 9, loss = 0.31078895
Iteration 10, loss = 0.29771795
Iteration 11, loss = 0.28711170
Iteration 12, loss = 0.27674763
Iteration 13, loss = 0.26787670
Iteration 14, loss = 0.25981259
Iteration 15, loss = 0.25148638
Training loss did not improve more than tol=0.100000 for 10 consecutive epochs. Stopping.


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Iteration 1, loss = 2.11969301
Iteration 2, loss = 1.37127929
Iteration 3, loss = 0.76492963
Iteration 4, loss = 0.56082270
Iteration 5, loss = 0.46698608
Iteration 6, loss = 0.41282786
Iteration 7, loss = 0.37898836
Iteration 8, loss = 0.35489831
Iteration 9, loss = 0.33711998
Iteration 10, loss = 0.32152381
Iteration 11, loss = 0.30912225
Iteration 12, loss = 0.29807573
Iteration 13, loss = 0.28880737
Iteration 14, loss = 0.28051976
Iteration 15, loss = 0.27231145
Training loss did not improve more than tol=0.100000 for 10 consecutive epochs. Stopping.
Iteration 1, loss = 2.11922489
Iteration 2, loss = 1.37425291
Iteration 3, loss = 0.76636832
Iteration 4, loss = 0.55849831
Iteration 5, loss = 0.46422937
Iteration 6, loss = 0.41096946
Iteration 7, loss = 0.37752144
Iteration 8, loss = 0.35402320
Iteration 9, loss = 0.33664908
Iteration 10, loss = 0.32136975
Iteration 11, loss = 0.30886900
Iteration 12, loss = 0.29860242
Iteration 13, loss = 0.28990658
Iteration 14, loss = 0.28163181


In [23]:
stacking_clf.score(X_val, y_val)

0.9391071428571428

# Boosting

In [26]:
adaboost_clf = AdaBoostClassifier(
    base_estimator=None,  # Utilizaremos los modelos base previamente entrenados
    n_estimators=50,       # Número de modelos base a entrenar
    learning_rate=1.0,     # Tasa de aprendizaje
    random_state=42
)

# Agregar los modelos base al clasificador AdaBoost
adaboost_clf.estimators_ = [tree_clf, lr_clf, mlp_clf]

# Entrenar el modelo AdaBoost con los modelos base existentes
adaboost_clf.fit(X_train, y_train)



In [27]:
adaboost_clf.score(X_val, y_val)

0.7129464285714285

# Bagging

In [32]:
bag_clf = BaggingClassifier(base_estimator=None, 
                            n_estimators=20, 
                            max_samples=100, 
                            max_features = 1.0,
                            bootstrap = True,
                            n_jobs=-1, 
                            random_state=0)

bag_clf.estimators_ = [tree_clf, lr_clf, mlp_clf]
bag_clf.fit(X_train, y_train)



In [33]:
bag_clf.score(X_val, y_val)

0.7666964285714286