## Universidad Autonoma de Aguascalientes
## Carrera: Ingenieria en Computación Inteligente 
## Curso: Aprendizaje Inteligente
## Maestro: Dr. Francisco Javier Luna Rosas
## Alumno: Martin Isai Nunez Villeda
## Semestre: Enero-Junio del 2026

# Practica 7: Redes Neuronales (SCIKIT-LEARN_MPLClassifier) para realizar análisis de sentimientos

## Paso 1: Importar las librerias necesarias

In [4]:
import pandas as pd
import numpy as np
from gensim.models import Word2Vec
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report

## Paso 2: Cargamos el dataset y lo separamos en review y sentiment, ya que son es un único arreglo de información

In [6]:
df = pd.read_csv('movie_data.csv', sep='\t')
df[['review', 'sentiment']] = df['review,sentiment'].str.rsplit(',', n=1, expand=True)

print(df.head())

                                    review,sentiment  \
0  In 1974, the teenager Martha Moxley (Maggie Gr...   
1  OK... so... I really like Kris Kristofferson a...   
2  ***SPOILER*** Do not read this, if you think a...   
3  hi for all the people who have seen this wonde...   
4  I recently bought the DVD, forgetting just how...   

                                              review sentiment  
0  In 1974, the teenager Martha Moxley (Maggie Gr...         1  
1  OK... so... I really like Kris Kristofferson a...         0  
2  ***SPOILER*** Do not read this, if you think a...         0  
3  hi for all the people who have seen this wonde...         1  
4  I recently bought the DVD, forgetting just how...         0  


## Paso 3: Convertimos la columna review en un arreglo (variable predectora)

In [8]:
array_df0 = df['review'].to_numpy()

print("array_df0:", array_df0[:5])   # primeras 5

array_df0: ['In 1974, the teenager Martha Moxley (Maggie Grace) moves to the high-class area of Belle Haven, Greenwich, Connecticut. On the Mischief Night, eve of Halloween, she was murdered in the backyard of her house and her murder remained unsolved. Twenty-two years later, the writer Mark Fuhrman (Christopher Meloni), who is a former LA detective that has fallen in disgrace for perjury in O.J. Simpson trial and moved to Idaho, decides to investigate the case with his partner Stephen Weeks (Andrew Mitchell) with the purpose of writing a book. The locals squirm and do not welcome them, but with the support of the retired detective Steve Carroll (Robert Forster) that was in charge of the investigation in the 70\'s, they discover the criminal and a net of power and money to cover the murder.<br /><br />"Murder in Greenwich" is a good TV movie, with the true story of a murder of a fifteen years old girl that was committed by a wealthy teenager whose mother was a Kennedy. The powerful an

## Paso 4: Convertimos la columna sentiment en un arreglo (variable a predecir)

In [10]:
array_df1 = df['sentiment'].to_numpy()

print("array_df1:", array_df1[:5])   # primeras 5

array_df1: ['1' '0' '0' '1' '0']


## Paso 5: Realizamos carpinteria de datos tokenizando el arreglo reviews

In [12]:
documents = [doc.split() for doc in array_df0]  

print("Ejemplo documento tokenizado:", documents[0][:20])

Ejemplo documento tokenizado: ['In', '1974,', 'the', 'teenager', 'Martha', 'Moxley', '(Maggie', 'Grace)', 'moves', 'to', 'the', 'high-class', 'area', 'of', 'Belle', 'Haven,', 'Greenwich,', 'Connecticut.', 'On', 'the']


## Paso 6: Entrenamos con Word2Vec

In [14]:
w2v_model = Word2Vec(
    sentences=documents,
    vector_size=100,     # dimensión del embedding
    window=5,
    min_count=2,
    workers=4
)

print("Tamaño del vocabulario W2V:", len(w2v_model.wv))

Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'


Tamaño del vocabulario W2V: 169860


## Paso 7: Generamos los vectores por reseña

In [16]:
def doc_vector(tokens, model):
    vecs = [model.wv[word] for word in tokens if word in model.wv]
    if len(vecs) == 0:
        return np.zeros(model.vector_size)
    return np.mean(vecs, axis=0)

X_w2v = np.array([doc_vector(tokens, w2v_model) for tokens in documents])

print("Shape X_w2v:", X_w2v.shape)   # (50000, 100)

Shape X_w2v: (50000, 100)


## Paso 8: Se separan los datos con el 70% de los datos para entrenamiento y el 30% para testing

In [18]:
X_train, X_test, y_train, y_test = train_test_split(
    X_w2v, array_df1, train_size=0.7, random_state=0
)

## Paso 9: Entrenamos nuestro modelo con una red neuronal NNBP.

In [37]:
clf = MLPClassifier(
    hidden_layer_sizes=(100,), #le metemos 100 capas ocultas
    activation='relu',
    solver='adam',
    max_iter=1000, #le metemos 1000 iteraciones como maximo para la convergencia del resultado, en caso contrario necesitara mas para mejorar
    verbose=True
)

clf.fit(X_train, y_train)

Iteration 1, loss = 0.53785050
Iteration 2, loss = 0.42513597
Iteration 3, loss = 0.40941139
Iteration 4, loss = 0.40556910
Iteration 5, loss = 0.40082143
Iteration 6, loss = 0.40064752
Iteration 7, loss = 0.39686716
Iteration 8, loss = 0.39381751
Iteration 9, loss = 0.39255356
Iteration 10, loss = 0.39111039
Iteration 11, loss = 0.38849794
Iteration 12, loss = 0.38870985
Iteration 13, loss = 0.38626970
Iteration 14, loss = 0.38524051
Iteration 15, loss = 0.38297339
Iteration 16, loss = 0.38151113
Iteration 17, loss = 0.38118693
Iteration 18, loss = 0.37963967
Iteration 19, loss = 0.37873512
Iteration 20, loss = 0.37794953
Iteration 21, loss = 0.37696505
Iteration 22, loss = 0.37501456
Iteration 23, loss = 0.37369126
Iteration 24, loss = 0.37233911
Iteration 25, loss = 0.37343394
Iteration 26, loss = 0.37029414
Iteration 27, loss = 0.37140508
Iteration 28, loss = 0.37128338
Iteration 29, loss = 0.36811127
Iteration 30, loss = 0.36733286
Iteration 31, loss = 0.36574213
Iteration 32, los

## Paso 10: Imprimimos nuestros resultados.

In [39]:
y_pred = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.8043333333333333
              precision    recall  f1-score   support

           0       0.77      0.85      0.81      7342
           1       0.84      0.76      0.80      7658

    accuracy                           0.80     15000
   macro avg       0.81      0.81      0.80     15000
weighted avg       0.81      0.80      0.80     15000



## Referencias
### Russell S.J., Norvig P. (2020). Artifical Intelligence: A Modern Approach, Prentine-Hall, 4th Edition, Englewood Cliffs, NJ, 2020.
### Kevin (2025). NEURONA DEl Modelo de McCulloch Pitts. https://www.youtube.com/watch?v=ROErFR3X4x0 (Ultimo acceso Febrero 2025)
### OpenAI. (2025). ChatGPT (modelo GPT-5.2)[Modelo de lenguaje grande].https://chatgpt.com/