# Lab 3: Contextual Bandit-Based News Article Recommendation

**`Course`:** Reinforcement Learning Fundamentals  
**`Student Name`:*Trusha Maheshwari*  
**`Roll Number`:*U20230139*  
**`GitHub Branch`:** firstname_U20230xxx  

# Imports and Setup

In [1]:
pip install rlcmab-sampler

Collecting rlcmab-sampler
  Downloading rlcmab_sampler-1.0.1-py3-none-any.whl.metadata (1.7 kB)
Collecting numpy>=2.4.2 (from rlcmab-sampler)
  Downloading numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (6.6 kB)
Collecting scipy>=1.17.0 (from rlcmab-sampler)
  Downloading scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rlcmab_sampler-1.0.1-py3-none-any.whl (2.8 kB)
Downloading numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m84.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (35.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.0/35.0 MB[0m [31m18.7 MB/s[0m eta [

In [1]:
import os
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from dataclasses import dataclass
from typing import Dict, Tuple, List, Optional

from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.linear_model import LogisticRegression

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

from rlcmab_sampler import sampler


# Load Datasets

In [8]:
# Load datasets
news_df = pd.read_csv("/news_articles (1).csv")
train_users = pd.read_csv("/train_users (1).csv")
test_users = pd.read_csv("/test_users (1).csv")

print(news_df.head())
print(train_users.head())


                                                link  \
0  https://www.huffpost.com/entry/covid-boosters-...   
1  https://www.huffpost.com/entry/american-airlin...   
2  https://www.huffpost.com/entry/funniest-tweets...   
3  https://www.huffpost.com/entry/funniest-parent...   
4  https://www.huffpost.com/entry/amy-cooper-lose...   

                                            headline   category  \
0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS   
1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   
2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   
3  The Funniest Tweets From Parents This Week (Se...  PARENTING   
4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   

                                   short_description               authors  \
0  Health experts said it is too early to predict...  Carla K. Johnson, AP   
1  He was subdued by passengers and crew when he ...        Mary Papenfuss   
2  "Until you have a dog y

## Data Preprocessing

In this section:
- Handle missing values
- Encode categorical features
- Prepare data for user classification

In [9]:
print("Train users - missing values (top 10):")
print(train_users.isna().sum().sort_values(ascending=False).head(10))
print("\nNews articles - missing values:")
print(news_df.isna().sum())

user_df = train_users.copy()

y = user_df["label"]
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
print("\nLabel mapping (class -> encoded):", dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_))))

X = user_df.drop(columns=["user_id", "label"])

numeric_cols = X.select_dtypes(include=["int64", "float64"]).columns.tolist()
categorical_cols = X.select_dtypes(include=["object", "bool"]).columns.tolist()

print("\nNumeric feature columns (", len(numeric_cols), "):")
print(numeric_cols)
print("\nCategorical feature columns (", len(categorical_cols), "):")
print(categorical_cols)

# preprocessing
numeric_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
])

categorical_transformer = Pipeline(steps=[
    ("onehot", OneHotEncoder(handle_unknown="ignore")),
])

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_cols),
        ("cat", categorical_transformer, categorical_cols),
    ]
)

news_df = news_df.copy()
news_df["category"] = news_df["category"].astype(str).str.strip()
news_category_encoder = LabelEncoder()
news_df["category_encoded"] = news_category_encoder.fit_transform(news_df["category"])

print("\nUnique news categories:")
print(news_df["category"].value_counts().head(10))

Train users - missing values (top 10):
age                  698
user_id                0
income                 0
clicks                 0
purchase_amount        0
session_duration       0
content_variety        0
engagement_score       0
num_transactions       0
avg_monthly_spend      0
dtype: int64

News articles - missing values:
link                    0
headline                0
category                0
short_description     313
authors              2438
date                    1
dtype: int64

Label mapping (class -> encoded): {'user_1': np.int64(0), 'user_2': np.int64(1), 'user_3': np.int64(2)}

Numeric feature columns ( 28 ):
['age', 'income', 'clicks', 'purchase_amount', 'session_duration', 'content_variety', 'engagement_score', 'num_transactions', 'avg_monthly_spend', 'avg_cart_value', 'browsing_depth', 'revisit_rate', 'scroll_activity', 'time_on_site', 'interaction_count', 'preferred_price_range', 'discount_usage_rate', 'wishlist_size', 'product_views', 'repeat_purchase_gap 

In [11]:
X_train, X_val, y_train, y_val = train_test_split(
    X,
    y_encoded,
    test_size=0.2,
    random_state=42,
    stratify=y_encoded,
)

print("Train size:", X_train.shape, "Validation size:", X_val.shape)

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

candidate_models = {
    "logreg_C0.5": Pipeline(steps=[
        ("preprocess", preprocessor),
        ("clf", LogisticRegression(max_iter=2000, C=0.5)),
    ]),
    "logreg_C1.0": Pipeline(steps=[
        ("preprocess", preprocessor),
        ("clf", LogisticRegression(max_iter=2000, C=1.0)),
    ]),
    "tree_depth4": Pipeline(steps=[
        ("preprocess", preprocessor),
        ("clf", DecisionTreeClassifier(max_depth=4, random_state=42)),
    ]),
    "tree_depth6": Pipeline(steps=[
        ("preprocess", preprocessor),
        ("clf", DecisionTreeClassifier(max_depth=6, random_state=42)),
    ]),
}

results = {}
best_name = None
best_model = None
best_acc = -1.0

for name, model in candidate_models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    acc = accuracy_score(y_val, y_pred)
    results[name] = acc
    if acc > best_acc:
        best_acc = acc
        best_name = name
        best_model = model

print("\nValidation accuracies by model:")
for name, acc in results.items():
    print(f"  {name}: {acc:.4f}")

print(f"\nBest model on validation set: {best_name} (accuracy = {best_acc:.4f})")

# best model used as the final context classifier
y_val_pred = best_model.predict(X_val)

print("\nClassification report (validation set, best model):")
print(classification_report(y_val, y_val_pred, target_names=label_encoder.classes_))

cm = confusion_matrix(y_val, y_val_pred)
print("\nConfusion matrix (rows=true, cols=pred) for best model:")
print(cm)

context_classifier = best_model
context_label_encoder = label_encoder

Train size: (1600, 31) Validation size: (400, 31)


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(



Validation accuracies by model:
  logreg_C0.5: 0.7400
  logreg_C1.0: 0.7425
  tree_depth4: 0.8250
  tree_depth6: 0.8750

Best model on validation set: tree_depth6 (accuracy = 0.8750)

Classification report (validation set, best model):
              precision    recall  f1-score   support

      user_1       0.82      0.89      0.86       142
      user_2       0.98      0.85      0.91       142
      user_3       0.83      0.88      0.85       116

    accuracy                           0.88       400
   macro avg       0.88      0.88      0.87       400
weighted avg       0.88      0.88      0.88       400


Confusion matrix (rows=true, cols=pred) for best model:
[[127   0  15]
 [ 15 121   6]
 [ 12   2 102]]


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


## User Classification

Train a classifier to predict the user category (`User1`, `User2`, `User3`),
which serves as the **context** for the contextual bandit.


# `Contextual Bandit`

## Reward Sampler Initialization

The sampler is initialized using the student's roll number `i`.
Rewards are obtained using `sampler.sample(j)`.


## Arm Mapping

| Arm Index (j) | News Category | User Context |
|--------------|---------------|--------------|
| 0–3          | Entertainment, Education, Tech, Crime | User1 |
| 4–7          | Entertainment, Education, Tech, Crime | User2 |
| 8–11         | Entertainment, Education, Tech, Crime | User3 |

## Epsilon-Greedy Strategy

This section implements the epsilon-greedy contextual bandit algorithm.


## Upper Confidence Bound (UCB)

This section implements the UCB strategy for contextual bandits.

## SoftMax Strategy

This section implements the SoftMax strategy with temperature $ \tau = 1$.


## Reinforcement Learning Simulation

We simulate the bandit algorithms for $T = 10,000$ steps and record rewards.

P.S.: Change $T$ value as and if required.


## Results and Analysis

This section presents:
- Average Reward vs Time
- Hyperparameter comparisons
- Observations and discussion


## Final Observations

- Comparison of Epsilon-Greedy, UCB, and SoftMax
- Effect of hyperparameters
- Strengths and limitations of each approach
