# Lab 3: Contextual Bandit-Based News Article Recommendation

**`Course`:** Reinforcement Learning Fundamentals  
**`Student Name`:**  
**`Roll Number`:**  
**`GitHub Branch`:** firstname_U20230xxx  

# Imports and Setup

In [2]:
pip install rlcmab-sampler

Collecting rlcmab-sampler
  Downloading rlcmab_sampler-1.0.1-py3-none-any.whl.metadata (1.7 kB)
Collecting numpy>=2.4.2 (from rlcmab-sampler)
  Downloading numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (6.6 kB)
Collecting scipy>=1.17.0 (from rlcmab-sampler)
  Downloading scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rlcmab_sampler-1.0.1-py3-none-any.whl (2.8 kB)
Downloading numpy-2.4.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (35.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.0/35.0 MB[0m [31m17.9 MB/s[0m eta [

In [7]:
import os
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from dataclasses import dataclass
from typing import Dict, Tuple, List, Optional

from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.linear_model import LogisticRegression

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

from rlcmab_sampler import sampler


# Load Datasets

In [5]:
# Load datasets
news_df = pd.read_csv("news_articles.csv")
train_users = pd.read_csv("train_users.csv")
test_users = pd.read_csv("test_users.csv")

print(news_df.head())
print(train_users.head())


                                                link  \
0  https://www.huffpost.com/entry/covid-boosters-...   
1  https://www.huffpost.com/entry/american-airlin...   
2  https://www.huffpost.com/entry/funniest-tweets...   
3  https://www.huffpost.com/entry/funniest-parent...   
4  https://www.huffpost.com/entry/amy-cooper-lose...   

                                            headline   category  \
0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS   
1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   
2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   
3  The Funniest Tweets From Parents This Week (Se...  PARENTING   
4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   

                                   short_description               authors  \
0  Health experts said it is too early to predict...  Carla K. Johnson, AP   
1  He was subdued by passengers and crew when he ...        Mary Papenfuss   
2  "Until you have a dog y

## Data Preprocessing

In this section:
- Handle missing values
- Encode categorical features
- Prepare data for user classification

In [8]:
ROLL_NUMBER = 139

PATH_TRAIN_USERS = "train_users.csv"
PATH_TEST_USERS  = "test_users.csv"
PATH_ARTICLES    = "news_articles.csv"

T = 10_000

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)

In [9]:
USER_CONTEXTS = ["User1", "User2", "User3"]
NEWS_CATEGORIES = ["Entertainment", "Education", "Tech", "Crime"]

CATEGORY_TO_IDX = {c: i for i, c in enumerate(NEWS_CATEGORIES)}
IDX_TO_CATEGORY = {i: c for c, i in CATEGORY_TO_IDX.items()}

CONTEXT_TO_IDX = {u: i for i, u in enumerate(USER_CONTEXTS)}
IDX_TO_CONTEXT = {i: u for u, i in CONTEXT_TO_IDX.items()}

def arm_index(context: str, category: str) -> int:
    """
    Maps (context, category) -> j in [0..11] according to PDF table:
    User1: 0..3, User2: 4..7, User3: 8..11
    in order: Entertainment, Education, Tech, Crime
    """
    ci = CONTEXT_TO_IDX[context]          # 0,1,2
    ai = CATEGORY_TO_IDX[category]        # 0..3
    return ci * 4 + ai

In [11]:
def load_csv_or_raise(path: str) -> pd.DataFrame:
    if not os.path.exists(path):
        raise FileNotFoundError(f"Could not find file: {path}")
    return pd.read_csv(path)

train_users = load_csv_or_raise(PATH_TRAIN_USERS)
test_users  = load_csv_or_raise(PATH_TEST_USERS)
articles    = load_csv_or_raise(PATH_ARTICLES)

POSSIBLE_USER_LABEL_COLS = ["classifying", "class", "label", "user_class", "user_type"]
user_label_col = next((c for c in POSSIBLE_USER_LABEL_COLS if c in train_users.columns), None)
if user_label_col is None:
    raise ValueError(
        f"Could not find user label column. Tried: {POSSIBLE_USER_LABEL_COLS}. "
        f"Columns found: {list(train_users.columns)}"
    )

if "category" not in articles.columns:
    raise ValueError(f"news_articles.csv must have a 'category' column. Columns: {list(articles.columns)}")

In [24]:
from sklearn.tree import DecisionTreeClassifier

def build_user_classifier(train_df: pd.DataFrame, label_col: str) -> Tuple[Pipeline, List[str]]:
    X = train_df.drop(columns=[label_col])
    y = train_df[label_col].astype(str)

    numeric_cols = [c for c in X.columns if pd.api.types.is_numeric_dtype(X[c])]
    categorical_cols = [c for c in X.columns if c not in numeric_cols]

    numeric_transformer = Pipeline(steps=[
        ("imputer", SimpleImputer(strategy="median"))
    ])

    categorical_transformer = Pipeline(steps=[
        ("imputer", SimpleImputer(strategy="most_frequent")),
        ("onehot", OneHotEncoder(handle_unknown="ignore"))
    ])

    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numeric_transformer, numeric_cols),
            ("cat", categorical_transformer, categorical_cols)
        ],
        remainder="drop"
    )

    # Decision Tree Classifier (stronger than plain logistic regression on many tabular sets)
    clf = DecisionTreeClassifier(
        max_depth=None,          # you can tune: 5, 8, 12, None
        min_samples_split=12,    # you can tune: 2, 5, 10
        min_samples_leaf=2,      # you can tune: 1, 2, 5
        class_weight="balanced", # helps if User1/User2/User3 are imbalanced
        random_state=42
    )

    model = Pipeline(steps=[
        ("preprocess", preprocessor),
        ("clf", clf)
    ])

    return model, X.columns.tolist()


# Train classifier
user_model, user_feature_cols = build_user_classifier(train_users, user_label_col)
X_train = train_users.drop(columns=[user_label_col])
y_train = train_users[user_label_col].astype(str)
user_model.fit(X_train, y_train)

# Evaluate on test_users
if user_label_col in test_users.columns:
    X_test = test_users.drop(columns=[user_label_col])
    y_test = test_users[user_label_col].astype(str)
    y_pred = user_model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("=== User Classifier Evaluation (Decision Tree) ===")
    print("Accuracy:", acc)
    print("\nClassification Report:\n", classification_report(y_test, y_pred))
    print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
else:
    print("NOTE: test_users.csv has no label column; classifier accuracy cannot be computed.")
    X_test = test_users.copy()
    y_pred = user_model.predict(X_test)
    print("Predictions generated for test_users (first 10):", y_pred[:10])


=== User Classifier Evaluation (Decision Tree) ===
Accuracy: 0.334

Classification Report:
               precision    recall  f1-score   support

       User1       0.33      0.33      0.33       672
       User2       0.37      0.35      0.36       679
       User3       0.31      0.32      0.31       649

    accuracy                           0.33      2000
   macro avg       0.33      0.33      0.33      2000
weighted avg       0.33      0.33      0.33      2000


Confusion Matrix:
 [[224 216 232]
 [220 238 221]
 [245 198 206]]


## User Classification

Train a classifier to predict the user category (`User1`, `User2`, `User3`),
which serves as the **context** for the contextual bandit.


# `Contextual Bandit`

## Reward Sampler Initialization

The sampler is initialized using the student's roll number `i`.
Rewards are obtained using `sampler.sample(j)`.


## Arm Mapping

| Arm Index (j) | News Category | User Context |
|--------------|---------------|--------------|
| 0–3          | Entertainment, Education, Tech, Crime | User1 |
| 4–7          | Entertainment, Education, Tech, Crime | User2 |
| 8–11         | Entertainment, Education, Tech, Crime | User3 |

## Epsilon-Greedy Strategy

This section implements the epsilon-greedy contextual bandit algorithm.


## Upper Confidence Bound (UCB)

This section implements the UCB strategy for contextual bandits.

## SoftMax Strategy

This section implements the SoftMax strategy with temperature $ \tau = 1$.


## Reinforcement Learning Simulation

We simulate the bandit algorithms for $T = 10,000$ steps and record rewards.

P.S.: Change $T$ value as and if required.


## Results and Analysis

This section presents:
- Average Reward vs Time
- Hyperparameter comparisons
- Observations and discussion


## Final Observations

- Comparison of Epsilon-Greedy, UCB, and SoftMax
- Effect of hyperparameters
- Strengths and limitations of each approach
