for each concept:
    track a student's latent mastery probability
    update it after each attempt
    predict next-step correctness
    produce interpretable parameters

In [None]:
# load data and libraries

import pandas as pd
import numpy as np

data = pd.read_csv("../data/interactions.csv")
data.head()

In [None]:
# define BKT parameters

P_INIT = 0.2     # initial mastery
P_LEARN = 0.15   # learning probability
P_SLIP = 0.1     # slip probability
P_GUESS = 0.2    # guess probability

In [None]:
# BKT update functions

def predict_correct(p_mastery):
    return p_mastery * (1 - P_SLIP) + (1 - p_mastery) * P_GUESS

In [None]:
def update_mastery(p_mastery, correct):
    if correct:
        num = p_mastery * (1 - P_SLIP)
        den = num + (1 - p_mastery) * P_GUESS
    else:
        num = p_mastery * P_SLIP
        den = num + (1 - p_mastery) * (1 - P_GUESS)

    p_posterior = num / den
    return p_posterior + (1 - p_posterior) * P_LEARN

In [None]:
# Train and evaluate BKT

predictions = []
labels = []

for (student, concept), group in data.groupby(["student_id", "concept_id"]):
    p_mastery = P_INIT
    group = group.sort_values("time_step")

    for _, row in group.iterrows():
        p_pred = predict_correct(p_mastery)
        predictions.append(p_pred)
        labels.append(row["correct"])

        p_mastery = update_mastery(p_mastery, row["correct"])

In [None]:
# Compute evaluation metrics

from sklearn.metrics import accuracy_score, log_loss

binary_preds = [1 if p >= 0.5 else 0 for p in predictions]

accuracy = accuracy_score(labels, binary_preds)
loss = log_loss(labels, predictions)

accuracy, loss



Per concept modeling
Interpretable parameters
Sequential updates
Standard evaluation metrics
Same data as CBM and DKT

The model correctly predicted the next response about 64.5% of the time. 

in log loss, lower the value the better. it is shown that how well the predicted probabilities match actual outcomes.

the model shows that it is confident when it should be. predictions are calibrated.

Probabilities matter more than ranking in this context. Hence, AUC & ROC are not used here. they are optional.

In this context, calibration > discrimination.

Predictions are sequential.