## Introduction
This lab demonstrates an active learning technique to learn handwritten digits using label propagation. The Label Propagation is a semi-supervised learning method that uses a graph-based approach to propagate labels across data points. Active learning is a process that allows us to iteratively select data points to label, and use these labeled points to retrain the model.

## Load the Digits Dataset
We will start by loading the digits dataset from scikit-learn library.

In [24]:
from sklearn import datasets

digits = datasets.load_digits()

## Shuffle and Split Data
Next, we will shuffle and split the dataset into labeled and unlabeled parts. We will start with only 10 labeled points.

In [25]:
import numpy as np

rng = np.random.RandomState(0)
indices = np.arange(len(digits.data))
rng.shuffle(indices)

X = digits.data[indices[:330]]
y = digits.target[indices[:330]]
images = digits.images[indices[:330]]

y_train = np.full(n_total_samples, -1)
y_train[:n_labeled_points] = y[:n_labeled_points]

n_total_samples = len(y)
n_labeled_points = 10
unlabeled_indices = np.arange(n_total_samples)[n_labeled_points:]

## Train Label Propagation Model
We will now train a label propagation model with the labeled data points and use it to predict the labels of the remaining unlabeled data points.

In [26]:
from sklearn.semi_supervised import LabelSpreading

lp_model = LabelSpreading(gamma=0.25, max_iter=20)
lp_model.fit(X, y_train)

## Select Most Uncertain Points
We will select the top five most uncertain points based on their predicted label distributions and request human labels for them.

In [29]:
from scipy import stats

pred_entropies = stats.entropy(lp_model.label_distributions_.T)
uncertainty_index = np.argsort(pred_entropies)[::-1]
uncertainty_index = uncertainty_index[np.isin(uncertainty_index, unlabeled_indices)][:5]


## Label the Most Uncertain Points
We will add the human labels to the labeled data points and train the model with them.

In [30]:
y_train[uncertainty_index] = y[uncertainty_index]
lp_model.fit(X, y_train)

## Repeat
We will repeat the process of selecting the top five most uncertain points, adding their labels to the labeled data points, and training the model until we have 30 labeled data points.

In [31]:
max_iterations = 4  # 每輪新增5筆，最多跑4輪會到30筆

for i in range(max_iterations):
    if len(unlabeled_indices) == 0:
        print("No unlabeled items left to label.")
        break

    pred_entropies = stats.entropy(lp_model.label_distributions_.T)
    uncertainty_index = np.argsort(pred_entropies)[::-1]
    uncertainty_index = uncertainty_index[np.isin(uncertainty_index, unlabeled_indices)][:5]

    print(f"Iteration {i + 1}: labeling indices {uncertainty_index}")

    y_train[uncertainty_index] = y[uncertainty_index]  # 模擬人類標註
    lp_model.fit(X, y_train)

    # 從未標註中刪除已經新增的
    unlabeled_indices = np.setdiff1d(unlabeled_indices, uncertainty_index, assume_unique=True)
    n_labeled_points += len(uncertainty_index)


Iteration 1: labeling indices [104 251 103 304 189]
Iteration 2: labeling indices [308 184  85  57 169]
Iteration 3: labeling indices [149 227  27  33 109]
Iteration 4: labeling indices [115  91  93  20  64]


## Summary
In summary, this lab demonstrated an active learning technique using Label Propagation to learn handwritten digits. We started by training a label propagation model with only 10 labeled points, and iteratively selected the top five most uncertain points to label until we had 30 labeled data points. This active learning technique can be useful to minimize the number of labeled data points required to train a model while maximizing its performance.