*This notebook is from https://github.com/neubig/anlp-code* by Graham Neubig

We added additional printing of feature weights in the Error Analysis section.

# Training a Sentiment Classifier

This is a notebook  that trains a sentiment classifier based on data. Specifically, it uses a bag-of-words to extract features, and the structured perceptron algorithm to train the classifier.

It will take in a text `X` and return a `label` of "1" if the sentiment of the text is positive, "-1" if the sentiment of the text is negative, and "0" if the sentiment of the text is neutral. You can test the accuracy of your classifier on the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/index.html) by running the notebook all the way to end.

## Setup

Setup code, do imports.

In [None]:
import random
import tqdm

## Feature Extraction

Feature extraction code, how do we get the features we use in training? By default we just use every word.

In [None]:
def extract_features(x: str) -> dict[str, float]:
    features = {}
    x_split = x.split(' ')
    for x in x_split:
        features[x] = features.get(x, 0) + 1.0
    return features

Also, initialize the feature weights to zero.

In [None]:

feature_weights = {}

## Data Reading

Read in the data from the training and dev (or finally test) sets

In [None]:
def read_xy_data(filename: str) -> tuple[list[str], list[int]]:
    x_data = []
    y_data = []
    with open(filename, 'r') as f:
        for line in f:
            label, text = line.strip().split(' ||| ')
            x_data.append(text)
            y_data.append(int(label))
    return x_data, y_data

In [None]:
x_train, y_train = read_xy_data('./data/train.txt')
x_dev, y_dev = read_xy_data('./data/dev.txt')

In [None]:
print(x_train[0])
print(y_train[0])

The Rock is destined to be the 21st Century 's new `` Conan '' and that he 's going to make a splash even greater than Arnold Schwarzenegger , Jean-Claud Van Damme or Steven Segal .
1


## Inference Code

How we run the classifier.

In [None]:
def run_classifier(features: dict[str, float]) -> int:
    score = 0
    for feat_name, feat_value in features.items():
        score = score + feat_value * feature_weights.get(feat_name, 0)
    if score > 0:
        return 1
    elif score < 0:
        return -1
    else:
        return 0

## Training Code

Learn the weights of the classifier.

In [None]:
NUM_EPOCHS = 5
for epoch in range(1, NUM_EPOCHS+1):
    # Shuffle the order of the data
    data_ids = list(range(len(x_train)))
    random.shuffle(data_ids)
    # Run over all data points
    for data_id in tqdm.tqdm(data_ids, desc=f'Epoch {epoch}'):
        x = x_train[data_id]
        y = y_train[data_id]
        # We will skip neutral examples
        if y == 0:    
            continue
        # Make a prediction
        features = extract_features(x)
        predicted_y = run_classifier(features)
        # Update the weights if the prediction is wrong
        if predicted_y != y:
            for feature in features:
                feature_weights[feature] = feature_weights.get(feature, 0) + y * features[feature]
                print (feature_weights[feature])

Epoch 1: 100%|██████████| 8544/8544 [00:00<00:00, 81967.74it/s]


1.0
4.0
2.0
-1.0
-1.0
0.0
5.0
3.0
-1.0
2.0
3.0
1.0
-1.0
-2.0
-1.0
-4.0
1.0
3.0
1.0
3.0
1.0
4.0
1.0
2.0
1.0
-2.0
6.0
2.0
-1.0
-5.0
-1.0
1.0
3.0
0.0
1.0
3.0
0.0
-3.0
3.0
2.0
2.0
1.0
1.0
-4.0
-3.0
4.0
1.0
2.0
0.0
-2.0
-3.0
0.0
2.0
-2.0
1.0
1.0
2.0
1.0
1.0
0.0
0.0
3.0
1.0
-3.0
2.0
2.0
5.0
2.0
2.0
0.0
2.0
-1.0
1.0
6.0
1.0
-2.0
-5.0
5.0
1.0
0.0
-1.0
-3.0
0.0
-2.0
-2.0
-4.0
-4.0
0.0
-3.0
-4.0
-2.0
-3.0
1.0
1.0
-3.0
0.0
2.0
2.0
-1.0
4.0
0.0
-2.0
-1.0
-2.0
0.0
0.0
3.0
-2.0
2.0
-5.0
2.0
-4.0
0.0
4.0
0.0
-2.0
4.0
3.0
-1.0
8.0
2.0
2.0
4.0
-2.0
4.0
1.0
-1.0
-2.0
0.0
1.0
1.0
4.0
2.0
-1.0
1.0
-2.0
-3.0
2.0
-1.0
-3.0
-3.0
-3.0
2.0
-7.0
-1.0
3.0
-5.0
-3.0
-4.0
-2.0
-1.0
-3.0
0.0
-1.0
0.0
2.0
-1.0
1.0
-4.0
-4.0
-2.0
1.0
1.0
-3.0
3.0
2.0
-3.0
0.0
0.0
3.0
0.0
3.0
3.0
-6.0
5.0
-3.0
-4.0
-1.0
-2.0
-4.0
2.0
1.0
2.0
1.0
4.0
0.0
2.0
5.0
1.0
1.0
-3.0
2.0
-4.0
0.0
3.0
7.0
-2.0
1.0
-3.0
5.0
6.0
6.0
4.0
-3.0
1.0
2.0
3.0
1.0
4.0
-1.0
2.0
-1.0
3.0
1.0
-1.0
-2.0
3.0
1.0
-2.0
-1.0
-4.0
1.0
-3.0
4.0
2.0
-1.0
2.0
2.0
-3

Epoch 2: 100%|██████████| 8544/8544 [00:00<00:00, 173084.63it/s]


1.0
-3.0
-4.0
-2.0
2.0
-3.0
-3.0
-3.0
5.0
-1.0
-3.0
-4.0
-4.0
4.0
2.0
1.0
-1.0
-3.0
1.0
1.0
-3.0
0.0
3.0
-5.0
-2.0
-1.0
0.0
4.0
-1.0
7.0
2.0
5.0
9.0
0.0
1.0
0.0
4.0
2.0
0.0
0.0
-11.0
0.0
1.0
-3.0
-5.0
-3.0
0.0
-2.0
2.0
3.0
0.0
-3.0
-2.0
-5.0
3.0
-6.0
-3.0
2.0
-3.0
-12.0
0.0
1.0
-1.0
-1.0
5.0
-3.0
-1.0
0.0
-3.0
-1.0
1.0
6.0
4.0
1.0
0.0
-1.0
2.0
-1.0
3.0
2.0
-2.0
3.0
3.0
-1.0
-2.0
6.0
0.0
1.0
-4.0
0.0
3.0
1.0
1.0
1.0
3.0
-3.0
1.0
-5.0
-1.0
-3.0
-6.0
-3.0
-1.0
6.0
0.0
2.0
2.0
-1.0
4.0
0.0
4.0
-2.0
4.0
-4.0
-3.0
0.0
0.0
1.0
-1.0
-1.0
-2.0
-2.0
2.0
-3.0
-1.0
0.0
-1.0
-5.0
2.0
-2.0
-2.0
-1.0
0.0
0.0
2.0
-1.0
2.0
-2.0
2.0
-1.0
5.0
1.0
-1.0
-1.0
7.0
0.0
1.0
8.0
0.0
1.0
-8.0
4.0
4.0
1.0
0.0
1.0
-6.0
-2.0
1.0
5.0
0.0
3.0
2.0
1.0
1.0
0.0
2.0
2.0
2.0
0.0
3.0
3.0
-3.0
-1.0
-2.0
3.0
4.0
1.0
3.0
-1.0
1.0
2.0
0.0
-6.0
-4.0
3.0
0.0
-1.0
-1.0
-1.0
-3.0
-3.0
2.0
-1.0
-1.0
-4.0
-3.0
0.0
6.0
1.0
0.0
1.0
3.0
-2.0
2.0
0.0
-2.0
7.0
1.0
0.0
8.0
3.0
0.0
0.0
-3.0
1.0
3.0
-2.0
2.0
3.0
-3.0
1.0
-4.0
1.0
-6.0
-2.0


Epoch 3: 100%|██████████| 8544/8544 [00:00<00:00, 188434.69it/s]


0.0
1.0
4.0
-2.0
-1.0
1.0
9.0
4.0
1.0
-5.0
-1.0
-1.0
2.0
3.0
-1.0
3.0
3.0
1.0
-3.0
-6.0
2.0
8.0
-1.0
3.0
-6.0
-1.0
-7.0
0.0
-4.0
-5.0
6.0
-2.0
-4.0
1.0
-5.0
-3.0
-3.0
4.0
0.0
-5.0
0.0
0.0
-5.0
4.0
1.0
-3.0
1.0
-2.0
-1.0
0.0
0.0
4.0
-6.0
1.0
6.0
5.0
-2.0
2.0
5.0
3.0
-1.0
-4.0
3.0
-1.0
3.0
0.0
2.0
3.0
4.0
5.0
2.0
0.0
3.0
2.0
0.0
0.0
-6.0
0.0
5.0
2.0
-4.0
5.0
-1.0
2.0
1.0
3.0
1.0
-5.0
-3.0
1.0
3.0
-6.0
-1.0
-2.0
0.0
4.0
7.0
-2.0
3.0
6.0
2.0
3.0
5.0
2.0
-1.0
1.0
-3.0
-4.0
-1.0
-2.0
-3.0
2.0
0.0
1.0
-3.0
-4.0
3.0
1.0
-4.0
3.0
-1.0
0.0
-3.0
0.0
-6.0
6.0
1.0
-2.0
-2.0
0.0
0.0
-4.0
-2.0
3.0
0.0
-1.0
5.0
-3.0
-4.0
-2.0
-1.0
2.0
-3.0
1.0
1.0
0.0
-2.0
1.0
0.0
0.0
-6.0
-4.0
-1.0
5.0
0.0
-6.0
-6.0
2.0
-2.0
0.0
3.0
1.0
0.0
-4.0
8.0
1.0
0.0
1.0
2.0
2.0
-2.0
2.0
3.0
2.0
4.0
-3.0
0.0
5.0
-5.0
-1.0
2.0
2.0
-1.0
2.0
2.0
-2.0
-4.0
1.0
0.0
3.0
1.0
-2.0
-7.0
-1.0
-1.0
3.0
1.0
2.0
0.0
-3.0
-4.0
1.0
2.0
0.0
1.0
-3.0
0.0
-2.0
-5.0
0.0
4.0
-1.0
-1.0
2.0
3.0
1.0
3.0
0.0
-3.0
0.0
1.0
3.0
1.0
-1.0
-3.0
-8.0
3.0
-4

Epoch 4: 100%|██████████| 8544/8544 [00:00<00:00, 206102.85it/s]

-1.0
-2.0
0.0
-8.0
-1.0
2.0
1.0
-2.0
0.0
1.0
-1.0
-3.0
2.0
1.0
3.0
-1.0
0.0
-1.0
0.0
4.0
-7.0
-1.0
-5.0
0.0
0.0
-1.0
-2.0
0.0
1.0
-2.0
2.0
0.0
0.0
3.0
1.0
6.0
4.0
1.0
-1.0
0.0
-2.0
-1.0
-2.0
6.0
-4.0
-1.0
6.0
-6.0
-2.0
-3.0
0.0
-2.0
0.0
-4.0
0.0
6.0
3.0
-1.0
-2.0
-2.0
-4.0
-1.0
-3.0
-3.0
-1.0
4.0
-5.0
2.0
2.0
4.0
-2.0
0.0
-1.0
3.0
4.0
0.0
1.0
-2.0
0.0
-2.0
1.0
-3.0
-1.0
-2.0
-1.0
8.0
3.0
-1.0
-1.0
-6.0
4.0
0.0
-3.0
-1.0
-1.0
-4.0
7.0
5.0
1.0
-2.0
0.0
0.0
-1.0
-4.0
2.0
-2.0
-3.0
-10.0
-1.0
0.0
3.0
-1.0
-2.0
-1.0
3.0
-2.0
4.0
1.0
3.0
-1.0
-1.0
3.0
0.0
3.0
0.0
0.0
7.0
4.0
-2.0
-1.0
-1.0
-2.0
0.0
-1.0
-2.0
-5.0
-2.0
-1.0
-1.0
-2.0
-3.0
-2.0
-1.0
-2.0
-4.0
1.0
4.0
-5.0
-2.0
-1.0
0.0
-1.0
-1.0
7.0
1.0
0.0
5.0
2.0
1.0
0.0
-3.0
0.0
0.0
-2.0
7.0
2.0
2.0
-1.0
-1.0
1.0
8.0
-1.0
-1.0
3.0
5.0
0.0
0.0
0.0
1.0
5.0
-3.0
3.0
-2.0
0.0
-1.0
1.0
4.0
1.0
2.0
-3.0
-1.0
-3.0
-1.0
0.0
1.0
-1.0
-4.0
0.0
0.0
1.0
-3.0
0.0
1.0
5.0
-2.0
0.0
-9.0
-1.0
8.0
-4.0
-6.0
7.0
0.0
-2.0
1.0
0.0
2.0
3.0
3.0
-2.0
2.0
2.0
10.0


Epoch 5: 100%|██████████| 8544/8544 [00:00<00:00, 199105.12it/s]

0.0
-1.0
-1.0
-1.0
0.0
10.0
-2.0
-6.0
7.0
3.0
8.0
3.0
8.0
1.0
2.0
-4.0
0.0
8.0
-2.0
-8.0
-11.0
4.0
-1.0
-1.0
-4.0
-4.0
0.0
2.0
3.0
-1.0
0.0
0.0
0.0
-6.0
-2.0
2.0
0.0
-2.0
-11.0
-2.0
6.0
3.0
-5.0
-2.0
2.0
3.0
0.0
-3.0
1.0
-1.0
-2.0
-1.0
0.0
2.0
0.0
8.0
0.0
3.0
-1.0
-4.0
-3.0
1.0
1.0
4.0
-4.0
-2.0
-1.0
1.0
0.0
-1.0
1.0
11.0
-1.0
-2.0
3.0
3.0
0.0
-1.0
2.0
-2.0
-1.0
-3.0
-2.0
1.0
-3.0
-4.0
0.0
0.0
-1.0
-1.0
0.0
-1.0
7.0
-8.0
-1.0
3.0
-4.0
-3.0
2.0
4.0
2.0
1.0
-3.0
1.0
-6.0
1.0
1.0
-3.0
0.0
0.0
8.0
0.0
2.0
-2.0
6.0
1.0
5.0
2.0
3.0
0.0
-4.0
-2.0
-13.0
4.0
6.0
2.0
-1.0
-1.0
1.0
-2.0
-4.0
0.0
-3.0
0.0
-9.0
-2.0
1.0
0.0
3.0
-2.0
2.0
1.0
-5.0
1.0
3.0
-8.0
1.0
-2.0
3.0
3.0
-2.0
-3.0
0.0
-2.0
0.0
-2.0
-1.0
-1.0
0.0
5.0
2.0
4.0
3.0
1.0
5.0
1.0
1.0
-2.0
3.0
-3.0
-2.0
1.0
-1.0
-2.0
2.0
1.0
5.0
-5.0
1.0
-2.0
-1.0
0.0
4.0
4.0
-1.0
0.0
-2.0
-2.0
-5.0
2.0
6.0
0.0
-2.0
2.0
5.0
1.0
2.0
-2.0
4.0
1.0
-4.0
6.0
-2.0
2.0
-3.0
3.0
2.0
4.0
6.0
4.0
1.0
1.0
0.0
8.0
-3.0
0.0
-7.0
-3.0
-1.0
1.0
4.0
-2.0
-1.0
2.0
0.0





## Evaluation Code

How we evaluate the classifier:

In [None]:
def calculate_accuracy(x_data: list[str], y_data: list[int]) -> float:
    total_number = 0
    correct_number = 0
    for x, y in zip(x_data, y_data):
        y_pred = run_classifier(extract_features(x))
        total_number += 1
        if y == y_pred:
            correct_number += 1
    return correct_number / float(total_number)

In [None]:
label_count = {}
for y in y_dev:
    if y not in label_count:
        label_count[y] = 0
    label_count[y] += 1
print(label_count)

{1: 444, 0: 229, -1: 428}


In [16]:
train_accuracy = calculate_accuracy(x_train, y_train)
test_accuracy = calculate_accuracy(x_dev, y_dev)
print(f'Train accuracy: {train_accuracy}')
print(f'Dev/test accuracy: {test_accuracy}')

Train accuracy: 0.7868679775280899
Dev/test accuracy: 0.5803814713896458


## Error Analysis

An important part of improving any system is figuring out where it goes wrong. The following two functions allow you to randomly observe some mistaken examples, which may help you improve the classifier. Feel free to write more sophisticated methods for error analysis as well.

In [17]:
def get_feature_contributions(features):
    output = {}
    for feat_name, feat_value in features.items():
        output[feat_name] = feat_value * feature_weights.get(feat_name, 0)
    return output

def find_errors(x_data, y_data):
    error_ids = []
    y_preds = []
    id2contributions = {}
    for i, (x, y) in enumerate(zip(x_data, y_data)):
        features = extract_features(x)
        y_preds.append(run_classifier(features))
        if y != y_preds[-1]:
            error_ids.append(i)
            id2contributions[i] = get_feature_contributions(features)
    for _ in range(5):
        my_id = random.choice(error_ids)
        x, y, y_pred = x_data[my_id], y_data[my_id], y_preds[my_id]

        print(f'{x}\ntrue label: {y}\npredicted label: {y_pred}')
        contributions = sorted(id2contributions[my_id].items(), key=lambda x: -x[1])
        for feat_name, contribution in contributions:
            print(f'Feature: {feat_name} ({contribution})')
        
        print()

In [18]:
find_errors(x_dev, y_dev)

Moody , heartbreaking , and filmed in a natural , unforced style that makes its characters seem entirely convincing even when its script is not .
true label: 1
predicted label: -1
Feature: , (6.0)
Feature: heartbreaking (5.0)
Feature: unforced (5.0)
Feature: style (2.0)
Feature: filmed (1.0)
Feature: convincing (1.0)
Feature: Moody (0.0)
Feature: and (0.0)
Feature: a (0.0)
Feature: makes (0.0)
Feature: its (0.0)
Feature: when (0.0)
Feature: in (-1.0)
Feature: that (-1.0)
Feature: is (-1.0)
Feature: not (-1.0)
Feature: . (-1.0)
Feature: entirely (-2.0)
Feature: script (-2.0)
Feature: natural (-3.0)
Feature: seem (-3.0)
Feature: characters (-4.0)
Feature: even (-4.0)

The tale of Tok -LRB- Andy Lau -RRB- , a sleek sociopath on the trail of O -LRB- Takashi Sorimachi -RRB- , the most legendary of Asian hitmen , is too scattershot to take hold .
true label: -1
predicted label: 1
Feature: , (6.0)
Feature: hold (5.0)
Feature: tale (3.0)
Feature: Asian (3.0)
Feature: most (2.0)
Feature: legend

## Visualize feature weights

We can inspect the weights that were learned for various features. Below we show the largest, smallest, and randomly selected feature weights. Inspecting them may give insight into the learned classifier.

In [19]:
import random

k = 25
topk_features = sorted(feature_weights.items(), key=lambda x: -x[1])[:k]
bottomk_features = sorted(feature_weights.items(), key=lambda x: x[1])[:k]
randomk_features = random.sample(list(feature_weights.items()), k)

print("Top-k")
for feature in topk_features:
    print(feature)

print("\nBottom-k")
for feature in bottomk_features:
    print(feature)

print("\nRandom k")
for feature in randomk_features:
    print(feature)

Top-k
('remarkable', 15.0)
('solid', 14.0)
('treat', 13.0)
('powerful', 13.0)
('charming', 12.0)
('enjoyable', 12.0)
('eyes', 12.0)
('sharp', 12.0)
('appealing', 12.0)
('half-bad', 12.0)
('rare', 12.0)
('ability', 12.0)
('human', 11.0)
('summer', 11.0)
('delightful', 11.0)
('terrific', 11.0)
('wonderful', 11.0)
('follow', 11.0)
('unique', 11.0)
('definitely', 11.0)
('hilarious', 11.0)
('buffs', 11.0)
('refreshing', 11.0)
('works', 10.0)
('beautifully', 10.0)

Bottom-k
('stupid', -17.0)
('none', -15.0)
('repetitive', -15.0)
('Lawrence', -14.0)
('worst', -13.0)
('were', -13.0)
('failure', -13.0)
('TV', -13.0)
('lacking', -13.0)
('thinks', -13.0)
('dull', -12.0)
('bore', -12.0)
('depressing', -12.0)
('instead', -12.0)
('uneven', -12.0)
('flat', -12.0)
('suffers', -12.0)
('terrible', -12.0)
('scene', -12.0)
('bland', -11.0)
('pretentious', -11.0)
('mess', -11.0)
('Sheridan', -11.0)
('wannabe', -11.0)
('horrible', -11.0)

Random k
('Chamber', 3.0)
('99-minute', -2.0)
('Baker', 3.0)
('Dark',