### Mô tả bài toán
Trong các câu hỏi của phần **Name Entity Recognition** chúng ta được cung cấp một tập dữ liệu nhỏ bao gồm hai chuỗi văn bản và các nhãn tương ứng trong đoạn code Python sau:

![image](https://firebasestorage.googleapis.com/v0/b/aivn-images.appspot.com/o/public%2F2025%2F3%2F2%2F1740888912022-image.png?alt=media&token=a2f9b858-8eac-451e-83ca-c73a55fdd53e)

### NER
Mục tiêu của bài toán này là xây dựng một mô hình Name Entity Recognition, gồm 5 class:  
0: B-Person  
1: I-Person  
2: B-Organization/Location  
3: I--Organization/Location  
4: O  
5: <pad> - padding  
với Baseline cụ thể như hình sau:  

![image](https://firebasestorage.googleapis.com/v0/b/aivn-images.appspot.com/o/public%2F2025%2F3%2F2%2F1740888959365-image.png?alt=media&token=70c3182b-edac-4ec1-a6dc-bb35cc4e7e3f)

Tất cả thông tin đều đã có ở trong phần mô tả, hãy đọc hiểu và trả lời các câu hỏi sau:

In [None]:
!pip install -U torchtext==0.17.0

Collecting torchtext==0.17.0
  Downloading torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl.metadata (7.6 kB)
Collecting torch==2.2.0 (from torchtext==0.17.0)
  Downloading torch-2.2.0-cp311-cp311-manylinux1_x86_64.whl.metadata (25 kB)
Collecting torchdata==0.7.1 (from torchtext==0.17.0)
  Downloading torchdata-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0->torcht

## Data

In [None]:
import torch
import torch.nn as nn
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

corpus = [
    "Satya Nadella is based in Washington",
    "Demis Hassabis works at DeepMind"
]
data_size = len(corpus)

# 0: B-Person - 1: I-Person
# 2: B-Organization/Location - 3: I--Organization/Location
# 4: O
labels = [[0, 1, 4, 4, 4, 2],
          [0, 1, 4, 4, 2]]

# Define the max vocabulary size and sequence length
vocab_size = 12
sequence_length = 6
num_classes = 5 + 1

In [None]:
# Define tokenizer function
tokenizer = get_tokenizer('basic_english')

# Create a function to yield list of tokens
def yield_tokens(examples):
    for text in examples:
        yield tokenizer(text)

# Create vocabulary
vocab = build_vocab_from_iterator(yield_tokens(corpus),
                                  max_tokens=vocab_size,
                                  specials=["<unk>", "<pad>"])
vocab.set_default_index(vocab["<unk>"])
vocab.get_stoi()

{'satya': 10,
 'nadella': 9,
 'is': 8,
 'in': 7,
 'demis': 5,
 'hassabis': 6,
 'deepmind': 4,
 'based': 3,
 'at': 2,
 '<pad>': 1,
 'washington': 11,
 '<unk>': 0}

In [None]:
# Tokenize and numericalize your samples
def vectorize(text, vocab, sequence_length, sequence_label):
    tokens = tokenizer(text)

    token_ids = [vocab[token] for token in tokens][:sequence_length]
    token_ids = token_ids + [vocab["<pad>"]] * (sequence_length - len(tokens))
    sequence_label = sequence_label + [5] * (sequence_length - len(tokens))
    sequence_label = sequence_label[:sequence_length]

    return torch.tensor(token_ids, dtype=torch.long), torch.tensor(sequence_label, dtype=torch.long)

# Vectorize the samples
sentence_vecs = []
label_vecs = []
for sentence, labels in zip(corpus, labels):
    sentence_vec, labels_vec = vectorize(sentence, vocab, sequence_length, labels)
    sentence_vecs.append(sentence_vec)
    label_vecs.append(labels_vec)

In [None]:
for v in sentence_vecs:
    print(v)

tensor([10,  9,  8,  3,  7, 11])
tensor([5, 6, 0, 2, 4, 1])


In [None]:
for v in label_vecs:
    print(v)

tensor([0, 1, 4, 4, 4, 2])
tensor([0, 1, 4, 4, 2, 5])


## Model

In [None]:
class POS_Model(nn.Module):
    def __init__(self, vocab_size, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, 2)
        custom_embedding_weight = torch.tensor([
            [ 0.26, -1.31],
            [ 0.72,  0.43],
            [-0.67,  0.61],
            [ 0.50,  0.50],
            [-0.26, -0.10],
            [ 1.29,  1.25],
            [ 1.95,  1.18],
            [-1.44, -1.89],
            [-0.20,  0.88],
            [-0.39,  1.07],
            [ 0.32, -0.05],
            [ 0.59, -0.98]
        ])
        self.embedding.weight = nn.Parameter(custom_embedding_weight)
        print("Embedding weights:")
        print(self.embedding.weight)


        # Custom RNN layer
        self.recurrent = nn.RNN(2, 3, batch_first=True)
        custom_rnn_weight_ih = torch.tensor([
              [-0.07, -0.31],
              [-0.28, -0.19],
              [-0.23, -0.15]
        ])
        custom_rnn_weight_hh = torch.tensor([
              [ 0.04,  0.37,  0.32],
              [ 0.46,  0.54, -0.54],
              [ 0.25, -0.02,  0.05]
        ])

        custom_rnn_bias_ih = torch.tensor([-0.47, -0.47,  0.50])
        custom_rnn_bias_hh = torch.tensor([ 0.42, -0.50,  0.41])

        self.recurrent.weight_ih_l0 = nn.Parameter(custom_rnn_weight_ih)
        self.recurrent.weight_hh_l0 = nn.Parameter(custom_rnn_weight_hh)
        self.recurrent.bias_ih_l0 = nn.Parameter(custom_rnn_bias_ih)
        self.recurrent.bias_hh_l0 = nn.Parameter(custom_rnn_bias_hh)

        print("RNN weights and biases:")
        print(self.recurrent.weight_ih_l0)
        print(self.recurrent.weight_hh_l0)
        print(self.recurrent.bias_ih_l0)
        print(self.recurrent.bias_hh_l0)

        # Custom fully connected layer
        self.fc = nn.Linear(3, num_classes)
        custom_fc_weight = torch.tensor([
            [ 0.10,  0.53,  0.23],
            [ 0.34,  0.32, -0.36],
            [ 0.24, -0.35,  0.29],
            [-0.28,  0.10, -0.18],
            [ 0.39,  0.15,  0.49],
            [-0.57,  0.35,  0.54]
        ])
        self.fc.weight = nn.Parameter(custom_fc_weight)
        custom_fc_bias = torch.tensor([[-0.13,  0.20,  0.13,  0.42, -0.22,  0.37]])
        self.fc.bias = nn.Parameter(custom_fc_bias)
        print("FC weights:")
        print(self.fc.weight)
        print("FC bias:")
        print(self.fc.bias)

    def forward(self, x):
        print(f"Input shape: {x.shape}")
        x = self.embedding(x)
        print(f"After embedding shape: {x.shape}")
        x, _ = self.recurrent(x)
        print(f"After RNN shape: {x.shape}")
        x = self.fc(x)
        print(f"After FC shape: {x.shape}")
        print(x)

        x = x.permute(0, 2, 1)
        print(f"After permute shape: {x.shape}")
        return x

In [None]:
# create model
model = POS_Model(vocab_size, num_classes)

Embedding weights:
Parameter containing:
tensor([[ 0.2600, -1.3100],
        [ 0.7200,  0.4300],
        [-0.6700,  0.6100],
        [ 0.5000,  0.5000],
        [-0.2600, -0.1000],
        [ 1.2900,  1.2500],
        [ 1.9500,  1.1800],
        [-1.4400, -1.8900],
        [-0.2000,  0.8800],
        [-0.3900,  1.0700],
        [ 0.3200, -0.0500],
        [ 0.5900, -0.9800]], requires_grad=True)
RNN weights and biases:
Parameter containing:
tensor([[-0.0700, -0.3100],
        [-0.2800, -0.1900],
        [-0.2300, -0.1500]], requires_grad=True)
Parameter containing:
tensor([[ 0.0400,  0.3700,  0.3200],
        [ 0.4600,  0.5400, -0.5400],
        [ 0.2500, -0.0200,  0.0500]], requires_grad=True)
Parameter containing:
tensor([-0.4700, -0.4700,  0.5000], requires_grad=True)
Parameter containing:
tensor([ 0.4200, -0.5000,  0.4100], requires_grad=True)
FC weights:
Parameter containing:
tensor([[ 0.1000,  0.5300,  0.2300],
        [ 0.3400,  0.3200, -0.3600],
        [ 0.2400, -0.3500,  0.290

# Test


In [None]:
data = torch.tensor([[10, 9, 8, 3, 7, 11]])
output = model(data)
print(output.shape)

Input shape: torch.Size([1, 6])
After embedding shape: torch.Size([1, 6, 2])
After RNN shape: torch.Size([1, 6, 3])
After FC shape: torch.Size([1, 6, 6])
tensor([[[-0.3919, -0.3171,  0.5895,  0.2339, -0.0224,  0.5002],
         [-0.5143, -0.4956,  0.5719,  0.3103, -0.1750,  0.6450],
         [-0.5387, -0.4904,  0.5579,  0.3242, -0.2123,  0.6228],
         [-0.5538, -0.4547,  0.5486,  0.3258, -0.2327,  0.5626],
         [-0.3275, -0.2517,  0.7864,  0.0566,  0.2581,  0.3234],
         [-0.4223, -0.3168,  0.7369,  0.1264,  0.1089,  0.3569]]],
       grad_fn=<ViewBackward0>)
After permute shape: torch.Size([1, 6, 6])
torch.Size([1, 6, 6])


## M08NER01
### Câu hỏi
Output shape của model RNN là?  
A.
```
(1, 6, 2)
```
B.
```
(1, 4, 2)
```
C.
```
(1, 6, 3)
```
D.
```
(1, 2, 4)
```
### Đáp án:
C (batch_size, seq_len, hidden_state)


## M08NER02
### Câu hỏi
Output shape của FC layer là?  
A.
```
(1, 6, 6)
```
B.
```
(1, 6, 3)
```
C
```
(1, 3, 6)
```
D.
```
(1, 3, 3)
```
### Đáp án:
A (batch_size, seq_len, nums_classes)

In [None]:
a = model(torch.tensor([[10]]))
sum(a[0])

Input shape: torch.Size([1, 1])
After embedding shape: torch.Size([1, 1, 2])
After RNN shape: torch.Size([1, 1, 3])
After FC shape: torch.Size([1, 1, 6])
tensor([[[-0.3919, -0.3171,  0.5895,  0.2339, -0.0224,  0.5002]]],
       grad_fn=<ViewBackward0>)
After permute shape: torch.Size([1, 6, 1])


tensor([0.5922], grad_fn=<AddBackward0>)

## M08NER03
### Câu hỏi
Hãy dùng token đầu tiên của sample 1 tính toán forward và trả về output cuối cùng. Dưới đây, đâu là kết quả tổng tất cả các phần tử trong vector output đó.   
A. 0.5624
B. 0.5922
C. 0.6473
D. 0.6850
### Đáp án:
B