# Acknowledge
I translated the amazing notebook from @Chris Deotte that you can check in here ([RAPIDS SVR](https://www.kaggle.com/code/cdeotte/rapids-svr-cv-0-450-lb-0-44x)) to learn and understand better how this works. If it helps, good!

# Load Libraries and Data

In [1]:
import numpy as np 
import pandas as pd 
import os, gc, re, warnings
warnings.filterwarnings("ignore")

In [2]:
dftr_pro = pd.read_csv("/content/commonlit-evaluate-student-summaries/prompts_train.csv")
dftr_sum = pd.read_csv("/content/commonlit-evaluate-student-summaries/summaries_train.csv")
dftr = dftr_pro.merge(dftr_sum , on = "prompt_id")
dftr.drop(["prompt_id" , "student_id"] , axis = 1 , inplace = True)
dftr["src"]="train"
dftr.head()

Unnamed: 0,prompt_question,prompt_title,prompt_text,text,content,wording,src
0,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,1 element of an ideal tragedy is that it shoul...,-0.210614,-0.471415,train
1,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,The three elements of an ideal tragedy are: H...,-0.970237,-0.417058,train
2,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,Aristotle states that an ideal tragedy should ...,-0.387791,-0.584181,train
3,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,One element of an Ideal tragedy is having a co...,0.088882,-0.59471,train
4,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,The 3 ideal of tragedy is how complex you need...,-0.687288,-0.460886,train


In [3]:
dfte_pro = pd.read_csv("/content/commonlit-evaluate-student-summaries/prompts_test.csv")
dfte_sum = pd.read_csv("/content/commonlit-evaluate-student-summaries/summaries_test.csv")
dfte = dfte_pro.merge(dfte_sum , on = "prompt_id")
dfte["src"]="test" 
dfte.head()

Unnamed: 0,prompt_id,prompt_question,prompt_title,prompt_text,student_id,text,src
0,abc123,Summarize...,Example Title 1,Heading\nText...,000000ffffff,Example text 1,test
1,abc123,Summarize...,Example Title 1,Heading\nText...,222222cccccc,Example text 3,test
2,def789,Summarize...,Example Title 2,Heading\nText...,111111eeeeee,Example text 2,test
3,def789,Summarize...,Example Title 2,Heading\nText...,333333dddddd,Example text 4,test


In [4]:
target_cols = ['content', 'wording']

In [5]:
import sys
sys.path.append("../input/iterativestratification")
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
FOLDS = 20
skf = MultilabelStratifiedKFold(n_splits=FOLDS, shuffle=True, random_state=42)
for i,(train_index, val_index) in enumerate(skf.split(dftr,dftr[target_cols])):
    dftr.loc[val_index,'FOLD'] = i
print('Train samples per fold:')
dftr.FOLD.value_counts()

Train samples per fold:


13.0    359
15.0    359
19.0    359
14.0    359
3.0     359
2.0     358
1.0     358
16.0    358
17.0    358
12.0    358
18.0    358
6.0     358
11.0    358
7.0     358
10.0    358
0.0     358
8.0     358
9.0     358
5.0     358
4.0     358
Name: FOLD, dtype: int64

# Generate Embeddings

In [6]:
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
from tqdm import tqdm

In [7]:
def mean_pooling(model_output, attention_mask):
    # Create the token embeddings
    token_embeddings = model_output.last_hidden_state.detach().cpu()
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    )
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )

In [8]:
BATCH_SIZE = 4

In [9]:
# Create a class for the embedded dataset

class EmbedDataset(torch.utils.data.Dataset):
    def __init__(self,df):
        self.df = df.reset_index(drop=True)
    def __len__(self):
        return len(self.df)
    def __getitem__(self,idx):
        text = self.df.loc[idx, "text"] # self.df.loc[idx, "prompt_question"] + self.df.loc[idx, "prompt_title"] + self.df.loc[idx, "prompt_text"] +  self.df.loc[idx,"full_text"]
        tokens = tokenizer(
                text,
                None,
                add_special_tokens=True,
                padding='max_length',
                truncation=True,
                max_length=MAX_LEN,return_tensors="pt")
        tokens = {k:v.squeeze(0) for k,v in tokens.items()}
        return tokens

ds_tr = EmbedDataset(dftr)
embed_dataloader_tr = torch.utils.data.DataLoader(ds_tr,\
                        batch_size=BATCH_SIZE,\
                        shuffle=False)
ds_te = EmbedDataset(dfte)
embed_dataloader_te = torch.utils.data.DataLoader(ds_te,\
                        batch_size=BATCH_SIZE,\
                        shuffle=False)

In [10]:
tokenizer = None
MAX_LEN = 512

def get_embeddings(MODEL_NM='', MAX=640, BATCH_SIZE=4, verbose=True, ex_verbose=False):
    global tokenizer, MAX_LEN
    DEVICE="cuda"
    model = AutoModel.from_pretrained( MODEL_NM )
    tokenizer = AutoTokenizer.from_pretrained( MODEL_NM )
    MAX_LEN = MAX
    
    model = model.to(DEVICE)
    model.eval()
    all_train_text_feats = []
    for batch in tqdm(embed_dataloader_tr,total=len(embed_dataloader_tr)):
        input_ids = batch["input_ids"].to(DEVICE)
        attention_mask = batch["attention_mask"].to(DEVICE)
        with torch.no_grad():
            model_output = model(input_ids=input_ids,attention_mask=attention_mask)
        sentence_embeddings = mean_pooling(model_output, attention_mask.detach().cpu())
        # Normalize the embeddings
        sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
        sentence_embeddings = sentence_embeddings.squeeze(0).detach().cpu().numpy()
        if ex_verbose:
            print(sentence_embeddings.shape)
        if len(sentence_embeddings.shape) == 1: # janky workaround
            continue
        all_train_text_feats.extend(sentence_embeddings)
    
    all_train_text_feats = np.array(all_train_text_feats)
        
    if verbose:
        print('Train embeddings shape', all_train_text_feats.shape)
      
    te_text_feats = []
    for batch in tqdm(embed_dataloader_te,total=len(embed_dataloader_te)):
        input_ids = batch["input_ids"].to(DEVICE)
        attention_mask = batch["attention_mask"].to(DEVICE)
        with torch.no_grad():
            model_output = model(input_ids=input_ids,attention_mask=attention_mask)
        sentence_embeddings = mean_pooling(model_output, attention_mask.detach().cpu())
        # Normalize the embeddings
        sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
        sentence_embeddings = sentence_embeddings.squeeze(0).detach().cpu().numpy()
        te_text_feats.extend(sentence_embeddings)
    te_text_feats = np.array(te_text_feats)
    if verbose:
        print('Test embeddings shape',te_text_feats.shape)
      
    return all_train_text_feats, te_text_feats

In [11]:
MODEL_NM = '../input/huggingface-deberta-variants/deberta-base/deberta-base'
all_train_text_feats, te_text_feats = get_embeddings(MODEL_NM, MAX=512)

Some weights of the model checkpoint at ../input/huggingface-deberta-variants/deberta-base/deberta-base were not used when initializing DebertaModel: ['config', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1792/1792 [03:49<00:00,  7.82it/s]


Train embeddings shape (7164, 768)


100%|██████████| 1/1 [00:00<00:00,  7.80it/s]

Test embeddings shape (4, 768)





In [12]:
MODEL_NM = '../input/deberta-v3-large/deberta-v3-large'
all_train_text_feats2, te_text_feats2 = get_embeddings(MODEL_NM, MAX=512)

Some weights of the model checkpoint at ../input/deberta-v3-large/deberta-v3-large were not used when initializing DebertaV2Model: ['mask_predictions.classifier.weight', 'lm_predictions.lm_head.dense.bias', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'mask_predictions.classifier.bias', 'mask_predictions.LayerNorm.weight', 'mask_predictions.dense.bias', 'mask_predictions.dense.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Spec

Train embeddings shape (7164, 1024)


100%|██████████| 1/1 [00:00<00:00,  2.76it/s]

Test embeddings shape (4, 1024)





In [13]:
MODEL_NM = '../input/huggingface-deberta-variants/deberta-large/deberta-large'
all_train_text_feats3, te_text_feats3 = get_embeddings(MODEL_NM, MAX=512)

Some weights of the model checkpoint at ../input/huggingface-deberta-variants/deberta-large/deberta-large were not used when initializing DebertaModel: ['config', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1792/1792 [11:20<00:00,  2.63it/s]


Train embeddings shape (7164, 1024)


100%|██████████| 1/1 [00:00<00:00,  2.64it/s]

Test embeddings shape (4, 1024)





In [14]:
MODEL_NM = '../input/huggingface-deberta-variants/deberta-large-mnli/deberta-large-mnli'
all_train_text_feats4, te_text_feats4 = get_embeddings(MODEL_NM, MAX=512)

Some weights of the model checkpoint at ../input/huggingface-deberta-variants/deberta-large-mnli/deberta-large-mnli were not used when initializing DebertaModel: ['pooler.dense.bias', 'pooler.dense.weight', 'config', 'classifier.weight', 'classifier.bias']
- This IS expected if you are initializing DebertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1792/1792 [11:21<00:00,  2.63it/s]


Train embeddings shape (7164, 1024)


100%|██████████| 1/1 [00:00<00:00,  2.64it/s]

Test embeddings shape (4, 1024)





In [15]:
MODEL_NM = '../input/huggingface-deberta-variants/deberta-xlarge/deberta-xlarge'
all_train_text_feats5, te_text_feats5 = get_embeddings(MODEL_NM, MAX=512)

Some weights of the model checkpoint at ../input/huggingface-deberta-variants/deberta-xlarge/deberta-xlarge were not used when initializing DebertaModel: ['lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1792/1792 [22:23<00:00,  1.33it/s]


Train embeddings shape (7164, 1024)


100%|██████████| 1/1 [00:00<00:00,  1.34it/s]

Test embeddings shape (4, 1024)





In [16]:
all_train_text_feats = np.concatenate([all_train_text_feats, all_train_text_feats2,
                                       all_train_text_feats3, all_train_text_feats4,
                                       all_train_text_feats5], axis=1)
te_text_feats = np.concatenate([te_text_feats, te_text_feats2,
                                te_text_feats3, te_text_feats4,
                               te_text_feats5], axis=1)
# delete other embeddings if used
del all_train_text_feats2, te_text_feats2
del all_train_text_feats3, te_text_feats3
del all_train_text_feats4, te_text_feats4
del all_train_text_feats5, te_text_feats5
gc.collect()
print('Our concatenated embeddings have shape', all_train_text_feats.shape )

Our concatenated embeddings have shape (7164, 4864)


# Train SVR

In [17]:
from cuml.svm import SVR
#from sklearn.svm import SVR
#import cuml
import cudf
import cupy as cp
#print("Checking for cuml", cuml.__version__)

In [18]:
# The metric
from sklearn.metrics import mean_squared_error

preds = []
scores = []

def comp_score(y_true, y_pred):
    rmse_scores = []
    for i in range(len(target_cols)):
        rmse_scores.append(np.sqrt(mean_squared_error(y_true[:,i], y_pred[:,i])))
    return np.mean(rmse_scores)

In [19]:
test_preds = np.zeros((len(te_text_feats),2))

In [20]:
dftr.drop(dftr.index[-1], inplace=True)

In [21]:
for fold in tqdm(range(FOLDS),total=FOLDS):
#for fold in range(FOLDS):
#    print('#'*25)
#    print('### Fold',fold+1)
#    print('#'*25)
    
    dftr_ = dftr[dftr["FOLD"]!=fold]
    dfev_ = dftr[dftr["FOLD"]==fold]
    
    tr_text_feats = all_train_text_feats[list(dftr_.index),:]
    ev_text_feats = all_train_text_feats[list(dfev_.index),:]
    
    ev_preds = np.zeros((len(ev_text_feats),2))
    test_preds = np.zeros((len(te_text_feats),2))
    for i,t in enumerate(target_cols):
        print(t,', ',end='')
        clf = SVR(C=1)
        clf.fit(tr_text_feats, dftr_[t].values)
        ev_preds[:,i] = clf.predict(ev_text_feats)
        test_preds[:,i] = clf.predict(te_text_feats)
    print()
    score = comp_score(dfev_[target_cols].values,ev_preds)
    scores.append(score)
    print("Fold : {} RSME score: {}".format(fold,score))
    preds.append(test_preds)
    
#print('#'*25)
print('Overall CV RSME =',np.mean(scores))

  0%|          | 0/20 [00:00<?, ?it/s]

content , wording , 

  5%|▌         | 1/20 [00:07<02:21,  7.44s/it]


Fold : 0 RSME score: 0.4857991696412925
content , wording , 

 10%|█         | 2/20 [00:08<01:01,  3.43s/it]


Fold : 1 RSME score: 0.5135170700748948
content , wording , 

 15%|█▌        | 3/20 [00:08<00:36,  2.14s/it]


Fold : 2 RSME score: 0.5141208581441621
content , wording , 

 20%|██        | 4/20 [00:09<00:24,  1.54s/it]


Fold : 3 RSME score: 0.512510632118025
content , wording , 

 25%|██▌       | 5/20 [00:09<00:18,  1.22s/it]


Fold : 4 RSME score: 0.4995725260801155
content , wording , 

 30%|███       | 6/20 [00:10<00:14,  1.01s/it]


Fold : 5 RSME score: 0.5275545503212785
content , wording , 

 35%|███▌      | 7/20 [00:11<00:11,  1.14it/s]


Fold : 6 RSME score: 0.5014564481671738
content , wording , 

 40%|████      | 8/20 [00:11<00:09,  1.25it/s]


Fold : 7 RSME score: 0.5007608545850175
content , wording , 

 45%|████▌     | 9/20 [00:12<00:08,  1.35it/s]


Fold : 8 RSME score: 0.5381253705089003
content , wording , 

 50%|█████     | 10/20 [00:13<00:07,  1.42it/s]


Fold : 9 RSME score: 0.49038674922258235
content , wording , 

 55%|█████▌    | 11/20 [00:13<00:06,  1.47it/s]


Fold : 10 RSME score: 0.5089160499475687
content , wording , 

 60%|██████    | 12/20 [00:14<00:05,  1.51it/s]


Fold : 11 RSME score: 0.4351359432763593
content , wording , 

 65%|██████▌   | 13/20 [00:14<00:04,  1.55it/s]


Fold : 12 RSME score: 0.5196545490533362
content , wording , 

 70%|███████   | 14/20 [00:15<00:03,  1.55it/s]


Fold : 13 RSME score: 0.4891641207578526
content , wording , 

 75%|███████▌  | 15/20 [00:16<00:03,  1.56it/s]


Fold : 14 RSME score: 0.48870907099812777
content , wording , 

 80%|████████  | 16/20 [00:16<00:02,  1.57it/s]


Fold : 15 RSME score: 0.49096830423602417
content , wording , 

 85%|████████▌ | 17/20 [00:17<00:01,  1.56it/s]


Fold : 16 RSME score: 0.47861781393627245
content , wording , 

 90%|█████████ | 18/20 [00:18<00:01,  1.59it/s]


Fold : 17 RSME score: 0.5030517609130829
content , wording , 

 95%|█████████▌| 19/20 [00:18<00:00,  1.60it/s]


Fold : 18 RSME score: 0.4942121705438145
content , wording , 

100%|██████████| 20/20 [00:19<00:00,  1.04it/s]


Fold : 19 RSME score: 0.5514226348349174
Overall CV RSME = 0.50218283236804





# prediction

In [22]:
sub = dfte.copy()
sub.loc[:,target_cols] = np.array(test_preds) #,weights=[1/s for s in scores]
sub_columns = pd.read_csv("/content/commonlit-evaluate-student-summaries/sample_submission.csv").columns
sub = sub[sub_columns]