#  Exercise 3: Optimizing RPO Training with α Parameter Tuning 🎛️
## 📘 Prerequisites
* Exercise 1: Smart dataset sampling for quality
* Exercise 2: Resource-efficient training strategies
## 🎯 The Challenge
The RPO (Regularized Preference Optimization) α parameter critically affects:

* Training stability
* Preference versus instructFT learning strength
* Base model knowledge preservation and task alignment

### Git Clone

In [1]:
! git clone https://github.com/thomsonreuters/labs_AMLD25_Workshop

Cloning into 'labs_AMLD25_Workshop'...
remote: Enumerating objects: 360, done.[K
remote: Counting objects: 100% (42/42), done.[K
remote: Compressing objects: 100% (34/34), done.[K
remote: Total 360 (delta 15), reused 22 (delta 7), pack-reused 318 (from 1)[K
Receiving objects: 100% (360/360), 4.56 MiB | 16.21 MiB/s, done.
Resolving deltas: 100% (207/207), done.


### Install dependencies

In [2]:
! pip install -r /kaggle/working/labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO/requirements.txt
! pip install flash-attn==2.7.3 --no-build-isolation

Collecting accelerate==1.3.0 (from -r /kaggle/working/labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO/requirements.txt (line 1))
  Downloading accelerate-1.3.0-py3-none-any.whl.metadata (19 kB)
Collecting bitsandbytes==0.45.1 (from -r /kaggle/working/labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO/requirements.txt (line 2))
  Downloading bitsandbytes-0.45.1-py3-none-manylinux_2_24_x86_64.whl.metadata (5.8 kB)
Collecting transformers==4.48.1 (from -r /kaggle/working/labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO/requirements.txt (line 5))
  Downloading transformers-4.48.1-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl==0.13.0 (from -r /kaggle/working/labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO/requirements.txt (line 6))
  Downloading trl-0.13.0-py3-none-any.whl.metadata (11 kB)
Downloading accelerate-1.3.0-py3-none-any.whl (336 kB)
[2K   

### Testing GPU
Please check if python recognize that you have GPU allocated, if not please go in `Settings`>`Accelerator`>`GPU T4 x 2` 

In [3]:
import os, sys

# from tensorflow.python.client import device_lib
repo_folder = os.getcwd().split('labs_AMLD25_Workshop')[0]+"/labs_AMLD25_Workshop/src" 
sys.path.append(repo_folder)

# UNCOMMENT TO CHECK GPU HW
# device_lib.list_local_devices()

if you get two GPUs you can manually assign them using env variables. This step is optional since they should be automatically recognized by pytorch 

In [4]:
os.environ["WANDB_DISABLED"] = "true" ## turning off WandB logging
os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"

rl_foolder = "labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO"

In [5]:
import torch

from typing import Optional, List, Dict
import datasets
from datasets import (
    load_dataset, 
    load_from_disk, 
    DatasetDict,
    concatenate_datasets
)

from accelerate import Accelerator, PartialState
from transformers import AutoModelForCausalLM, AutoTokenizer

from trl import (
    ModelConfig,
    DPOTrainer,
    DPOConfig,
    TrlParser,
    get_kbit_device_map,
    get_peft_config,
    get_quantization_config,
)

from trlabs.rl.data import (
    get_datasets, 
    DataArguments
)

from trlabs.utils import *

from trl.trainer.utils import SIMPLE_CHAT_TEMPLATE


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


### Model Config

In [6]:
model_config = {
    "model_name_or_path": "Qwen/Qwen2-0.5B-Instruct",
    "torch_dtype": "bfloat16",
    "use_peft": True, 
    "lora_r": 64,        
    "lora_alpha": 32,    # Stronger updates
    "lora_dropout": 0.1, # Prevent overfitting
}

### Data Config

In [7]:
data_params = {
  "dataset_name": "Mix 1",
  "dataset_mixer": {
    # For time constraints, use only our preference collection 
    # to see the effect of the RPO objective and its HP
    # "trl-lib/ultrafeedback_binarized": 0.02, 
    f"{rl_foolder}/data/AMLD25_reuters_gentitle_1k": 1.,
  },
  "dataset_splits": ["train", "test"],
  "num_eval_samples": 100,
  "seed": 42
}

### Training Config

In [8]:
training_params =  {
    ## RPO loss active 
    ## alpha is the multiplier of NLL loss
    "rpo_alpha": .5,
    ## General
    "output_dir": f"{model_config['model_name_or_path'].split('/')[0].lower()}_ex3_output",
    "num_train_epochs": 1,
    "beta": 0.1,
    "eval_strategy": "steps",
    "eval_steps": 8,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 8,
    #@ context length and max length (max_new_token = max_length - max_prompt_length)
    "max_length": 768,
    "max_prompt_length":512,
    ## Optimizer
    "optim": "adamw_torch",
    "learning_rate": 2.0e-4,
    "weight_decay": 0.001,
    "adam_epsilon": 1.0e-8,
    "adam_beta1": 0.9,
    "adam_beta2": 0.999,
    "max_grad_norm": 1.0,
    ## Scheduler ##
    "warmup_steps": 10,
    "lr_scheduler_type": "cosine",
    ## Logging
    "log_level": "info",
    "logging_first_step": True,
    "logging_steps": 10
}

### RPO Training Loop

In [9]:
from trlabs.rl.train import dpo

dpo(data_params, training_params, model_config)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

Extracting prompt from train dataset:   0%|          | 0/987 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/987 [00:00<?, ? examples/s]

Extracting prompt from eval dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/987 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward. If rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 987
  Num Epochs = 1
  Instantaneous batch size per device = 1
  Training with DataParallel so batch size has been adjusted to: 2
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 8
  Total optimization steps = 61
  Number of trainable parameters = 4,325,376


Step,Training Loss,Validation Loss,Runtime,Samples Per Second,Steps Per Second,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/chosen,Logps/rejected,Logits/chosen,Logits/rejected,Nll Loss
8,17.8125,1.799219,50.5746,1.977,0.989,1.40625,1.34375,0.57,0.066406,-42.25,-40.75,-3.078125,-3.078125,2.25
16,16.4149,1.521406,52.1463,1.918,0.959,2.453125,2.359375,0.56,0.099121,-31.75,-30.625,-3.125,-3.125,1.6875
24,12.3883,1.450234,52.7255,1.897,0.948,2.609375,2.40625,0.62,0.192383,-30.25,-30.125,-3.09375,-3.09375,1.609375
32,11.6555,1.3975,52.8289,1.893,0.946,2.71875,2.453125,0.64,0.265625,-29.125,-29.625,-3.046875,-3.046875,1.546875
40,11.4125,1.374141,52.8984,1.89,0.945,2.75,2.4375,0.66,0.314453,-28.75,-29.75,-3.0,-3.0,1.53125
48,11.4125,1.358047,53.0304,1.886,0.943,2.78125,2.453125,0.66,0.337891,-28.375,-29.625,-2.96875,-2.984375,1.507812
56,11.2234,1.355469,52.836,1.893,0.946,2.796875,2.453125,0.66,0.34375,-28.375,-29.75,-2.96875,-2.96875,1.507812


The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward. If rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 100
  Batch size = 2
The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward. If rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 100
  Batch size = 2
The following columns in the evaluation set don't have a corresp

Saving model checkpoint to qwen_ex3_output
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/config.json
Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 24,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}



***** eval metrics *****
  epoch                   =     0.9879
  eval_logits/chosen      =    -2.9688
  eval_logits/rejected    =    -2.9688
  eval_logps/chosen       =     -28.25
  eval_logps/rejected     =    -29.625
  eval_loss               =     1.3527
  eval_nll_loss           =     1.5078
  eval_rewards/accuracies =       0.66
  eval_rewards/chosen     =     2.7969
  eval_rewards/margins    =     0.3477
  eval_rewards/rejected   =     2.4531
  eval_runtime            = 0:00:52.87
  eval_samples_per_second =      1.891
  eval_steps_per_second   =      0.946


tokenizer config file saved in qwen_ex3_output/tokenizer_config.json
Special tokens file saved in qwen_ex3_output/special_tokens_map.json


## Your turn!
Play with rho_alpha to get the best contribution from the two loss terms

## Give a look to the Model Generation

In [10]:
from trlabs.utils import dataset_creation, not_relevant_data

SYSTEM_PROMPT = 'You are an advanced AI system specialised in providing Reuters News title given a body text of the news.'
INSTRUCTION = "The title should be in capital letters and between 6 and 8 words in length. Please provide only the title as output and no other text or explanation."

dataset = load_dataset("ucirvine/reuters21578", 'ModApte', trust_remote_code=True)
dataset = dataset.filter(not_relevant_data).shuffle(seed=42).map(dataset_creation, fn_kwargs={"system_prompt": SYSTEM_PROMPT, "instruction": INSTRUCTION})

README.md:   0%|          | 0.00/16.0k [00:00<?, ?B/s]

reuters21578.py:   0%|          | 0.00/17.9k [00:00<?, ?B/s]

reuters21578.tar.gz:   0%|          | 0.00/8.15M [00:00<?, ?B/s]

Generating test split:   0%|          | 0/3299 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/9603 [00:00<?, ? examples/s]

Generating unused split:   0%|          | 0/722 [00:00<?, ? examples/s]

Filter:   0%|          | 0/3299 [00:00<?, ? examples/s]

Filter:   0%|          | 0/9603 [00:00<?, ? examples/s]

Filter:   0%|          | 0/722 [00:00<?, ? examples/s]

Map:   0%|          | 0/3295 [00:00<?, ? examples/s]

Map:   0%|          | 0/9583 [00:00<?, ? examples/s]

Map:   0%|          | 0/722 [00:00<?, ? examples/s]

In [11]:
from trlabs.rl.eval import setup_model_and_lora, generate

index =10
prompt = dataset["test"][index]["system"]+dataset["test"][index]["messages"]

model, tokenizer = setup_model_and_lora(
    base_model_name = model_config["model_name_or_path"], 
    lora_path = training_params["output_dir"]
)

response = generate(prompt, model, tokenizer)
print(response)

loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/vocab.json
loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/merges.txt
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/tokenizer_config.json
loading file chat_template.jinja from cache at None
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
loading configuration file config.json fro

system
You are an advanced AI system specialised in providing Reuters News title given a body text of the news.
user
Fleet Financial Group said its shareholders approved an increase in shares of the company's authorized common stock to 100,000,000 shares from 75,000,000 shares currently. The company said shareholders approved the move at the annual meeting in Providence today when the company reported that its first quarter earnings rose to 38.5 mln dlrs, or 73 cts a share, from 31.7 mln dlrs, or 60 cts a share, in the first quarter 1986. J. Terence Murray, chairman and president of Fleet Financial, said, "Fleet's mortgage banking activities in particular continued to produce signficant income increases (in the first quarter)." Murray said Fleet's mortgage servicing portfolio reached 22.1 billion dlrs by March 31, including 1.8 billion dlrs purchased in March.

The title should be in capital letters and between 6 and 8 words in length. Please provide only the title as output and no oth

#### Note: 
if you do not provide lora_path you can check the base model output

## Solution

In [12]:
from trlabs.utils import *

dataset = load_from_disk("/kaggle/working/labs_AMLD25_Workshop/sessions/4_RLalignment_and_DPO/data/AMLD25_reuters_gentitle_1k").filter(reuters_cleaning_dataset)
dataset.save_to_disk("AMLD25_reuters_gentitle_0.5k_cleaned")

Filter:   0%|          | 0/987 [00:00<?, ? examples/s]

Filter:   0%|          | 0/496 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/446 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/196 [00:00<?, ? examples/s]

In [13]:
data_params = {
  "dataset_name": "Mix 1",
  "dataset_mixer": {
    f"AMLD25_reuters_gentitle_0.5k_cleaned": 1.,
  },
  "dataset_splits": ["train", "test"],
  "num_eval_samples": 100,
  "seed": 42
}

In [14]:
from trlabs.rl.train import dpo

dpo(data_params, training_params, model_config)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/config.json
Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-0.5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_emb

Extracting prompt from train dataset:   0%|          | 0/446 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/446 [00:00<?, ? examples/s]

Extracting prompt from eval dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/446 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward. If rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 446
  Num Epochs = 1
  Instantaneous batch size per device = 1
  Training with DataParallel so batch size has been adjusted to: 2
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 8
  Total optimization steps = 27
  Number of trainable parameters = 4,325,376


Step,Training Loss,Validation Loss,Runtime,Samples Per Second,Steps Per Second,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/chosen,Logps/rejected,Logits/chosen,Logits/rejected,Nll Loss
8,19.7031,1.843125,46.77,2.138,1.069,1.484375,1.375,0.58,0.10498,-43.0,-45.5,-3.09375,-3.0625,2.375
16,16.5234,1.50375,47.1485,2.121,1.06,2.546875,2.28125,0.62,0.267578,-32.25,-36.5,-3.109375,-3.09375,1.789062
24,12.6719,1.431719,48.0461,2.081,1.041,2.734375,2.375,0.62,0.357422,-30.375,-35.5,-3.078125,-3.046875,1.679688


The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward. If rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 100
  Batch size = 2
The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward. If rejected, prompt, rejected_reward, origin_response_c_r, chosen, date, chosen_reward are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 100
  Batch size = 2
The following columns in the evaluation set don't have a corresp

Saving model checkpoint to qwen_ex3_output
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/config.json
Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 24,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.48.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}



***** eval metrics *****
  epoch                   =     0.9686
  eval_logits/chosen      =    -3.0781
  eval_logits/rejected    =    -3.0469
  eval_logps/chosen       =     -30.25
  eval_logps/rejected     =      -35.5
  eval_loss               =      1.429
  eval_nll_loss           =     1.6719
  eval_rewards/accuracies =       0.63
  eval_rewards/chosen     =       2.75
  eval_rewards/margins    =     0.3652
  eval_rewards/rejected   =      2.375
  eval_runtime            = 0:00:48.22
  eval_samples_per_second =      2.073
  eval_steps_per_second   =      1.037


tokenizer config file saved in qwen_ex3_output/tokenizer_config.json
Special tokens file saved in qwen_ex3_output/special_tokens_map.json


## Give a look to the Model Generation

In [15]:
from trlabs.utils import dataset_creation, not_relevant_data

SYSTEM_PROMPT = 'You are an advanced AI system specialised in providing Reuters News title given a body text of the news.'
INSTRUCTION = "The title should be in capital letters and between 6 and 8 words in length. Please provide only the title as output and no other text or explanation."

dataset = load_dataset("ucirvine/reuters21578", 'ModApte', trust_remote_code=True)
dataset = dataset.filter(not_relevant_data).shuffle(seed=42).map(dataset_creation, fn_kwargs={"system_prompt": SYSTEM_PROMPT, "instruction": INSTRUCTION})

In [16]:
from trlabs.rl.eval import setup_model_and_lora, generate

index =10
prompt = dataset["test"][index]["system"]+dataset["test"][index]["messages"]

model, tokenizer = setup_model_and_lora(
    base_model_name = model_config["model_name_or_path"], 
    lora_path = training_params["output_dir"]
)

response = generate(prompt, model, tokenizer)
print(response)

loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/vocab.json
loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/merges.txt
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/tokenizer_config.json
loading file chat_template.jinja from cache at None
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
loading configuration file config.json fro

system
You are an advanced AI system specialised in providing Reuters News title given a body text of the news.
user
Fleet Financial Group said its shareholders approved an increase in shares of the company's authorized common stock to 100,000,000 shares from 75,000,000 shares currently. The company said shareholders approved the move at the annual meeting in Providence today when the company reported that its first quarter earnings rose to 38.5 mln dlrs, or 73 cts a share, from 31.7 mln dlrs, or 60 cts a share, in the first quarter 1986. J. Terence Murray, chairman and president of Fleet Financial, said, "Fleet's mortgage banking activities in particular continued to produce signficant income increases (in the first quarter)." Murray said Fleet's mortgage servicing portfolio reached 22.1 billion dlrs by March 31, including 1.8 billion dlrs purchased in March.

The title should be in capital letters and between 6 and 8 words in length. Please provide only the title as output and no oth