<a href="https://colab.research.google.com/github/yf591/Financial-Market-Analysis-Using-Machine-Learning/blob/main/FinGPT_Training_with_LoRA_and_Meta_Llama_3_8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FinGPT: Training with LoRA and Meta-Llama-3-8B

## Part 0: Preparation

In [None]:
from google.colab import drive, output
drive.mount('/content/drive') #You'll be asked to authorize access to your Google Drive

Mounted at /content/drive


In [None]:
!pip install protobuf
!pip install transformers==4.40.1 # 必要に応じてバージョンを調整
!pip install cpm_kernels
!pip install torch>=2.0 # 必要に応じてCUDAバージョンを指定
!pip install gradio
!pip install mdtex2html
!pip install sentencepiece
!pip install accelerate
!pip install datasets
!pip install bitsandbytes
!pip install loguru
!pip install peft --upgrade  # 最新版をインストール

output.clear()

In [None]:
import os
import sys
import shutil
import logging

import datasets
from datasets import load_dataset
from datasets import load_from_disk

import json
from tqdm.notebook import tqdm
# Used to display a progress bar in Jupyter Notebook to help visualize the progress of data processing

from typing import List, Dict, Optional

import torch
from torch.utils.tensorboard import SummaryWriter
import torch.nn.functional as F

from loguru import logger

from transformers import (
    AutoModel,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoConfig,
    LlamaForCausalLM,
    AutoModelForSequenceClassification
)
from transformers.utils import is_bitsandbytes_available
from transformers.integrations import TensorBoardCallback

from peft import (
    TaskType,
    LoraConfig,
    get_peft_model,
    set_peft_model_state_dict,
    prepare_model_for_kbit_training,
    PeftModel
)
from peft.utils import TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING

from sklearn.metrics import accuracy_score,f1_score
import pandas as pd
import matplotlib.pyplot as plt

## Part 1: Preparing the Data
このパートでは、モデルのトレーニングに使用するデータセットの準備を行います。

### 1.1 Initialize Directories
必要なディレクトリを作成し、既存のファイルやフォルダを削除して、データ保存のための準備を行います。また、Hugging Face Hubにログインするためのトークンを設定します。

In [None]:
# HuggingFaceアカウントと紐付ける
from huggingface_hub import notebook_login
notebook_login() # Hugging Faceにログイン（Access TokensのValueを入力）

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
if not os.path.exists('./data'):
    os.makedirs('./data')


jsonl_path = "../data/dataset_new.jsonl"
save_path = '../data/dataset_new'


if os.path.exists(jsonl_path):
    os.remove(jsonl_path)

if os.path.exists(save_path):
    shutil.rmtree(save_path)

directory = "../data"
if not os.path.exists(directory):
    os.makedirs(directory)

In [None]:
!ls -l ./data/dataset_new

ls: cannot access './data/dataset_new': No such file or directory


### 1.2 Load and Prepare Dataset
eroshot/twitter-financial-news-sentiment データセットをHugging Face Datasetsからダウンロードし、データフレームに変換します。

数値ラベルをテキストの感情ラベル (negative, neutral, positive) に変換し、命令微調整 (Instruction Tuning) のために各データに指示文を追加します。

In [None]:
dic = {
    0:'negative',
    1:'positive',
    2:'neutral'
}

In [None]:
tfns = load_dataset('zeroshot/twitter-financial-news-sentiment') #tfns = Twitter Financial News Sentiment

README.md:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

sent_train.csv:   0%|          | 0.00/859k [00:00<?, ?B/s]

sent_valid.csv:   0%|          | 0.00/217k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9543 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2388 [00:00<?, ? examples/s]

In [None]:
tfns = tfns['train']
tfns = tfns.to_pandas()

tfns['label'] = tfns['label'].apply(lambda x : dic[x])  # Map numerical labels to their corresponding sentiments

#Add instruction for each data entry, which is crucial for Instruction Tuning.
tfns['instruction'] = 'What is the sentiment of this tweet? Please choose an answer from {negative/neutral/positive}.'
tfns.columns = ['input','output','instruction']

#Convert the Pandas dataframe back to a Hugging Face Dataset object.
tfns = datasets.Dataset.from_pandas(tfns)
tfns

Dataset({
    features: ['input', 'output', 'instruction'],
    num_rows: 9543
})

### 1.3 Concatenate and Shuffle Dataset
データセットを2倍に連結してデータ量を増やし、シャッフルすることで学習データの偏りを減らします。

In [None]:
tmp_dataset = datasets.concatenate_datasets([tfns]*2) #Creat a list that contains 2 tfns
train_dataset = tmp_dataset
print(tmp_dataset.num_rows)

19086


In [None]:
all_dataset = train_dataset.shuffle(seed = 42)
all_dataset.shape

(19086, 3)

The training data is all set

## Part 2: Dataset Formatting and Tokenization
このパートでは、モデルが理解できる形式にデータを整形し、トークン化を行います。

### 2.1 Dataset Fromatting
You must structure your data in a specific format that aligns with the training process.

命令微調整に適した形式にデータセットを整形します。具体的には、"Instruction:", "Input:", "Answer:" というプレフィックスを付けて、各サンプルを辞書形式でJSONLファイルに保存します。

In [None]:
def format_examle(example:dict) -> dict:    #Defines a function named format_example that takes a dictionary as input (example: dict) and returns a dictionary (-> dict).
  context = f"Instruction:{example['instruction']}\n"   #Initializes a string variable context using an f-string to format the instruction.
  if example.get('input'):     #Checks if the example dictionary has an input key and whether it contains a value.
    context += f"Input:{example['input']}\n"
  context += 'Answer: '
  target = example['output']
  return {"context": context , "target":target}  # This is the format of json data.



data_list = []
for item in all_dataset.to_pandas().itertuples():    #Iterates over each row of the dataset all_dataset, which has been converted into a Pandas DataFrame using .to_pandas().
  tmp = {}
  tmp['instruction'] = item.instruction
  tmp['input'] = item.input
  tmp['output'] = item.output
  data_list.append(tmp)

This is what the elements in data_list look like before formatting

---

In [None]:
data_list[0]

{'instruction': 'What is the sentiment of this tweet? Please choose an answer from {negative/neutral/positive}.',
 'input': '$DRIP $LABU $GASX - SOXL, LABU, JO and GUSH among weekly ETF movers https://t.co/FntrWNY9sn',
 'output': 'neutral'}

In [None]:
# save to a json file
with open("../data/dataset_new.jsonl",'w') as f:
  for example in tqdm(data_list,desc = 'formatting..'):
    f.write(json.dumps(format_examle(example)) + '\n')

formatting..:   0%|          | 0/19086 [00:00<?, ?it/s]

In [None]:
json_data_list = []  # Var to save json data

# Save to a jsonl file and store in json_data_list
with open("../data/dataset_new.jsonl", 'r') as f:
    for line in f:
        json_line = json.loads(line.strip())
        json_data_list.append(json_line)

This is what it is look like after formatting

In [None]:
json_data_list[0]

{'context': 'Instruction:What is the sentiment of this tweet? Please choose an answer from {negative/neutral/positive}.\nInput:$DRIP $LABU $GASX - SOXL, LABU, JO and GUSH among weekly ETF movers https://t.co/FntrWNY9sn\nAnswer: ',
 'target': 'neutral'}

In [None]:
json_data_list[0]['context']

'Instruction:What is the sentiment of this tweet? Please choose an answer from {negative/neutral/positive}.\nInput:$DRIP $LABU $GASX - SOXL, LABU, JO and GUSH among weekly ETF movers https://t.co/FntrWNY9sn\nAnswer: '

In [None]:
json_data_list[0]['target']

'neutral'

login to HF to use Llama 3

### 2.2 Tokenization
Tokenization is the process of converting input text into tokens that can be fed into the model.

meta-llama/Meta-Llama-3-8B モデルのトークナイザーを使って、テキストデータをモデルが処理できる数値データ (トークンID) に変換します。

In [None]:
model_name = 'meta-llama/Meta-Llama-3-8B'   #Specifies the model you're working with
jsonl_path = '../data/dataset_new.jsonl'
save_path = '../data/dataset_new'    #The path where the processed dataset will be saved after tokenization or any other processing
max_seq_length = 512    #Maximum sequence length for the inputs. If an input exceeds this length, it will either be truncated or skipped.
skip_overlength = True    #A flag that determines whether to skip overlength examples that exceed max_seq_length

This preprocess function tokenizes the promt and target, combines them into Input ids, trims or pads the squence to the maximum squence length.

preprocess 関数では、プロンプトとターゲットをトークン化し、連結した後、最大シーケンス長に合わせてパディングまたは切り詰めを行います。

In [None]:
def preprocess(tokenizer, config, example, max_seq_length):
  prompt = example['context']
  target = example['target']
  prompt_ids = tokenizer.encode(   #ids refers to the numerical identifiers that correspond to tokens.These token ids are what the model processes, as models require numerical input rather than raw text.
      prompt,
      max_length = max_seq_length,
      truncation = True
      )
  target_ids = tokenizer.encode(
      target,
      max_length = max_seq_length,
      truncation = True,
      add_special_tokens = False
      )
  input_ids = prompt_ids + target_ids + [config.eos_token_id]  #[config.eos_token_id] is a sign that marks the end of the list.
  return {'input_ids':input_ids,'seq_len':len(prompt_ids)}

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, device_map='auto')



tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

In [None]:
example = json_data_list[0]
prompt = example['context']
target = example['target']

In [None]:
example['target']

'neutral'

input_ids is a complete list of token IDs that combines the input sentence (prompt), the target sentence (target), and the end-of-sequence token (eos_token_id). This list is fed into the model for training or inference. The model uses these IDs to understand and process the input and generate the corresponding output.

The read_jsonl function reads each line from the JSONL file, preprocesses it using the preprocess function, and then yields each preprocessed example.

In [None]:
def read_jsonl(path, max_seq_length, skip_overlength=False):
    tokenizer = AutoTokenizer.from_pretrained(    #Initializes a tokenizer using a pre-trained model specified by model_name.
        model_name, trust_remote_code=True)
    config = AutoConfig.from_pretrained(    #Loads the configuration for the model. device_map='auto' helps automatically map the model to available devices (e.g., GPU or CPU).
        model_name, trust_remote_code=True, device_map='auto')
    with open(path, "r") as f:
        for line in tqdm(f.readlines()):
            example = json.loads(line)
            #Preprocesses each example by tokenizing it and converting it into input_ids using the preprocess() function,
            #which takes the tokenizer, config, example, and max_seq_length as inputs.
            feature = preprocess(tokenizer, config, example, max_seq_length)
            if skip_overlength and len(feature["input_ids"]) > max_seq_length:
                continue
            feature["input_ids"] = feature["input_ids"][:max_seq_length]  #Truncates the input_ids to ensure they do not exceed max_seq_length.
            yield feature
#Uses yield to return one preprocessed feature at a time, making the function a generator.
#This allows you to iterate over the processed features one by one without loading everything into memory at once.

### 2.3 Save the Dataset
トークン化されたデータセットをHugging Face Datasetsオブジェクトとして保存します。

In [None]:
save_path = './data/dataset_new'

In [None]:
dataset = datasets.Dataset.from_generator(
    lambda: read_jsonl(jsonl_path, max_seq_length, skip_overlength)
    )
dataset.save_to_disk(save_path)

Generating train split: 0 examples [00:00, ? examples/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


  0%|          | 0/19086 [00:00<?, ?it/s]

Saving the dataset (0/1 shards):   0%|          | 0/19086 [00:00<?, ? examples/s]

In [None]:
# Load Dataset
loaded_dataset = load_from_disk('./data/dataset_new')

# Check the structure of Dataset
print(loaded_dataset)

# Print the first sample of the dataset
print(loaded_dataset['input_ids'][0])

Dataset({
    features: ['input_ids', 'seq_len'],
    num_rows: 19086
})
[128000, 17077, 25, 3923, 374, 279, 27065, 315, 420, 12072, 30, 5321, 5268, 459, 4320, 505, 314, 43324, 14, 60668, 14, 31587, 28374, 2566, 22444, 7842, 3378, 400, 20257, 52, 400, 38, 1950, 55, 482, 5745, 37630, 11, 32074, 52, 11, 10458, 323, 480, 20088, 4315, 17496, 54163, 96454, 3788, 1129, 83, 6973, 12598, 77, 376, 54, 23923, 24, 9810, 198, 16533, 25, 220, 60668, 128001]


### 2.4 Save dataset to your own google drive

Colabのランタイムがリセットされてもデータが保持されるように、処理済みのデータセットをGoogle Driveに保存します。

In [None]:
save_path = '/content/drive/MyDrive/Colab Notebooks/【Dev】Financial Market Analysis Using Machine Learning/FinGPT/FinGPT: Training with LoRA and Llama-3/dataset_new' #Change to your own address
# Write your own Google drive saving address in xxxxxxxx part: '/content/drive/MyDrive/xxxxxxxxxxxxxxxxx/dataset_new'
dataset.save_to_disk(save_path)

Saving the dataset (0/1 shards):   0%|          | 0/19086 [00:00<?, ? examples/s]

### 2.5 Load Dataset from google drive
Runs directly from here every time you re-login or reconnect.

保存したデータセットをGoogle Driveから読み込みます。

In [None]:
# Load saved dataset
loaded_dataset = load_from_disk(save_path)

In [None]:
# Check the structure of Dataset
print(loaded_dataset)

# Print the first sample of the dataset
print(loaded_dataset['input_ids'][0])

Dataset({
    features: ['input_ids', 'seq_len'],
    num_rows: 19086
})
[128000, 17077, 25, 3923, 374, 279, 27065, 315, 420, 12072, 30, 5321, 5268, 459, 4320, 505, 314, 43324, 14, 60668, 14, 31587, 28374, 2566, 22444, 7842, 3378, 400, 20257, 52, 400, 38, 1950, 55, 482, 5745, 37630, 11, 32074, 52, 11, 10458, 323, 480, 20088, 4315, 17496, 54163, 96454, 3788, 1129, 83, 6973, 12598, 77, 376, 54, 23923, 24, 9810, 198, 16533, 25, 220, 60668, 128001]


## Part 3: Setup FinGPT training with LoRA and Llama-3

このパートでは、モデルのトレーニングに必要な設定を行います。

### 3.1 Training Arguments Setup:
Initialize and set training arguments.

TrainingArguments を使って、トレーニングに関する様々なパラメータ (エポック数、バッチサイズ、学習率など) を設定します。

In [None]:
training_args = TrainingArguments(
    output_dir='/content/drive/MyDrive/Colab Notebooks/【Dev】Financial Market Analysis Using Machine Learning/FinGPT/FinGPT: Training with LoRA and Llama-3/finetuned_model/',    # Path to save the fine-tuned model
    logging_steps = 500,               # Log every 500 steps
    # max_steps=10000,                 # Maximum number of training steps (commented out, can be enabled)
    num_train_epochs = 2,              # Number of training epochs (train for 2 epochs)
    per_device_train_batch_size=4,     # Batch size of 4 for training on each device (GPU/CPU)
    gradient_accumulation_steps=8,     # Accumulate gradients for 8 steps before updating weights
    learning_rate=1e-4,                # Learning rate set to 1e-4
    weight_decay=0.01,                 # Weight decay (L2 regularization) set to 0.01
    warmup_steps=1000,                 # Warm up the learning rate for the first 1000 steps
    save_steps=500,                    # Save the model every 500 steps
    fp16=True,                         # Enable FP16 mixed precision training to save memory and speed up training
    # bf16=True,                       # Enable BF16 mixed precision training (commented out)
    torch_compile = False,             # Whether to enable Torch compile (`False` means not enabled)
    load_best_model_at_end = True,     # Load the best-performing model at the end of training
    evaluation_strategy="steps",       # Evaluation strategy is set to evaluate every few steps
    remove_unused_columns=False,       # Whether to remove unused columns during training (keep all columns)
)

### 3.2 Quantization Config Setup:
Set quantization configuration to reduce model size without losing significant precision.

BitsAndBytesConfig を使って、4ビット量子化の設定を行います。これにより、GPUメモリ使用量を削減し、大きなモデルを扱えるようにします。

In [None]:
# quantitative allocation
q_config = BitsAndBytesConfig(load_in_4bit=False,
                                bnb_4bit_quant_type='nf4',
                                bnb_4bit_use_double_quant=True,
                                bnb_4bit_compute_dtype=torch.float16
                                )

In [None]:
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

### 3.3 Model Loading & Preparation:
Load the base model and tokenizer, and prepare the model for INT8 training.

Runtime -> Change runtime type -> A100 GPU

Restart runtime and run again if not working

meta-llama/Meta-Llama-3-8B モデルとトークナイザーを読み込み、量子化設定を適用します。

In [None]:
model_name = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
is_bitsandbytes_available()

True

In [None]:
model = LlamaForCausalLM.from_pretrained(
        model_name,
        quantization_config = q_config,
        trust_remote_code=True,
        device_map='auto'
    )

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/177 [00:00<?, ?B/s]

### 3.4 LoRA Config & Setup

LoraConfig を使って、LoRA (Low-Rank Adaptation) の設定を行います。
- LoRAは、モデル全体ではなく、少数の追加パラメータのみを学習することで、効率的にファインチューニングを行う手法です。

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

# LoRA for Llama3
target_modules = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING['llama']  # Modules for the Llama model
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=target_modules,
    bias='none',
)

# Loading LoRA for Llama3 models using PEFT (Parameter-Efficient Fine-Tuning)
model = get_peft_model(model, lora_config)

# Print the number of trainable parameters
print_trainable_parameters(model)

trainable params: 3407872 || all params: 4544008192 || trainable%: 0.07499704789264605


In [None]:
resume_from_checkpoint = None
if resume_from_checkpoint is not None:
    checkpoint_name = os.path.join(resume_from_checkpoint, 'pytorch_model.bin')
    if not os.path.exists(checkpoint_name):
        checkpoint_name = os.path.join(
            resume_from_checkpoint, 'adapter_model.bin'
        )
        resume_from_checkpoint = False
    if os.path.exists(checkpoint_name):
        logger.info(f'Restarting from {checkpoint_name}')
        adapters_weights = torch.load(checkpoint_name)
        set_peft_model_state_dict(model, adapters_weights)
    else:
        logger.info(f'Checkpoint {checkpoint_name} not found')

In [None]:
model.print_trainable_parameters()

trainable params: 3,407,872 || all params: 8,033,669,120 || trainable%: 0.0424


## Part 4: Loading Data and Training FinGPT
In this segment, we'll delve into the loading of your pre-processed data, and finally, launch the training of your FinGPT model. Here's a stepwise breakdown of the script provided:

- Need to purchase Google Colab GPU plans, Colab Pro is sufficient or just buy 100 compute units for $10

このパートでは、準備したデータを使ってモデルのトレーニングを実行します。

### 4.1 Loading Your Data:

Part 2で保存したデータセットを読み込み、トレーニングセットとテストセットに分割します。

In [None]:
save_path = '/content/drive/MyDrive/Colab Notebooks/【Dev】Financial Market Analysis Using Machine Learning/FinGPT/FinGPT: Training with LoRA and Llama-3/dataset_new' # Load saved dataset
dataset = load_from_disk(save_path)
dataset = dataset.train_test_split(0.2, shuffle=True, seed = 42)

###4.2 Training Configuration and Launch:
- Customize the Trainer class for specific loss computation, prediction step, and model-saving methods.
- Define a data collator function to process batches of data during training.
- Set up TensorBoard for logging, instantiate your modified trainer, and begin training.

data_collator 関数でバッチデータのパディング処理を定義し、Trainer を使ってトレーニングを実行します。TensorBoardを使ってトレーニング過程を可視化します。トレーニング後、ファインチューニングされたモデルをGoogle Driveに保存します。

In [None]:
class ModifiedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            labels=inputs["labels"],
        ).loss

    def prediction_step(self, model: torch.nn.Module, inputs, prediction_loss_only: bool, ignore_keys = None):
        with torch.no_grad():
            res = model(
                input_ids=inputs["input_ids"].to(model.device),
                labels=inputs["labels"].to(model.device),
            ).loss
        return (res, None, None)

    def save_model(self, output_dir=None, _internal_call=False):
        from transformers.trainer import TRAINING_ARGS_NAME

        os.makedirs(output_dir, exist_ok=True)
        torch.save(self.args, os.path.join(output_dir, TRAINING_ARGS_NAME))
        saved_params = {
            k: v.to("cpu") for k, v in self.model.named_parameters() if v.requires_grad
        }
        torch.save(saved_params, os.path.join(output_dir, "adapter_model.bin"))


def data_collator(features: list) -> dict:
    # Check if pad_token_id is None, if it is then use eos_token_id as the padding value
    if tokenizer.pad_token_id is None:
        pad_token_id = tokenizer.eos_token_id  # Use eos_token_id as a fill symbol
    else:
        pad_token_id = tokenizer.pad_token_id

    len_ids = [len(feature["input_ids"]) for feature in features]
    longest = max(len_ids)

    input_ids = []
    labels_list = []

    for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]):
        ids = feature["input_ids"]
        seq_len = feature["seq_len"]

        # Padding with calculated pad_token_id
        labels = (
            [pad_token_id] * (seq_len - 1) + ids[(seq_len - 1) :] + [pad_token_id] * (longest - ids_l)
        )
        ids = ids + [pad_token_id] * (longest - ids_l)

        _ids = torch.LongTensor(ids)
        labels_list.append(torch.LongTensor(labels))
        input_ids.append(_ids)

    input_ids = torch.stack(input_ids)
    labels = torch.stack(labels_list)

    return {
        "input_ids": input_ids,
        "labels": labels,
    }

In [None]:
output_dir = '/content/drive/MyDrive/Colab Notebooks/【Dev】Financial Market Analysis Using Machine Learning/FinGPT/FinGPT: Training with LoRA and Llama-3/Model/' # Use your own address
os.makedirs(output_dir, exist_ok=True)

In [None]:
# Train
# Took about 10 compute units
writer = SummaryWriter()
trainer = ModifiedTrainer(
    model=model,
    args=training_args,             # Trainer args
    train_dataset=dataset["train"], # Training set
    eval_dataset=dataset["test"],   # Testing set
    data_collator=data_collator,    # Data Collator
    callbacks=[TensorBoardCallback(writer)],
)
trainer.train()
writer.close()


# Save model to Google Drive
model.save_pretrained(output_dir)

You are adding a <class 'transformers.integrations.integration_utils.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
WandbCallback


Step,Training Loss,Validation Loss


## Part 5: Inference and Benchmarks using FinGPT
Now that your model is trained, let’s understand how to use it to infer and run benchmarks.

- Took about 10 compute units

このパートでは、トレーニング済みモデルを使って推論を行い、ベンチマークデータセットで性能を評価します。

### 5.1 Load the model

Part 4で保存したトレーニング済みモデルを読み込みます。

In [None]:
#clone the FinNLP repository
!git clone https://github.com/AI4Finance-Foundation/FinNLP.git

sys.path.append('/content/FinNLP/')

Cloning into 'FinNLP'...
remote: Enumerating objects: 1424, done.[K
remote: Counting objects: 100% (184/184), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 1424 (delta 168), reused 148 (delta 148), pack-reused 1240 (from 1)[K
Receiving objects: 100% (1424/1424), 4.53 MiB | 3.84 MiB/s, done.
Resolving deltas: 100% (655/655), done.


In [None]:
# Load benchmark datasets from FinNLP
from finnlp.benchmarks.fpb import test_fpb
from finnlp.benchmarks.fiqa import test_fiqa , add_instructions
from finnlp.benchmarks.tfns import test_tfns
from finnlp.benchmarks.nwgi import test_nwgi

In [None]:
# Fine-tuned PEFT model paths
path_to_check = '/content/drive/MyDrive/Colab Notebooks/【Dev】Financial Market Analysis Using Machine Learning/FinGPT/FinGPT: Training with LoRA and Llama-3/Model/'

# Check if the specified path exists
if os.path.exists(path_to_check):
    print("Path exists.")
else:
    print("Path does not exist.")

Path exists.


In [None]:
def eval_with_PEFT_model(base_model,peft_model):


  # Loda tokenizer
  tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
  tokenizer.pad_token = tokenizer.eos_token  # Use eos_token as pad_token
  tokenizer.padding_side = 'left'  # Important: Set as left padding


  model = LlamaForCausalLM.from_pretrained(base_model,
                                          trust_remote_code=True,
                                          load_in_8bit=True,
                                          device_map="cuda:0")  #Set the model to GPU

  # load peft's fine-tuned model weights
  model = PeftModel.from_pretrained(model, peft_model)

  return model.eval(), tokenizer

In [None]:
base_model = "meta-llama/Meta-Llama-3-8B" # Loading the Llama base model and supporting text-generated models
peft_model = path_to_check  # Fine-tuned PEFT model paths

model,tokenizer = eval_with_PEFT_model(base_model,peft_model)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

ValueError: Can't find 'adapter_config.json' at '/content/drive/MyDrive/Colab Notebooks/【Dev】Financial Market Analysis Using Machine Learning/FinGPT/FinGPT: Training with LoRA and Llama-3/Model/'

### 5.2 Run Benchmarks:
いくつかの金融テキストデータセット (TFNS, FPB, FIQA, NWGI) を使って、モデルの性能 (Accuracy, F1-score) を評価します

In [None]:
batch_size = 8

logging.getLogger("transformers").setLevel(logging.ERROR)

In [None]:
# TFNS Test Set, len 2388
# Available: 99.4 compute units
res_tfns = test_tfns(model, tokenizer, batch_size = batch_size)
# Available: 98.9 compute units
# Took about 0.5 compute unite to inference

In [None]:
# FPB, len 1212
res_fpb = test_fpb(model, tokenizer, batch_size = batch_size)

In [None]:
# FiQA, len 275
res_fiqa = test_fiqa(model, tokenizer, prompt_fun = add_instructions, batch_size = batch_size)

In [None]:
# NWGI, len 4047
res_nwgi = test_nwgi(model, tokenizer, batch_size = batch_size)

In [None]:
res_nwgi

In [None]:
def get_score(df):

  accuracy = accuracy_score(df['target'], df['new_out'])

  f1_macro = f1_score(df['target'], df['new_out'], average='macro')

  f1_weighted = f1_score(df['target'], df['new_out'], average='weighted')

  return round(accuracy,3), round(f1_macro,3), round(f1_weighted,3)


def form_socre_dic(dataset_name):
  score_list = get_score(dataset_name)

  score_dic = {
        'Accuracy': score_list[0],
        'F1_macro': score_list[1],
        'F1_weighted': score_list[2]
  }

  return score_dic

In [None]:
score_dic = {
      'TFNS': form_socre_dic(res_tfns),
      'FPB': form_socre_dic(res_fpb),
      'FIQA': form_socre_dic(res_fiqa),
      'NWGI': form_socre_dic(res_nwgi)
}

In [None]:
pd.DataFrame(score_dic)

In [None]:
score_dic

### 5.3 Result comparision
ベンチマーク結果を他のモデルと比較し、表とグラフで可視化します。

In [None]:
#Results of other fine-tuned model come from previous tranning results.
results = {
    "TFNS": {
        "FinGPT-ChatGlm2-6b": {"Acc": 0.856, "F1_macro": 0.806, "F1_weighted": 0.850},
        "FinGPT-V3.1": {"Acc": 0.876, "F1_macro": 0.841, "F1_weighted":  0.875},
    },
    "FPB": {
        "FinGPT-ChatGlm2-6b": {"Acc": 0.741, "F1_macro": 0.655, "F1_weighted": 0.694},
        "FinGPT-V3.1": {"Acc": 0.856, "F1_macro": 0.841, "F1_weighted": 0.855},
    },
    "FIQA": {
        "FinGPT-ChatGlm2-6b": {"Acc": 0.48, "F1_macro": 0.5,  "F1_weighted": 0.49},
        "FinGPT-V3.1": {"Acc": 0.836, "F1_macro":0.746, "F1_weighted": 0.850},
    },
    "NWGI": {
        "FinGPT-ChatGlm2-6b": {"Acc": 0.521, "F1_macro": 0.500, "F1_weighted":0.490},
        "FinGPT-V3.1": {"Acc": 0.642, "F1_macro": 0.650,"F1_weighted": 0.642},
    },
}

In [None]:
# Update the results dictionary to insert the value of FinGPT-Llama-8b.
for dataset_name, scores in score_dic.items():
    if dataset_name in results:

        if "FinGPT-Llama-8b" not in results[dataset_name]:
            results[dataset_name]["FinGPT-Llama-8b"] = {}

        results[dataset_name]["FinGPT-Llama-8b"].update({
            "Acc": scores['Accuracy'],
            "F1_macro": scores['F1_macro'],
            "F1_weighted": scores['F1_weighted']
        })

In [None]:
data = []
for dataset, models in results.items():
    for model, metrics in models.items():
        data.append([dataset, model, metrics.get("Acc", None), metrics.get("F1_macro", None),
                     metrics.get("F1_micro", None), metrics.get("F1_weighted", None)])

df = pd.DataFrame(data, columns=["Dataset", "Model", "Acc", "F1_macro", "F1_micro", "F1_weighted"])

# visualization
def plot_metric(metric_name):
    plt.figure(figsize=(10, 6))
    for model in df["Model"].unique():
        subset = df[df["Model"] == model]
        plt.plot(subset["Dataset"], subset[metric_name], marker='o', label=model)
    plt.title(f"{metric_name} Comparison Across Datasets")
    plt.xlabel("Dataset")
    plt.ylabel(metric_name)
    plt.legend()
    plt.grid(True)
    plt.show()

# Visualization of Accuracy, F1_macro and F1_weighted comparison
plot_metric("Acc")
plot_metric("F1_macro")
plot_metric("F1_weighted")

In [None]:
# Transpose the data table so that the rows are datasets and the columns are models for Acc, F1_macro, and F1_weighted, respectively.


acc_df = df.pivot(index='Dataset', columns='Model', values='Acc')


f1_macro_df = df.pivot(index='Dataset', columns='Model', values='F1_macro')


f1_weighted_df = df.pivot(index='Dataset', columns='Model', values='F1_weighted')

In [None]:
print("Accuracy DataFrame:")
acc_df

In [None]:
print("\nF1 Macro DataFrame:")
f1_macro_df

In [None]:
print("\nF1 Weighted DataFrame:")
f1_weighted_df

## **<u>Conclusion</u>**

In this project, we trained a FinGPT model using the code from Parts 1 through 5.  We fine-tuned a Meta-Llama-3-8B model using LoRA (Low-Rank Adaptation) and evaluated its performance on four benchmark datasets: TFNS, FPB, FIQA, and NWGI. These benchmarks are widely used for evaluating the performance of natural language processing models in the financial domain, each consisting of different financial text datasets.

* **TFNS (Twitter Financial News Sentiment):** A sentiment analysis dataset of tweets related to financial news.  Accuracy on this benchmark reflects the model's ability to correctly classify the sentiment (positive, negative, or neutral) expressed in financial tweets. High accuracy here suggests the model's potential for social media sentiment analysis, which can be valuable for understanding market sentiment and informing investment strategies.
* **FPB (Financial PhraseBank):** A sentiment analysis dataset of phrases extracted from financial news articles.  Accuracy on this benchmark indicates the model's ability to correctly identify the sentiment of financial phrases within their context.
* **FIQA (Financial Question Answering):** A financial question-answering dataset where sentiment labels are assigned based on the difficulty of the questions. Accuracy here measures the model's ability to understand the emotional nuances behind financial questions and generate appropriate responses.
* **NWGI (News with Generating Insights):** A dataset for generating insights from financial news headlines and bodies. Accuracy on this benchmark reflects the model's ability to extract key information from news articles and produce insightful analyses.

Our results showed strong performance on the TFNS benchmark, achieving 87% accuracy.  This result is promising and suggests the model's potential as a tool for analyzing market sentiment from social media.  The FPB benchmark also showed promising results, with accuracy nearing 80%, suggesting practical applicability. However, the FIQA benchmark yielded a lower accuracy of 57%, and the NWGI benchmark also showed a lower accuracy of 61%, indicating areas for improvement compared to other models.  The lower accuracy on FIQA, in particular, may be attributed to the relatively small size of the dataset and the inherent complexities of sentiment analysis on short tweet-like data.

To improve performance on FIQA and NWGI, we are considering several strategies. First, we plan to augment the training data with additional datasets specifically focused on financial terminology, such as financial news articles and corporate financial reports.  This will strengthen the model's understanding of the financial domain.  Second, we will fine-tune the model's hyperparameters, focusing on the learning rate, batch size, and LoRA-specific parameters like rank (`r`) and `lora_alpha`.  Third, we will explore prompt engineering techniques to design more effective prompts and input phrasing. Finally, while we currently use Meta-Llama-3-8B, comparing performance with other large language models, such as BloombergGPT, is a future research direction.

The FinGPT model has the potential for various financial applications, including financial market prediction, portfolio risk management, AI-driven investment advice, and trading strategies based on sentiment analysis of news and social media.  Accurate sentiment analysis plays a crucial role in understanding market sentiment and informing investment decisions. For example, the high accuracy on the TFNS benchmark suggests that this model could be an effective tool for analyzing market sentiment and potentially improving investment strategies.