# Fintech Chatbot

## 1. Overview
#### VaultAI is a chatbot for financial services using the BANKING77 dataset. This project demonstrates how to classify intents from the BANKING77 dataset using two different methods:
1. **Naive Bayes Classifier** (Traditional ML Approach)
2. **Transformer (BERT)** Model (State-of-the-Art NLP)


#### **The chabot can:**
#### 1. Recognise User Intent
#### 2. Extract Entities


### **Goal**: Build and compare models to identify intents from user queries and integrate them into a chatbot.

### **Objectives:**
- #### Train a machine learning model for intent recognition.
- #### Implement entity extraction with spaCy.
- #### Deploy the chatbot as a REST API.

## 2. Data Loading and Preprocessing

In [1]:
# Install necessary libraries
!pip install datasets scikit-learn transformers torch
!pip install --upgrade scipy numpy
!pip install --upgrade pyarrow
!pip install --upgrade datasets
!pip uninstall -y tensorflow
!pip install tensorflow-cpu

Collecting numpy
  Using cached numpy-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Using cached numpy-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.3 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-2.0.2:
      Successfully uninstalled numpy-2.0.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dopamine-rl 4.0.9 requires tensorflow>=2.2.0, which is not installed.
cudf-cu12 24.10.1 requires pyarrow<18.0.0a0,>=14.0.0, but you have pyarrow 18.0.0 which is incompatible.
cupy-cuda12x 12.2.0 requires numpy<1.27,>=1.20, but you have numpy 2.1.3 which is incompatible.
gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.1.3 which is incompatible.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have sci

In [2]:
# Import libraries
import os
import torch
from datasets import load_dataset
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, precision_recall_fscore_support
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
                          Trainer, TrainingArguments)

## 3. Naive Bayes Classifier


### Data Processing

In [3]:
# Disable Weights & Biases logging
os.environ["WANDB_DISABLED"] = "true"

# Load the BANKING77 dataset
ds = load_dataset("legacy-datasets/banking77")

# Extract text and labels
texts = ds["train"]["text"]
labels = ds["train"]["label"]

# Map label indices to intent names
label_names = ds['train'].features['label'].names

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)

# Vectorize the text data
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


### Model Training


*   Train a Naive Bayes classifier for intent recognition.
*   Evaluate the model’s accuracy and generate a classification report.



In [4]:
# Train Naive Bayes Model
nb_model = MultinomialNB()
nb_model.fit(X_train_vec, y_train)

# Predict and Evaluate
y_pred = nb_model.predict(X_test_vec)
nb_accuracy = accuracy_score(y_test, y_pred)
print(f"Naive Bayes Model Accuracy: {nb_accuracy:.2f}")

# Classification Report
print("\nClassification Report (Naive Bayes):")
print(classification_report(y_test, y_pred, target_names=label_names))

Naive Bayes Model Accuracy: 0.79

Classification Report (Naive Bayes):
                                                  precision    recall  f1-score   support

                                activate_my_card       0.63      0.84      0.72        31
                                       age_limit       0.96      0.96      0.96        25
                         apple_pay_or_google_pay       0.85      0.96      0.90        23
                                     atm_support       0.80      0.63      0.71        19
                                automatic_top_up       0.93      0.93      0.93        27
         balance_not_updated_after_bank_transfer       0.69      0.77      0.73        31
balance_not_updated_after_cheque_or_cash_deposit       0.79      0.90      0.84        41
                         beneficiary_not_allowed       0.69      0.83      0.75        24
                                 cancel_transfer       0.91      0.91      0.91        35
                            

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## 4. Transformer (BERT)

### Preprocessing and Tokenization

In [5]:
# Tokenizer for BERT
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenize datasets
train_encodings = tokenizer(X_train, padding=True, truncation=True, max_length=32, return_tensors="pt")
test_encodings = tokenizer(X_test, padding=True, truncation=True, max_length=32, return_tensors="pt")

# Create PyTorch datasets
class BankingDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: val[idx] for key, val in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item

train_dataset = BankingDataset(train_encodings, y_train)
test_dataset = BankingDataset(test_encodings, y_test)

### Model Training

In [6]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# Use a smaller model (DistilBERT) for faster training
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_names))

# Define optimized training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=3e-5,  # Slightly higher learning rate for faster convergence
    per_device_train_batch_size=4,  # Reduced batch size for CPU efficiency
    per_device_eval_batch_size=8,
    num_train_epochs=2,  # Reduced number of epochs for faster training
    weight_decay=0.01,
    save_strategy="no",  # Avoid intermediate checkpoint saves
    logging_steps=500,  # Log less frequently
    report_to="none",
)

# Define metrics for evaluation
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average="weighted")
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# Initialize Trainer with dynamic padding
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, pad_to_multiple_of=8)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    data_collator=data_collator,
)

# Train the model
trainer.train()

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.99,0.743025,0.83908,0.832607,0.862095,0.83908


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.99,0.743025,0.83908,0.832607,0.862095,0.83908
2,0.3864,0.411039,0.889555,0.887394,0.898966,0.889555


TrainOutput(global_step=4002, training_loss=1.3270114467240524, metrics={'train_runtime': 7123.031, 'train_samples_per_second': 2.247, 'train_steps_per_second': 0.562, 'total_flos': 132677737400064.0, 'train_loss': 1.3270114467240524, 'epoch': 2.0})

### Evaluation

In [7]:
## Evaluate the BERT Model
results = trainer.evaluate(test_dataset)
bert_accuracy = results["eval_accuracy"]
print(f"Transformer Model Accuracy: {bert_accuracy:.2f}")

# Save Model and Tokenizer
model.save_pretrained("./fintech_bert_model")
tokenizer.save_pretrained("./fintech_bert_model")

Transformer Model Accuracy: 0.89


('./fintech_bert_model/tokenizer_config.json',
 './fintech_bert_model/special_tokens_map.json',
 './fintech_bert_model/vocab.txt',
 './fintech_bert_model/added_tokens.json',
 './fintech_bert_model/tokenizer.json')

## 5. Compare Naive Bayes and BERT


In [8]:
print(f"Naive Bayes Accuracy: {nb_accuracy:.2f}")
print(f"BERT Accuracy: {bert_accuracy:.2f}")

Naive Bayes Accuracy: 0.79
BERT Accuracy: 0.89


## 6. Define Chatbot Logic

In [11]:
# Load BERT model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("./fintech_bert_model")
tokenizer = AutoTokenizer.from_pretrained("./fintech_bert_model")

#Define Chatbot response logic
def predict_intent(user_input):
    inputs = tokenizer(user_input, return_tensors="pt", padding=True, truncation=True, max_length=64)
    outputs = model(**inputs)
    predicted_label = torch.argmax(outputs.logits, dim=1).item()
    return predicted_label

    # Full label-to-intent mapping (example for the first few intents)
label_to_intent = {
    0: "activate_my_card",
    1: "age_limit",
    2: "apple_pay_or_google_pay",
    # Add all remaining labels up to 76
}

# Responses for each intent
responses = {
    "activate_my_card": "To activate your card, go to the app's settings and follow the activation instructions.",
    "age_limit": "The age limit for this service is 18 years or older.",
    "apple_pay_or_google_pay": "Yes, we support Apple Pay and Google Pay. You can add your card via their respective apps.",
    # Add responses for all other intents
}

# Default response for unknown intents
default_response = "Sorry, I didn't understand your request. Can you rephrase?"


def chatbot_response(user_input):
    try:
        label = predict_intent(user_input)
        intent = label_to_intent.get(label, "unknown_intent")
        response = responses.get(intent, default_response)
        return response
    except Exception as e:
        return "There was an error processing your request. Please try again later."

## 7. Chatbot Integration

The chatbot uses Flask to expose a REST API for interaction.

- **Endpoint**: `/chat`
- **HTTP Method**: POST
- **Request Payload**:
    ```json
    {"text": "How do I activate my card?"}
    ```
- **Response**:
    ```json
    {"response": "To activate your card, go to the app's settings and follow the activation instructions."}
    ```

In [19]:
from flask import Flask, request, jsonify

app = Flask(__name__)
@app.route("/chat", methods=["POST"])
def chat():
    user_input = request.json.get("text", "")
    response = chatbot_response(user_input)
    return jsonify({"response": response})

if __name__ == "__main__":
    app.run()

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m


## 8. Evaluation Results

| **Model**       | **Accuracy** | **Precision** | **Recall** | **F1-Score** |
|------------------|--------------|---------------|------------|--------------|
| Naive Bayes      | 79%          | 78%           | 79%        | 78%          |
| BERT             | 89%          | 88%           | 89%        | 88%          |

### **Observations**:
- **Naive Bayes**: Lightweight, suitable for resource-constrained environments.
- **BERT**: Superior accuracy and contextual understanding, recommended for production use.

## 9. Recommendations
- Use BERT for production applications requiring high accuracy and contextual understanding.
- Naive Bayes is suitable for lightweight tasks or resource-limited scenarios.
- Enhance the chatbot by adding more intents and responses for improved usability.