<a href="https://colab.research.google.com/github/vishakhun/GDPR_Prediction/blob/main/GDPR_Customer_Request_Prediction_Model_(ride_hailing_company).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#GDPR Customer Request Prediction Model - Ride-hailing company

##1. Introduction
This project aims to predict whether customer service requests are related to GDPR compliance using machine learning models. This capability can help organizations respond appropriately to privacy-related inquiries, ensuring compliance with data protection laws.

##2. Project Description
Objective: Develop a predictive model that can classify customer service requests based on whether they pertain to GDPR issues.
###Models Utilized:


1. Random Forest
2. XGBoost



Data: The dataset include customer requests in a ride-hailing company with fields like Request ID, Request Type, Request Description and randomly created using LLM


##3. System Architecture


*   Data Preprocessing: Cleansing and vectorizing text data using TF-IDF.
*   Model Training: Training several machine learning models and selecting the best performer based on evaluation metrics.
* Prediction Module: Deploying the selected model to predict the category of new requests.





###Import/Install necessary libraries

In [9]:
%%capture
import pandas as pd
import numpy as np
import xgboost as xgb
import scipy.sparse as sp
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier


try:
    import xgboost
except ImportError:
    !pip install

### Load the dataset

In [10]:
data = pd.read_csv('/content/sample_data/balanced_customer_service_requests_dataset.csv')

###Data Preprocessing

In [11]:
# Create a 'GDPR_Related' column based on the 'Request Type'
data['GDPR_Related'] = data['Request Type'].apply(lambda x: x == 'GDPR Request')


def preprocess_text(text):
    """Convert text to lowercase and remove non-alphanumeric characters."""
    text = text.lower()
    text = ''.join([char for char in text if char.isalnum() or char.isspace()])
    return text
# Apply preprocessing to the 'Request Description'
data['Processed Description'] = data['Request Description'].apply(preprocess_text)

# Feature Extraction
vectorizer = TfidfVectorizer(max_features=1000)
X = vectorizer.fit_transform(data['Processed Description'])
y = data['GDPR_Related']

# Splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Model Training and Evaluation


In [12]:
# Model Training and Evaluation
# Define models

models = {
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'XGBoost': xgb.XGBClassifier(objective='binary:logistic', use_label_encoder=False, eval_metric='logloss', seed=42)
}

# Train and evaluate models
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f"Results for {name}:")
    print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
    print(f"Precision: {precision_score(y_test, y_pred)}")
    print(f"Recall: {recall_score(y_test, y_pred)}")
    print(f"F1 Score: {f1_score(y_test, y_pred)}")


Results for Random Forest:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Results for XGBoost:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0


### Prediction Function

In [13]:
def predict_gdpr(description):
    """Generate GDPR-related predictions from different models for a given description."""
    processed_description = preprocess_text(description)
    tfidf_description = vectorizer.transform([processed_description])
    print(f"Prediction results for: '{description}'")
    for name, model in models.items():
        prediction = model.predict(tfidf_description)
        print(f"{name}: {'GDPR-related' if prediction else 'Not GDPR-related'}")


#User Input and Prediction

In [14]:
# Loop to continuously accept input
while True:
    user_input = input("Enter a customer service request description to predict (type 'exit' to quit): ")
    if user_input.lower() == 'exit':
        break
    predict_gdpr(user_input)
    print("\n")  # Print a newline for better separation of prediction sessions


Enter a customer service request description to predict (type 'exit' to quit): delete data
Prediction results for: 'delete data'
Random Forest: GDPR-related
XGBoost: Not GDPR-related


Enter a customer service request description to predict (type 'exit' to quit): data access
Prediction results for: 'data access'
Random Forest: GDPR-related
XGBoost: Not GDPR-related


Enter a customer service request description to predict (type 'exit' to quit): data deletion
Prediction results for: 'data deletion'
Random Forest: GDPR-related
XGBoost: Not GDPR-related


Enter a customer service request description to predict (type 'exit' to quit): data correction
Prediction results for: 'data correction'
Random Forest: GDPR-related
XGBoost: Not GDPR-related


Enter a customer service request description to predict (type 'exit' to quit): fdsfsf
Prediction results for: 'fdsfsf'
Random Forest: Not GDPR-related
XGBoost: Not GDPR-related


Enter a customer service request description to predict (type 'exit' 