# Complaint Classification using Neural Networks and NLP

**Objective:**

Build a model that can classify customer complaints into predefined categories using Natural Language Processing (NLP) and Neural Networks.

**Project Tasks:**

**1. Data Understanding & Preprocessing**

- Perform **Exploratory Data Analysis (EDA)**:
    - Class distribution
    - Complaint length analysis
    - Word cloud for each category (optional)
- Text Cleaning:
    - Lowercasing
    - Removing punctuation
    - Removing stopwords
    - Lemmatization or stemming
    
**2. Text Vectorization (Feature Extraction)**
- **Mandatory**: Use **Word Embeddings**:
    - Pre-trained: GloVe, Word2Vec, or FastText (recommended)
    - Or train embeddings from scratch using Embedding Layer in Keras/PyTorch
- Optional: Compare with TF-IDF + Dense Neural Networks

**3. Model Building (Neural Networks)**
- **Baseline**: Build a simple feedforward Neural Network using embeddings.
- **Advanced**: Build a Recurrent Neural Network (RNN) or LSTM model for sequence modelling.
- **Optional**: Try ID Convolutional Neural Networks (CNNs) for text classification.

**4. Model Evaluation**
- Accuracy, Precision, Recall, Fl-Score per class
- Confusion Matrix
- Training vs. Validation loss and accuracy plots

**5. Model Optimization**
- Try different optimizers, batch sizes, learning rates.
- Experiment with dropout, regularization, embedding size.

**Rules:**
- Do not use traditional ML models (like SVM, Random Forest, etc.).
- Must use neural networks (Feedforward, LSTM, GRU, CNN etc.).
- No direct copying of code from online tutorials; they can take references but the implementation must be custom.

**Bonus Ideas:**
- Visualize attention weights (if using advanced models)
- Handle multi-label classification if the dataset allows
- Perform error analysis on misclassified samples

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [12]:
hdfc_data = pd.read_csv('HDFC_Bank_Complaints_Dataset.csv')
hdfc_data

Unnamed: 0,Complaint ID,Complaint Text,Category
0,1,Incorrect deduction of annual credit card fee.,Notification Issue
1,2,Incorrect charges on my account statement.,Credit Card Issue
2,3,Incorrect charges on my account statement.,Credit Card Issue
3,4,Net banking account got locked without reason.,Account Issue
4,5,Online fund transfer failed but money was debi...,Notification Issue
...,...,...,...
5995,5996,ATM did not dispense cash but the amount was d...,Net Banking Issue
5996,5997,Incorrect deduction of annual credit card fee.,Net Banking Issue
5997,5998,Mobile banking app is frequently crashing.,Notification Issue
5998,5999,Delay in issuing new debit card.,Account Issue


In [17]:
# rename Complaint Text to Complaint_Text
hdfc_data.rename(columns={'Complaint Text': 'Complaint_Text'}, inplace=True)

In [18]:
hdfc_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6000 entries, 0 to 5999
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Complaint ID    6000 non-null   int64 
 1   Complaint_Text  6000 non-null   object
 2   Category        6000 non-null   object
dtypes: int64(1), object(2)
memory usage: 140.8+ KB


In [19]:
hdfc_data.isna().sum()

Complaint ID      0
Complaint_Text    0
Category          0
dtype: int64

In [20]:
hdfc_data['Category'].value_counts(dropna=False)

Category
Net Banking Issue         640
Branch Service Issue      629
Notification Issue        628
Loan Issue                610
Cheque Issue              607
Account Issue             607
Credit Card Issue         598
Customer Service Issue    568
Fraudulent Transaction    558
ATM Issue                 555
Name: count, dtype: int64

In [24]:
hdfc_data[["Complaint_Text", "Category"]].duplicated().sum()

np.int64(5600)