#### Customer Call Center Performance Analysis

##### Data Overview
The dataset has 5,000 entries and 10 columns. Here's a summary of the columns:

- Call Id: Identifier for each call (e.g., ID0001)
- Agent: Name of the agent who handled the call
- Date: Date of the call (stored as a string)
- Time: Time of the call (stored as a string)
- Topic: The topic discussed during the call (e.g.,Contract related, Technical Support)
- Answered (Y/N): Whether the call was answered or not (Y for answered, N for unanswered)
- Resolved: Whether the issue was resolved (Y for yes, N for no)
- Speed of answer in seconds: The time taken to answer the call
- AvgTalkDuration: Average talk duration of the call 
- Satisfaction rating: A rating given for customer satisfaction


##### Objective:

The goal of this project is to analyze customer call center data to gain insights into the following:

- Agent Performance: Identify top-performing agents based on factors like call resolution, speed of answer, and customer satisfaction ratings.
- Call Topic Analysis: Determine which topics are most frequently discussed and how they impact call outcomes (answered, resolved, etc.).
- Customer Satisfaction: Analyze the factors that influence customer satisfaction scores, such as speed of answer and resolution status.
- Operational Efficiency: Evaluate how efficiently the call center operates, considering metrics like average talk duration and speed of answering.


##### STEP 1 : Data Loading and Inspection



In [2]:
import pandas as pd

  from pandas.core import (


In [7]:
data =pd.read_excel("/Users/vaishnavipullakhandam/Desktop/github/excel/Call-Center-Data.xlsx")

# Display the first few rows of the dataset
data.head()

Unnamed: 0,Call Id,Agent,Date,Time,Topic,Answered (Y/N),Resolved,Speed of answer in seconds,AvgTalkDuration,Satisfaction rating
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,Y,Y,109.0,00:02:23,3.0
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,Y,N,70.0,00:04:02,3.0
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,Y,Y,10.0,00:02:11,3.0
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,Y,Y,53.0,00:00:37,2.0
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,Y,Y,95.0,00:01:00,3.0


In [8]:
# Display basic information about the data using the info method

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Call Id                     5000 non-null   object 
 1   Agent                       5000 non-null   object 
 2   Date                        5000 non-null   object 
 3   Time                        5000 non-null   object 
 4   Topic                       5000 non-null   object 
 5   Answered (Y/N)              5000 non-null   object 
 6   Resolved                    5000 non-null   object 
 7   Speed of answer in seconds  4054 non-null   float64
 8   AvgTalkDuration             4054 non-null   object 
 9   Satisfaction rating         4054 non-null   float64
dtypes: float64(2), object(8)
memory usage: 390.8+ KB


##### STEP 2 : Data Cleaning

Data cleaning is a crucial step in the data analysis process, ensuring that the dataset is accurate and ready for analysis. During this step, we handle missing values, fix data types, and address any inconsistencies in the data.

- Handling Missing Values:
Columns like Speed of answer in seconds, AvgTalkDuration, and Satisfaction rating contain missing values. We'll decide how to handle them (e.g., filling with averages, dropping rows, etc.).

- If the column is important for analysis and the na values make up less than 5-10% of the total rows, replace with mean or median. 

-  If the column is not important and the na values make up over 20-30% of the rows, then it is better to drop them. Otherwise, filling too many values could distort the data.


- Date and Time Conversion:
The Date and Time columns are currently in string format. We need to convert them to proper datetime objects so they can be used for time-based analysis.


- Categorical Data:
Columns like Answered (Y/N) and Resolved are categorical variables but are stored as strings. We might consider encoding these values as 0/1 or keeping them as categories for better analysis.


- Data Consistency:
Ensure that data entries are consistent (e.g., uniform date formatting, removing unnecessary white spaces).



68.0