# Delta Airlines Twitter Customer Support Analysis

In this notebook we will be analyzing tweets from users for Delta Air Lines support reps using OpenAI GPT3.5 API. The analysis is done in 2 steps:

1. Intent identification: find out why a user is writing a tweet, this could be one of 4 reasons:
    - Raise a complaint/grievance
    - Ask a question
    - Share a good experience
    - Other reasons

2. Reasons for complaints: for the tweets where a complaint is being raised, we want to identify the major causes which would be one of:
    - Flight Cancellation
    - Flight Delays
    - Bad Flight Experience
    - Lost/Damaged Luggage
    - Flight Attendant Complaints
    - Flight Booking Problems
    - Poor Customer Service
    - Other

## Data Source


## Environment Setup
This notebook was run using Python 3.8 with following packages installed:
- langchain==0.0.177
- pandas==2.0.1

In [56]:
import sys
sys.path.append("../")

In [1]:
import pandas as pd

In [2]:
data_dir = "/Users/aarshayjain/Data/generative-ai/gpt-tutorials/twitter-customer-support"
df = pd.read_csv(f"{data_dir}/twcs.csv")
df.shape

(2811774, 7)

In [3]:
df.head(2)

Unnamed: 0,tweet_id,author_id,inbound,created_at,text,response_tweet_id,in_response_to_tweet_id
0,1,sprintcare,False,Tue Oct 31 22:10:47 +0000 2017,@115712 I understand. I would like to assist y...,2.0,3.0
1,2,115712,True,Tue Oct 31 22:11:45 +0000 2017,@sprintcare and how do you propose we do that,,1.0


In [12]:
df.author_id.value_counts().to_csv(f"{data_dir}/author-stats.csv")

In [19]:
delta_responded_tweets = set(df.loc[df.author_id == "Delta"].in_response_to_tweet_id)
len(delta_responded_tweets)

36271

In [23]:
delta_inbounds = df.loc[df.tweet_id.isin(delta_responded_tweets)].loc[lambda x: x.inbound].loc[lambda x: x.in_response_to_tweet_id.isnull()]
delta_inbounds.shape

(24603, 7)

In [27]:
delta_inbounds.head(5)

Unnamed: 0,tweet_id,author_id,inbound,created_at,text,response_tweet_id,in_response_to_tweet_id
317,611,115818,True,Sat Aug 06 01:31:50 +0000 2016,@DELTA i booked my flight using delta amex car...,609,
487,792,115882,True,Tue Oct 31 21:33:48 +0000 2017,@Delta why wasn't earlier flight offered when ...,790,
494,801,115883,True,Sun Oct 29 15:59:35 +0000 2017,"@Delta The ""change flight"" search option on yo...",799,
496,803,115884,True,Tue Oct 31 21:33:27 +0000 2017,.@delta this has been my inflight studio exper...,802,
505,814,115885,True,Tue Oct 31 16:37:26 +0000 2017,@Delta I'm flying JFK-MEX-MID tomorrow and you...,813,


In [30]:
delta_inbounds["date"] = pd.to_datetime(delta_inbounds.created_at).apply(lambda x: x.date)

In [31]:
delta_inbounds.to_csv(f"{data_dir}/delta-inbounds.csv")

In [32]:
delta_inbounds.sort_values(by="date", ascending=False, inplace=True)

In [39]:
delta_inbounds.date.value_counts().sort_index(ascending=False).head(20)

2017-12-03    290
2017-12-02    324
2017-12-01    452
2017-11-30    391
2017-11-29    400
2017-11-28    424
2017-11-27    541
2017-11-26    403
2017-11-25    335
2017-11-24    302
2017-11-23    351
2017-11-22    466
2017-11-21    395
2017-11-20    365
2017-11-19    346
2017-11-18     17
2017-11-17    485
2017-11-16    469
2017-11-15    394
2017-11-14    445
Name: date, dtype: int64

In [41]:
list(delta_inbounds.text.sample(5))

['@Delta flight attendants on flight 5913 from BNA to BOS were 👌. Please do something for them - will be flying Delta again because of them.',
 '@Delta Checking for baggage allowance on https://t.co/xGKZ90duCF could not find the countries NIGER and/or NIGERIA. Help?',
 '@Delta can i use my sky miles to book a ticket for my daughter',
 'In the back of the plane and @Delta must have run out of their yummy cookies! 😭 #ThingsNotToDoToAPregnantLady 💔',
 'Hightailing out of this country to celebrate my birthday #woohoo @Delta https://t.co/IUB77dA3C9']

In [93]:
df_delta_tweets = delta_inbounds[:100]
df_delta_tweets.shape

(100, 8)

## Langchain access

[API reference](https://platform.openai.com/docs/guides/chat/introduction)

In [94]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate

In [95]:
category_prompt_template = """
you are a customer support agent for one of the largest airlines in the USA who is responding to tweets from users. given a tweet, you are supposed to classify it into one of the following categories:

- complaint
- question
- compliment
- others

classify the following tweet:

{tweet}
"""

In [96]:
system_message_prompt = SystemMessagePromptTemplate.from_template("")
user_message_prompt = HumanMessagePromptTemplate.from_template(category_prompt_template)
chat_prompt = ChatPromptTemplate.from_messages([
    system_message_prompt,
    user_message_prompt])

In [97]:
from src.utils.general import load_env_vars_from_files
load_env_vars_from_files(["openai-creds.sh"])

In [98]:
chat = ChatOpenAI(temperature=0)

In [99]:
tweet = delta_inbounds.text.iloc[0]
print(tweet)

@Delta I do really appreciate the app notifications when bags have been loaded on the plane. Nice to have one less thing to worry about ;)


In [100]:
ai_response = chat(chat_prompt.format_prompt(tweet=tweet).to_messages())

In [101]:
ai_response.content

'compliment'

## get all responses

In [102]:
batch_messages = [chat_prompt.format_prompt(tweet=tweet).to_messages()
                  for tweet in df_delta_tweets.text]
len(batch_messages)

100

In [103]:
result = chat.generate(batch_messages)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID abd535b20adfbb70f0f34cde2b65c859 in your message.).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID efef213cb87b1955b1329b204292b66f in your message.).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can r

In [106]:
len(result.generations)

100

In [167]:
result.llm_output

{'token_usage': {'prompt_tokens': 10849,
  'completion_tokens': 271,
  'total_tokens': 11120},
 'model_name': 'gpt-3.5-turbo'}

In [109]:
df_delta_tweets["category_raw"] = [x[0].text for x in result.generations]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_delta_tweets["category_raw"] = [x[0].text for x in result.generations]


In [110]:
df_delta_tweets.category_raw.value_counts()

compliment                    30
complaint                     30
Category: Question             9
Category: Complaint            8
Category: Question.            8
Category: Complaint.           5
question                       4
others                         2
Category: Others               2
Question                       1
category: suggestion/other     1
Name: category_raw, dtype: int64

In [111]:
import re

In [143]:
def extract_category(raw):
    pattern = re.compile("compliment|complaint|question|other")
    match = pattern.search(raw.lower())
    if match is None:
        return None
    return match.group()

In [144]:
df_delta_tweets["category"] = df_delta_tweets.category_raw.apply(extract_category)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_delta_tweets["category"] = df_delta_tweets.category_raw.apply(extract_category)


In [147]:
df_delta_tweets.category.value_counts()

complaint     43
compliment    30
question      22
other          5
Name: category, dtype: int64

In [154]:
def sample_category(category, n):
    print("\n\n".join(list(df_delta_tweets.loc[df_delta_tweets.category == category].text.sample(n))))

In [161]:
sample_category("other", 5)

#BloddyMary Time!It Is Sunday &amp; I'm Not Flying The @Delta https://t.co/evyt8hkqP2! https://t.co/bueOliSTmt

@Delta in a couple hours....2 million!! https://t.co/6AQQC1QI5z

And away we go!  #la #theviewfrommylaptop #adventuresinsales #silverjeansco @delta @ New York,… https://t.co/u3BUV3x9Ww

Another @Delta sunrise in #LAS #headinghome https://t.co/MlvaF8d9qC

@Delta how about serving soft pretzels in your Pennsylvania clubs instead of bagels? Treat us to this local delicacy. Please. https://t.co/PLVdB6oXbH


# exploring reasons for complaints

This is a good first start, but for an actual appication perspective we need to see where exactly we're going wrong. Let's try to classify the negative reviews further into some categories:

- Bad Flight Experience
- Flight Cancellation
- Flight Delays
- Poor Customer Service
- Damaged Luggage
- Flight Attendant Complaints
- Flight Booking Problems
- Lost Luggage
- Other

In [162]:
neg_category_prompt_template = """
you are a customer support agent for one of the largest airlines in the USA who is responding to tweets from users. 
given a tweet where a customer is complaining, you are supposed to identify the reason for the complaint and classify it into one of the following categories:

- Flight Cancellation
- Flight Delays
- Bad Flight Experience
- Lost/Damaged Luggage
- Flight Attendant Complaints
- Flight Booking Problems
- Poor Customer Service
- Other

classify the following tweet:

{tweet}
"""

In [164]:
system_message_prompt = SystemMessagePromptTemplate.from_template("")
user_message_prompt_complaint = HumanMessagePromptTemplate.from_template(neg_category_prompt_template)
chat_prompt_complaint = ChatPromptTemplate.from_messages([
    system_message_prompt,
    user_message_prompt_complaint])

In [166]:
df_complaints = df_delta_tweets.loc[df_delta_tweets.category == "complaint"]
df_complaints.shape

(43, 10)

In [170]:
batch_messages = [chat_prompt_complaint.format_prompt(tweet=tweet).to_messages()
                  for tweet in df_complaints.text]
result_complaint = chat.generate(batch_messages)
len(result_complaint.generations)

43

In [173]:
df_complaints["complaint_category"] = [x[0].text for x in result_complaint.generations]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_complaints["complaint_category"] = [x[0].text for x in result_complaint.generations]


In [175]:
df_complaints.complaint_category.value_counts()

Category: Flight Booking Problems                                                                                                                                                                                                                                 6
Category: Poor Customer Service.                                                                                                                                                                                                                                  5
Category: Flight Delays                                                                                                                                                                                                                                           4
Category: Poor Customer Service                                                                                                                                                                                             