<a href="https://colab.research.google.com/github/talilinda/BA_HW/blob/main/S05_Ocado.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Monitoring Customer Sentiment at Ocado

## B.S. Vanneste and A. Zohrehvand

Copyright © 2023 B.S. Vanneste and A. Zohrehvand. This publication may not be digitized, photocopied, or otherwise reproduced, posted, or transmitted, without the permission of the authors. To order copies or request permission to reproduce materials, email b.vanneste@ucl.ac.uk. This exercise is developed solely as the basis for class discussion.

# Ocado

In 2020, the UK grocery retailer Ocado faced a daunting challenge to keep supplying their customers while a pandemic raged. The company is keen to monitor customers' opinion to understand how well it is coping. **It seeks your help to build and evaluate a monitoring system.**

One approach is sentiment analysis, in which the opinions of customers are inferred from social media postings. This approach fits well with Ocado's online only strategy. Sentiment analysis relies on natural language processing (NLP) models. These models are typically trained on large amounts of text and can be reused on other text. Thus, we can benefit from the work of others by directly importing these pre-trained NLP models, saving substantial time and ensureing (near) state-of-the-art performance.

The data are from Twitter. We focus on tweets that include the word Ocado. Because Twitter is publicly accessible, you also have access to tweets that refer to Tesco. Tesco is a key competitor, using both online channels and an extensive network of physical stores.  

**Q1. As a manager at Ocado, why would you be interested in Tesco's tweets?**

*your answere here*

# Import packages

Python makes use of software packages for added functionality.

In [None]:
# import packages
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

# install transformers from Hugging Face (an NLP package)
!pip install transformers
from transformers import pipeline

# Import data

We'll work with a dataset of 2,400 tweets about Ocado and Tesco. From the population of all tweets posted in the UK in 2020 that include "Ocado" or "Tesco", we randomly sample 100 tweets per month for each firm.


In [None]:
# import data
df = pd.read_csv("https://storage.googleapis.com/ai4business/Ocado/c3j27xdc_tweets_df.csv")

# show data
df.head()

In [None]:
# show some tweets
df.text.sample(10).tolist()

# Set up model

Hugging Face, an NLP software package, provides many pre-trained models. You can see a list of models [here](https://huggingface.co/models). For the purpose of this exercise, we use RoBERTa-large (Liu et al. 2019). You can read more about it [here](https://huggingface.co/siebert/sentiment-roberta-large-english).


In [None]:
# set up model
sentiment_analysis = pipeline("sentiment-analysis", model="siebert/sentiment-roberta-large-english")

# Understand model

The model is trained to classify sentiment as positive or negative. We'll try out the model by feeding it tweets with different sentiment.

**Q2. Write a positive tweet and check if the model predicts positive.**

A tweet is a short text of 1 or 2 sentences.

In [None]:
### your code here ###
positive_tweet="YOUR POSITIVE TWEET HERE"
### /your code here ###

sentiment_analysis(positive_tweet)

The model returns a label (a prediction) and a score (according to the model, the probability that the label is correctly predicted).

**Q3. Write a negative tweet and check if the model predicts negative.**


In [None]:
### your code here ###
negative_tweet="YOUR NEGATIVE TWEET HERE"
### /your code here ###

sentiment_analysis(negative_tweet)

Because the model is trained to classify sentiment as positive or negative, it will struggle with neutral text. Try it.

**Q4. Write a neutral tweet and check the prediction.**

In [None]:
### your code here ###
neutral_tweet="YOUR NEUTRAL TWEET HERE"
### /your code here ###

sentiment_analysis(neutral_tweet)

Natural language is full of subtleties. For example, a sarcastic sentence is easily misunderstood by non-native speakers. How good is the model in picking up these subtleties?

**Q5. Write a negative tweet that you think the model will misclassify as positive.**


In [None]:
### your code here ###
misleading_tweet="YOUR MISLEADING TWEET HERE"
### /your code here ###

sentiment_analysis(misleading_tweet)

# Assess model



**Q6: Select 5 tweets and let the model predict their sentiment.**


In [None]:
# select tweets by picking 5 numbers between 0 and 2399
### your code here ###
tweet_numbers = [0, 0, 0, 0, 0]
### /your code here ###

# get text
tweet_text = df["text"][tweet_numbers].tolist()

# conduct sentiment analysis
list(zip(tweet_text, sentiment_analysis(tweet_text)))

**Q7: Do you agree with alll the model predictions? (Yes/No)**

*Your answer here.*

We can assess the model by comparing its predictions with the ground truth or actual sentiment. Because the data are unlabeled, we need to provide the ground truth. This process is called labeling or tagging the data.

Let's select some random tweets for labeling.

In [None]:
# select 10 random tweets
test_tweets = df["text"].sample(10).tolist()

# show tweets
test_tweets

**Q8. Label the tweets as either POSITIVE or NEGATIVE in the same order as above.**

In [None]:
### your code here ###
labels = ["POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",
          "POSITIVE OR NEGATIVE",]
### /your code here ###

# define test set
test_df = pd.DataFrame({"text":test_tweets,
                        "label":labels})

# show test set
test_df

**Q9. Reflect on the objectivity and cost of labeling the data.**

*your answere here*

Now that we have the labels, we can compare them to the model predictions.

In [None]:
# conduct sentiment analysis
test_df["predicted_label"]=pd.DataFrame(sentiment_analysis(test_df.text.tolist()))["label"]

# show test set
test_df

A common and intuitive performance metric is accuracy or the proportion correct predictions. For example, if the number of correctly predicted tweets is 7 and the total number of predictions is 10, then the accuracy is 7 / 10 = 0.7.

In [None]:
# calculate accuracy score
accuracy_score(test_df.label, test_df.predicted_label)

**Q10. Reflect on the model's accuracy.**

*your answere here*

# Use the model

One way to evaluate a model is to assess its accuracy, as we have done above. Another way is to consider the extent to which it can explain events, as we will do below.

We import the predictions for each tweet of Ocado and Tesco in the dataset. The predictions are made with the same model as specified above. Depending on the model that is used, predicting sentiment can take substantial time for larger datatsets. If you want to generate these predictions in your own time, then you can do so with the code that is also provided below (uncomment code before running it).

In [None]:
# import predictions
df = pd.read_csv("https://storage.googleapis.com/ai4business/Ocado/kCbekA2Nd5_tweets_with_predictions.csv")

# # generate predictions
# def make_chunks(lst, n):
#     for i in range(0, len(lst), n):
#         yield lst[i:i + n]

# chunks = list(make_chunks(df["text"].tolist(), 800))
# df[["label", "prediction"]] = pd.concat([pd.DataFrame(sentiment_analysis(elem)) for elem in chunks])

For each tweet, the model returns a label (or prediction) and score (or probability). We first plot the scores distributions separately for the positive and negatively predicted tweets.

In [None]:
# plot scores
sns.displot(
    df, x="prediction", col="label",
    binwidth=0.01,
    facet_kws=dict(margin_titles=True),
)
plt.show()

**Q11. Overall is the model confident of its predictions? (Yes/No)**

Reflect on whether this level of confidence is desired or not.


*your answere here*

We next analyse the labels or predictions. We compare *average* tweet sentiment per month. Hence, we need to aggregate the data by company and month.

In [None]:
# define function for aggregating data
def calculate_prop_positive(x):
    return sum(x == "POSITIVE") / len(x)

# aggregate data by comany and month
gb = df.groupby(by=["company", "month"])
df_aggregate = gb["label"].apply(calculate_prop_positive)
df_aggregate = df_aggregate.reset_index()

# plot average sentiment over time
sns.lineplot(data=df_aggregate, x="month", y="label", hue="company")
plt.show()

**Q12. Based on the figure above, can you infer the month of the first lockdown in the UK?**

*Your answer here*

**Q13. Speculate why Ocado has done worse than Tesco in terms of sentiment.**


*Your answer here*

Ocado sells groceries under their own brand, with a name brand (e.g. Unilever), and also from one other supermarket chain. For the last category, Ocado switched suppliers from Waitrose to M&S in 2020.

**Q14. Based on the figure above, can you guess the month of Ocado switching suppliers?**


*Your answer here.*

**Q15. As a manager at Ocado, how could you use these type of analyses going forward?**

*Your answer here.*