# Sentiment Analysis of Ryanair Customer Reviews

Ryanair, a popular low-cost airline that operates in various regions including Europe and North Africa, has recently extended its services to Morocco. This project aims to conduct sentiment analysis on customer reviews of Ryanair to gain a comprehensive understanding of the general sentiment towards the airline and identify potential areas for improvement.

We will start by cleaning the text data and generating sentiment scores using the VADER (Valence Aware Dictionary and sEntiment Reasoner) tool. Then, we will visualize the results to gain a better understanding of the sentiment distribution and explore the relationship between the length of the reviews and their sentiment scores.

Let's get started!

# 1- Scraping Customer Reviews

The aim of this task is to collect customer reviews from the Ryanair page on [https://www.airlinequality.com] by using Python and BeautifulSoup. The collected data will be used for sentiment analysis, which will provide insights into customer feedback about Ryanair. Analyzing customer feedback can help identify any potential issues or trends that need to be addressed by the airline to improve customer satisfaction.

In [30]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

In [31]:
base_url = "https://www.airlinequality.com/airline-reviews/ryanair/"
pages = 20
page_size = 100

Name = []
reviews = []
SeatType = []
Recommend = []
Date=[]
Destination = []

# Loop through each page of reviews
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')

    # Extract names
    for para in parsed_content.find_all("span", {"itemprop": "name"}):
        Name.append(para.get_text())

    # Extract reviews
    for para in parsed_content.find_all("div", {"class": "text_content"}):
        reviews.append(para.get_text())

    # Extract seat types
    for para in parsed_content.find_all("tr"):
        if para.find("td", {"class": "review-rating-header cabin_flown"}, text="Seat Type"):
            SeatType.append(para.get_text().replace("Seat Type", ""))

    # Extract recommendation scores
    for para in parsed_content.find_all("table", {"class": "review-ratings"}):
        rec = para.find('td', {'class': ['review-value rating-yes', 'review-value rating-no']})
        if rec is not None:
            Recommend.append(rec.string.strip()[0])
            
    # Extract Date of the review
    for para in parsed_content.find_all("time", {"itemprop": "datePublished"}):
        Date.append(para.get_text())
    

print(f"   ---> {len(Name)} total names")
print(f"   ---> {len(reviews)} total reviews")
print(f"   ---> {len(SeatType)} total Seat Type")
print(f"   ---> {len(Recommend)} total Recommendation")
print(f"   ---> {len(Date)} total Date")

Scraping page 1
Scraping page 2
Scraping page 3
Scraping page 4
Scraping page 5
Scraping page 6
Scraping page 7
Scraping page 8
Scraping page 9
Scraping page 10
Scraping page 11
Scraping page 12
Scraping page 13
Scraping page 14
Scraping page 15
Scraping page 16
Scraping page 17
Scraping page 18
Scraping page 19
Scraping page 20
   ---> 2000 total names
   ---> 2000 total reviews
   ---> 2000 total Seat Type
   ---> 2000 total Recommendation
   ---> 2000 total Date


In [32]:
# Create a DataFrame with the collected data
df = pd.DataFrame({
    "name": Name,
    "reviews": reviews,
    "seat_type": SeatType,
    "recommended": Recommend,
    "date_review": Date
})

# Check the shape of the DataFrame
print(f"Shape of the DataFrame: {df.shape}")

Shape of the DataFrame: (2000, 5)


In [33]:
df.head()

Unnamed: 0,name,reviews,seat_type,recommended,date_review
0,K Vernon,✅ Trip Verified | Frankfurt Hahn to Vilnius re...,Economy Class,y,24th April 2023
1,S Hamoliche,✅ Trip Verified | Very bad experience. The fl...,Economy Class,n,22nd April 2023
2,P Marsele,✅ Trip Verified | Would like to pass a feedba...,Economy Class,y,22nd April 2023
3,U Ali,✅ Trip Verified | I had thought Ryanair would...,Economy Class,n,21st April 2023
4,N Hashan,✅ Trip Verified | Got us late both ways. Also...,Economy Class,n,19th April 2023


In [34]:
df['reviews']

0       ✅ Trip Verified | Frankfurt Hahn to Vilnius re...
1       ✅ Trip Verified |  Very bad experience. The fl...
2       ✅ Trip Verified |  Would like to pass a feedba...
3       ✅ Trip Verified |  I had thought Ryanair would...
4       ✅ Trip Verified |  Got us late both ways. Also...
                              ...                        
1995    First time with Ryanair. Just me and my two yo...
1996    Tenerife to Manchester. Flight left on time an...
1997    London Stansted to Fuerteventura out on 20th N...
1998    The bad press surrounding Ryanair proved to be...
1999    We returned from Gran Canaria to Stansted on t...
Name: reviews, Length: 2000, dtype: object

In [35]:
# Split reviews on | and return the last element.
df["reviews"] = df["reviews"].apply(lambda x: x.split("|", 1)[-1] if "| " in x else x)

In [36]:
# Remove ordinal indicator from date strings
df["date_review"] = df["date_review"].apply(lambda x: re.sub(r'\b(\d+)(st|nd|rd|th)\b', r'\1', x))

# Convert date strings to datetime format
df["date_review"] = pd.to_datetime(df["date_review"], format="%d %B %Y")

In [37]:
df.head()

Unnamed: 0,name,reviews,seat_type,recommended,date_review
0,K Vernon,Frankfurt Hahn to Vilnius return. Incredible ...,Economy Class,y,2023-04-24
1,S Hamoliche,Very bad experience. The flight was delayed ...,Economy Class,n,2023-04-22
2,P Marsele,Would like to pass a feedback to Antonio and...,Economy Class,y,2023-04-22
3,U Ali,I had thought Ryanair would have improved ov...,Economy Class,n,2023-04-21
4,N Hashan,Got us late both ways. Also caused us to mis...,Economy Class,n,2023-04-19
