<a href="https://colab.research.google.com/github/zubergurjar/Airbnb/blob/main/AirBnb_Booking_Analysis_pynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - AirBnb Booking Analysis



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** Mohd Zuber Alam
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**


You might have heard of Airbnb, which is like an online marketplace where people can rent out their homes to travelers looking for a place to stay. Started in 2008 by two guys in San Francisco, Brian Chesky and Joe Gebbia, Airbnb has become a popular way for people to find unique and personal places to stay when they travel. You can use Airbnb through their website or on your phone using an app.

Now, here's the interesting part: Over the years, Airbnb has gathered a huge amount of information about the places people book, the hosts who rent their spaces, and the experiences of both guests and hosts. This information is like a treasure chest full of insights that we can use to learn more about how people use Airbnb and how it affects different things.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Have you ever thought about renting out your place on Airbnb? It's a cool way to earn money and meet new people. But there are some tricky parts. This project wants to figure out those tricky parts and help people who rent out their places (hosts) have a better time.

Here are the some Problem Statement.

1. Rules and Laws: Different places have different rules about renting homes for a short time. It's confusing for hosts to know what's allowed and what's not.
2. Dealing with guests: Sometimes guests can be noisy or cause problems. Hosts need to know how to handle these situations without getting stressed.
3. Finding the right price: Hosts want to charge a fair price for their place, but it's hard to know what's the right amount.

#### **Define Your Business Objective?**

Recommending marketing campaign strategies and predicting the destination neighbourhood which are in high demand.

1. Using Exploratory Data Analysis, find out the most demanded room type, neighbourhood_group.

2. Find the average days guests prefer to stay in single visit in different room type in varied neighbourhood_group.

3. Find out the most sought after Price bracket in which maximum booking happens and get most reviews.

4. Find the neighbourhood_group in which maximum listings done by top hosts? Specify the reason behind it with your insight.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
file_path = "/content/drive/MyDrive/Airbnb/Airbnb NYC 2019.csv"
airbnb_df = pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb_df = airbnb_df.drop_duplicates()
airbnb_df.count()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airbnb_df.isnull().sum()

In [None]:
# Visualizing the missing values
airbnb_df['name'].fillna('unknown',inplace=True)
airbnb_df['host_name'].fillna('noname',inplace=True)

airbnb_df['reviews_per_month'].fillna('0',inplace=True)

In [None]:
#so the null values are removed
airbnb_df[['host_name','name','reviews_per_month']].isnull().sum()

In [None]:
#removed "last_review" because that is not much inmpotant
airbnb_df = airbnb_df.drop(['last_review'], axis=1)

In [None]:
airbnb_df.info()

### What did you know about your dataset?

From this dataset, we can analyze various aspects of Airbnb listings, such as pricing trends, popularity of different room types, neighborhoods with high availability, and much more. It would be interesting to explore patterns in prices, review frequency, and host behavior to gain insights into the Airbnb market in New York City.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description

These variables collectively provide information about the properties listed on Airbnb, including their location, type, price, availability, and host-related details. Analyzing these variables can offer insights into pricing trends, occupancy rates, host behavior, and traveler preferences within the Airbnb ecosystem.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
airbnb_df['id'].nunique()

In [None]:
airbnb_df['name'].nunique()

In [None]:
airbnb_df['host_id'].nunique()

In [None]:
(airbnb_df['neighbourhood_group'].unique())

In [None]:
airbnb_df['neighbourhood'].nunique()

In [None]:
airbnb_df["room_type"].value_counts()

In [None]:
price_value_counts = airbnb_df["price"].value_counts()

price_value_counts.sort_index()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
airbnb_df.drop(['id','last_review'], axis=1, inplace=True) #cell is not running because i have already removed these columns

In [None]:
airbnb_df.head() #Can see cahnges after droping the columns

### What all manipulations have you done and insights you found?

1. Overall listing has distibuted in 5 neighbourhood_group which are having over 200 neighbourhoods. Its price goes upto 10k.
2. After inspection I figured out that a particular property name have one particular host_name hosted by that same individual but a particular host_name can have multiple properties in a neighbourhood_group or neighbourhood.
From the unique ids we get to know that all the property ids are different and each listings are different here.
3. By experimenting we get to know that in columns "price" and "availability_365" shows zero cost and not available throughout year respectively. Hosts not available round the justifies but zero Price doesn't.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
top_20_neigbours = airbnb_df['neighbourhood'].value_counts()[:20] #checking top 20 neighbourhoods on the basis of no of listings in entire NYC!
top_20_neigbours.plot(kind='bar',color='chocolate')
plt.xlabel('Neighbourhood')
plt.ylabel('Counts in entire NYC')
plt.title('Top neighbourhoods in entire NYC on the basis of count of listings')

##### 1. Why did you pick the specific chart?

Because This chart shows the which neighbourhood had most no of booking.

##### 2. What is/are the insight(s) found from the chart?

Williamsburg ,Bedford-Stuyvesant, Harlem, Bushwick and Upper West Side are top 5 neighbouhood which has most no of booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With the help of this data we can find target places to improve services at those and it will also help to get profit by limiting the advertisement market area so we can minimize the cost and increase the profit.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
room_type = airbnb_df['room_type'].value_counts()
plt.figure(figsize=(8, 8))
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99']  # Custom colors for each slice

room_type.plot(kind='pie', autopct='%1.1f%%', colors=colors, shadow=True, startangle=140, textprops={'fontsize': 12})
plt.title('Room Type Distribution in NYC Airbnb Listings', fontsize=16)
legend = plt.legend(room_type.index, loc='center left', bbox_to_anchor=(1, 0.5), title="Room Type")
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

Because this chart show the Distribution of Room types

##### 2. What is/are the insight(s) found from the chart?

"Entire home/apt" and "Private room" categories have hosted more guests compared to the "Shared room" category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


With the help of this visualization we can find target categories to improve services at those and it will also help to get profit by limiting the advertisement market area so we can minimize the cost and increase the profit.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
avg_price = airbnb_df.groupby(["neighbourhood_group"])["price"].mean()
plt = avg_price.plot.bar(figsize = (8,6), fontsize = 10)
plt.set_xlabel("Neighbourhood Group", fontsize = 14)
plt.set_ylabel("Average price", fontsize = 11)
plt.set_title("Average price in different Neighbourhood Groups", fontsize=12)

##### 1. Why did you pick the specific chart?

Because this chart show the Averege price in different Neighborhood Group

##### 2. What is/are the insight(s) found from the chart?

Manhattan has the highest average price among neighborhood groups, while Staten Island has the lowest.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can help identify profitable areas like Manhattan for targeted marketing and investment, potentially boosting business.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
new = airbnb_df.groupby(["room_type"])["price"].mean()
chart4 = new.plot.bar(figsize = (4,4), fontsize = 10)
chart4.set_xlabel("Room type", fontsize = 12)
chart4.set_ylabel("Average price", fontsize = 12)
chart4.set_title("Average price in different room type", fontsize = 14)



##### 1. Why did you pick the specific chart?

To get the idea of average price of room type wrt their demand/occupancy  as showed in earlier one chart.

##### 2. What is/are the insight(s) found from the chart?

This Chart shows the average mean price of Entire home/apt is higher as compared to remaining two room types. This also give us profit oriented new listing approach for all stakeholders benefits.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This Chart attracts special attention to make sure the avalability as well as best service for this room type for overall positive customer sentiments and good reviews.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(8, 6))
sns.boxplot(x='room_type', y='price', data=airbnb_df,  palette='pastel')
plt.xlabel('Room Type')
plt.ylabel('Price')
plt.title('Price Distribution by Room Type')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A box plot is chosen because it effectively shows the distribution of prices for different room types,

##### 2. What is/are the insight(s) found from the chart?


The chart reveals that "Entire home/apt" listings generally have higher prices compared to "Private room" and "Shared room" listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insight that "Entire home/apt" listings generally have higher prices may positively impact revenue generation. However, if there's a significant price disparity between room types, it could negatively affect demand for more expensive listings, potentially leading to reduced occupancy and growth limitations.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
airbnb_df.groupby('room_type')['minimum_nights'].mean().plot(figsize= (4,4), kind='bar', color='green', grid=False)
plt.title('Average Stays in different room types', fontsize = 14)
plt.xlabel('Room types', fontsize = 12)
plt.ylabel('Average Stays', fontsize = 12 )

##### 1. Why did you pick the specific chart?

I choose this chart to show the average nights stays in different room types.

##### 2. What is/are the insight(s) found from the chart?

This chart gives us information about making adequate facilities for average days and when respective room type going to available again for booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By this chart we can derive business profit by making basic calculations like no of times respective room type can be available for booking in a month and accordingly we have to decide booking price considering expenses.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize = (12,6))
sns.scatterplot(x = airbnb_df["longitude"], y = airbnb_df["latitude"], hue = airbnb_df["neighbourhood_group"])
plt.show #visualization of each neighbourhood_group using latitude and longitude

##### 1. Why did you pick the specific chart?

This chart shows the location wrt longitude and latitude of different neighbourhood groups in the city.

##### 2. What is/are the insight(s) found from the chart?

The Scatter plot shows Manhattan and Brooklyn has almost similar longitude that's why they both garner almost 85% of bookings. And Staten Island belongs to outskirts so has less bookings as well as listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart shows locations nearer to the prime hotspots can garner more bookings hence we try hard to list more number of demanding room type and increase overall revenue by attracting customer.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10, 6))
sns.set(style='whitegrid')

sns.barplot(x='neighbourhood_group', y='availability_365', data=airbnb_df, palette='pastel')

plt.xlabel('Neighborhood Group', fontsize=14)
plt.ylabel('Availability (days)', fontsize=14)
plt.title('Availability by Neighborhood Group', fontsize=16)
plt.xticks(rotation=45, fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)  # Add a horizontal grid

plt.show()

##### 1. Why did you pick the specific chart?

I recommended a bar plot because it effectively compares the average availability

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that Manhattan and Brooklyn generally has lower average availability compared to others

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that Manhattan and Brooklyn have lower average availability could potentially lead to negative growth, as it may indicate higher demand, but also the potential for lower occupancy rates due to limited availability.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.scatterplot(data=airbnb_df, x='price', y='number_of_reviews', hue='room_type')
plt.title('Price vs Number of Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

The Scatter plot shows the relationshiop between price and no of reviews.

##### 2. What is/are the insight(s) found from the chart?

The Scatter plot shows negative relation between Price and no of reviews. Usually cheaper rooms has more occupancy hence more reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Price vs Reviews shows booking volume for resonable Booking Price hence revenue generation Price point range.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
fig = plt.subplots(figsize=(6, 6))

sns.countplot(data=airbnb_df[airbnb_df['availability_365']  == 365], x='neighbourhood_group', hue='room_type', palette='GnBu_d')
plt.title('No. of Properties Available 365 days', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

This chart will give us idea about availability of room_type in respective neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

This subplot gives us clear picture i.e. "Trend" of room_type available most during a year. e.g. Private room has most availability round the year except in Manhattan and least availability in Shared room irrespective of Neighbourhood group.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The above Subplot shows us which room type in which neighbourhood group has most as well as least availability throughout the year. Hence this data can be for listing and delisting demanded and least asked room type resp.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
top_hosts = airbnb_df.groupby('host_name')['calculated_host_listings_count'].max().nlargest(5).reset_index()
plt.figure(figsize=(10, 6))
sns.set(style='whitegrid')
sns.barplot(x='calculated_host_listings_count', y='host_name', data=top_hosts, palette='pastel')
plt.xlabel('Listing Count', fontsize=14)
plt.ylabel('Host Name', fontsize=14)
plt.title('Top Hosts with the Most Listings', fontsize=16)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.show()

##### 1. Why did you pick the specific chart?


I recommended a bar chart because it is suitable for comparing the listing counts of the top hosts and visually identifying the hosts with the most listings.

##### 2. What is/are the insight(s) found from the chart?

Sounder (NYC) and Blue ground are the two top host with the most listing

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that "Sounder (NYC)" and "Blueground" are the top hosts with the most listings could potentially have a positive business impact by recognizing successful host partnerships and encouraging similar host collaborations

#### Chart - 12

In [None]:
# Chart - 12 visualization code
popular_neighborhoods = airbnb_df['neighbourhood'].value_counts().nlargest(10)


plt.figure(figsize=(10, 6))
sns.barplot(x=popular_neighborhoods.values, y=popular_neighborhoods.index)
plt.xlabel('Number of Listings', fontsize=14)
plt.ylabel('Neighborhood', fontsize=14)
plt.title('Top 10 Popular Neighborhoods by Number of Listings', fontsize=16)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar chart because it effectively compares the popularity of neighborhoods by listing count and visualizes the average prices by neighborhood

##### 2. What is/are the insight(s) found from the chart?

According to the chart, "Williamsburg" and "Bedford-Stuyvesant" are the two most popular neighborhoods by the number of listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that "Williamsburg" and "Bedford-Stuyvesant" are the most popular neighborhoods can positively impact business by targeting marketing efforts. However, overconcentration in these areas may lead to negative growth due to potential oversaturation and reduced demand elsewhere.

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Marketing Focus: Focus marketing campaigns and advertisements on the top 20 neighborhoods with high booking numbers to maximize bookings. This targeted approach can reduce marketing costs and improve campaign effectiveness.

2. Room Type Preferences: Recognize that most guests prefer entire home/apartment and private room listings over shared rooms. Tailor listings and marketing efforts to highlight these preferred room types.

3. High-Demand Neighborhoods: Concentrate marketing efforts on Manhattan and Brooklyn, as they are the most demanded neighborhood groups among guests.

4. Average Stay Duration: Acknowledge that the average stays in entire home/apartment, shared room, and private room listings are approximately 8, 6, and 5 days, respectively. Use this information to set booking expectations and optimize pricing strategies.

5. Average Monthly Booking Rate: Calculate and monitor the average number of bookings for each room type per month to assess booking performance and adjust marketing efforts accordingly.

6. Optimal Price Range: Recognize that the majority of bookings fall within the price range of $10-400, where approximately 95% of bookings occur. This is also where most reviews are generated. Ensure competitive pricing within this range to attract more guests.

7. Reviews and Host Engagement: Given that private room and entire home/apartment listings receive the most reviews, encourage hosts to maintain high-quality listings and hospitality in these categories. Consider incentives or programs to promote quality and garner more reviews.

8. Host Commissions: Recognize that hosts are inclined toward Manhattan and Brooklyn due to higher price realization and more bookings. Leverage this by negotiating higher commissions with hosts in these high-demand areas to increase profits.

# **Conclusion**

From the above Exploratory Data Analysis of Airbnb Dataset we can conclude that:



*   Manhattan and Brooklyn are the two distinguished, expensive & posh areas of NY. Though location of property has high effect on deciding price, but a property in popular location doesn't mean it will stay occupied in most of the time.
*   The people who prefer to stay in Entire home or Apartment they are going to stay bit longer and same is the most booked room type in the Neighbourhood group.

*   The findings from an exploratory data analysis project on Airbnb can help both hosts and guests make more informed decisions. Hosts can learn more about what amenities guests are looking for and how to price their property competitively. On the other hand, guests can follow some parameteres to make decisions about the location, amenities, and price of properties they want to book.
*   The given Airbnb Dataset has vast data but lacks in some required features because it is not easy to decide property valuation. Overall, conducting an exploratory data analysis project on Airbnb can provide valuable insights into the dynamics of the short-term rental market and enhance the user experience for both hosts and guests.












### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***