<a href="https://colab.research.google.com/github/vinay2102/Airbnb-Analysis/blob/main/Airbnb-Analysis(Almabetter).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Airbnb-Analysis-Capstone-Project



##### **Project Type**    - EDA
##### **Contribution**    - Team
##### **Team Member 1 -** Vinay
##### **Team Member 2 -** Rohan

# **Project Summary -**

Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.

This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values.

Explore and analyze the data to discover key understandings (not limited to these) such as :

What can we learn about different hosts and areas?

What can we learn from predictions? (ex: locations, prices, reviews, etc)

Which hosts are the busiest and why?

Is there any noticeable difference of traffic among different areas and what could be the reason for it?

# **GitHub Link -**

# **Problem Statement**


Let us assume that my dad is a wealthy individual and is looking to make some investments in real estate. He has heard from his colleagues that the New York City real estate market has been thriving and makes good enough ROI(return on investment). Being in India, my dad is quite skeptical about investing in New York and leaving the property abandoned, He also wants some cash flow to be generated out of the property every now and then. To get out of this dilemma my dad calls his old friend, Mr Robert Kiyosaki (author of the book Rich Dad Poor Dad), he advices my dad to buy the property and list it on some rental services like "Airbnb". My dad after listening to his friend comes and asks me "Son, what is Airbnb?"

Defining Airbnb - Airbnb is an online marketplace connecting travelers with local hosts. On one side, the platform enables people to list their available space and earn extra income in the form of rent. On the other, Airbnb enables travelers to book unique homestays from local hosts, saving them money and giving them a chance to interact with locals. Catering to the on-demand travel industry, Airbnb is present in over 190 countries across the world.

After this professional explanation to my dad, he is quite happy about the opportunity his friend has opened up for him. Now he wants answers to his questions from the business point of view. He asks to me find out a few unique details about the New York Airbnb market, for example:- Which part of New York generates the most revenue, etc. Now, me being a data scientist I decide to tackle this problem through the art of Data Science and make my Indian parent proud(Probably the most hypothetical thing I've mentioned till now).


#### **Define Your Business Objective?**

Our business objective is to leverage data-driven insights from the Airbnb dataset to optimize rental property performance and enhance guest satisfaction. By analyzing key factors such as location, property type, pricing, and customer reviews, we aim to identify trends and patterns that can inform strategic decision-making. Our goal is to maximize occupancy rates, increase rental revenues, and improve the overall guest experience. Through targeted marketing efforts and operational enhancements, we seek to position our properties competitively within the market and achieve sustainable growth.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.rcParams['figure.figsize'] = (10, 7)

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
bnb_df = pd.read_csv('/content/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
#Checking the first 5 values of the dataframe

bnb_df.head(5)

In [None]:
#Checking the last 5 values of the dataframe

bnb_df.tail(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

bnb_df.shape

### Dataset Information

In [None]:
# Dataset Info

bnb_df.info()

In [None]:
#Checking the type of rooms
bnb_df.room_type.value_counts()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

bnb_df.duplicated().sum()

We can see that there are no duplicate observations, so we can move ahead.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

bnb_df.isnull().sum()

In [None]:
# Deleting the observations with null values.

bnb_df.dropna(inplace=True)

### What did you know about your dataset?

Answer Here

We can see that we have 16 columns and 48,895 observations. To better understand the dataset let's see what each column means.

id : A unique id given to each airbnb lisitng.

name : The Ad title for the listing on Airbnb website.

host_id : A unique id given to an Airbnb host.

host_name : The name with which the host is registered.

neighbourhood_group : A group of areas/neighbourhoods.

neighbourhood : Name of a particular area/ neighbourhood.

latitude : latitudinal coordinate of the listing.

longitude : longitudinal coordinate of listing.

room_type : listing type(1 of 3 types) - 1.Entire Home/apartment, 2.Private room, 3.Shared room.

price : price of the listing.

minimum_nights : Minimum number of nights required to stay in a single visit.

number_of_reviews : The total number of reviews given by visitors.

last_review : date of the last recorded review.

reviews_per_month : The number of reviews given per month for a listing.

calculated_host_listings_count : the total number of listings registered under a given host.

availability_365 : the number of days for which a listing is available in a year.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

bnb_df.columns

In [None]:
# Dataset Describe
bnb_df.describe()

Here we can see that there are price values which are 0. This doesn't make sense because I don't think so people would put up listing for free. We will impute values for these wrong observations.

### Variables Description

id: The unique identifier for each Airbnb listing.

name: The name or title of the listing.

host_id: The unique identifier for the host of the listing.

host_name: The name of the host.

neighbourhood_group: The larger administrative area or region within which the listing is situated.

neighbourhood: The specific neighborhood or district where the listing is located.

latitude: The latitude coordinates of the listing's location.

longitude: The longitude coordinates of the listing's location.

room_type: The type of accommodation offered (e.g., entire home/
apartment, private room, shared room).

price: The nightly price for renting the listing.

minimum_nights: The minimum number of nights required for booking the listing.

number_of_reviews: The total number of reviews the listing has received.

last_review: The date of the most recent review.

reviews_per_month: The average number of reviews per month.

calculated_host_listings_count: The number of listings the host has.

availability_365: The number of days the listing is available for booking within the next 365 days.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

bnb_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

Now I will be removing a few columns (last_review, reviews_per_month, latitude and longitude), because I don't see them adding any value to the questions I have to answer.

In [None]:
#Removing unnecessary columns

bnb_df.drop(['latitude','longitude','last_review','reviews_per_month'],axis=1,inplace=True)

In [None]:
#Getting values where price is 0

bnb_df[bnb_df['price']==0]

In [None]:
bnb_df[bnb_df['price']==0].shape

So there are 10 observations with price as 0 which need to be treated.
We will achieve this by imputing values based on minimum number of nights and their average price.

In [None]:
# performing groupby to find average price for different no. minimum night

bnb_df.groupby('minimum_nights')['price'].mean().reset_index()

In [None]:
#function for imputing average value of price wherever price is 0
def price_inputer(min_nights_list,bnb_df):
  for i in min_nights_list:
    avg_val = bnb_df[bnb_df['minimum_nights']==i].groupby('minimum_nights')['price'].mean().reset_index().loc[0][1]
    bnb_df['price']=np.where((bnb_df['price']==0)&(bnb_df['minimum_nights']==i),avg_val,bnb_df['price'])

In [None]:
#Calling function to impute price values
min_nights_list = [1,2,3,4,5,30]
price_inputer(min_nights_list,bnb_df)

In [None]:
bnb_df[bnb_df['price']==0].shape

We can see that there are no more observations that have the value for price as 0.

In [None]:
#Removing Unimportant Columns

main_cols = set(bnb_df.columns)-{'id','host_id'}

In [None]:
#Taking Columns with Numerical Values

num_main_cols = bnb_df[main_cols].describe().columns.tolist()
# num_main_cols

### What all manipulations have you done and insights you found?

1. Understanding the relevance of specific columns: By removing irrelevant columns like id, host_id, latitude, and longitude, i've focused my analysis on variables that are more pertinent to my business objectives, such as property characteristics, location, pricing, and availability.


2. Addressing missing or invalid data: Handling zero prices by replacing them with the average price provides a pragmatic approach to dealing with incomplete or erroneous data. This ensures that my analysis can proceed smoothly without encountering issues related to missing values in the price column.



Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1
Plotting important visualizations to see the skewness and distribution of each variable.

In [None]:
# Chart - 1 visualization code

fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(18, 10))
axes = axes.flatten()
for col, ax in zip(num_main_cols, axes):
    sns.histplot(x=col, data=bnb_df, ax=ax, kde=True, element='poly')
    ax.set_title(f'Column {col} skewness : {bnb_df[col].skew()}')

plt.tight_layout(h_pad=1, w_pad=0.8)

##### 1. Why did you pick the specific chart?


I chose subplots to plot important visualizations because they allow for simultaneous examination of the skewness and distribution of each variable within a single, compact layout. This efficient approach facilitates quick comparison and identification of patterns across multiple variables, enhancing the depth of analysis

##### 2. What is/are the insight(s) found from the chart?

 Here we can see that all our variables have a positive skewness except availability which is moreover a uniform distribution, this tells us that all kinds of listings are moreover available throughout the year.(We can also see a mild to high rise in count of availability from the end of the year to the start of the year)

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Yes, the insights gained from subplots can create a positive business impact by informing strategic decisions. For example, understanding the distribution of prices and occupancy rates can help optimize pricing strategies and target high-demand periods, leading to increased revenue and customer satisfaction.

#### Chart - 2
Which host are the Buisiest and why?


In [None]:
#creating a new busiest hosts dataframe with groupby operations on columns that I think cause a host to be busy.

busiest_hosts = bnb_df.groupby(['host_id','host_name','room_type','calculated_host_listings_count'])[['price','number_of_reviews','availability_365']].max().reset_index().sort_values(by=['number_of_reviews','price','availability_365'], ascending=False).head()

In [None]:
#Outlook

busiest_hosts

In [None]:
check_list=['price','number_of_reviews','availability_365']

In [None]:
#Plotting the visualizations for Question 1
fig,axes=plt.subplots(nrows=1,ncols=3,figsize=(15,6))
axes.flatten()
for col,ax in zip(check_list,axes):
  splot= sns.barplot(data=busiest_hosts,x='host_name',y=col,ax=ax,hue="host_name")
  for p in splot.patches:
    splot.annotate(format(p.get_height(), '.1f'),
                   (p.get_x() + p.get_width() / 2., p.get_height()),
                   ha = 'center', va = 'center',
                   xytext = (0, 9),
                   textcoords = 'offset points')
  ax.set_title(f'host name vs {col}')

##### 1. Why did you pick the specific chart?

These plots show us that the number of reviews is directly proportional to the popularity of the host. Factors like price and availability also effect the popularity, but the best linear relationship is provided by the number of reviews.

##### 2. What is/are the insight(s) found from the chart?

This analysis shows us that the busiest hosts are(in order):
1. Dona
2. JJ
3. Maya
4. Carol
5. Danielle

All of them have a good mixture of price(around 60 dollars), availability(around half a year) and number of reviews(around 550) which makes them popular.

Another important thing to note is that all of these hosts have private rooms as listings, this could be a very important thing to know while investing money for my dad.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This knowledge can inform business strategies such as pricing optimization, marketing targeting, and investment decisions, ultimately leading to improved performance and profitability in the rental market.

#### Chart - 3
For which location do the customers pay the highest rent.

In [None]:
#Obtaining a dataframe that shows us the maximum price for each location.(Top 30 with highest price)
max_price_per_location = bnb_df.groupby('neighbourhood')['price'].max().reset_index().sort_values(['price'],ascending=False).rename(columns={'price':'max_price'}).head(30)
plt.figure(figsize=(10,7))
ax = sns.barplot(data=max_price_per_location,x='neighbourhood',y='max_price',hue="neighbourhood")
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.title("Maximum price for each locations")
plt.tight_layout()
plt.show()

In [None]:
#Printing the top 5 neighbourhoods with highest rent
print("The neighbourhoods with the highest rents are:")
for i in range(5):
  print(i+1,'.', max_price_per_location.iloc[i]['neighbourhood'],'with a price of:',max_price_per_location.iloc[i]['max_price'])

##### 1. Why did you pick the specific chart?

A bar plot allows for easy identification of the top 30 locations with the highest prices, making it suitable for presenting this specific type of data in a visually appealing and interpretable format.

##### 2. What is/are the insight(s) found from the chart?

Here we have the highest rent neighbourhoods.
It depends on the type of the house or the part of the area it belongs to

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If rental prices are artificially inflated or exclusionary, it may lead to negative growth by fostering market instability and socio-economic disparities.

#### Chart - 4
For which location do the customers pay the lowest rent.

In [None]:
# Chart - 4 visualization code

#Obtaining a dataframe that shows us the minimum price for each location.(Top 30 with lowest price)
min_price_per_location = bnb_df.groupby('neighbourhood')['price'].min().reset_index().sort_values(['price'],ascending=True).rename(columns={'price':'min_price'}).head(30)
plt.figure(figsize=(10,7))
ax = sns.barplot(data=min_price_per_location,x='neighbourhood',y='min_price',hue="neighbourhood")
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.title("Minimum price for each locations")
plt.tight_layout()
plt.show()

In [None]:
#Printing the top 5 neighbourhoods with lowest rent
print("The neighbourhoods with the lowest rents are:")
for i in range(5):
  print(i+1,'.', min_price_per_location.iloc[i]['neighbourhood'],'with a price of:',min_price_per_location.iloc[i]['min_price'])

##### 1. Why did you pick the specific chart?

A bar plot allows for easy identification of the least 30 locations with the lowest prices, making it suitable for presenting this specific type of data in a visually appealing and interpretable format.

##### 2. What is/are the insight(s) found from the chart?

Here we have the lowest rent neighbourhoods. There is a distinct feature observed here, Upper west side and Greenpoint are present in both the lists, this shows that there is high fluctuation of prices in these areas(Maybe it depends on the type of the house or the part of the area it belongs to).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

They provide valuable information for renters seeking affordable options and for investors looking for areas with potential for high rental yields. However, it's essential to consider the overall market dynamics, as excessively low rents may indicate underlying issues such as low demand or poor property conditions, which could hinder positive business outcomes in the long term.

#### Chart - 5
Top 5 highest listing areas/locations.

In [None]:
# Chart - 5 visualization code
highest_listing_areas = bnb_df.groupby('neighbourhood')['calculated_host_listings_count'].sum().reset_index().sort_values(by='calculated_host_listings_count',ascending=False).head(10)


In [None]:
highest_listing_areas

In [None]:
#Creating a visualization for loctions with highest listing
ax = sns.barplot(data=highest_listing_areas,x='neighbourhood',y='calculated_host_listings_count',hue="neighbourhood")
for a in ax.patches:
    ax.annotate(format(a.get_height(), '.1f'),
                   (a.get_x() + a.get_width() / 2., a.get_height()),
                   ha = 'center', va = 'center',
                   xytext = (0, 9),
                   textcoords = 'offset points')
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()

In [None]:
#Printing the top 5 highest listing areas
print("The top 5 highest listed areas are:")
for i in range(5):
  print(i+1,'.', highest_listing_areas.iloc[i]['neighbourhood'],'with ',highest_listing_areas.iloc[i]['calculated_host_listings_count'],' number of listings')

##### 1. Why did you pick the specific chart?

A bar plot allows for easy identification of the top locations with the highest listing counts, making it suitable for presenting this specific type of data in a visually intuitive format.

##### 2. What is/are the insight(s) found from the chart?

Therefore with some straight forward groupby operations we easily have the top 5 areas with the highest number of listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This data can inform various business decisions such as property investment, marketing strategies, and target audience analysis. For example, investors can identify areas with high demand for rental properties, while property managers can tailor their marketing efforts to capitalize on the popularity of these areas. Additionally, understanding which neighborhoods have the highest listing counts can help businesses allocate resources effectively and optimize their operational strategies for maximum profitability.

#### Chart - 6  
Lowest 5 listing areas/locations.

In [None]:
# Chart - 6 visualization code
lowest_listing_areas = bnb_df.groupby('neighbourhood')['calculated_host_listings_count'].sum().reset_index().sort_values(by='calculated_host_listings_count',ascending=True).head(10)


In [None]:
lowest_listing_areas

In [None]:
ax = sns.barplot(data=lowest_listing_areas,x='neighbourhood',y='calculated_host_listings_count',hue="neighbourhood")
for a in ax.patches:
    ax.annotate(format(a.get_height(), '.1f'),
                   (a.get_x() + a.get_width() / 2., a.get_height()),
                   ha = 'center', va = 'center',
                   xytext = (0, 9),
                   textcoords = 'offset points')
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()

In [None]:
#Printing the top 5 highest listing areas
print("The top 5 lowest listed areas are:")
for i in range(5):
  print(i+1,'.', lowest_listing_areas.iloc[i]['neighbourhood'],'with ',lowest_listing_areas.iloc[i]['calculated_host_listings_count'],' number of listings')

##### 1. Why did you pick the specific chart?

A bar plot allows for easy identification of the lowest locations with the lowest listing counts, making it suitable for presenting this specific type of data in a visually intuitive format.

##### 2. What is/are the insight(s) found from the chart?

Therefore with some straight forward groupby operations we easily have the last 5 areas with the lowest number of listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding these areas with low listing counts allows businesses to explore niche markets, develop targeted marketing strategies, and potentially capture market share in less saturated areas. However, it's essential to conduct further research to assess the demand, feasibility, and potential challenges associated with operating in these areas before making strategic decisions.

#### Chart - 7  
What is the average preferred price by the customer for each neighbourhood group and for each room type.

In [None]:
#Creating a dataframe which shows the average price for different neighbourhood groups
avg_pref_price = bnb_df.groupby(['neighbourhood_group','room_type'])['price'].mean().reset_index()

In [None]:
#Getting a glimpse of what the dataframe with average prices for different neighbourhood groups looks like
avg_pref_price

In [None]:
#creating a visualisation for prices of each neighbourhood group with different room types
splot=sns.barplot(data=avg_pref_price,x='neighbourhood_group',y='price',hue='room_type',palette='rainbow')
for p in splot.patches:
    splot.annotate(format(p.get_height(), '.1f'),
                   (p.get_x() + p.get_width() / 2., p.get_height()),
                   ha = 'center', va = 'center',
                   xytext = (0, 9),
                   textcoords = 'offset points')
plt.xlabel("neighbourhood_group", size=14)
plt.ylabel("price", size=14)

In [None]:
#Creating a dataframe for printing values
print_df = bnb_df.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack().reset_index()
#Printing the average prices for each neighbourhood group and for different room types
print('The average prices are:')
for i in range(len(print_df)):
  print(print_df.iloc[i]['neighbourhood_group'],':\n','1. Entire home/apt-',print_df.iloc[i]['Entire home/apt'],
        '\n','2. Private room-',print_df.iloc[i]['Private room'],'\n','3. Shared room-',print_df.iloc[i]['Shared room'])

##### 1. Why did you pick the specific chart?

A bar plot to visualize prices of each neighborhood group with different room types because it effectively displays the comparison of prices across multiple categories in a clear and concise manner.

##### 2. What is/are the insight(s) found from the chart?

The visualization precisely shows us the average price preferred by customers for each neighbourhood group(for different room types). For a written approach we can make use of the unstack function and some calculated print statement settings as done in the previous cell.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Additionally, this insight can inform decisions related to property investment, marketing targeting, and overall business planning, ultimately leading to improved customer satisfaction and business performance.

#### Chart - 8
What is the percentage of the total number of nights spent for each neighbourhood group.

In [None]:
# Chart - 8 visualization code

#Creating a dataframe that represents the total number of nights spent for each neighbourhood group
total_nights_per_neighbourhood = bnb_df.groupby('neighbourhood_group')['minimum_nights'].sum().reset_index().sort_values(by='minimum_nights',ascending=False)

In [None]:
total_nights_per_neighbourhood

In [None]:
#Initial preparations for plotting pie chart with percentages
neighbourhood_groups_list = list(total_nights_per_neighbourhood['neighbourhood_group'])
total_minimum_nights_list = list(total_nights_per_neighbourhood['minimum_nights'])
palette_color = sns.color_palette('Set2')
explode = (0.025,0.025,0.025,0.025,0.025)

In [None]:
#Creating the pie chart visualisation
plt.figure(figsize=(9,7))
plt.pie(total_minimum_nights_list,labels=neighbourhood_groups_list,colors=palette_color,autopct='%0.01f%%')
plt.title("Percentage of total number of nights spent for each location")
plt.show()

##### 1. Why did you pick the specific chart?

 It effectively illustrates the distribution of nights spent across different neighborhood groups in a visually appealing and easy-to-understand format. A pie chart allows for quick comparison of proportions and can highlight any disparities or trends in the distribution of nights spent among neighborhood groups.

##### 2. What is/are the insight(s) found from the chart?

<b>From the visualization we can see that the total number of nights spent per location is given by:</b>

Manhattan-53.9%

Brooklyn-35.5%

Queens-8.6%

Bronx-1.5%

Staten Island-0.5%


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 By understanding the popularity of different neighborhoods among guests, businesses can make informed decisions regarding property investment, marketing strategies, and service offerings. This knowledge enables businesses to better cater to customer preferences, optimize resource allocation, and ultimately enhance customer satisfaction and profitability.

#### Chart - 9
what is the maximum number of review per neighborhood group

In [None]:
# Chart - 9 visualization code

max_review_per_location = bnb_df.groupby('neighbourhood_group').max().reset_index().sort_values(['number_of_reviews'],ascending=False)
plt.figure(figsize=(10,7))
ax = sns.barplot(data=max_review_per_location,x='neighbourhood_group',y='number_of_reviews',hue="neighbourhood_group")
plt.title("Maximum Review for each locations")
plt.tight_layout()
plt.show()

In [None]:
#Printing the top 5 neighbourhoods with highest rent
print("The neighbourhoods with the maximum reviews are:")
for i in range(5):
  print(i+1,'.', max_review_per_location.iloc[i]['neighbourhood_group'],'with a Reviews of:',max_review_per_location.iloc[i]['number_of_reviews'])

##### 1. Why did you pick the specific chart?

It effectively displays the comparison of review counts across different neighborhood groups in a clear and concise manner.

##### 2. What is/are the insight(s) found from the chart?

The insights reveal the top 5 neighborhoods with the highest number of reviews:

Queens with 629 reviews.

Manhattan with 607 reviews.

Brooklyn with 488 reviews.

Staten Island with 333 reviews.

Bronx with 321 reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Understanding which neighborhoods receive the most reviews indicates areas of high demand and customer satisfaction.

#### Chart - 10
To check is there any relationship between the availability of listings and their prices?

In [None]:
# Chart - 10 visualization code

plt.figure(figsize=(10, 6))
plt.scatter(bnb_df['availability_365'], bnb_df['price'], alpha=0.5, color='blue')
plt.title('Relationship between Availability and Price')
plt.xlabel('Availability (in days)')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a scatter plot because it's an effective way to visualize the relationship between two continuous variables, in this case, the availability of listings and their prices.

##### 2. What is/are the insight(s) found from the chart?


The scatter plot reveals no discernible relationship between Airbnb listing availability and prices. Price fluctuations occur across various availability levels, suggesting diverse factors influence pricing. Outliers hint at unique pricing strategies, while clusters may denote market segments. Understanding these patterns aids in optimizing pricing strategies and meeting diverse customer demands.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By recognizing the lack of a clear relationship between Airbnb listing availability and prices, businesses can adopt more nuanced pricing strategies that consider various factors beyond just availability. Understanding the factors influencing pricing fluctuations, identifying outliers with unique pricing strategies, and recognizing market segments based on price clusters are crucial steps in optimizing pricing strategies to meet diverse customer demands.

#### Chart - 11
Taking the top 10 hosts with the highest average price


In [None]:
# Chart - 11 visualization code

average_price_by_host = bnb_df.groupby('host_name')['price'].mean()

sorted_hosts = average_price_by_host.sort_values(ascending=False)

# Taking the top 10 hosts with the highest average price
top_10_hosts = sorted_hosts.head(10)

# Plotting the bar plot
plt.figure(figsize=(10,6))
top_10_hosts.plot(kind='bar', color='skyblue')
plt.title('Top 10 Hosts with Highest Average Price')
plt.xlabel('Host ID')
plt.ylabel('Average Price')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

 A bar plot, effectively highlights the differences in average prices among the top hosts.

##### 2. What is/are the insight(s) found from the chart?

Olson

Rum

Jay and Liz

Sarah

Sarah (2 hosts)

Katherine

Luxury Property

Viberlyn

Ilo and Richard

Shah

These hosts are listed in descending order based on the highest average price of their listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding which hosts command higher average prices provides valuable information for businesses to analyze pricing strategies, market positioning, and customer preferences. This insight can help businesses tailor their marketing efforts, optimize pricing strategies, and identify opportunities to enhance profitability and competitiveness in the market.

#### Chart - 12
Correlation between Average Availability and Average Number of Reviews

In [None]:
# Chart - 12 visualization code
average_availability_reviews = bnb_df.groupby('host_id').agg({'availability_365': 'mean', 'number_of_reviews': 'mean'})

# Plotting the scatter plot
plt.figure(figsize=(10,6))
plt.scatter(average_availability_reviews['availability_365'], average_availability_reviews['number_of_reviews'], alpha=0.5)
plt.title('Correlation between Average Availability and Average Number of Reviews')
plt.xlabel('Average Availability (in days)')
plt.ylabel('Average Number of Reviews')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This scatter plot will visualize the correlation between the average availability and the average number of reviews for each host.

##### 2. What is/are the insight(s) found from the chart?

 Each point on the plot represents a host, with the x-coordinate indicating the average availability and the y-coordinate representing the average number of reviews. Adjust the parameters as needed to suit your visualization preferences.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By analyzing this relationship through the scatter plot, businesses can identify patterns that may influence pricing strategies and customer satisfaction levels. This insight can aid in optimizing listing availability to maximize occupancy rates while maintaining positive review ratings, ultimately contributing to a positive business impact.

#### Chart - 13
What is the distribution of room types among Airbnb listings?

In [None]:
# Chart - 13 visualization code
room_type_distribution = bnb_df['room_type'].value_counts()

# Plotting the distribution as a pie chart
plt.figure(figsize=(8, 6))
plt.pie(room_type_distribution, labels=room_type_distribution.index, autopct='%0.01f%%', startangle=140)
plt.title('Distribution of Room Types Among Airbnb Listings')
plt.show()

##### 1. Why did you pick the specific chart?

It effectively illustrates the proportion of each room type in the dataset in a visually intuitive manner.

##### 2. What is/are the insight(s) found from the chart?

Entire room: 52.3%

Private room: 45.5%

Shared room: 2.2%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding the proportion of each room type allows businesses to tailor their marketing strategies, pricing models, and property management approaches to better meet customer preferences and market demand.

#### Chart - 14
Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

corr_matrix = bnb_df.corr()
sns.heatmap(corr_matrix,annot=True)
plt.show()

##### 1. Why did you pick the specific chart?

A heatmap to visualize the correlation matrix between different variables because it provides a clear and concise representation of the relationships among multiple variables. Heatmaps use color gradients to indicate the strength and direction of correlations, making it easy to identify patterns and associations in the data.

##### 2. What is/are the insight(s) found from the chart?

Strong positive correlations (values close to 1) indicate that as one variable increases, the other tends to increase as well.

Strong negative correlations (values close to -1) indicate that as one variable increases, the other tends to decrease.

Weak correlations (values close to 0) suggest little to no linear relationship between variables.

#### Chart - 15
Pair Plot

In [None]:
# Pair Plot visualization code

sns.pairplot(data=bnb_df)

##### 1. Why did you pick the specific chart?

A pair plot allows for a quick examination of pairwise relationships, making it easier to identify patterns, correlations, and potential outliers.

##### 2. What is/are the insight(s) found from the chart?

Correlation: Identifying the strength and direction of correlations between variables. Positive correlations suggest that as one variable increases, the other tends to increase as well, while negative correlations suggest the opposite.

Distribution: Observing the distribution of individual variables and identifying any deviations from normality or potential outliers.

Trends: Detecting any linear or nonlinear trends between variables, which can provide insights into potential relationships or patterns in the data.

Clusters: Identifying clusters or groups of observations that may indicate distinct subpopulations within the dataset.

Interactions: Examining interactions between variables, particularly in the context of predictive modeling or understanding causal relationships.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.


**Based on the insights obtained, here are some recommendations for the client to achieve their business objectives:**

Pricing Optimization: Utilize the positive skewness of most variables to refine pricing strategies. Consider factors such as location, room type, and availability to set competitive prices and maximize revenue.

Host Engagement: Encourage hosts like Dona, JJ, Maya, Carol, and Danielle, who have a high number of reviews and a good mixture of price and availability, to maintain their quality standards and potentially offer incentives for continued excellence.

Investment Decisions: Explore opportunities in neighborhoods with high rental prices like Astoria, Greenpoint, and Tribeca, while considering areas with lower rents for cost-effective investments. Focus on neighborhoods with high demand for listings, such as those with the highest number of reviews.

Market Segmentation: Analyze the distribution of room types among listings and tailor marketing strategies to target customers based on their preferences. Adjust offerings to meet the demand for different accommodation types in each neighborhood group.

Customer Satisfaction: Continue to prioritize customer satisfaction by maintaining high-quality listings and responsive host services. Monitor trends in the total number of nights spent per location to gauge demand and adapt offerings accordingly.

Continuous Improvement: Regularly review and analyze data, including correlations between variables and feedback from guests, to identify areas for improvement and innovation. Implement strategies to address any observed patterns or trends in the data.

Overall, by leveraging these insights and recommendations, the client can optimize operations, enhance customer satisfaction, and maximize profitability in the Airbnb rental market.

# **Conclusion**

In summary, the knowledge acquired from the thorough examination of the Airbnb dataset offers insightful advice for successfully accomplishing company goals. The pricing variables' observed positive skewness points to potential areas for revenue-maximizing pricing strategy optimisation. Furthermore, knowing which neighbourhoods have the highest and lowest rents as well as which hosts are the busiest can provide important information for marketing and investment decisions.


Comprehending the allocation of room categories across listings facilitates focused marketing endeavours that cater to a wide range of consumer inclinations. Moreover, market demand and customer satisfaction data help to sustain responsive host services and premium listings.

Through consistent trend monitoring and utilisation of data-driven insights, enterprises can adjust to evolving market conditions and foster ongoing enhancement. In the end, companies can improve customer satisfaction, boost profitability, and keep a competitive edge in the Airbnb rental market by putting these insights into practice.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***