# **Project Name**    - AirBnb Bookings Analysis



 **Project Type**    - EDA
##### **Contribution**    - Team
##### **Team Member 1 -** Yamini Mehendwariya
##### **Team Member 2 -** Priyabrata Biswal


# **Project Summary -**


AirBnb is an online marketplace where people can rent out their homes or apartments to other who are looking for a place to stay. Analyzing AirBnb bookings can provide valuable insights for property owners, renters, and tourist alike. In this analysis, Python is used to explore and visualize AirBnb booking data.

The analysis begins with importing the necessary Python libraries, such as Pandas and Matplotlib, and loading the AirBnb dataset. The dataset is then cleaned, with missing values and duplicates removed, and the data is preprocessed by converting date-time data into a more usable format.

Next, the data is explore through various visualizations, such as scatterplots and heatmaps, to identify trends and patterns in the data. The analysis also includes the use of regression models to predict the price of a booking based on features such as location, property type, and number of guests.

Finally, the result of the analysis are summarized, highlighting key findings and insights gained from the data. these insights can be used by property owners to optimize their listings and pricing strategies, renters to find the best deals and tourists to plan their trips more effectively.

Overall, the AirBnb bookings analysis using Python is a powerful tool for understanding the dynamics of the online marketplace and for making data-driven decisions that can benefit all parties involved.

# **GitHub Link -**


https://github.com/yamini1998m

https://github.com/prbrtbiswal

# **Problem Statement**


**Write Problem Statement Here.**

1.What is the distribution of minimum nights required for bookings?

2.how does the price of listings vary across the neighbourhood?

3.what is the minimum and maximum price of each type of list?

4.what is the distribution of the number of reviews for each listing?

5.How does the distribution of prices vary across different neighbourhood and room types?

6.in which quarter of time the booking got higher than usual in which part of location.

7.In which Neighbourhood how many hosts are present. 

#### **Define Your Business Objective?**

Since 2008, guests and hosts have used AirBnb to expand on travelling possibilities and present a more unique , personalised way of experiencing the world. today, AirBnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through AirBnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analysed and used for security, business decisions, understanding of customers' (guests) and providers' (hosts) behaviour and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more. This dataset has around 49,000 observations in it with 16 columns and it is a mix of categorical and numeric values. Explore and analyse the data to discover key understandings.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# There are different type of libraries which we are going to use in our dataset to simplify and analysis of this dataset.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns



### Dataset Loading

In [None]:
# Load Dataset
# In this cell we mounted our drive.
from google.colab import drive
drive.mount('/content/drive')


In [None]:
# Here we read our given dataset csv file in our google colab from drive.
df=pd.read_csv('/content/drive/MyDrive/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
# Dataset First Look
# In the first look there are top 5 rows will be shown .
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

a=df.isnull().sum()
a


In [None]:
'''Now we are going to index slicing of 'a' dataframe
which contains the count of nullvalues of each columns through iloc[].
one is 'b' which contains index number from 0 to 7 and
another is 'c' which contains index number from 8 to 16.'''

a=df.isnull().sum()
b=a.iloc[0:8]
c=a.iloc[8:]
print(b,c)


In [None]:
# Visualizing the missing values
# In this we are visualizing the missing values with respect to columns in our first index slicing.

plt.rcParams['figure.figsize'] = (17, 7) 
plt.plot(b)


In [None]:
# In this we are visualizing the missing values with respect to columns in our second index slicing.

plt.plot(c)
plt.rcParams['figure.figsize'] = (17, 7) 

### What did you know about your dataset?

**we got to know that there are 48895 observations and 16 variables in our data set which include 16 and 21 null values in name and host_name respectively.On the other hand, there are 10052 null values in both reviews_per_month and  last_review . And the datatype of these nullvalues are int64 , And we came to know that there are 3 data types (float64, int64, objects) in our given dataset.**

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description 

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns:
  print(f'unique element of   {i}   is  {len(df[str(i)].unique())}')
  

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
copy_of_df=df.copy()
copy_of_df.head()


In [None]:
len(df['room_type'].unique())

In [None]:
#What is the distribution of minimum nights required for bookings?
# not_null_min_nights=df[df['minimum_nights'].isna()]
# len(not_null_min_nights)
# df.groupby('host_name')['minimum_nights'].count()
len(df['host_name'].unique())

In [None]:
#2.how does the price of listings vary across the neighbourhood?

maxm_rate_in_neighbourhood=df.groupby('neighbourhood')['price'].max()
maxm_rate_in_neighbourhood.sort_values()

In [None]:
min_rate_in_neighbourhood=df.groupby('neighbourhood')['price'].min()
min_rate_in_neighbourhood

In [None]:
average_rate_in_neighbourhood=df.groupby('neighbourhood')['price'].mean()
average_rate_in_neighbourhood

In [None]:
#3.what is the minimum and maximum price of each type of list?
max_rate_in_each_type_list=df.groupby('room_type')['price'].max()
max_rate_in_each_type_list.sort_values()

In [None]:
max_rate_in_each_type_list=df.groupby('room_type')['price'].mean()
max_rate_in_each_type_list

In [None]:
#4.what is the distribution of the number of reviews for each listing?
max_rate_in_each_type_list=df.groupby('room_type')['number_of_reviews'].sum()
max_rate_in_each_type_list.sort_values()

In [None]:
#5.How does the distribution of prices vary across different neighbourhood and room types?
df.groupby(['neighbourhood','room_type'])['price'].max()

In [None]:
df.groupby(['neighbourhood','room_type'])['price'].min()

In [None]:
#6.in which quarter of time the booking got higher than usual in which part of location.
# define a function to know quarter
def quarter(string):
    if string[5:7] in ('01','02','03'):
      return('Q1')
    elif string[5:7] in ('04','05','06'):
      return('Q2')
    elif string[5:7] in ('07','08','09'):
      return('Q3')
    elif string[5:7] in ('10','11','12'):
      return('Q4')
    else:
      return('no quater')
 #not null value in last review column     
no_na=df.loc[~df['last_review'].isna()]
no_na
#create a new column named quarter
no_na['quarter']=no_na['last_review'].apply(quarter)
no_na




In [None]:
no_na.groupby(['neighbourhood','quarter'])['number_of_reviews'].sum()

In [None]:
7.#In which Neighbourhood how many hosts are present. 
hosts_in_each_neighbourhood=df.groupby('neighbourhood')['host_name'].count()
hosts_in_each_neighbourhood.sort_values(ascending=False)

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.rcParams['figure.figsize'] = (25, 5)
df.groupby('neighbourhood')['price'].max().plot(kind='bar')
plt.title('maximun price in each neighbourhood')
plt.xlabel('neighbourhood',{'fontsize':20,'fontweight':15})
plt.ylabel('maxm price',{'fontsize':20,'fontweight':15})

##### 1. Why did you pick the specific chart?

To know about in a specific neighbourhood maximum price is how much.

##### 2. What is/are the insight(s) found from the chart?

In the neighbourhood of Astoria,Greenpoint,Upper West Side maximum price is 10000,and so on

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

In which location highest price is how much?According to this analysis client shoud set the mind where he has to invest.


#### Chart - 2

In [None]:
# Chart - 2 visualization code
df.groupby('neighbourhood')['host_name'].count().plot(kind='bar')
plt.title('no of hosts in each neighbourhood')
plt.xlabel('Neighbourhood',{'fontsize':20,'fontweight':15})
plt.ylabel('Host Count',{'fontsize':20,'fontweight':15})
plt.rcParams['figure.figsize'] = (25, 5)

To know in which neighbourhood more hosts are there.

##### 2. What is/are the insight(s) found from the chart?

In Williamsburg neighbourhood more hosts are there.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

This analysis give to choose in which neighbourhood you start your business according to number of hosts.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.rcParams['figure.figsize'] = (25, 5)
df.groupby('room_type')['number_of_reviews'].sum().plot(kind='bar')
plt.title('maximun number of reviews in each Room Type')
plt.xlabel('RoomType',{'fontsize':20,'fontweight':15})
plt.ylabel('number_of_reviews',{'fontsize':20,'fontweight':15})


##### 1. Why did you pick the specific chart?

To know which type of room is maximum reviewed.

##### 2. What is/are the insight(s) found from the chart?

entire home/appt having maximum number of reviews

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

so above mentioned property has more reviews ,so this has great demand.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
no_na.groupby(['neighbourhood','quarter'])['number_of_reviews'].sum().unstack().plot(kind='bar')
plt.title('realtion between neighbourhood,quarter,number_of_reviews ')
plt.xlabel('Neighbourhood',{'fontsize':20,'fontweight':15})
plt.ylabel('number of reviews',{'fontsize':20,'fontweight':15})
plt.rcParams['figure.figsize'] = (25,15)

##### 1. Why did you pick the specific chart?

To know in which neighbourhood in which quarter how much count of number of reviews.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

if any one have to do more property then he has to see this chart because here is the worth information about quarter and neighbourhood

#### Chart - 5

In [None]:
# Chart - 5 visualization code
df.groupby('room_type')['price'].max().plot(kind='bar')
plt.title('maximum price of each room type ')
plt.xlabel('property type',{'fontsize':20,'fontweight':5})
plt.ylabel('maximum price',{'fontsize':20,'fontweight':5})
plt.rcParams['figure.figsize'] = (10,5)

##### 1. Why did you pick the specific chart?

we can know the maximum price of each type of room from the above chart

##### 2. What is/are the insight(s) found from the chart?

maximum price of Entire hom/appt and Private room are same as 10000

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

you need to know the maximum price of each type of property for which you should not exceed that amount in that property segment.


#### Chart - 6

In [None]:
# Chart - 6 visualization code
max_rate_in_each_type_list.plot(kind='bar')
plt.title('Average price of each room type ')
plt.xlabel('property type',{'fontsize':20,'fontweight':5})
plt.ylabel('Average price',{'fontsize':20,'fontweight':5})
plt.rcParams['figure.figsize'] = (10,5)

##### 1. Why did you pick the specific chart?

To know the average price in each room type

##### 2. What is/are the insight(s) found from the chart?

Average price of Entire home/appt is high.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
df.groupby('neighbourhood_group')['host_name'].count().plot(kind='pie')
plt.title('number of host in neighbourhood_group')

##### 1. Why did you pick the specific chart?

To know in which neighbourhood group hosts number is maximum

##### 2. What is/are the insight(s) found from the chart?

In manhattan more hosts are there

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

yes new comer have to choose manhattan neighbourhood for more business

#### Chart - 8

In [None]:
# Chart - 8 visualization code
df.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack().plot(kind='bar')
plt.title('Average price of each type of property in each neighbourhood group')
plt.xlabel('neighbourhood')
plt.ylabel('average price')

##### 1. Why did you pick the specific chart?

To know the average price of each property type in each neighbouhood group

##### 2. What is/are the insight(s) found from the chart?

manhattan neighbourhood has high average value in all type of room

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Manhattan is the best neighbourhood for doing business

#### Chart - 9

In [None]:
# Chart - 9 visualization code
no_na.groupby(['neighbourhood_group','quarter'])['number_of_reviews'].sum().unstack().plot(kind='bar')
plt.title('review count in a quarter in neighbourhood group')
plt.ylabel('number of reviews')

##### 1. Why did you pick the specific chart?

To know the review count in a quarter in neighbourhood group

##### 2. What is/are the insight(s) found from the chart?

In brooklyn area in 2nd quarter more reviews are there.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It means more booking are hapeening in 2nd quarter of each year in brooklyn area

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***