<a href="https://colab.research.google.com/github/vibha-sanghani/EDA-AirBnb-Bookings-Analysis/blob/main/Vibha_EDA_Airbnb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AirBnb Bookings Analysis


##### **Project Type**    - Exploratory Data Analysis
##### **Contribution**    - Individual


# **Project Summary -**

This project focuses on analyzing a dataset comprising approximately 49,000 observations. The primary objective is to uncover valuable insights that can guide management and stakeholders in making informed decisions to drive business growth. By identifying trends and patterns within the data, actionable recommendations can be formulated to improve and expand the business. This analysis benefits both guests and hosts by providing insights to help guests make informed choices and guiding hosts on necessary improvements to enhance their offerings and achieve sustainable growth. Key information provided includes:

*   Listing Counts
*   Data Distributed as per specific neighborhood groups
*   Prices
*   Reviews data
*   Room Type Preference

Using this information we will be able to seek insights like:-

*   Preference of the guests for their hosts
*   Room-Type preference
*   Prefered Price range
*   Most preferred neighborhood

This project involves diving deep into a rich dataset of nearly 49,000 observations to uncover actionable insights that drive growth and innovation. The ultimate goal is to empower stakeholders with a clear understanding of the trends and patterns shaping the business landscape. By analyzing this data, we aim to create a personalized experience for our guests and provide meaningful guidance to our hosts, enabling them to cater to specific needs more effectively.

One key outcome will be the development of a dynamic filtering system that allows guests to discover listings tailored to their budget and preferences. Simultaneously, hosts will be ranked based on their ability to meet guest requirements, fostering healthy competition and encouraging improvements. This dual-sided approach ensures an enhanced guest experience and actionable insights for hosts to refine their offerings, ultimately driving satisfaction and loyalty.

From a technical perspective, data wrangling will play a critical role, with Pandas as the backbone for cleaning, organizing, and structuring the dataset. For numerical computations and ranking algorithms, NumPy will enable efficient array operations and seamless handling of numerical data. To communicate our findings effectively, we will leverage the power of Matplotlib and Seaborn to craft visually compelling stories from the data, ensuring stakeholders grasp insights effortlessly.

Beyond the technical scope, this project is a valuable learning experience. By simulating a real-world business scenario, I will gain insights into the operational models of this field, from problem identification to crafting practical solutions. Developing a strategic mindset, mastering the utilization of Python libraries, and applying complex concepts creatively will be essential milestones.

Ultimately, this project is not just about data analysis—it's about transforming raw numbers into meaningful narratives that guide decision-making and fuel growth. It’s an opportunity to polish my critical thinking, enhance my storytelling with data, and prepare for solving real-world business challenges effectively.





# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The task of this project is to derive insights from the given dataset so that it can be used by the stake holders for business improvements

#### **Define Your Business Objective?**

The business objective of this project is to identify opportuinites of improvements and also to derive patterns and insights of customer's preference which will ultimately help us in achieving customer satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import math
import seaborn as sns
import openpyxl
pd.set_option('display.max_columns', 200)

### Dataset Loading

In [None]:
# Load Dataset
# df_airbnb = pd.DataFrame(pd.read_csv("Airbnb NYC 2019.csv"))
# Load the dataset
from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/My Drive/AlmaBetter Project/M2-Project/Airbnb NYC 2019.csv'

df_airbnb = pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
df_airbnb.head(5)

Above is an initial glimpse of the dataset we will be working with. Let us now delve deeper into the data to explore its structure and uncover meaningful insights.



In [None]:
#check for all the columns we have in our dataset.
df_airbnb.columns

Here is a comprehensive overview of all the columns present in our dataset.

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df_airbnb.shape

Examining the shape of the dataset reveals a substantial number of rows compared to columns, indicating a robust dataset. Specifically, it comprises:

Rows: 48,895
Columns: 16

### Dataset Information

In [None]:
# Dataset Info
df_airbnb.info()


The dataset demonstrates a clear division between numerical and categorical variables, summarized as follows:
* **Numerical Data:**
  *   3 colums with float64 data type
  *   7 columns with int64 data type
* **Categorical Data:**
  * 6 columns with object data type.



#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df_airbnb.duplicated().sum()

From this analysis, we observe that there are no exact duplicate values present in the dataset.

In [None]:
# Converting the 'last_review' column in a datetime format.
df_airbnb['last_review'] = pd.to_datetime(df_airbnb['last_review'],errors='coerce')
df_airbnb.info()

In [None]:
# Now let us sort the data by the last_review column
df_airbnb = df_airbnb.sort_values(by='last_review', ascending=False).reset_index(drop=True)
df_airbnb

As we can see there are few NA values in our dataframe in the date column let's fill these values with the latest dates we are having in our dataframe.

In [None]:
# Replacing NA Values
df_airbnb['last_review'].replace(np.nan,df_airbnb['last_review'].max(), inplace=True)

# Let's sort the values again
df_airbnb = df_airbnb.sort_values(by='last_review', ascending=False).reset_index(drop=True)
df_airbnb

In [None]:
# Dropping duplicated values
df_airbnb = df_airbnb.drop_duplicates(subset=['name', 'host_name', 'neighbourhood_group', 'neighbourhood', 'room_type'], keep='first').reset_index(drop=True)

df_airbnb.shape # (48662, 16)

# Let's check if there are any duplicated values now
df_airbnb[df_airbnb.duplicated(subset=['name', 'host_name', 'neighbourhood_group', 'neighbourhood', 'room_type'])] # No values
df_airbnb.duplicated().sum() # 0 duplicate

We have successfully dropped the duplicated values and now we only have latest data values.

In [None]:
# Missing Values/Null Values Count
null_values_count = df_airbnb.isnull().sum()

# Visualizing the missing values
null_values_count


As we can see we were having missing values in name, host_name,and reviews_per_month columns

In [None]:
# To check the outliers we need to check the columns which are having numerical data.
df_airbnb.describe()

### What did you know about your dataset?


The Airbnb NYC 2019 dataset contains 48,895 rows and 16 columns. It includes both numerical and categorical data, with 3 columns of float64, 7 columns of int64, and 6 columns of object data types. The primary key is the "id" column, which uniquely identifies the hotel listings.

The columns in the dataset are as follows:
['id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude', 'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']

This dataset had no exact duplicated values.In our dataset we were able to use the 'last_review' column as the timestamp for our dataset.

The dataset had missing values primarily in last_review and reviews_per_month (9989 missing entries). For missing last_review, we replaced the NA values with the most recent date available. Additionally, there were a few missing values in the name (16) and host_name (21) columns.

After addressing these issues, the dataframe is now cleaned and ready for data wrangling.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_airbnb.head()

In [None]:
# Dataset Describe
df_describe = df_airbnb.describe()
df_describe # These are all the numerical variables in our dataset.

### Variables Description

The dataset provides various outputs that can help derive key insights. For instance, while the price range is broad, the average price preferred by customers is around 150, offering insight into guest budgets and preferences.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
def unique_value(df):
  for column in df.columns:
    unique_values = df[column].unique()
    print(f"Unique values for '{column}': {unique_values}")

## 3. ***Data Wrangling***

### Data Wrangling Code

As our Data cleaning, Data transformation, and Handling outliers has been completed, now we will be working on "Feature Engineering".

In [None]:
# Let's create a new column with the price range distribution.
# We will use it while working with the price column.
start = 0
end = 10000
breakpoints = np.linspace(start, end, num=101)
breakpoints = breakpoints.astype(int)
def price_range(amt):
    bp = breakpoints
    for i in range(len(breakpoints)-1):
        if bp[i] <= amt <= bp[i+1]:
            return f"{bp[i]} - {bp[i+1]}"


df_airbnb['price_range'] = df_airbnb['price'].apply(lambda amt: price_range(amt))
df_airbnb['price_range']

Creating a price range column offers several advantages that enhance data analysis and decision-making:

* Data Summarization: By grouping prices into ranges, we gain a clearer and more concise summary of the dataset's price distribution.

* Improved Visualization: Representing data through price ranges makes visualizations more effective and easier to interpret compared to analyzing individual price points.

* Segmentation and Analysis: Dividing prices into ranges enables straightforward comparisons between different segments, making it easier to identify patterns and trends.

* Informed Decision-Making: Price ranges support better decision-making by providing insights that can guide recommendations based on customer budgets and requirements.

* Insight Communication: Using price ranges simplifies the process of communicating customer preferences and purchasing power. It also highlights where the majority of customers fall within the price spectrum, aiding in targeted strategies.

In [None]:
# Let's take a glance at our dataset once.
df_airbnb.head()

# As we can see our dataset is sorted according to the last review, let us sort our dataset according to the price.
df_airbnb.sort_values(by='price', ascending=False, inplace=True)

# Let's take a look at our price sorted dataset once.
df_airbnb.head()

In [None]:
# Check our columns and filter those columns that will work as our features.
df_airbnb.columns

In [None]:
# Let's filter the dataframe with only the required columns.
df_feature = df_airbnb[['id', 'name', 'host_id', 'neighbourhood_group',
       'neighbourhood', 'room_type', 'price','last_review', 'price_range','minimum_nights',
       'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', ]]
df_feature=df_feature.reset_index(drop=True)
df_feature.head()

This dataset contains several distinct groups. Let's explore these groups further to uncover meaningful insights by analyzing the information specific to each group.

In [None]:
# Let's check the top performing Host as per the total listing count
host_groups = df_feature.groupby('name')
hosts = []
no_of_listings = []
host_prices = []
for host, data in host_groups:
  hosts.append(host)
  no_of_listings.append(data['calculated_host_listings_count'].sum())
  host_prices.append(data['price'].mean())

host_df = pd.DataFrame({
    'Host Name': hosts,
    'Total Listings': no_of_listings,
    'Price': host_prices
})
host_df = host_df.sort_values(by='Total Listings', ascending=False).reset_index(drop=True)
host_df = host_df.drop_duplicates(subset='Total Listings').reset_index(drop=True)
top_10_hosts = host_df.head(10)

host_df['Revenue'] = (host_df['Total Listings'])*(host_df['Price'])
top_host_revenue = host_df.sort_values(by='Revenue', ascending=False).reset_index(drop=True)
top_host_revenue.head(10)

In [None]:
# First we check how many neighbourhood_groups are there.
df_feature['neighbourhood_group'].unique() # There are 5 different neighbourhood groups.
# ['Queens', 'Manhattan', 'Brooklyn', 'Staten Island', 'Bronx']

n_groups=df_feature.groupby('neighbourhood_group')
n_groups

# Let's check which group is most prefered as per the listings, and no. of reviews.
groups = [] # To save the groups
listings = []
reviews = []
max_price = []
min_price = []
for group, data in n_groups:
  groups.append(group)
  listings.append(data['calculated_host_listings_count'].sum())
  reviews.append(data['number_of_reviews'].sum())
  max_price.append(data['price'].max())
  min_price.append(data['price'].min())

df_neighbour_group = pd.DataFrame({
    'Group': groups,
    'Listing Count': listings,
    'No._of_reviews': reviews,
    'Min Price': min_price,
    'Max Price': max_price
})

df_neighbour_group

Here we can see that Manhattan group is the most prefered group as per the listings.However the most reviews are given to the Brooklyn group.


In [None]:
# Now let's divide our dataset as per the room types, and create their groups.
# Check how room_types are there.

df_feature['room_type'].unique() # There are 3 room types.
# ['Entire home/apt', 'Private room', 'Shared room']

#  let's group our dataset according to the room types.
room_groups = df_feature.groupby('room_type')
rooms = []
room_listings = []
room_reviews = []
max_room_price = []
min_room_price = []
for room_type, room_data in room_groups:
    rooms.append(room_type)
    room_listings.append(room_data['calculated_host_listings_count'].sum())
    max_room_price.append(room_data['price'].max())
    min_room_price.append(room_data['price'].min())

df_room_group = pd.DataFrame({
    'Group': rooms,
    'Listing Count': room_listings,
    'Min Price': min_room_price,
    'Max Price': max_room_price
})

df_room_group


It is evident that the most preferred room type is the "Entire home/apt," which also receives the highest number of reviews. On the other hand, "Shared rooms" are the least preferred option.

In [None]:
# Now let us check how many different neighbourhoods are there in total.
df_feature['neighbourhood'].count() # there are 48662 neighbourhoods

In [None]:
# Now let us check what are the top 10 most prefered neighbourhoods.
# Also, let's check their average pricing and average price range.
area_groups = df_feature.groupby('neighbourhood')
areas = []
listing_count = []
avg_price = []
for area, n_data in area_groups:
  areas.append(area)
  listing_count.append(n_data['calculated_host_listings_count'].sum())
  avg_price.append(round(n_data['price'].mean(), 2))

df_area = pd.DataFrame({
    'Area': areas,
    'Listing Count': listing_count,
    'Average Price': avg_price,
})

df_area = df_area.sort_values(by='Listing Count', ascending=False).reset_index(drop=True).head(10)
df_area

In [None]:
# To analyze the relationship between price and room type,
#'ll examine the price distribution for each room type.
# Using the grouped DataFrame room_groups,
# we calculate the average price for each room type as follows:

# Calculate the average price for each room type
avg_prices = [data['price'].mean() for room, data in room_groups]

# Create a new DataFrame to store the results
room_vs_price = pd.DataFrame({
    'Room Type': rooms,
    'Avg Price': avg_prices
})

room_vs_price

As observed, the "Entire home/apt" has the highest average pricing compared to other room types.

In [None]:
# Relationship: Reviews per Month vs. Room Type
total_room =[data['reviews_per_month'].sum() for room, data in room_groups]
room_vs_reviews = pd.DataFrame({
    'Room Type': rooms,
    'Total Reviews': total_room
})

room_vs_reviews

By analyzing the room_vs_reviews DataFrame, we can see the total number of reviews per month for each room type. As observed, the "Entire home/apt" has the highest number of reviews.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code : Line Chart # check number of reviews base on last review in 10 years


# Filter reviews from the last 10 years
current_date = pd.Timestamp.now()
cutoff_date = current_date - pd.DateOffset(years=10)
df_last_10_years = df_feature[df_feature['last_review'] >= cutoff_date]

# Extract the year from the 'last_review' column
df_last_10_years['review_year'] = df_last_10_years['last_review'].dt.year

# Count the number of reviews per year
# reviews_per_year = df_last_10_years['review_year'].value_counts().sort_index()

# Group by year and sum 'number_of_reviews'
reviews_per_year=df_last_10_years.groupby('review_year')['number_of_reviews'].sum()

# Plot the results in line graph
plt.figure(figsize=(10, 6))
# reviews_per_year.plot(kind='bar', color='skyblue', edgecolor='black')
sns.lineplot(x=reviews_per_year.index, y=reviews_per_year.values, marker='o', color='skyblue')
plt.title('Number of Reviews in the Last 10 Years', fontsize=12, fontweight='bold')
plt.xlabel('Year', fontsize=10)
plt.ylabel('Number of Reviews', fontsize=10)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()


##### 1. Why did you pick the specific chart?

A line chart is a good choice to visualize trends over time. In this case, the chart is used to see how the number of reviews has changed over the past 10 years.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that the number of reviews has fluctuated over the past 10 years, but there is no clear upward or downward trend.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from this chart can help create a positive business impact. For example, if the business can identify what is driving the fluctuations in the number of reviews, they can take steps to increase the number of positive reviews. However, there is no evidence to suggest that the current number of reviews is leading to negative growth.

#### Chart - 2

In [None]:
# Chart - 2 visualization code : Bar Chart
# As now we are having the price ranges with the most listings let us check the neighbourhoods with the most listings.

# Sort the data by 'Listing Count' in descending order and take the top 10 rows
df_top_10 = df_area.sort_values(by='Listing Count', ascending=False).head(10)
# Set up the figure
plt.figure(figsize=(17, 5))
# Define the color palette
# colors = ['red', 'blue', 'green', 'orange', 'purple', 'pink', 'cyan', 'brown', 'yellow', 'gray']
sns.barplot(data=df_top_10,x='Area',y='Listing Count',hue='Area',palette='viridis',legend=False)
plt.xlabel('Area')
plt.ylabel('Listing Count')
plt.title('Top 10 prefered neighbourhood areas',fontdict={'fontweight': 'bold', 'fontsize': 12})
# Add data labels
for x,y in zip(df_top_10['Area'],df_top_10['Listing Count']):
    plt.text(x,y,y,ha='center',va='bottom',fontsize=10)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a good choice to compare categorical data. In this case, the chart is used to compare the number of listings in different neighborhoods.



##### 2. What is/are the insight(s) found from the chart?

The chart shows that the top 10 neighborhoods have a significantly higher number of listings than the other neighborhoods. The most popular neighborhood, Financial District, has nearly 85,000 listings, while the least popular neighborhood, Tribeca, has only 7,519 listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from this chart can help create a positive business impact. For example, businesses can focus their marketing efforts on the neighborhoods that are most popular with renters. Additionally, businesses can use this information to decide where to open new locations.

#### Chart - 3

In [None]:
# Chart - 3 visualization code : Pie Chart
# Visualize the most prefered room type using a pie chart
# Set up the figure
plt.figure(figsize=(8, 4))
# colors = ['#b5651d', '#8b4513', '#deb887']
colors = ['#2e8b57', '#87ceeb', '#f0e68c']
plt.pie(df_room_group['Listing Count'], labels=df_room_group['Group'], autopct='%1.1f%%', explode=(0.1, 0, 0.1), shadow=True, colors=colors,
    startangle=360)
plt.title('Most Prefered Room Type', fontsize=12, fontweight='bold')
plt.axis('equal')
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

Pie chart was used because it shows the relative frequencies of their listing counts for each room type. It is easy to compare the percentage distribution established for each room type in this data visualization, allowing users to quickly determine the most and least popular room types.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that entire home/apartments are the most preferred room type, accounting for 77.6% of listings. Private rooms are the second most popular choice, at 20.8%, while shared rooms are the least popular, at 1.6%.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive Impact: Analysis of room type interests makes it easier to know the areas to market by and redesign the stock to reflect popular choices like ‘Entire home/apt.’

* Negative Growth: Offering too many rooms in the ‘Shared room’ category, or not offering an adequate number of rooms for the right price could harm a company since many rooms would likely remain empty.

* Justification: The separation of business goals from customers’ needs can lead to inefficient distribution of resources resulting in losses.


#### Chart - 4

In [None]:
# Chart - 4 visualization code : Bar Chart
# To display only the top 10 neighborhoods in the bar chart based on the Listing Count

# Sort the data by 'Listing Count' in descending order and take the top 10 rows
df_sort = df_neighbour_group.sort_values(by='Listing Count', ascending=False)
# Set up the figure
plt.figure(figsize=(10, 5))
# Define the color palette
colors = ['blue', 'green', 'purple', 'yellow', 'brown']

sns.barplot(data=df_sort,x='Group',y='Listing Count',hue='Group',palette=colors,legend=False)
plt.xlabel('Neighbourhood Group')
plt.ylabel('Listing Count')
plt.title('Most Prefered Neighbourhoods',fontdict={'fontweight': 'bold', 'fontsize': 12})
for x,y in zip(df_sort['Group'],df_sort['Listing Count']):
    plt.text(x,y,y,ha='center',va='bottom')
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are well-suited for comparing categorical data. This chart specifically compares the number of listings in various neighborhoods.

##### 2. What is/are the insight(s) found from the chart?

Insignt that can be found from the chart:
* Manhattan has the most listings (around 450,000).
* Staten Island has the least listings (around 850).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Positive business impact: By understanding which neighborhoods have the most listings, businesses can target their marketing efforts to those areas. This can help them to reach a wider audience and increase their sales.

Negative business impact: If a business focuses only on the neighborhoods with the most listings, they may miss out on potential customers in other neighborhoods. It is important to consider other factors, such as demographics and psychographics, when targeting customers.

#### Chart - 5

In [None]:
# Chart - 5 visualization code : Scatter Plot # number_of_reviews vs reviews_per_month
sns.scatterplot(data=df_feature, x='number_of_reviews',y='reviews_per_month',color='dodgerblue',alpha=0.8, )

# Add labels, title, and legend
plt.title('Number of Reviews vs Reviews Per Month', fontsize=12, fontweight='bold')
plt.xlabel('Number of Reviews', fontsize=10)
plt.ylabel('Reviews Per Month', fontsize=10)
plt.show()


##### 1. Why did you pick the specific chart?

A scatter plot is the best choice to visualize the relationship between two continuous variables (number of reviews and reviews per month).

##### 2. What is/are the insight(s) found from the chart?

There is a positive correlation between the number of reviews and reviews per month. This means that businesses or products with more reviews tend to receive more reviews per month.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes as we can seek out the difference from the given reviews and make specific recommendations to our hosts about what changes can be done by them in order to increase their reach to the customers. This can be one of our premium services that can widely help our hosts.

#### Chart - 6 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
df_feature.describe()

plt.figure(figsize=(8, 4))
df_heatmap= df_feature[['price', 'minimum_nights', 'number_of_reviews',
             'reviews_per_month', 'calculated_host_listings_count', 'availability_365']].corr()
# define the mask to set the values in the upper triangle to True
# mask = np.triu(np.ones_like(df_heatmap))
hetmap1 = sns.heatmap(data = df_heatmap,annot=True)
hetmap1.set_title('Correlation Heatmap', fontdict={'fontsize':12,'fontweight': 'bold'}, pad=16);


##### 1. Why did you pick the specific chart?

As it allows to easily visualize the relationships among various numerical features, the correlation heatmap was chosen. This makes it easier for the user to appreciate the strength with which various variables are related which is important in comprehending the data and choosing features for further analysis.

##### 2. What is/are the insight(s) found from the chart?

The following conclusions can be drawn from the chart:

* Variables which have strong positive or negative correlations are selected for further studies (e.g., features which have high relation with price may be useful in making predictions).
* Very weak correlations denote features that are unlikely to affect one another, thus assisting in feature selection.
* For instance, availability_365 and number_of_reviews may have certain price trends that can be used in business strategies.

#### Chart - 7 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df_feature,palette='Set1')


##### 1. Why did you pick the specific chart?

Here we can easily see the distribution of our numerical variables, it will also give us an overview of our dataset and the relationships between our variables.

##### 2. What is/are the insight(s) found from the chart?

Yes, we can see that the utmost majority of our complete data set is within the price range of 0-2500.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

The exploratory data analysis provides clear insights into customer preferences and demand patterns. Approximately 90% of our business activity is concentrated on specific areas, whether in terms of regional preferences or pricing. This saturation highlights what resonates with our customers. By leveraging these findings, we can strategically replicate successful elements across other areas to align with customer expectations and drive growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***

In [None]:
# df_airbnb