<a href="https://colab.research.google.com/github/vishwapv/hotel-booking-analysis/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel booking analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**  
The dataset contains hotel bookings data. One of the hotels is a resort hotel and the
other is a city hotel. The dataset have the structure, with 31 variables describing the
40,060 observations of resort hotel and 79,330 observations of city hotel. Each
observation represents a hotel booking. The dataset comprehend bookings due to
arrive between the 1st of July of 2015 and the 31st of August 2017, including bookings
that effectively arrived and bookings that were canceled. Since this is hotel real data,
all data elements pertaining hotel or customer identification were deleted.
The problem statement was to identify what impacts booking cancellation, from
which country most guests are coming, who did the booking and whether customers
repeating their bookings or not.
The first step in the analysis involved taking initial look at the data, looking for any
missing values and null values and tackling them. 

The second step involved analyzing numerical type features, with the help of different
visualization techniques such as heatmap, distplot, bar graphs, boxplots, pie charts,
etc. Finding correlation between each variable and also finding the important
features that had an impact on cancellation of bookings.
The third step involved analyzing categorical variables such as hotel,
arrival_date_month, country, reserved_room_type, reservation_status, deposit_type,
distribution_channel, market_segment and finding any underlying pattern that
affects the rate of cancellations.
The final step was to point down the insights developed during the analysis of the
data. Some observations draw were; increase in lead time increases rate of booking
cancellation, increase in ADR also increases rate of booking cancellation, non-refund
policy also increases rate of booking cancellation, majority of guests are from Western
Europe, mostly couples booked the hotels and majority of customers are not repeating their bookings. 


Write the summary here within 500-600 words.

# **GitHub Link -**  https://github.com/vishwapv

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions!
This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.
Explore and analyze the data to discover important factors that govern the bookings.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset

df =pd.read_csv('/content/drive/MyDrive/Colab Notebooks/module/data visualization/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head(10)


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

In [None]:
df.head(10)

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
len(df.columns)

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description 

- **Hotel** - Type of hotel

- **is canceled** - it denoted 1 when it is canceled and 0 if booking was not canceled

- **lead time** - time between booking nd check in
- **arrival date year** - Year where customer arrived at hotel
- **arrival date month** - Month where customer arrived at hotel
- **arrival date week** -week where the customer arrived at hotel
- **arrival date day of month** - numbers of day where the customer arrived at hotel
- **stay in weekend nights** - number of night where the customer stay in weekend nights
- **stays_in_week_nights** -  number of night where the customer stay in week nights
- **adults** - number of person who were adults
- **deposite type** - Indication on if the customer made a deposit to guarantee the booking. Three categories, No-deposit, Non-Refund, Refundable
- **Adr** - Average Daily rate as defined by the average rental revenue earned for an occupied room per day.
- **required_car_parking_spaces** - Number of car parking spaces required by the customer.
- **previous cancellation** - Number of previous bookings that were cancelled by the customer prior to the current booking.
- **reserved room type** - code of room type
- **reservation status set** -  Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Checking for total null values in each column 
df.isnull().sum().sort_values(ascending=False)

In [None]:
# Percentage of null values in each column

print(100*(df.isnull().sum()/len(df.index)).sort_values(ascending=False))

In [None]:
# Plotting heatmap of null values

fig,axes = plt.subplots(1,1,figsize=(20,10))
sns.heatmap(df.isna())
plt.show()

**Note:**

- Columns which contains null values are agent, company, children and country.

- Approximately, there are 94% company column and 13% agent column filled with null values.

In [None]:
# creating a copy of data set
hotel_df=df.copy()

In [None]:
# Replacing null values of column Agent and Company with 0
hotel_df[['agent', 'company']] = hotel_df[['agent', 'company']].fillna(0.0)

**let's check hotel_df by plotting heat map**


In [None]:
# checking through heat map
fig,axes = plt.subplots(1,1,figsize=(20,10))
sns.heatmap(hotel_df.isna())
plt.show()

**we can conclude that there no null values**

In [None]:
# again we replace missing value of coloumn 'children' with rounded mean value as it contain the count of childen

hotel_df['children'].fillna(round(df['children'].mean()), inplace = True)

In [None]:
#replace the country column with mode
hotel_df['country'].fillna(df['country'].mode().to_string(), inplace=True)

In [None]:
# Drop those rows which have adult, babies and children equals to 0

hotel_df = hotel_df.drop(df[(hotel_df.adults + hotel_df.babies + hotel_df.children)==0].index)

In [None]:
# lets check the shape of the dataframe
hotel_df.shape

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***