<a href="https://colab.research.google.com/github/tuhanren/Airbnb-Data-Analysis/blob/main/Final_Project_Group15.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project Objective:

The goal of this project is to help new Airbnb hosts estimate a reasonable nightly **rental price** for their property. Using historical data from Airbnb listings, we will build predictive models that consider factors such as **location(lat and long)**, **availability**, and **house rules?(50k missing)** to forecast the most suitable rental price for new listings.

Dataset: https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata/dataLinks to an external site.

Data dictionary: https://docs.google.com/spreadsheets/d/1b_dvmyhb_kAJhUmv81rAxl4KcXn0Pymz


Key Analysis Steps:

1. Data Cleaning: Clean the dataset by handling missing values, fixing inconsistencies, and addressing any outliers.
2. Descriptive Analytics: Explore the overall distribution of rental prices, relationships between location and price, popular room types and other descriptive insights.
3. Diagnostic Analytics: Perform correlation analysis and regression to find which factors (location, room type, availability, etc.) most influence rental prices.
4. Predictive Analytics: Build models that predict rental prices based on property characteristics and availability.

Practical Use Case:
New hosts will be able to input their property’s details, such as location, room type, availability, etc. Our model will then predict a reasonable nightly rental price based on similar listings from the historical data, helping the host price their property competitively.

Feasibility:
This project is feasible using Python libraries like pandas for data handling, scikit-learn and keras for modeling, and matplotlib for visualization. The final result will not only provide price recommendations for new Airbnb hosts but also reveal market trends and insights.

In [None]:
# Import the 'drive' module from the 'google.colab' package to enable Google Drive integration.
# Then, mount Google Drive to the '/drive' directory within the Colab environment.
# The 'force_remount=True' parameter ensures that the Drive is remounted even if it was previously mounted.

from google.colab import drive
drive.mount('/drive', force_remount=True)

# Change the current working directory to the specified folder within Google Drive,
# where you can save and load your Colab notebooks or files.
%cd '/drive/MyDrive/Colab Notebooks/INF1340/group project/'

Mounted at /drive
[Errno 2] No such file or directory: '/drive/MyDrive/Colab Notebooks/INF1340/group project/'
/content


In [231]:
import pandas as pd

def read_csv(uri: str) -> pd.DataFrame:
  """Read a CSV file from the given URI and return a pandas DataFrame.

  Args:
    uri: The URI of the CSV file to read.

  Returns:
    A pandas DataFrame containing the data from the CSV file.
  """

  try:
    return pd.read_csv(uri)
  except FileNotFoundError as ex:
    print(f'Error! File Not Found! uri={uri}')
    raise ex

def columns_snakecase(dataFrame: pd.DataFrame) -> None:
  """
  Convert column names in a pandas DataFrame to lowercase and
  replace all spaces with underscores e.g. 'My Column Name' becomes 'my_column_name'.

  Args:
    dataFrame: The pandas DataFrame whose column names need to be converted.
  """

  dataFrame.columns = dataFrame.columns.str.lower().str.replace(' ', '_')

def columns_drop(dataFrame: pd.DataFrame, columns: list) -> None:
  """Drop the specified columns from a pandas DataFrame.

  Args:
    dataFrame: The pandas DataFrame from which columns need to be dropped.
    columns: A list of column names to be dropped from the DataFrame.
  """
  dataFrame.drop(columns, axis=1, inplace=True)

def columns_drop_by_null_percentage(
    dataFrame: pd.DataFrame,
    percentage_threshold: float
) -> None:
  """"""

  columns = dataFrame.columns[dataFrame.isnull().mean() > percentage_threshold]
  print(f"Droping: {columns}")
  columns_drop(dataFrame, columns)

def columns_fill_null(dataFrame: pd.DataFrame, columns: list, value: any) -> None:
  """
  Fill missing values in the specified columns of a pandas DataFrame with a given value.

  Args:
    dataFrame: The pandas DataFrame in which missing values need to be filled.
    columns: A list of column names whose missing values need to be filled.
    value: The value to fill missing values with.
  """

  for column in columns:
    dataFrame[column].fillna(value, inplace=True)

def columns_dollarize(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Convert values in the specified columns of a pandas DataFrame from string to float
  by removing dollar signs and commas.

  Args:
    dataFrame: The pandas DataFrame in which values need to be converted.
    columns: A list of column names whose values need to be converted.
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].replace('[\$,]', '', regex=True).astype(float)

def rows_drop_by_condition(dataFrame: pd.DataFrame, condition: any) -> None:
  """
  Drop rows from a pandas DataFrame based on a given condition.

  Args:
    dataFrame: The pandas DataFrame from which rows need to be dropped.
    condition: A pandas DataFrame condition to filter rows.
  """

  dataFrame.drop(dataFrame[condition].index, inplace=True)

def rows_drop_by_null(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Drop rows from a pandas DataFrame that contain missing values in the specified columns.

  Args:
    dataFrame: The pandas DataFrame from which rows need to be dropped.
    columns: A list of column names whose rows need to be dropped.
  """
  dataFrame.dropna(subset=columns, inplace=True)

def columns_lowercase(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Convert column names in a pandas DataFrame to lowercase.

  Args:
    dataFrame: The pandas DataFrame whose columns need to be converted.
    columns: A list of column names to be converted to categorical data type.
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].str.lower()

def columns_categorize(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Convert the specified columns of a pandas DataFrame to categorical data type.

  Args:
    dataFrame: The pandas DataFrame whose columns need to be converted.
    columns: A list of column names to be converted to categorical data type.
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].astype('category')

def columns_boolize(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Convert the specified columns of a pandas DataFrame to boolean data type.

  Args:
    dataFrame: The pandas DataFrame whose columns need to be converted.
    columns: A list of column names to be converted to boolean data type.
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].astype(bool)

def columns_intize(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Convert the specified columns of a pandas DataFrame to integer data type.

  Args:
    dataFrame: The pandas DataFrame whose columns need to be converted.
    columns: A list of column names to be converted to integer data type.
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].astype(int)

def columns_floatize(dataFrame: pd.DataFrame, columns: list) -> None:
  """
  Convert the specified columns of a pandas DataFrame to float data type.
  Args:
    dataFrame: The pandas DataFrame whose columns need to be converted.
    columns: A list of column names to be converted to float data type.
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].astype(float)

def apply_lambda(dataFrame: pd.DataFrame, columns: list, fn: any) -> None:
  """
  Apply a lambda function to the specified columns of a pandas DataFrame.

  Args:
    dataFrame: The pandas DataFrame on which the lambda function needs to be applied.
    columns: A list of column names whose values need to be transformed.
    fn: The lambda
  """

  for column in columns:
    dataFrame[column] = dataFrame[column].apply(fn)

In [232]:
# Call read_csv() to import csv file.
df = read_csv('Airbnb_Open_Data.csv')
df.head(5)

  return pd.read_csv(uri)


Unnamed: 0,id,NAME,host id,host_identity_verified,host name,neighbourhood group,neighbourhood,lat,long,country,...,service fee,minimum nights,number of reviews,last review,reviews per month,review rate number,calculated host listings count,availability 365,house_rules,license
0,1001254,Clean & quiet apt home by the park,80014485718,unconfirmed,Madaline,Brooklyn,Kensington,40.64749,-73.97237,United States,...,$193,10.0,9.0,10/19/2021,0.21,4.0,6.0,286.0,Clean up and treat the home the way you'd like...,
1,1002102,Skylit Midtown Castle,52335172823,verified,Jenna,Manhattan,Midtown,40.75362,-73.98377,United States,...,$28,30.0,45.0,5/21/2022,0.38,4.0,2.0,228.0,Pet friendly but please confirm with me if the...,
2,1002403,THE VILLAGE OF HARLEM....NEW YORK !,78829239556,,Elise,Manhattan,Harlem,40.80902,-73.9419,United States,...,$124,3.0,0.0,,,5.0,1.0,352.0,"I encourage you to use my kitchen, cooking and...",
3,1002755,,85098326012,unconfirmed,Garry,Brooklyn,Clinton Hill,40.68514,-73.95976,United States,...,$74,30.0,270.0,7/5/2019,4.64,4.0,1.0,322.0,,
4,1003689,Entire Apt: Spacious Studio/Loft by central park,92037596077,verified,Lyndon,Manhattan,East Harlem,40.79851,-73.94399,United States,...,$41,10.0,9.0,11/19/2018,0.1,3.0,1.0,289.0,"Please no smoking in the house, porch or on th...",


Rename all columns, to lower.

In [233]:
columns_snakecase(df)
df.columns

Index(['id', 'name', 'host_id', 'host_identity_verified', 'host_name',
       'neighbourhood_group', 'neighbourhood', 'lat', 'long', 'country',
       'country_code', 'instant_bookable', 'cancellation_policy', 'room_type',
       'construction_year', 'price', 'service_fee', 'minimum_nights',
       'number_of_reviews', 'last_review', 'reviews_per_month',
       'review_rate_number', 'calculated_host_listings_count',
       'availability_365', 'house_rules', 'license'],
      dtype='object')

Drop high missing rate columns (15%)

In [234]:
columns_drop_by_null_percentage(df, 0.15)

Droping: Index(['last_review', 'reviews_per_month', 'house_rules', 'license'], dtype='object')


Drop irrelevant columns

In [235]:
irrelevant_to_drop = ['name', 'host_id', 'country', 'country_code',
                   'host_name', 'calculated_host_listings_count']
columns_drop(df, irrelevant_to_drop)

In [203]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 102599 entries, 0 to 102598
Data columns (total 16 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   id                      102599 non-null  int64  
 1   host_identity_verified  102310 non-null  object 
 2   neighbourhood_group     102570 non-null  object 
 3   neighbourhood           102583 non-null  object 
 4   lat                     102591 non-null  float64
 5   long                    102591 non-null  float64
 6   instant_bookable        102494 non-null  object 
 7   cancellation_policy     102523 non-null  object 
 8   room_type               102599 non-null  object 
 9   construction_year       102385 non-null  float64
 10  price                   102352 non-null  object 
 11  service_fee             102326 non-null  object 
 12  minimum_nights          102190 non-null  float64
 13  number_of_reviews       102416 non-null  float64
 14  review_rate_number  

In [236]:
columns_dollarize(df, ['price', 'service_fee'])
columns_fill_null(df, ['price'], df.groupby(['neighbourhood', 'room_type'])['price'].transform('mean'))
columns_fill_null(df, ['service_fee', 'price', 'minimum_nights', 'number_of_reviews',
                       'review_rate_number', 'availability_365'], 0)
columns_fill_null(df, ['host_identity_verified'], 'unconfirmed')
columns_fill_null(df, ['cancellation_policy'], 'strict')
columns_fill_null(df, ['instant_bookable'], False)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  dataFrame[column].fillna(value, inplace=True)
  dataFrame[column].fillna(value, inplace=True)


Drop rows of null values. mentioned counts from draft.

In [237]:
rows_drop_by_null(df, ['lat', 'long', 'neighbourhood_group', 'neighbourhood', 'construction_year', 'room_type'])

In [238]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 102338 entries, 0 to 102598
Data columns (total 16 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   id                      102338 non-null  int64  
 1   host_identity_verified  102338 non-null  object 
 2   neighbourhood_group     102338 non-null  object 
 3   neighbourhood           102338 non-null  object 
 4   lat                     102338 non-null  float64
 5   long                    102338 non-null  float64
 6   instant_bookable        102338 non-null  bool   
 7   cancellation_policy     102338 non-null  object 
 8   room_type               102338 non-null  object 
 9   construction_year       102338 non-null  float64
 10  price                   102338 non-null  float64
 11  service_fee             102338 non-null  float64
 12  minimum_nights          102338 non-null  float64
 13  number_of_reviews       102338 non-null  float64
 14  review_rate_number      1

Lower catagory columns

In [239]:
columns_lowercase(df, ['host_identity_verified', 'neighbourhood_group',
                       'neighbourhood', 'cancellation_policy', 'room_type'])

In [240]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 102338 entries, 0 to 102598
Data columns (total 16 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   id                      102338 non-null  int64  
 1   host_identity_verified  102338 non-null  object 
 2   neighbourhood_group     102338 non-null  object 
 3   neighbourhood           102338 non-null  object 
 4   lat                     102338 non-null  float64
 5   long                    102338 non-null  float64
 6   instant_bookable        102338 non-null  bool   
 7   cancellation_policy     102338 non-null  object 
 8   room_type               102338 non-null  object 
 9   construction_year       102338 non-null  float64
 10  price                   102338 non-null  float64
 11  service_fee             102338 non-null  float64
 12  minimum_nights          102338 non-null  float64
 13  number_of_reviews       102338 non-null  float64
 14  review_rate_number      1

Fix inconsistent cases

In [241]:
# print all category
df['neighbourhood_group'].value_counts().sort_index()

Unnamed: 0_level_0,count
neighbourhood_group,Unnamed: 1_level_1
bronx,2709
brookln,1
brooklyn,41735
manhatan,1
manhattan,43690
queens,13248
staten island,954


In [242]:
df.loc[df['neighbourhood_group'] == 'manhatan', "neighbourhood_group"] = 'manhattan'
df.loc[df['neighbourhood_group'] == 'brookln', "neighbourhood_group"] = 'brooklyn'

In [243]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 102338 entries, 0 to 102598
Data columns (total 16 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   id                      102338 non-null  int64  
 1   host_identity_verified  102338 non-null  object 
 2   neighbourhood_group     102338 non-null  object 
 3   neighbourhood           102338 non-null  object 
 4   lat                     102338 non-null  float64
 5   long                    102338 non-null  float64
 6   instant_bookable        102338 non-null  bool   
 7   cancellation_policy     102338 non-null  object 
 8   room_type               102338 non-null  object 
 9   construction_year       102338 non-null  float64
 10  price                   102338 non-null  float64
 11  service_fee             102338 non-null  float64
 12  minimum_nights          102338 non-null  float64
 13  number_of_reviews       102338 non-null  float64
 14  review_rate_number      1

bay terrace 8 bay terrace, staten island 4 | checked not same

chelsea 2281 chelsea, staten island 1 | checked not same

clifton 39 clinton hill 1137 | checked not same

concourse 122 concourse village 76 | checked not same

hollis 44 holliswood 10 | checked not same

jamaica 618 jamaica estates 50 jamaica hills 21 | checked not same

kew gardens 83 kew gardens hills 66 | checked not same

new dorp 4 new dorp beach 12 | checked new drop include new drop beach

As these names are very similar, we decided to check them on google map, and confiremd they are different areas.

In [244]:
df['neighbourhood'].value_counts().sort_index().to_string()

"neighbourhood\nallerton                        96\narden heights                    9\narrochar                        52\narverne                        223\nastoria                       1872\nbath beach                      48\nbattery park city              118\nbay ridge                      304\nbay terrace                      8\nbay terrace, staten island       4\nbaychester                      29\nbayside                        124\nbayswater                       40\nbedford-stuyvesant            7918\nbelle harbor                    31\nbellerose                       26\nbelmont                         45\nbensonhurst                    157\nbergen beach                    30\nboerum hill                    357\nborough park                   268\nbreezy point                     9\nbriarwood                      121\nbrighton beach                 167\nbronxdale                       48\nbrooklyn heights               308\nbrownsville                    153\nbull's head 

In [245]:
df['host_identity_verified'].value_counts().sort_index()

Unnamed: 0_level_0,count
host_identity_verified,Unnamed: 1_level_1
unconfirmed,51333
verified,51005


In [246]:
df['cancellation_policy'].value_counts().sort_index()

Unnamed: 0_level_0,count
cancellation_policy,Unnamed: 1_level_1
flexible,33975
moderate,34265
strict,34098


In [247]:
df['room_type'].value_counts().sort_index()

Unnamed: 0_level_0,count
room_type,Unnamed: 1_level_1
entire home/apt,53558
hotel room,116
private room,46439
shared room,2225


Cast columns to the appropriate types.

In [248]:
int_columns = ['minimum_nights', 'number_of_reviews', 'review_rate_number', 'availability_365', 'construction_year']
cat_columns = ['host_identity_verified', 'neighbourhood_group', 'neighbourhood', 'instant_bookable', 'cancellation_policy', 'room_type']
float_columns = ['lat','long', 'price', 'service_fee']
columns_intize(df, int_columns)
columns_categorize(df, cat_columns)
columns_floatize(df, float_columns)

In [249]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 102338 entries, 0 to 102598
Data columns (total 16 columns):
 #   Column                  Non-Null Count   Dtype   
---  ------                  --------------   -----   
 0   id                      102338 non-null  int64   
 1   host_identity_verified  102338 non-null  category
 2   neighbourhood_group     102338 non-null  category
 3   neighbourhood           102338 non-null  category
 4   lat                     102338 non-null  float64 
 5   long                    102338 non-null  float64 
 6   instant_bookable        102338 non-null  category
 7   cancellation_policy     102338 non-null  category
 8   room_type               102338 non-null  category
 9   construction_year       102338 non-null  int64   
 10  price                   102338 non-null  float64 
 11  service_fee             102338 non-null  float64 
 12  minimum_nights          102338 non-null  int64   
 13  number_of_reviews       102338 non-null  int64   
 14  review_ra

Fix inconsistent cases

In [250]:
rows_drop_by_condition(df, df['price'] == 0)
apply_lambda(df, ['availability_365', 'minimum_nights'], lambda x: max(0, x))
apply_lambda(df, ['availability_365'], lambda x: min(365, x))

In [251]:
df['availability_365'].describe()

Unnamed: 0,availability_365
count,102338.0
mean,139.651947
std,133.477069
min,0.0
25%,2.0
50%,95.0
75%,268.0
max,365.0


In [252]:
df['minimum_nights'].describe()

Unnamed: 0,minimum_nights
count,102338.0
mean,8.120112
std,30.227049
min,0.0
25%,1.0
50%,3.0
75%,5.0
max,5645.0


In [254]:
df.describe()

Unnamed: 0,id,lat,long,construction_year,price,service_fee,minimum_nights,number_of_reviews,review_rate_number,availability_365
count,102338.0,102338.0,102338.0,102338.0,102338.0,102338.0,102338.0,102338.0,102338.0,102338.0
mean,29194740.0,40.728091,-73.949631,2012.487287,625.324514,124.707352,8.120112,27.35031,3.26948,139.651947
std,16229790.0,0.055863,0.049544,5.765455,331.28948,66.547533,30.227049,49.351284,1.295229,133.477069
min,1001254.0,40.49979,-74.24984,2003.0,50.0,0.0,0.0,0.0,0.0,0.0
25%,15155270.0,40.68875,-73.98258,2007.0,340.25,67.0,1.0,1.0,2.0,2.0
50%,29192110.0,40.7223,-73.95444,2012.0,625.0,124.5,3.0,7.0,3.0,95.0
75%,43223980.0,40.762757,-73.93234,2017.0,912.0,182.0,5.0,30.0,4.0,268.0
max,57367420.0,40.91697,-73.70522,2022.0,1200.0,240.0,5645.0,1024.0,5.0,365.0


Fill missing values.

In [None]:
# # Handle `price` column
# df_cleaned['price'] = df_cleaned['price'].replace('[\$,]', '', regex=True).astype(float)
# df_cleaned['price'] = df_cleaned['price'].fillna(df_cleaned.groupby(['neighbourhood', 'room_type'])['price'].transform('mean'))
# df_cleaned['price'] = df_cleaned['price'].fillna(df_cleaned['price'].mean())

For the columns bleow, fillna with 0 and ensure the type of int

In [None]:
# int_columns = ['minimum_nights', 'number_of_reviews',
#                'review_rate_number', 'availability_365']
# df_cleaned[int_columns] = df_cleaned[int_columns].fillna(0).astype(int)

Filling missing values with 'unknown' or 'unconfirmed' for some non-numerical columns

In [None]:
# df_cleaned['host_identity_verified'] = df_cleaned['host_identity_verified'].fillna('unconfirmed')
# df_cleaned['cancellation_policy'] = df_cleaned['cancellation_policy'].fillna('unknown')
# df_cleaned['instant_bookable'] = df_cleaned['instant_bookable'].fillna('unknown')

Drop the rows that with null values for columns `lat`, `long`, `neighbourhood_group`, and `neighbourhood`


In [None]:
# df_cleaned = df_cleaned.dropna(subset=['lat', 'long', 'neighbourhood_group', 'neighbourhood', 'construction_year'])

Handle the `minimum_nights` and `availability_365` negative values.

In [None]:
# # Handle the `minimum_nights` and `availability_365` columns
# df_cleaned = df_cleaned[df_cleaned['minimum_nights'] >= 0].copy()  # Explicitly create a copy
# df_cleaned.loc[:, 'availability_365'] = df_cleaned['availability_365'].apply(lambda x: 0 if x < 0 else x)

Drop columns with high missing rates and unneeded, will not use for analysis

In [None]:
# columns_to_drop = ['house_rules', 'license', 'name', 'host_id', 'country',
#                    'country_code', 'host_name', 'calculated_host_listings_count',
#                    'last_review', 'reviews_per_month']
# df_cleaned = df_cleaned.drop(columns_to_drop, axis=1)

Drop duplicated records since the dataset is rental listings (541 duplicated rows)

In [None]:
df_cleaned = df_cleaned.drop_duplicates()

Convert relevant columns to category type

In [None]:
# categorical_columns = ['host_identity_verified', 'neighbourhood_group', 'neighbourhood',
#                        'instant_bookable', 'cancellation_policy', 'room_type']
# df_cleaned[categorical_columns] = df_cleaned[categorical_columns].astype('category')
# df_cleaned[categorical_columns].info() # check

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cleaned[categorical_columns] = df_cleaned[categorical_columns].astype('category')


Convert relevant columns to int type

In [None]:
# int_columns = ['construction_year', 'minimum_nights', 'number_of_reviews',
#                'review_rate_number', 'availability_365']
# df_cleaned[int_columns] = df_cleaned[int_columns].astype(int)
# df_cleaned[int_columns].info() # check

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cleaned[int_columns] = df_cleaned[int_columns].astype(int)


**Dealing with inconsitant data**:<br/>


*   `neighbourhood_group`: Case differences like 'manhatan' and 'Manhattan'.
*   `availability_365`: By defination it is the availability of the listing x days in the future as determined by the calendar. So it should not be over 365.



In [None]:
# # Handle the inconsistant values and style in the column
# df_cleaned['neighbourhood_group'] = df_cleaned['neighbourhood_group'].replace({'manhatan': 'Manhattan'}).str.lower().astype('category')
# # Find inconsistant values in `neighbourhood` column
# df_cleaned['neighbourhood'] = df_cleaned['neighbourhood'].str.lower()
# # `availability_365` should not have value over 365
# df_cleaned['availability_365'] = df_cleaned['availability_365'].apply(lambda x: min(x, 365))

  df_cleaned['neighbourhood_group'] = df_cleaned['neighbourhood_group'].replace({'manhatan': 'Manhattan'}).str.lower().astype('category')
