# Airbnb Project

Airbnb is an online marketplace and hospitality service, enabling people to lease or rent short-term lodging including vacation rentals, apartment rentals, homestays, hostel beds, or hotel rooms. The company does not own any lodging; it is merely a broker & receives percentage service fees (commissions) from both guests & hosts in conjunction with every booking. In this project, there are 2 sections:

1. [Exploratory Data Analysis](#1)

I want to use questions to guide my exploration of the dataset, to pick up on any trends or insights on the different neighborhoods, the pricing, as well as the satisfaction levels based on reviews.

2. [Application](#2)

I want to create an application for customers to visualise possible listings on a map depending on their conditions for **price**, **overall satisfaction (number of stars)**, & **neighborhood**. Prospective hosts can also use the application to give them a gauge on the **price to set** per night for their listing, which will be calculated by taking the average prices of the other listings within the same neighborhood.

In [None]:
# importing the data

import csv

with open("airbnb_data.csv", newline = "") as csvfile:
    reader = csv.DictReader(csvfile)
    airbnb_data = []
    
    for row in reader:
        airbnb_data.append(dict(row))
        
print(airbnb_data[:3])

## Data Cleaning

In [None]:
# changing the data type of all numerical values into appropriate data types

for row in airbnb_data:
    row["reviews"] = int(row["reviews"])
    row["overall_satisfaction"] = float(row["overall_satisfaction"])
    row["accommodates"] = int(row["accommodates"])
    row["bedrooms"] = float(row["bedrooms"])
    row["price"] = float(row["price"])
    row["latitude"] = float(row["latitude"])
    row["longitude"] = float(row["longitude"])
    
print(airbnb_data[:3])

<a id='1'><h2><font color="salmon">&nbsp;1.</font><font color="salmon"> Exploratory Data Analysis </font> </h2></a>

#### Q1. List out each neighborhoods & their number of listings

In [None]:
results1 = {}

for row in airbnb_data:
    neighborhood = row["neighborhood"]
   
    if neighborhood not in results1:
        results1[neighborhood ] = 1
    else:
        results1[neighborhood ] += 1

print(results1)

#### Q2. List out each neighborhood & their average overall_satisfaction

In [None]:
total_overall_satisfaction = {}
total_count = {}

for row in airbnb_data:
    if row["reviews"] > 0:
        neighborhood = row["neighborhood"]
        overall_satisfaction = row["overall_satisfaction"]   
        
        if neighborhood not in total_overall_satisfaction:
            total_overall_satisfaction[neighborhood] = overall_satisfaction
        else:
            total_overall_satisfaction[neighborhood] += overall_satisfaction

for row in airbnb_data:
    if row["reviews"] > 0:
        neighborhood = row["neighborhood"]
        
        if neighborhood not in total_count:
            total_count[neighborhood] = 1
        else:
            total_count[neighborhood] += 1

results2 = {}

for neighborhood, count in total_count.items():
    overall_satisfaction = total_overall_satisfaction[neighborhood]
    results2[neighborhood] = round(overall_satisfaction / count, 2)
    
print(results2)

#### Q3. List out each neighborhood & their average price

In [None]:
total_price = {}
total_count = {}

for row in airbnb_data:
    neighborhood = row["neighborhood"]
    price = row["price"]
    
    if neighborhood not in total_price:
        total_price[neighborhood] = price
    else:
        total_price[neighborhood] += price
        
for row in airbnb_data:
    neighborhood = row["neighborhood"]
    
    if neighborhood not in total_count:
        total_count[neighborhood] = 1
    else:
        total_count[neighborhood] += 1
        
results3 = {}

for neighborhood, count in total_count.items():
    price = total_price[neighborhood]
    results3[neighborhood] = round(price / count, 2)

print(results3)

#### Q4. Plot a distribution of counts of the overall_satisfaction

In [None]:
import matplotlib.pyplot as plt

score_count = {}
score_count_tuple = []

for row in airbnb_data:
    reviews = row["reviews"]
    overall_satisfaction = row["overall_satisfaction"]
    
    if reviews > 0:
        if overall_satisfaction not in score_count:
            score_count[overall_satisfaction] = 1
        else:
            score_count[overall_satisfaction] += 1

for data in score_count.items():
    score_count_tuple.append(tuple(data))

score = []
count = []

for row in score_count_tuple:
    score.append(row[0])
    count.append(row[1])

x = score
y = count

plt.bar(x, y)
plt.title("Distribution of Overall Satisfaction Scores")
plt.xlabel("Overall Satisfaction Scores")
plt.ylabel("Counts")
plt.show()

#### Q5. Plot a geographical representation of all of the listings in Singapore

In [None]:
long_list = []
lat_list = []

for row in airbnb_data:
    longitude = row["longitude"]
    latitude = row["latitude"]
    long_list.append(longitude)
    lat_list.append(latitude)

plt.scatter(long_list, lat_list)
plt.title("Geographical Representation of all Airbnb Listings in Singapore")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

<a id='2'><h2><font color="salmon">&nbsp;2.</font><font color="salmon"> Creating an Airbnb Application for Customers & Hosts </font> </h2></a>

In [None]:
# creating a function to collect all the latitudes

def get_all_latitude(data, r_listing_id):
    results = []
    
    for row in data:
        latitude = row["latitude"]
        listing_id = row["listing_id"]
        
        if listing_id in r_listing_id:
            results.append(latitude)
            
    return results

In [None]:
# creating a function to collect all the longitudes

def get_all_longitude(data, r_listing_id):
    results = []
    
    for row in data:
        longitude = row["longitude"]
        listing_id = row["listing_id"]
        
        if listing_id in r_listing_id:
            results.append(longitude)
            
    return results

In [None]:
# create the function that recommends listings to users based on a given price, satisfaction score & neighborhood

def listings_recommender(data, r_price, r_overall_satisfaction, r_neighborhood_id):
    results = []
    
    for row in data:
        price = row["price"]
        overall_satisfaction = row["overall_satisfaction"]
        neighborhood = row["neighborhood"]
        listing_id = row["listing_id"]
        
        if price <= r_price and overall_satisfaction >= r_overall_satisfaction and neighborhood == r_neighborhood_id:
            results.append(listing_id)
    
    return results

In [None]:
# creating a function to geographically visualise a given list of listings

import mplleaflet
import matplotlib.pyplot as plt

def visualise_listings(data, list_of_listing_ids):
    list_latitude = []
    list_longitude = []
    
    for row in data:
        latitude = row["latitude"]
        longitude = row["longitude"]
        listing_id = row["listing_id"]
        
        if listing_id in list_of_listing_ids:
            list_latitude.append(latitude)
            list_longitude.append(longitude)
            
    plt.scatter(list_longitude, list_latitude, marker = "*", s = 500, c = "red")
        
    return mplleaflet.show()

In [None]:
# creating a function that tells hosts the recommended price to set based on the neighbourhood

def price_recommender(data, r_neighborhood_id):
    total_price = 0
    count = 0
    
    for row in data:
        reviews = row["reviews"]
        overall_satisfaction = row["overall_satisfaction"]
        neighborhood = row["neighborhood"]
        price = row["price"]
        
        if reviews >= 1 and overall_satisfaction >= 4 and neighborhood == r_neighborhood_id:
            total_price += price
            count += 1
            
            recommended_price = float(total_price / count)
            
    return round(recommended_price, 2)

## Using the Application

#### For customers

In [None]:
neighborhood_to_test = input("Which neighborhood would you like to look into?").upper()
price_to_test = float(input("What is your budget? ($)"))
overall_satisfaction_to_test = float(input("Overall satisfaction score? (out of 5 stars)"))

visualise_listings(airbnb_data, listings_recommender(airbnb_data, price_to_test, overall_satisfaction_to_test, neighborhood_to_test))
print("-----------------------------------------------------------------------------------------------------------------------------")
print("Here are the Listing IDs of the listings that you have filtered: ", listings_recommender(airbnb_data, price_to_test, overall_satisfaction_to_test, neighborhood_to_test))

#### For hosts

In [None]:
print("Here is the list of neighborhoods & the number of listings within each of them:", results1)
hosting_neighborhood = input("Which neighborhood is your listing going to be hosted in?").upper()
print("-----------------------------------------------------------------------------------------------------------------------------")
print("The average price of a night's stay around this neighborhood is: $" + str(price_recommender(airbnb_data, hosting_neighborhood)))

# END