`Exploring Airbnb Market Trends`

`August 2025`

This project investigates market trends in New York City's Airbnb listings by combining data from multiple file types. The analysis focuses on determining the earliest and most recent review dates, counting the number of private room listings, and calculating the average listing price.

`Any questions, please reach out!`

Chiawei Wang, PhD\
Data & Product Analyst\
<chiawei.w@outlook.com>

`*` Note that the table of contents and other links may not work directly on GitHub.

[Table of Contents](#table-of-contents)
1. [Executive Summary](#executive-summary)
   - [Challenge](#challenge)
   - [Research Questions](#research-questions)
   - [Data Overview](#data-overview)
   - [Approach](#approach)
   - [Results](#results)
   - [Conclusion](#conclusion)
2. [Exploratory Data Analysis](#exploratory-data-analysis)

# Executive Summary

## Challenge

We want to explore market trends in New York City's Airbnb listings by combining data from multiple file types. Specifically, we aim to determine the earliest and most recent review dates, count how many listings are private rooms, and calculate the average listing price.

## Research Questions

- What are the dates of the earliest and most recent reviews?
- How many of the listings are private rooms?
- What is the average listing price?

## Data Overview

### airbnb_price

| Index | Column      | Type   | Description                                    |
| ----- | ----------- | ------ | ---------------------------------------------- |
| 0     | listing_id  | int64  | Unique identifier of listing                   |
| 1     | price       | object | Nightly listing price in USD                   |
| 2     | nbhood_full | object | Name of borough and neighborhood where listing |

### airbnb_room

| Index | Column       | Type   | Description                                                                               |
| ----- | ------------ | ------ | ----------------------------------------------------------------------------------------- |
| 0     | listing_id   | int64  | Unique identifier of listing                                                              |
| 1     | description  | object | Listing description                                                                       |
| 2     | room_type    | object | Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments |

### airbnb_review

| Index | Column       | Type   | Description                             |
| ----- | ------------ | ------ | --------------------------------------- |
| 0     | listing_id   | int64  | Unique identifier of listing            |
| 1     | host_name    | object | Name of listing host                    |
| 2     | last_review  | object | Date when the listing was last reviewed |

## Approach

1. Loading the data
2. Merging the three DataFrames
3. Determining the earliest and most recent review dates
4. Finding how many listings are private rooms
5. Finding the average price of listings
6. Creating a DataFrame with the four solution values

## Results

- Earliest review date: 2010-01-01
- Most recent review date: 2020-12-31
- Number of private room listings: 11356
- Average listing price: 141.78

## Conclusion

By combining data from multiple file types, we successfully explored market trends in New York City's Airbnb listings. We determined the earliest and most recent review dates, counted the number of private room listings, and calculated the average listing price. This analysis provides valuable insights into the Airbnb market in New York City.

# Exploratory Data Analysis

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np

In [2]:
# Read in the CSV as a DataFrame
airbnb_price = pd.read_csv('airbnb_price.csv')

# Preview the data
print(airbnb_price.shape)
airbnb_price.head()

(25209, 3)


Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


In [3]:
# Read in the XLSX as a DataFrame
airbnb_room = pd.read_excel('airbnb_room.xlsx')

# Preview the data
print(airbnb_room.shape)
airbnb_room.head()

(25209, 3)


Unnamed: 0,listing_id,description,room_type
0,2595,Skylit Midtown Castle,Entire home/apt
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt
3,5178,Large Furnished Room Near B'way,private room
4,5238,Cute & Cozy Lower East Side 1 bdrm,Entire home/apt


In [4]:
# Read in the TSV as a DataFrame
airbnb_review = pd.read_csv('airbnb_review.tsv', sep = '\t')

# Preview the data
print(airbnb_review.shape)
airbnb_review.head()

(25209, 3)


Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019
3,5178,Shunichi,June 24 2019
4,5238,Ben,June 09 2019


In [5]:
# Join the three data frames together into one
listings = pd.merge(airbnb_price, airbnb_room, on='listing_id')
listings = pd.merge(listings, airbnb_review, on='listing_id')

# What are the dates of the earliest and most recent reviews?
# To use a function like max()/min() on last_review date column, it needs to be converted to datetime type
listings['last_review_date'] = pd.to_datetime(listings['last_review'], format='%B %d %Y')
first_reviewed = listings['last_review_date'].min()
last_reviewed = listings['last_review_date'].max()

# How many of the listings are private rooms?
# Since there are differences in capitalization, make capitalization consistent
listings['room_type'] = listings['room_type'].str.lower()
private_room_count = listings[listings['room_type'] == 'private room'].shape[0]

# What is the average listing price?
# To convert price to numeric, remove " dollars" from each value
listings['price_clean'] = listings['price'].str.replace(' dollars', '').astype(float)
avg_price = listings['price_clean'].mean()

# Create a DataFrame with the review dates and other statistics
review_dates = pd.DataFrame({
    'first_reviewed': [first_reviewed],
    'last_reviewed': [last_reviewed],
    'nb_private_rooms': [private_room_count],
    'avg_price': [round(avg_price, 2)]
})

# Print the review_dates DataFrame
print(review_dates)

  first_reviewed last_reviewed  nb_private_rooms  avg_price
0     2019-01-01    2019-07-09             11356     141.78
