Skip to content

vaaridhi/webscraping-sf-house-prices

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

webscraping-sf-house-prices

The San Francisco housing market can be unpredictable and also highly sought after, given the prime location of several neighborhoods. Being a tech hub, many people come to San Francisco for jobs (high employment), creating start-ups (talented workforce, VCs), favorable climate, and culture. It is also known to be the most expensive market in the US. The housing market in metropolitan areas is inundated with high variations in prices based on numerous factors making it harder for players in the market (realtors, homeowners, buyers, tenants, banks, government, etc.) to track this information and quote the optimal prices.

The goal of this project is to create a web scraping tool using the Beautiful Soup library in Python to extract data of houses sold in San Francisco in the last ten months from Trulia.com and store it in MongoDB. This includes features like no. of bedrooms, house size, price, years built, apartment features, and more. The tool can be used by market stakeholders to track this information. There are several insights they can gain from this data, like the specific premium properties in the city, the price premium paid per zip code, and creating SAAS tools for customers. These tools can include a dashboard to track housing metrics over time, recommender systems to match customers with houses based on their requirements, or machine learning models to predict prices.