The code should run with no issues using Python versions 3.*.
The data set, containing information about Airbnb descriptions and ratings in New York City, was imported from Inside Airbnb on 2021-05-02.
- one notebook, containing all the codes and explanatory markdown cells,
- one csv file, containing all data needed
I tried to understand the most important question of a potential Airbnb host: what should a host do to obtain high ratings from the customers?
More specifically, I tried to answer the following three questions:
- Which neighborhood has a high rating in 'location'?
- Will the ratings be higher, if the host lives in the same neighborhood where the listing is?
- Overall, what is most important if a host wants to achieve a high overall rating?
To answer the above questions, I perform data cleansing to the original data set. Several assumptions are made, including the following:
- For time-related information, such as the date when the host started, I convert them into day counts until the data scraping day. In other words, I only care about the length of time, instead of specific dates.
- For descriptive information, such as the description of the Airbnb and the "About" of hosts, I avoid doing a text analysis by just counting the lengths of the descriptions. This could be an oversimplification and could be improved in a future work.
- For location-related information, such as the neighbourhoods of Airbnbs and the host locations, I apply some simplifications. For example, for the host locations, I only extract information about whether the host is in New York, instead of caring about the specific locations.
There are other assumptions in my data cleansing process. More can be seen in the notebook, where every assumption is clearly stated by the corresponding code cells.
The main findings of the code are summarized at this post.
As mentioned above, the data is from Inside Airbnb which is available under a Creative Commons CC0 1.0 Universal (CC0 1.0) "Public Domain Dedication" license. Otherwise, feel free to use the code here as you like!