# KNN for Airbnb Listing Pricing
The below codes use the airbnb seattle listing features and price to train the knn model, by tunning different features and n_neibors values, it can be used to predict new aribnb listing price. 
For example, I have a place with 2 bedrooms, 3 beds and can accommodate 4 people, this model will let me know that my place can be rented out at around $158 in Seattle Airbnb market.

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor

In [2]:
#read data
df = pd.read_csv('listings.csv')
print('Here is the info of Airbnb Seattle Listing file')
df.info()

Here is the info of Airbnb Seattle Listing file
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7505 entries, 0 to 7504
Columns: 106 entries, id to reviews_per_month
dtypes: float64(19), int64(23), object(64)
memory usage: 6.1+ MB


In [3]:
#define feature columns and do data cleaning
feature_columns = ['accommodates','bathrooms','bedrooms','beds','price']
df['price'] = df['price'].str.replace('\$|,','').astype(float)
#print(df.head())

In [4]:
#standardize the data
df_valueset = df[feature_columns].dropna()
scaler= StandardScaler().fit(df_valueset)
df_valueset_normed = scaler.transform(df_valueset)
df_valueset_normed_df = pd.DataFrame(df_valueset_normed, index=df_valueset.index, columns=df_valueset.columns)
#df_valueset.info()
#print(df_valueset)
#print(df_valueset_normed_df)

In [5]:
#define training set and test set
df_train= df_valueset_normed_df.iloc[:6000]
df_test = df_valueset_normed_df.iloc[6000:]

In [6]:
#fit the KNN model then predict the test set
cols = ['accommodates','bedrooms','beds'] #cols can be used to adjust the model

neigh = KNeighborsRegressor(n_neighbors=50) # n_neighbors can be used to adjust the model
neigh.fit(df_train[cols],df_train['price'])

prediction = neigh.predict(df_test[cols])

In [7]:
#use the test set to see the model accuracy
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(prediction, df_test['price'])
mser = mse**(1/2)
print(mser) # This can test the accuracy of the model

1.9066139649836509


In [8]:
myplace = [[4,0,2,3,0]]
myplace_normed = scaler.transform(myplace)
print(myplace_normed)

[[ 0.12284015 -1.97831425  0.57871191  0.73650024 -0.59873115]]


In [9]:
#use the knn model trained above, if I have a place with 2 bedrooms, 3 beds and can accommodate 4 people, using the market data, I can probably airbnb it out at the below price:
prediction_myplace_normed = neigh.predict([[0.12284015,0.57871191,0.73650024]])
price_mean = df_valueset['price'].mean()
price_std = df_valueset['price'].std()
prediction_myplace = prediction_myplace_normed*price_std+price_mean
print('The predicted market price is: $'+ str(prediction_myplace))

The predicted market price is: $[157.99951023]


Additional Note:KNN is quick and easy to use, but due to its algorithm that scan thorugh all dataset, compute the distance then return the velue based on defined 'n'neibors', it can be time consuiming to run on very large dataset