# Projeto Airbnb Rio - Ferramenta de Previsão de Preço de Imóvel para pessoas comuns 

### Context
On Airbnb, anyone who owns a room or a property of any type (apartment, house, cabin, guesthouse, etc.) can list their property to be rented on a per-night basis.

You create your host profile (a person who offers a property for nightly rental) and create a listing for your property.

In this listing, the host should describe the property's features as completely as possible, in order to help renters/travelers choose the best option for them (and also to make their listing more appealing).

There are dozens of customization options for a listing — from minimum nights, price, number of rooms, to cancellation policies, extra fees for additional guests, ID verification requirements, and more.

### Our Goal
To build a price prediction model that allows an ordinary person with a property to know how much they should charge per night.

Or, from a renter's perspective, given the property they are considering, help them know whether the price is attractive (i.e., below average for properties with similar characteristics) or not.

### What We Have, Inspirations, and Credits
The datasets were taken from Kaggle: https://www.kaggle.com/allanbruno/airbnb-rio-de-janeiro

They are available for download below the lesson (if you pull the data directly from Kaggle, your results might differ from mine, since the datasets could have been updated).

If you'd like to see another approach, we can use as reference the solution by Kaggle user Allan Bruno in this Notebook: https://www.kaggle.com/allanbruno/helping-regular-people-price-listings-on-airbnb

You'll notice some similarities between the solution we will build here and his, but also some significant differences in the project's construction process.

- The datasets contain the property prices and their respective features for each month.

- Prices are given in Brazilian Reais (R$)

- We have data from April 2018 to May 2020, except for June 2018, which is missing.

### Initial Expectations
- I believe seasonality might be an important factor, since months like December are typically more expensive in Rio.

- The property's location should heavily influence the price, as in Rio de Janeiro the location can drastically change the area's characteristics (safety, natural beauty, tourist attractions).

- Amenities may have a significant impact, since many buildings and houses in Rio are old.

Let’s find out how much these factors influence the price, and whether there are other less intuitive ones that are extremely important.



#### Importar Bibliotecas e Bases de Dados

In [8]:
import pandas as pd
import pathlib

#### Creating Database = Merging all files in just one DataFrame

In [None]:
months = {'jan': 1, 'fev': 2, 'mar': 3, 'abr': 4, 'mai': 5, 'jun': 6,
         'jul': 7, 'ago': 8, 'set': 9, 'out': 10, 'nov': 11, 'dez': 12}

database_path = pathlib.Path('dataset')

aux = []

for file in database_path.iterdir():
    
    month_name = file.name[:3].lower()
    month = months[month_name]
    
    year = file.name[-8:]
    year = int(year.replace('.csv', ''))

    df = pd.read_csv(database_path / file.name)
    df['month'] = month
    df['year'] = year 

    aux.append(df)

database_airbnb = pd.concat(aux)

display(database_airbnb)
                              


### Consolidar Base de Dados

In [14]:
print(list(database_airbnb.columns))

['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary', 'space', 'description', 'experiences_offered', 'neighborhood_overview', 'notes', 'transit', 'access', 'interaction', 'house_rules', 'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url', 'host_id', 'host_url', 'host_name', 'host_since', 'host_location', 'host_about', 'host_response_time', 'host_response_rate', 'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url', 'host_picture_url', 'host_neighbourhood', 'host_listings_count', 'host_total_listings_count', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'street', 'neighbourhood', 'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market', 'smart_location', 'country_code', 'country', 'latitude', 'longitude', 'is_location_exact', 'property_type', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'bed_type', 'amenities', 'square_feet', 'price', 'weekly_price', 'monthly_price', '

### Se tivermos muitas colunas, já vamos identificar quais colunas podemos excluir

### Tratar Valores Faltando

### Verificar Tipos de Dados em cada coluna

### Análise Exploratória e Tratar Outliers

### Encoding

### Modelo de Previsão

### Análise do Melhor Modelo

### Ajustes e Melhorias no Melhor Modelo