<h1>Flat prices analysis in St. Petersburg with geo data</h1>

<div style="background:#abd5f5; border:1px solid #b3deff; padding: 20px">
    <h2 style="color:#002b63">Table of content</h2>
<ul>
    <li>Indroduction</li>
    <li>Getting geo data</li>
    <li>Map with prices</li>
    <li>Choropleth map</li>
</ul>
    </div>

<h2>Indroduction</h2>

This analysis focuses on geodata. I'm going to get the coordinates of each flat and build a map with markers that reflect the range of prices. In addition, I am going to create a choropleth map in which each area of the city will have a color corresponding to the mean price in the area.

<h2>Getting geo data</h2>

I am going to use my own CSV file. I created it using data from https://spb.cian.ru/kupit-kvartiru/ This file contains a list of St. Petersburg flats for sale.

In [2]:
import pandas as pd
import numpy as np

flats=pd.read_csv('data/flats_all.csv',sep=';')
flats.head()

Unnamed: 0,link,price,total_area,living_area,kitchen_area,floor,type,height,bathrooms,balconies,...,renovation,view,rooms,city,area,neighborhood,street,metro_name,metro_km,number_of_floors
0,https://spb.cian.ru/sale/flat/234775065/,13143249.0,74.13,26.3,22.0,3,Новостройка,3.3,2.0,1.0,...,,,2,Санкт-Петербург,р-н Приморский,Юнтолово,"Планерная ул., 94",Комендантский проспект,1.98,12
1,https://spb.cian.ru/sale/flat/239273301/,12430600.0,47.81,,,2,Новостройка,,0.0,0.0,...,,,1,Санкт-Петербург,р-н Петроградский,Посадский,"ул. Рентгена, 25",Петроградская,1.04,8
2,https://spb.cian.ru/sale/flat/250966190/,13800000.0,44.1,14.1,19.4,2,Новостройка,,0.0,0.0,...,,,1,Санкт-Петербург,р-н Курортный,мкр. Сестрорецк,"ул. Максима Горького, 2Ас2",Беговая,9.57,5
3,https://spb.cian.ru/sale/flat/249950664/,8730851.0,56.29,29.3,10.9,5,Новостройка,,2.0,2.0,...,,,2,Санкт-Петербург,р-н Приморский,Юнтолово,Нью Тайм жилой комплекс,Комендантский проспект,1.98,13
4,https://spb.cian.ru/sale/flat/250766812/,13650000.0,70.8,,10.0,13,Вторичка,2.8,1.0,1.0,...,Евроремонт,На улицу и двор,2,Санкт-Петербург,р-н Приморский,Комендантский аэродром,"аллея Поликарпова, 6к1",Пионерская,1.04,19


At first, I remove all data with a missing address

In [21]:
flats.drop(flats[flats['street'].isna()].index,axis='rows',inplace=True)
flats.reset_index(drop=True, inplace=True)

Connection to Google Maps API

In [23]:
import googlemaps
f=open('keys/api_google_maps_key')
mykey=f.read()
f.close()
gmaps = googlemaps.Client(key=mykey)

Geocoding - getting coordinates using address

In [24]:
# libraries for progress bar
from ipywidgets import IntProgress 
from IPython.display import display

# adding columns with latitude and longitude
flats['lat']=0.0
flats['lng']=0.0

# init progress bar
progress = IntProgress(min=0, max=len(flats), value=0)
display(progress)

# geocoding
for i in range(len(flats)):
    geocode_result = gmaps.geocode(flats.loc[i,'city']+', '+flats.loc[i,'street'])
    flats.loc[i,'lat']=geocode_result[0]['geometry']['location']['lat']
    flats.loc[i,'lng']=geocode_result[0]['geometry']['location']['lng']
    progress.value = i

IntProgress(value=0, max=1401)

Let's save results to the csv file.

In [25]:
flats.to_csv('data/flats_cian_2.csv',sep=';',index=False)

<h2>Map with prices</h2>

In [3]:
flats=pd.read_csv('data/flats_cian_2.csv',sep=';')

Next, I will create the map of Saint Petersburg using Folium library. Also, I will add markers with all the flats. The marker color will be reflect the price of the flat.

In [31]:
# library for bulding maps
import folium

# map with coordinates of St. Petersburg
map_piter = folium.Map(location=[59.9810199, 30.3540484], zoom_start=9)

# loop for adding markers to map
for lat, lng, price, name in zip(flats['lat'],flats['lng'],flats['price'],flats['link']):
    # label name
    label = '{}'.format(round(price))
    label = folium.Popup(label, parse_html=True)
    # color of the marker
    if price<5000000:
        clr='#00ffff'
    elif price<10000000:
        clr='#91e2da'
    elif price<15000000:
        clr='#d9a694'
    elif price<20000000:
        clr='#eb8473'
    elif price<25000000:
        clr='#f75b53'
    else:
        clr='#ff0035'
    # adding markers
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=clr,
        fill=True,
        fill_color=clr,
        fill_opacity=0.8,
        parse_html=False).add_to(map_piter)
    
# labels and colors for a legend on the map
legend_colors = ['#00ffff','#91e2da','#d9a694','#eb8473','#f75b53','#ff0035']
legend_labels = ['<5','5-10','10-15','15-20','20-25','>25']
legend_categories = ""
for label, color in zip(legend_labels,legend_colors):
    legend_categories += f'<li><span style="background-color:{color}">{label}</span></li>'
    
# html for the legend
legend_html = """
    <div style="
         position: fixed; 
         bottom: 50px; left: 50px; width: 200px; height: 160px; 
         border:2px solid grey; z-index:9999; 
         background-color:white;
         opacity: .85;
         font-size:14px;
         font-weight: bold;
        ">
        Prices in million rubbles
        {categories}
    </div> """.format(categories=legend_categories)
map_piter.get_root().html.add_child(folium.Element(legend_html))

# displaying the map
map_piter

<h2>Choropleth map</h2>

At first, I create a dataframe with 2 columns: an area and a mean price for flats in the area.

In [76]:
# group by area
mean_area_prices=flats[['area','price']].groupby(['area']).mean()
# reset index
mean_area_prices=mean_area_prices.reset_index()
# change text format for matching with the geojson file
mean_area_prices['area']=mean_area_prices['area'].str[5:]+' район'
# divide prices on 1 mln
mean_area_prices['price']=mean_area_prices['price']/1000000

Let's create the map with areas. Each area of the city will have a color corresponding to the mean price in the area

In [98]:
# map with coordinates of St. Petersburg
map_piter_clasters = folium.Map(location=[59.9810199, 30.3540484], zoom_start=9)

# add clusters
map_piter_clasters.choropleth(
    #geojson file with areas borders
    geo_data='data/spb.geojson',
    data=mean_area_prices,
    columns=['area','price'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    legend_name='Mean price for flats in the area, in millions of rubles')

# displaying the map
map_piter_clasters