## Introduction

This project will be usefull to people who want to migrate from one city (Moscow, Russia) to another (New York, USA). It will allow to understand similarity of two selected boroughs - one in Moscow and one in New York. Similarity will be based on venue clusterization. 

## Data
We will be using Foursquare to get venue data. Foursquare will be used to get data for all neighborhoods in selected borough. We will get some data about boroughs and neighborhoods from the internet and some data will be taken from previous Coursera projects.

Loading libraries

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import re

In [2]:
url = 'https://en.wikipedia.org/wiki/Administrative_divisions_of_Moscow'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

In [3]:
# Getting Okrug names
# divs = soup.find_all("th", {"class": "navbox-group"})
# for el in divs:
#     string = el.text.strip()
#     if string[-5:] == 'Okrug':
#         print(string)

In [4]:
okrug_names = []
okrug_urls = []
divs = soup.find_all("th", {"class": "navbox-group"})
for el in divs[:12]:
#     okrug_urls.append(''.join(['https://en.wikipedia.org', re.findall(r'a href="(.*)" title', str(el))]))
    okrug_urls.append(''.join(['https://en.wikipedia.org', re.search(r'a href="(.*)" ', str(el)).group(1)]))
    if el.text.endswith('Okrug'):
        okrug_names.append(el.text)
df = pd.DataFrame(data=None, index=None, columns=['Okrug_names', 'Okrug_urls'], dtype=None, copy=False)
df['Okrug_names'] = okrug_names
df['Okrug_urls'] = okrug_urls
df

Unnamed: 0,Okrug_names,Okrug_urls
0,Central Administrative Okrug,https://en.wikipedia.org/wiki/Central_Administ...
1,Northern Administrative Okrug,https://en.wikipedia.org/wiki/Northern_Adminis...
2,North-Eastern Administrative Okrug,https://en.wikipedia.org/wiki/North-Eastern_Ad...
3,Eastern Administrative Okrug,https://en.wikipedia.org/wiki/Eastern_Administ...
4,South-Eastern Administrative Okrug,https://en.wikipedia.org/wiki/South-Eastern_Ad...
5,Southern Administrative Okrug,https://en.wikipedia.org/wiki/Southern_Adminis...
6,South-Western Administrative Okrug,https://en.wikipedia.org/wiki/South-Western_Ad...
7,Western Administrative Okrug,https://en.wikipedia.org/wiki/Western_Administ...
8,North-Western Administrative Okrug,https://en.wikipedia.org/wiki/North-Western_Ad...
9,Zelenogradsky Administrative Okrug,https://en.wikipedia.org/wiki/Zelenograd


In [5]:
lat_arr = []
lng_arr = []
for el in df.Okrug_urls:
    response = requests.get(el)
    soup = BeautifulSoup(response.text, 'html.parser')
    lat, lng = soup.find("span", {"class": "geo"}).text.split('; ') #Get Okrug coordinates
    lat_arr.append(lat)
    lng_arr.append(lng)
df['Latitude'] = lat_arr
df['Longitude'] = lng_arr
df

Unnamed: 0,Okrug_names,Okrug_urls,Latitude,Longitude
0,Central Administrative Okrug,https://en.wikipedia.org/wiki/Central_Administ...,55.75,37.617
1,Northern Administrative Okrug,https://en.wikipedia.org/wiki/Northern_Adminis...,55.833,37.517
2,North-Eastern Administrative Okrug,https://en.wikipedia.org/wiki/North-Eastern_Ad...,55.833,37.617
3,Eastern Administrative Okrug,https://en.wikipedia.org/wiki/Eastern_Administ...,55.783,37.767
4,South-Eastern Administrative Okrug,https://en.wikipedia.org/wiki/South-Eastern_Ad...,55.667,37.617
5,Southern Administrative Okrug,https://en.wikipedia.org/wiki/Southern_Adminis...,55.633,37.667
6,South-Western Administrative Okrug,https://en.wikipedia.org/wiki/South-Western_Ad...,55.65,37.533
7,Western Administrative Okrug,https://en.wikipedia.org/wiki/Western_Administ...,55.717,37.483
8,North-Western Administrative Okrug,https://en.wikipedia.org/wiki/North-Western_Ad...,55.817,37.433
9,Zelenogradsky Administrative Okrug,https://en.wikipedia.org/wiki/Zelenograd,55.99778,37.19028


In [15]:
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Moscow, RU'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Moscow are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Moscow are 55.7504461, 37.6174943.


In [16]:
map_Moscow = folium.Map(location=[latitude, longitude], zoom_start=9)
for Okrug_name, lat, lng in zip(df['Okrug_names'], \
                             df['Latitude'], \
                             df['Longitude']):
    label = '{}'.format(Okrug_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#863100',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Moscow)  
map_Moscow

In [17]:
def shortener(s):
    return ('').join(re.findall(r'\b\w', s))

df['Short_Okrugs'] = df.Okrug_names.apply(shortener)
df = df[['Okrug_names', 'Short_Okrugs', 'Latitude', 'Longitude', 'Okrug_urls']]
df

Unnamed: 0,Okrug_names,Short_Okrugs,Latitude,Longitude,Okrug_urls
0,Central Administrative Okrug,CAO,55.75,37.617,https://en.wikipedia.org/wiki/Central_Administ...
1,Northern Administrative Okrug,NAO,55.833,37.517,https://en.wikipedia.org/wiki/Northern_Adminis...
2,North-Eastern Administrative Okrug,NEAO,55.833,37.617,https://en.wikipedia.org/wiki/North-Eastern_Ad...
3,Eastern Administrative Okrug,EAO,55.783,37.767,https://en.wikipedia.org/wiki/Eastern_Administ...
4,South-Eastern Administrative Okrug,SEAO,55.667,37.617,https://en.wikipedia.org/wiki/South-Eastern_Ad...
5,Southern Administrative Okrug,SAO,55.633,37.667,https://en.wikipedia.org/wiki/Southern_Adminis...
6,South-Western Administrative Okrug,SWAO,55.65,37.533,https://en.wikipedia.org/wiki/South-Western_Ad...
7,Western Administrative Okrug,WAO,55.717,37.483,https://en.wikipedia.org/wiki/Western_Administ...
8,North-Western Administrative Okrug,NWAO,55.817,37.433,https://en.wikipedia.org/wiki/North-Western_Ad...
9,Zelenogradsky Administrative Okrug,ZAO,55.99778,37.19028,https://en.wikipedia.org/wiki/Zelenograd


In [18]:
for url in df.Okrug_urls:
# url = df['Okrug_urls'][1]
    response = requests.get(url)
    # soup = BeautifulSoup(response.text, 'html.parser')
    soup = BeautifulSoup(str(re.findall(r'</sup>..?</p>(.*)References', str(response.content))), 'html.parser')
    links = soup.select('ul li a')
    names = []
    urls = []
    for el in links:
        urls.append(''.join(['https://en.wikipedia.org', el.get('href')]))
        names.append(el.text)
    print(names)

['', 'Daugavpils', '', 'Riga', '', 'Ingolstadt', 'Bavaria']
['1 Territorial divisions', '2 Economy', '3 Education', '4 Coat of arms', '5 References', '5.1 Notes', '5.2 Sources', 'Aeroport', 'Begovoy', 'Beskudnikovsky', 'Dmitrovsky', 'Golovinsky', 'Khovrino', 'Khoroshyovsky', 'Koptevo', 'Levoberezhny', 'Molzhaninovsky', 'Savyolovsky', 'Sokol', 'Timiryazevsky', 'Vostochnoye Degunino', 'Voykovsky', 'Zapadnoye Degunino']
['Alexeyevsky', 'Altufyevsky', 'Babushkinsky', 'Bibirevo', 'Butyrsky', 'Lianozovo', 'Losinoostrovsky', 'Marfino', 'Maryina roshcha', 'Ostankinsky', 'Otradnoye', 'Rostokino', 'Severnoye Medvedkovo', 'Severny', 'Sviblovo', 'Yaroslavsky', 'Yuzhnoye Medvedkovo']
['1 Territorial divisions', '2 References', '2.1 Notes', '2.2 Sources', 'Bogorodskoye', 'Veshnyaki', 'Vostochnoye Izmaylovo', 'Vostochny', 'Golyanovo', 'Ivanovskoye', 'Izmaylovo', 'Kosino-Ukhtomsky', 'Metrogorodok', 'Novogireyevo', 'Novokosino', 'Perovo', 'Preobrazhenskoye', 'Severnoye Izmaylovo', 'Sokolinaya gora', 'S

In [19]:
url = 'https://en.wikipedia.org/wiki/Administrative_divisions_of_Moscow'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for table in soup.find_all('table', {'class' : 'wikitable'})[1:]:
    table_rows = table.find_all('tr')
    for tr in table_rows:
        td = tr.find_all('td')
        row = [i.text for i in td]
        print(row)

[]
['Arbat', 'Арбат\n', '25,699\n']
['Basmanny', 'Басманный\n', '100,899\n']
['Khamovniki', 'Хамовники\n', '97,110\n']
['Krasnoselsky', 'Красносельский\n', '45,229\n']
['Meshchansky', 'Мещанский\n', '56,077\n']
['Presnensky', 'Пресненский\n', '116,979\n']
['Tagansky', 'Таганский\n', '109,993\n']
['Tverskoy', 'Тверской\n', '75,955\n']
['Yakimanka', 'Якиманка\n', '22,822\n']
['Zamoskvorechye', 'Замоскворечье\n', '50,590\n']
[]
[]
['Northern Administrative Okrug (Северный административный округ, Severny administrativny okrug)\n', '1,112,846\n', ' Leningradsky Prospect runs through the Northern Administrative District.  Dmitrovsky  Northern Okrug districts\n']
['Districts under the administrative okrug jurisdiction:\n']
['\nAeroport (Аэропорт)\n', '74,775\n']
['\nBegovoy (Беговой)\n', '44,385\n']
['\nBeskudnikovsky (Бескудниковский)\n', '74,790\n']
['\nDmitrovsky (Дмитровский)\n', '88,931\n']
['\nGolovinsky (Головинский)\n', '102,160\n']
['\nKhoroshyovsky (Хорошёвский)\n', '55,949\n']
['\n

In [22]:
# url = 'https://en.wikipedia.org/wiki/Administrative_divisions_of_Moscow'
# b=0
# response = requests.get(url)
# soup = BeautifulSoup(response.text, 'html.parser')
# for table in soup.find_all('table', {'class' : 'wikitable'})[1:]:
#     table_rows = table.find_all('tr')
#     for tr in table_rows:
#         td = tr.find_all('td')
#         for el in td:
#             a = el.find_all('a', {'class' : ''})
#             if a:
#                 print(a[0].text, a[0].get('href'))
#                 b +=1
# print(b)

In [34]:
url = 'https://en.wikipedia.org/wiki/Administrative_divisions_of_Moscow'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# table = soup.select('table', {'class' : 'nowraplinks mw-collapsible mw-collapsed navbox-inner'})
table = soup.find('table', {'class' : 'nowraplinks mw-collapsible mw-collapsed navbox-inner'})
for el in table.find_all('a')[4:]:
    print(el.text)

Central Administrative Okrug
Arbat
Basmanny
Khamovniki
Krasnoselsky
Meshchansky
Presnensky
Tagansky
Tverskoy
Yakimanka
Zamoskvorechye

Northern Administrative Okrug
Aeroport
Begovoy
Beskudnikovsky
Dmitrovsky
Golovinsky
Khoroshyovsky
Khovrino
Koptevo
Levoberezhny
Molzhaninovsky
Savyolovsky
Sokol
Timiryazevsky
Vostochnoye Degunino
Voykovsky
Zapadnoye Degunino
North-Eastern Administrative Okrug
Alexeyevsky
Altufyevsky
Babushkinsky
Bibirevo
Butyrsky
Lianozovo
Losinoostrovsky
Marfino
Maryina roshcha
Ostankinsky
Otradnoye
Rostokino
Severnoye Medvedkovo
Severny
Sviblovo
Yaroslavsky
Yuzhnoye Medvedkovo
Eastern Administrative Okrug
Bogorodskoye
Golyanovo
Ivanovskoye
Izmaylovo
Kosino-Ukhtomsky
Metrogorodok
Novogireyevo
Novokosino
Perovo
Preobrazhenskoye
Severnoye Izmaylovo
Sokolinaya gora
Sokolniki
Veshnyaki
Vostochnoye Izmaylovo
Vostochny
South-Eastern Administrative Okrug
Kapotnya
Kuzminki
Lefortovo
Lyublino
Maryino
Nekrasovka
Nizhegorodsky
Pechatniki
Ryazansky
Tekstilshchiki
Vykhino-Zhulebino

In [77]:
import requests
import json
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["int_ref"="RU-MOW"];
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data = response.json()
data

{'version': 0.6,
 'generator': 'Overpass API 0.7.55.7 8b86ff77',
 'osm3s': {'timestamp_osm_base': '2019-10-25T12:16:02Z',
  'timestamp_areas_base': '2019-10-25T11:34:02Z',
  'copyright': 'The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.'},
 'elements': [{'type': 'area',
   'id': 3600102269,
   'tags': {'ISO3166-2': 'RU-MOW',
    'addr:country': 'RU',
    'addr:region': 'Москва',
    'admin_level': '4',
    'border_type': 'region',
    'boundary': 'administrative',
    'gost_7.67-2003': 'РОФ-МОС',
    'int_ref': 'RU-MOW',
    'name': 'Москва',
    'name:be': 'Масква',
    'name:ca': 'Moscou',
    'name:cs': 'Moskva',
    'name:de': 'Moskau',
    'name:en': 'Moscow',
    'name:eo': 'Moskvo',
    'name:fi': 'Moskova',
    'name:fr': 'Moscou',
    'name:fy': 'Moskou',
    'name:hr': 'Moskva',
    'name:ja': 'モスクワ',
    'name:lt': 'Maskva',
    'name:nl': 'Moskou',
    'name:no': 'Moskva',
    'name:pl': 'Moskwa',
    'name:ru': 'Москв

In [95]:
import requests
import json
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["addr:country"="RU"]["addr:region"="Москва"][admin_level=5];
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data = response.json()
data['elements']

[{'type': 'area',
  'id': 3600162903,
  'tags': {'addr:country': 'RU',
   'addr:region': 'Москва',
   'admin_level': '5',
   'boundary': 'administrative',
   'name': 'Северный административный округ',
   'name:be': 'Паўночная адміністрацыйная акруга',
   'name:de': 'Nördlicher Verwaltungsbezirk',
   'name:en': 'Northern Administrative Okrug',
   'name:ru': 'Северный административный округ',
   'ref': 'САО',
   'type': 'boundary',
   'website': 'https://www.sao.mos.ru/',
   'wikidata': 'Q462016',
   'wikipedia': 'ru:Северный административный округ'}},
 {'type': 'area',
  'id': 3600226149,
  'tags': {'addr:country': 'RU',
   'addr:region': 'Москва',
   'admin_level': '5',
   'boundary': 'administrative',
   'name': 'Западный административный округ',
   'name:be': 'Заходняя адміністрацыйная акруга',
   'name:de': 'Westlicher Verwaltungsbezirk',
   'name:en': 'Western Administrative Okrug',
   'name:ru': 'Западный административный округ',
   'ref': 'ЗАО',
   'type': 'boundary',
   'website