# City Dataset Generator

You can find the dataset at the following url : [click me 😁](https://simplemaps.com/data/world-cities).
The license of the dataset is also in this repo.

We start by defining the globals of our program

In [24]:
N_CITIES = 10

Now we load the dataset

In [25]:
import pandas as pd
from haversine import haversine as hs
from IPython.display import display
import plotly.express as px
import plotly.io as pio
pio.renderers.default='notebook'

cities = pd.read_csv("worldcities.csv")
display(cities)

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Tokyo,Tokyo,35.6897,139.6922,Japan,JP,JPN,Tōkyō,primary,37977000.0,1392685764
1,Jakarta,Jakarta,-6.2146,106.8451,Indonesia,ID,IDN,Jakarta,primary,34540000.0,1360771077
2,Delhi,Delhi,28.6600,77.2300,India,IN,IND,Delhi,admin,29617000.0,1356872604
3,Mumbai,Mumbai,18.9667,72.8333,India,IN,IND,Mahārāshtra,admin,23355000.0,1356226629
4,Manila,Manila,14.6000,120.9833,Philippines,PH,PHL,Manila,primary,23088000.0,1608618140
...,...,...,...,...,...,...,...,...,...,...,...
40996,Tukchi,Tukchi,57.3670,139.5000,Russia,RU,RUS,Khabarovskiy Kray,,10.0,1643472801
40997,Numto,Numto,63.6667,71.3333,Russia,RU,RUS,Khanty-Mansiyskiy Avtonomnyy Okrug-Yugra,,10.0,1643985006
40998,Nord,Nord,81.7166,-17.8000,Greenland,GL,GRL,Sermersooq,,10.0,1304217709
40999,Timmiarmiut,Timmiarmiut,62.5333,-42.2167,Greenland,GL,GRL,Kujalleq,,10.0,1304206491


Now we select `n` cities

In [26]:
cities_sample = cities.sort_values("population", ascending=False)[:N_CITIES]
# .sample(n=N_CITIES, random_state=RANDOM_SEED)[["city","lat","lng"]]
display(cities_sample)

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Tokyo,Tokyo,35.6897,139.6922,Japan,JP,JPN,Tōkyō,primary,37977000.0,1392685764
1,Jakarta,Jakarta,-6.2146,106.8451,Indonesia,ID,IDN,Jakarta,primary,34540000.0,1360771077
2,Delhi,Delhi,28.66,77.23,India,IN,IND,Delhi,admin,29617000.0,1356872604
3,Mumbai,Mumbai,18.9667,72.8333,India,IN,IND,Mahārāshtra,admin,23355000.0,1356226629
4,Manila,Manila,14.6,120.9833,Philippines,PH,PHL,Manila,primary,23088000.0,1608618140
5,Shanghai,Shanghai,31.1667,121.4667,China,CN,CHN,Shanghai,admin,22120000.0,1156073548
6,São Paulo,Sao Paulo,-23.5504,-46.6339,Brazil,BR,BRA,São Paulo,admin,22046000.0,1076532519
7,Seoul,Seoul,37.56,126.99,"Korea, South",KR,KOR,Seoul,primary,21794000.0,1410836482
8,Mexico City,Mexico City,19.4333,-99.1333,Mexico,MX,MEX,Ciudad de México,primary,20996000.0,1484247881
9,Guangzhou,Guangzhou,23.1288,113.259,China,CN,CHN,Guangdong,admin,20902000.0,1156237133


We get the distances between each cities.

In [27]:
distances = []

for index_1, city_1 in cities_sample.iterrows():
  row = []
  for index_2, city_2 in cities_sample.iterrows():
    city_1_gps = (city_1["lat"], city_1["lng"])
    city_2_gps = (city_2["lat"], city_2["lng"])
    distance = round(hs(city_1_gps, city_2_gps))
    row.append(distance)
  distances.append(row)

cities_names = list(cities_sample["city_ascii"])
distances = pd.DataFrame(distances, columns=cities_names, index=cities_names)
display(distances)

Unnamed: 0,Tokyo,Jakarta,Delhi,Mumbai,Manila,Shanghai,Sao Paulo,Seoul,Mexico City,Guangzhou
Tokyo,0,5786,5833,6734,2995,1761,18534,1152,11306,2903
Jakarta,5786,0,5010,4661,2791,4438,15628,5297,16846,3336
Delhi,5833,5010,0,1167,4754,4245,14431,4684,14653,3642
Mumbai,6734,4661,1167,0,5134,5041,13766,5607,15656,4208
Manila,2995,2791,4754,5134,0,1843,18379,2621,14220,1248
Shanghai,1761,4438,4245,5041,1843,0,18568,873,12914,1207
Sao Paulo,18534,15628,14431,13766,18379,18568,0,18343,7432,17963
Seoul,1152,5297,4684,5607,2621,873,18343,0,12051,2071
Mexico City,11306,16846,14653,15656,14220,12914,7432,12051,0,14120
Guangzhou,2903,3336,3642,4208,1248,1207,17963,2071,14120,0


We save the dataframe into a csv.

In [28]:
distances.to_csv("cities_distances.csv", index=False)

We plot the cities on a map.

In [30]:
globe = px.scatter_geo(data_frame=cities_sample, lat="lat", lon="lng", hover_name="city", size="population", projection="orthographic")
globe.show()

We plot the connections on a map

In [32]:
# cities_sample_2 = cities_sample.reindex([0, 2, 3, 1, 4])
# globe = px.line_geo(data_frame=cities_sample_2, lat="lat", lon="lng", hover_name="city", projection="orthographic")
# globe.show()