Skip to content
/ dsba_MST Public

Building a minimum spanning tree that spans Russian cities

Notifications You must be signed in to change notification settings

thxi/dsba_MST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visualization

Data visualization can be found HERE

WARNING: yandex map might take some time to load

The buttons and the tabs should be pretty self-explanatory

  • boruvka/kruskal/prim - the tab showcasing the obtained mst (which is unique considering the circumstances) using the algorithm
  • comparison - the tab which shows how the distances compare to existing ones. The lower the opacity of the road, the higher the difference in length between the existing road and the blue one; Lines are clickable
  • Yandex map - the map which uses yandex js api to create the map inplace to show the real and 'imaginary' roads on one map

To run locally you need docker

Then:

make build-front
make run-front # runs on localhost:1337

make kill-front # to stop the container

Miscellaneous:

Obtained coordinates of cities can be found in data/ru_lat_lng.csv or jupyter/cities->edges.ipynb

That dataset contained some outliers which had to be deleted;

The corresponding jupyter notebook: jupyter/city_outliers.ipynb

Clear dataset: ru_lat_lng_clear.csv


The distance between cities was calculated according to the Vincenty's formulae (see great circle distance)

The corresponding jupyter notebook: jupyter/city_distances.ipynb


At this point there were 2 datasets: data/ru_dist_mat_v1.csv and data/ru_dist_mat_v2.csv

postfix:

  • v1 - Including all russian cities
  • v2 - Including only russian cities with population > 200k

To see how the second one was obtained see jupyter/Big cities.ipynb


Algorithms can be found in algs directory;

To get the MSTs run:

python3 algs/mst.py v* # where * is either 1 or 2

Distances between roads were acquired with the use of Google Distance Matrix

Corresponding jupyter notebook

Corresponding dataset v1 dataset v2


The obtained MSTs were visualized using Folium library algs/html_maps.py

python3 algs/html_maps.py v* # where * is either 1 or 2

Or refer to jupyter/Comparing distances.ipynb where we made the interactive maps (maps/comparison_map_markers_v* or maps/comparison_map_* or better use the web app)

Comparison

See jupyter/Comparing distances.ipynb to see how the distances were compared (the bottom of the notebook)

We have saved around 12000 km by just connecting the cities directly

It is also worth looking at the boxplot and distributions of differences: Box Dist As we can see the majority of distance saves were not greater than approximately 200 km

Used resources:

cities dataset

yandex geocoder

Google Distance Matrix

heroku to deploy the visualized data