### Data overview

In this project the data from the [Ukrainian Open Data Portal](https://data.gov.ua) was used.

The dataset contains information about international bus routes and bus stops in Ukraine. Under [this link](https://data.gov.ua/dataset/d5d79cf2-ccc7-40a9-9b7f-4c5cd0d061f9) two files are allocated: `international_routes_stops3.csv` (contains information about stops and schedule of international routes) and `international_routes.csv` (contains information about starting and destination points of the routes, along with schedules). Copies of those datasets (actual on Monday, June 10 2019) are contained under the `raw_data` directory of this repository.

In [12]:
import pandas as pd

In [22]:
stops = pd.read_csv('./raw_data/international_routes_stops3.csv', sep=';', encoding='windows-1251')
stops.head()

Unnamed: 0,id,rejs_number,stop_city_name,stop_address,stop_code,route_frequency,distance_direct,arrival_direct,departure_direct,time_diff_direct,arrival_return,departure_return,time_diff_return,stop_number,Unnamed: 14
0,CZ1357,1,Київ,"вулиця Набережно-Печерська дорога, 10-а",3200000000,Щоденно,0,20:00,20:10,0,15:30,,0,0.0,
1,CZ1357,1,Київ,"вул. С.Петлюри, 32",3200000000,Щоденно,12,20:30,21:00,0,14:40,15:00,0,1.0,
2,CZ1357,1,Житомир,"вул. Київська, 93",1810100000,Щоденно,140,23:20,23:40,0,12:50,13:00,0,2.0,
3,CZ1357,1,Рівне,"вул. Київська, 40",5610100000,Щоденно,328,2:50,3:10,0,10:30,10:40,0,3.0,
4,CZ1357,1,Львів,"вул. Стрийська, 109",4610100000,Щоденно,539,6:10,6:30,0,8:35,8:50,0,4.0,


In [14]:
routes = pd.read_csv('./raw_data/international_routes.csv')
routes.head()

Unnamed: 0,permision_delivery_date,n/o,id,march_route_name,company_name,company_name_external,border_crossing,border_crossing_alternative,march_route_time_direct,march_route_time_return,march_route_regularity,license_terms,destionation_country_name,destionation_country_code,march_route_status
0,,3,MD3,"АК АЕРОПОРТ БОРИСПІЛЬ (вул. Бориспільська, 7) ...",ФОП Гергі Є.М.,"S.R.L. ""SARVALTEH AUTO""",Табаки,16:00,16:00,,Щоденно,2020-02-08 00:00:00,Молдова,MD,True
1,,8,MD8,"Балта АС (вул. Уварова, 38а) - Дубосари","ПП Фаворіт 65005, м. Одеса, вул. Бугаївська, 2...","ОАО ""Дубоссарское автотранспортное предприятие...",Платоново,,,,,2020-06-07 00:00:00,Молдова,MD,True
2,05.06.2018,11,RU11,Бахмут-Москва,"ТОВ""Форум Авто""|ФОП""Мелкумян Ю.О.""","ООО""Газавторемонт""",Гоптівка,Плетенівка,16:50,16:50,Щоденно,2019-07-20 00:00:00,Росія,RU,True
3,,12,RU12,"Бахмут (АС вул. Космонавтів,5)-Москва (Новоясе...","ТОВ ""СЕВЕР-АВТО""",без паритету,Плетенівка,,,,,2019-02-28 00:00:00,Росія,RU,True
4,,13,RU13,Бахму - Москва,ФОП Базілєвський В.О.,"ООО ""Горизон-Тур""",Танюшівка,,19:30,15:30,Щоденно,2019-01-07 00:00:00,Росія,RU,True


In this project file `international_routes_stops3.csv` will be used. It should be filtered - in this research I'm interested only in routes, which are heading to Poland. 

**Explanation of useful columns:**

1. *ID* decodes the bus line id - when checking this ID in the `international_routes.csv` dataset, one could determine line operator's comoany name, information about it's license, border crossing point etc.

2. *stop_city_name* together with *stop_address* will be used to determine geolocation of the stop, which will be used for visualisation on maps.

3. *stop_number* is a column which will be used to determine the position of the stop in the route.

In [36]:
### Filtering out Polish routes
pol = stops[stops.id.str[:2].str.upper() == 'PL']

pol.shape

(7737, 15)

In [37]:
import numpy as np

In [41]:
np.unique(pol.id).shape

(484,)

We can see that we have 7737 stops belonging to 484 bus lines heading to Poland.