# В данной работе будет исследоваться:
1. Зависимость количества ДТП от ям\неровностей на дороге
2. ...


## Installing dependecies, importing some of them and setting global variables

In [2]:
%pip install -qU setuptools pandas numpy plotly rich tqdm geopandas geojson bs4 requests ipywidgets jupyterlab_widgets pydantic

Note: you may need to restart the kernel to use updated packages.


In [3]:
import pickle
from rich import print
from tqdm.notebook import tqdm
from pathlib import Path
# from google.colab import output
# output.enable_custom_widget_manager()

data_accidents_source_url = 'https://dtp-stat.ru/opendata'

## Downloading data
If you are already have data in `./data` folder - skip this section

### Download HTML table and save links to list
source link was added in import section:
```python
data_accidents_source_usr = 'https://dtp-stat.ru/opendata'
```

In [4]:
from bs4 import BeautifulSoup
import urllib.request

html_page = urllib.request.urlopen(data_accidents_source_url)
soup = BeautifulSoup(html_page, "html.parser")
download_links = []
for link in soup.findAll('a'):
    link_href = link.get('href')
    download_links.append(link_href) if '.geojson' in link_href else None
print(download_links)

### Download every file and store it `./data/*.geojson` file
NOTE: `geojson_files` will be changed in next cell

In [5]:
import requests
from pathlib import Path

geojson_files = []
Path.mkdir(Path('data'), exist_ok=True)

for file_link in tqdm(download_links):
    file_data = requests.get(file_link).content
    with open(file_link.replace('https://cms.dtp-stat.ru/media/opendata/', 'data/'), 'wb+') as file:
        geojson_files.append(file.name)
        file.write(file_data)


  0%|          | 0/85 [00:00<?, ?it/s]

## Loading data from file and convert it to `CarAccident` class

### Load data from file and store it `geojson_objects` list

In [12]:
import json
import geojson 

# for those who already have the data downloaded, so you can skip the previous cell
geojson_files = [str(file) for file in Path.cwd().glob('data/*.geojson')]
geojson_objects = []

for file in tqdm(geojson_files):
    with open(file, 'r', encoding='utf-8') as f:
            data = json.loads(f.read())
            geojson_objects.append(geojson.FeatureCollection(data['features']))
            #print(data['features'][0]['properties'])
print(geojson_objects[0].features[0]['properties'])

### Create a dataclasses for better data analysis
NOTE that dataclasses are inherits from `pydantic.BaseModel`

In [9]:
from pydantic import BaseModel

class Participant(BaseModel):
    role: str | None
    gender: str | None
    violations: list | None
    health_status: str | None
    years_of_driving_experience: int | None

class Vehicle(BaseModel):
    year: int | None
    brand: str | None
    color: str | None
    category: str | None
    participants: list[Participant] | None

class CarAccident(BaseModel):
    id: int | None
    tags: list | None
    light: str | None
    point: dict | None
    nearby: list | None
    region: str | None
    scheme: str | None
    address: str | None
    weather: list | None 
    category: str | None
    datetime: str | None
    severity: str | None
    vehicles: list[Vehicle] | None 
    dead_count: int | None
    participants: list[Participant] | None
    injured_count: int | None
    parent_region: str | None 
    road_conditions: list | None
    participants_count: int | None
    participant_categories: list | None

In [10]:
# load properties from geojson_objects to CarAccident objects
car_accidents = []
for geojson_object in geojson_objects:
    for feature in geojson_object['features']:
        car_accident = CarAccident(**feature['properties'])
        car_accidents.append(car_accident)

print(car_accidents[0])

In [20]:
pickle.dump(car_accidents, open('car_accidents.pkl', 'wb'))

In [None]:
car_accidents = pickle.load(open('car_accidents.pkl', 'rb'))
print(car_accidents[0])