# Data asignment

Данный пример направлен на агрегацию данных в рамках заранее подготовленных городских кварталов г. Санкт-Петербурга:
- Функциональное назначение кварталов.
- Параметры застройки.
- Параметры городских сервисов.

## Городские кварталы

1. Чтение слоя городских кварталов:
    - `geometry : Polygon`
2. Определение проекционной системы координат

In [37]:
import geopandas as gpd

blocks = gpd.read_file('./input/blocks.geojson')
crs = blocks.estimate_utm_crs()
blocks = blocks.to_crs(crs)
blocks.head()

Unnamed: 0,geometry
0,"POLYGON ((349424.859 6631180.891, 349424.751 6..."
1,"POLYGON ((352083.617 6633950.146, 352240.448 6..."
2,"POLYGON ((346700.642 6618453.176, 346681.107 6..."
3,"POLYGON ((347043.363 6618261.219, 347042.608 6..."
4,"POLYGON ((354879.039 6618859.116, 354845.405 6..."


Смотрим, какая у нас определилась система координат:

In [38]:
crs

<Projected CRS: EPSG:32636>
Name: WGS 84 / UTM zone 36N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 30°E and 36°E, northern hemisphere between equator and 84°N, onshore and offshore. Belarus. Cyprus. Egypt. Ethiopia. Finland. Israel. Jordan. Kenya. Lebanon. Moldova. Norway. Russian Federation. Saudi Arabia. Sudan. Syria. Türkiye (Turkey). Uganda. Ukraine.
- bounds: (30.0, 0.0, 36.0, 84.0)
Coordinate Operation:
- name: UTM zone 36N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## Функциональное зонирование

Чтение слоя функциональных зон и перевод в проекционную систему координат:
- `geometry : Polygon | MultiPolygon`
- `functional_zone : str`

In [39]:
functional_zones = gpd.read_file('./input/functional_zones.geojson').to_crs(crs)
functional_zones.head()

Unnamed: 0,functional_zone,geometry
0,Т1Ж2-2,"MULTIPOLYGON (((349202.695 6660862.796, 349199..."
1,Т1Ж2-2,"MULTIPOLYGON (((345558.116 6666406.372, 345528..."
2,Т1Ж2-2,"MULTIPOLYGON (((347805.242 6663237.649, 347790..."
3,Т1Ж2-2,"MULTIPOLYGON (((346292.257 6667294.593, 346266..."
4,Т1Ж2-2,"MULTIPOLYGON (((350166.384 6660332.141, 350186..."


Задание правил маппинга `functional_zone` в `blocksnet.enums.LandUse`.

P.S: всё, что не определено в `rules`, будет отфильтровано во время назначения зонирования.

In [40]:
from blocksnet.enums import LandUse

rules = {
    "Т3Ж1": LandUse.RESIDENTIAL,
    "ТР0-2": LandUse.RECREATION,
    "Т3Ж2": LandUse.RESIDENTIAL,
    "Т1Ж2-1": LandUse.RESIDENTIAL,
    "Т2ЖД2": LandUse.RESIDENTIAL,
    "ТД1-3": LandUse.BUSINESS,
    "ТД2": LandUse.BUSINESS,
    "ТД3": LandUse.BUSINESS,
    "ТУ": LandUse.TRANSPORT,
    "ТИ4": LandUse.TRANSPORT,
    "ТД1-1": LandUse.RESIDENTIAL,
    "ТД1-2": LandUse.RESIDENTIAL,
    "ТПД1": LandUse.INDUSTRIAL,
    "ТПД2": LandUse.INDUSTRIAL,
    "ТИ1-1": LandUse.TRANSPORT,
    "Т3ЖД3": LandUse.RESIDENTIAL,
    "ТК1": LandUse.SPECIAL,
    "ТР2": LandUse.RECREATION,
    "ТИ2": LandUse.TRANSPORT,
    "ТР5-2": LandUse.RECREATION,
    "Т1Ж2-2": LandUse.RESIDENTIAL,
    "ТР4": LandUse.RECREATION,
    "ТР5-1": LandUse.RECREATION,
    "Т2Ж1": LandUse.RESIDENTIAL,
    "ТИ3": LandUse.TRANSPORT,
    "Т1Ж1": LandUse.RESIDENTIAL,
    "ТИ1-2": LandUse.TRANSPORT,
    "ТР3-2": LandUse.RECREATION,
    "ТР0-1": LandUse.RECREATION,
    "ТП2": LandUse.INDUSTRIAL,
    "ТК3": LandUse.SPECIAL,
    "ТР1": LandUse.RECREATION,
    "ТР3-1": LandUse.RECREATION,
    "ТС1": LandUse.AGRICULTURE,
    "ТК2": LandUse.SPECIAL,
    "ТП1": LandUse.INDUSTRIAL,
    "ТП3": LandUse.INDUSTRIAL,
    "ТП4": LandUse.INDUSTRIAL,
    "ТС2": LandUse.SPECIAL,
}

Назначение `LandUse` согласно слою функциональных зон.

In [41]:
from blocksnet.blocks.assignment import assign_land_use

blocks_lu = assign_land_use(blocks, functional_zones, rules)

[32m2025-06-18 14:29:12.600[0m | [1mINFO    [0m | [36mblocksnet.blocks.assignment.core[0m:[36massign_land_use[0m:[36m44[0m - [1mOverlaying geometries[0m
[32m2025-06-18 14:29:16.824[0m | [32m[1mSUCCESS [0m | [36mblocksnet.blocks.assignment.core[0m:[36massign_land_use[0m:[36m55[0m - [32m[1mShares calculated[0m


Результат:
- `residential : float >=0 <=1`
- `business : float >=0 <=1`
- `recreation : float >= 0 <= 1`
- `industrial : float >= 0 <= 1`
- `transport : float >= 0 <= 1`
- `special : float >= 0 <= 1`
- `agriculture : float >= 0 <= 1`
- `land_use : blocksnet.enums.LandUse | None` -- преобладающий `LandUse`
- `share : float >=0 <=1` -- доля преобладающего `LandUse`

In [42]:
blocks_lu.head()

Unnamed: 0,geometry,residential,business,recreation,industrial,transport,special,agriculture,land_use,share
0,"POLYGON ((349424.859 6631180.891, 349424.751 6...",0.0,0.0,0.0,0.0,1.0,0.0,0.0,LandUse.TRANSPORT,1.0
1,"POLYGON ((352083.617 6633950.146, 352240.448 6...",0.099,0.0,0.079912,0.0,0.401072,0.0,0.417018,LandUse.AGRICULTURE,0.417018
2,"POLYGON ((346700.642 6618453.176, 346681.107 6...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,LandUse.RESIDENTIAL,1.0
3,"POLYGON ((347043.363 6618261.219, 347042.608 6...",0.729125,0.0,0.270875,0.0,0.0,0.0,0.0,LandUse.RESIDENTIAL,0.729125
4,"POLYGON ((354879.039 6618859.116, 354845.405 6...",0.454375,0.0,0.0,0.0,0.144935,0.0,0.399984,LandUse.RESIDENTIAL,0.454375


## Параметры застройки и население

Чтение слоя зданий и перевод в проекционную систему координат. 

В данном случае файл ожидается с такими параметрами с дальнейшим приведением к спецификации библиотеки:
- `geometry : BaseGeometry`
- `storeys_count : float -> number_of_floors` -- количество этажей
- `is_living : bool` -- жилое ли здание
- `building_area : float -> footprint_area` -- площадь пятна застройки (при отсутствии будет взято из `geometry`)
- `living_area : float` -- поэтажная жилая площадь зданий (при отсутствии будет восстановлено)
- `population_balanced : int -> population` -- количество населения (при отсутствии будет восстановлено из `living_area`)

Дополнительные колонки, которые не помешали бы:
- `build_floor_area` -- поэтажная площадь зданий
- `non_living_area` -- поэтажная площадь зданий, не являющаяся жилой

In [43]:
buildings_columns = {
    'geometry':'geometry',
    'storeys_count':'number_of_floors',
    'is_living':'is_living',
    'building_area':'footprint_area',
    'living_area':'living_area',
    'population_balanced':'population',
}

buildings = gpd.read_file('./input/buildings.geojson').to_crs(crs)
buildings = buildings[buildings_columns.keys()].rename(columns=buildings_columns)

Восстановление значений с помощью `impute_buildings`

In [44]:
from blocksnet.preprocessing.imputing import impute_buildings

buildings = impute_buildings(buildings.rename(columns=buildings_columns)[buildings_columns.values()])



Агрегация значений

In [51]:
from blocksnet.blocks.aggregation import aggregate_objects

blocks_buildings,_ = aggregate_objects(blocks, buildings)

[32m2025-06-18 14:31:06.903[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36m_preprocess_input[0m:[36m12[0m - [1mPreprocessing input[0m
[32m2025-06-18 14:31:06.970[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36maggregate_objects[0m:[36m41[0m - [1mAggregating objects[0m


Сбросим `is_living` и `number_of_floors`, переименуем `count` в `count_buildings`

In [52]:
blocks_buildings = blocks_buildings.drop(columns=['is_living', 'number_of_floors']).rename(columns={'count':'count_buildings'})

Результат:
- `footprint_area` -- площадь пятна застройки
- `build_floor_area` -- поэтажная площадь зданий
- `living_area` -- поэтажная жилая площадь зданий
- `non_living_area` -- поэтажная нежилая площадь зданий
- `population` -- количество населения в квартале
- `count_buildings` -- количество зданий в квартале  

In [54]:
blocks_buildings.head()

Unnamed: 0,geometry,footprint_area,build_floor_area,living_area,non_living_area,population,count_buildings
0,"POLYGON ((349424.859 6631180.891, 349424.751 6...",0.0,0.0,0.0,0.0,0.0,0.0
1,"POLYGON ((352083.617 6633950.146, 352240.448 6...",69.018103,69.018103,0.0,69.018103,0.0,2.0
2,"POLYGON ((346700.642 6618453.176, 346681.107 6...",5853.863274,6106.759644,4197.188633,1909.571011,109.0,54.0
3,"POLYGON ((347043.363 6618261.219, 347042.608 6...",4214.828165,4375.483259,3033.795607,1341.687653,77.0,36.0
4,"POLYGON ((354879.039 6618859.116, 354845.405 6...",13392.846325,31242.324144,20860.965881,10381.358263,431.0,123.0


## Параметры сервисов

Проверяем, все ли сервисы в `./input/services` есть в `service_types_config`.

P.S: Ожидаем файл формата `{service_type}.geojson`

In [60]:
import os
from blocksnet.config import service_types_config

for file_name in os.listdir('./input/services'):
    service_type = file_name.split('.')[0]
    if service_type not in service_types_config:
        print(f'{service_type} есть в input/services, но отсутствует в service_types_config')

Делаем то же самое, но для конфига: проверяем наличие файлов для каждого типа сервисов.

In [63]:
for service_type in service_types_config:
    file_name = f'./input/services/{service_type}.geojson'
    if not os.path.exists(file_name):
        print(f'{service_type} есть в service_types_config, но отсутствует в input/services')

stadium есть в service_types_config, но отсутствует в input/services
embankment есть в service_types_config, но отсутствует в input/services
oil_refinery есть в service_types_config, но отсутствует в input/services
plant_of_building_materials есть в service_types_config, но отсутствует в input/services
farmland есть в service_types_config, но отсутствует в input/services
livestock есть в service_types_config, но отсутствует в input/services
nursing_home есть в service_types_config, но отсутствует в input/services
library есть в service_types_config, но отсутствует в input/services
gallery есть в service_types_config, но отсутствует в input/services
monastery есть в service_types_config, но отсутствует в input/services
diplomatic есть в service_types_config, но отсутствует в input/services
court_house есть в service_types_config, но отсутствует в input/services
veterinary есть в service_types_config, но отсутствует в input/services
notary есть в service_types_config, но отсутствует в in

Читаем файлы, которые есть. Ожидаем следующий формат:
- `geometry : BaseGeometry`
- `capacity : float`

Также фильтруем объекты, у который `geometry==None`.

In [80]:
services_gdfs = {}

for service_type in service_types_config:
    file_name = f'./input/services/{service_type}.geojson'
    if os.path.exists(file_name):
        gdf = gpd.read_file(file_name).to_crs(crs)
        gdf = gdf[~gdf.geometry.isna()].copy()
        services_gdfs[service_type] = gdf

print(f'{len(services_gdfs)} из {len(service_types_config.service_types)} типов сервисов будет представлено')

60 из 81 типов сервисов будет представлено


Восстанавливаем недостающие параметры

In [81]:
from blocksnet.preprocessing.imputing import impute_services

services_gdfs = {st:impute_services(gdf,st) for st,gdf in services_gdfs.items()}

Агрегируем параметры, переименуем столбцы:
- `count` -> `count_{service_type}`
- `capacity` -> `capacity_{service_type}`

In [86]:
blocks_services = {}

for service_type,services_gdf in services_gdfs.items():
    gdf,_ = aggregate_objects(blocks, services_gdf)
    gdf = gdf.rename(columns={
        'capacity':f'capacity_{service_type}',
        'count':f'count_{service_type}',
    })
    blocks_services[service_type] = gdf

[32m2025-06-18 14:49:44.873[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36m_preprocess_input[0m:[36m12[0m - [1mPreprocessing input[0m
[32m2025-06-18 14:49:44.875[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36maggregate_objects[0m:[36m41[0m - [1mAggregating objects[0m


[32m2025-06-18 14:49:44.899[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36m_preprocess_input[0m:[36m12[0m - [1mPreprocessing input[0m
[32m2025-06-18 14:49:44.901[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36maggregate_objects[0m:[36m41[0m - [1mAggregating objects[0m
[32m2025-06-18 14:49:44.926[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36m_preprocess_input[0m:[36m12[0m - [1mPreprocessing input[0m
[32m2025-06-18 14:49:44.927[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36maggregate_objects[0m:[36m41[0m - [1mAggregating objects[0m
[32m2025-06-18 14:49:44.948[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36m_preprocess_input[0m:[36m12[0m - [1mPreprocessing input[0m
[32m2025-06-18 14:49:44.949[0m | [1mINFO    [0m | [36mblocksnet.blocks.aggregation.core[0m:[36maggregate_objects[0m:[36m41[0m - [1mAggregating objects[0m
[32

Результат:
- `count_{service_type}` -- количество объектов данного типа сервисов
- `capacity_{service_type}` -- общая емкость данного типа сервисов

In [88]:
blocks_services['school'].head()

Unnamed: 0,geometry,capacity_school,count_school
0,"POLYGON ((349424.859 6631180.891, 349424.751 6...",0.0,0.0
1,"POLYGON ((352083.617 6633950.146, 352240.448 6...",0.0,0.0
2,"POLYGON ((346700.642 6618453.176, 346681.107 6...",0.0,0.0
3,"POLYGON ((347043.363 6618261.219, 347042.608 6...",0.0,0.0
4,"POLYGON ((354879.039 6618859.116, 354845.405 6...",942.0,1.0


## Соединение результатов в финальный датафрейм

In [94]:
blocks = blocks.join(blocks_lu.drop(columns=['geometry']))
blocks = blocks.join(blocks_buildings.drop(columns=['geometry']))
for gdf in blocks_services.values():
    blocks = blocks.join(gdf.drop(columns=['geometry']))

Геометрия:
- `geometry : Polygon`

Параметры функционального назначения:
- `residential : float >=0 <=1`
- `business : float >=0 <=1`
- `recreation : float >= 0 <= 1`
- `industrial : float >= 0 <= 1`
- `transport : float >= 0 <= 1`
- `special : float >= 0 <= 1`
- `agriculture : float >= 0 <= 1`
- `land_use : blocksnet.enums.LandUse | None` -- преобладающий `LandUse`
- `share : float >=0 <=1` -- доля преобладающего `LandUse`

Параметры застройки и население:
- `footprint_area` -- площадь пятна застройки
- `build_floor_area` -- поэтажная площадь зданий
- `living_area` -- поэтажная жилая площадь зданий
- `non_living_area` -- поэтажная нежилая площадь зданий
- `population` -- количество населения в квартале
- `count_buildings` -- количество зданий в квартале  

Параметры сервисов:
- `count_{service_type}` -- количество объектов данного типа сервисов
- `capacity_{service_type}` -- общая емкость данного типа сервисов

In [95]:
blocks.head()

Unnamed: 0,geometry,residential,business,recreation,industrial,transport,special,agriculture,land_use,share,...,capacity_prison,count_prison,capacity_landfill,count_landfill,capacity_plant_nursery,count_plant_nursery,capacity_greenhouse_complex,count_greenhouse_complex,capacity_warehouse,count_warehouse
0,"POLYGON ((349424.859 6631180.891, 349424.751 6...",0.0,0.0,0.0,0.0,1.0,0.0,0.0,LandUse.TRANSPORT,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"POLYGON ((352083.617 6633950.146, 352240.448 6...",0.099,0.0,0.079912,0.0,0.401072,0.0,0.417018,LandUse.AGRICULTURE,0.417018,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"POLYGON ((346700.642 6618453.176, 346681.107 6...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,LandUse.RESIDENTIAL,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"POLYGON ((347043.363 6618261.219, 347042.608 6...",0.729125,0.0,0.270875,0.0,0.0,0.0,0.0,LandUse.RESIDENTIAL,0.729125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"POLYGON ((354879.039 6618859.116, 354845.405 6...",0.454375,0.0,0.0,0.0,0.144935,0.0,0.399984,LandUse.RESIDENTIAL,0.454375,...,0.0,0.0,0.0,0.0,30.0,1.0,0.0,0.0,0.0,0.0
