This script takes land use data for the municipality of Belo Horizonte and regroups the original categories into simpler, more aggregate categories. The level of disaggregation present in the original data not only is unnecessary, for the purposes of the Thesis, but it would add up to a higher level of complications and noise, not the mention the more elevated computational costs.

Then, land use geometries are parsed into a grid of Uber's H3 Hexagons of resolution that is akin to the area of the median parcel in belo Horizonte municipality. These hexagons also contain populational counts and income as per the 2010 Census.

The script has at first been written with the 2017 land use map, which was readily available online. Afterwards, the years 2011, 2018, and 2020 maps are parsed into hexagons using the same functions but with some minor tunings.

# Land Use Maps for 2017

In [None]:
import os
import pathlib
import re

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns

from geobr import read_municipality
from h3census.assemble import get_hexagons_with_census_data
from matplotlib.patches import Patch

%matplotlib inline
%config InlineBackend.figure_format='retina'

## Preliminaries

### Parent Folders

These should of course be adjusted to reflect the appropriate locations in your disk or wherever

In [None]:
out_folder = os.environ.get('OUT_FOLDER')
out_folder = pathlib.Path(out_folder)
out_folder = out_folder / 'A'

db_folder = os.environ.get('DB_FOLDER')
db_folder = pathlib.Path(db_folder)

### General Purpose

In [None]:
def _get_zipped_path_for_gpd(path):
    """This gets a full path for a zipped shp file and parses it
    into a structure that gpd.read_file() understands.
    """
    prefix = r'zip://'
    
    try:
        path = prefix + path.as_posix()
    except:
        path = pathlib.PureWindowsPath(path)
        path = prefix + path.as_posix()
        
    return path

## Getting Land Use Data

data source: http://bhmap.pbh.gov.br

In [None]:
def get_geodata(path, is_zipped=True):
    """Takes either a full raw string path os a pathlib's pure windows
    path and uses it to return a shapefile. It also makes the necessary
    adjustments to read shapefiles compressed into a .zip file.
    """    
    if is_zipped:
        path = _get_zipped_path_for_gpd(path)
            
    return gpd.read_file(path)

## Initial Exploration and Preliminary Wrangling

In [None]:
path = db_folder / 'beaga/tipologia_uso_ocupacao/uso_ocup_2017.zip'

land_uses = get_geodata(path)

# some later operations don't handle well a mix of Polygons and MultyPolygons
land_uses = land_uses.explode()
land_uses.reset_index(drop=True, inplace=True)

In [None]:
land_uses.head(2)

In [None]:
print(land_uses.crs)

In [None]:
new_names = {'TIPOLOGIA_': 'type', 'TIPOLOGIA0': 'category'}
land_uses.rename(columns=new_names, inplace=True)

to_keep = ['type', 'category', 'geometry']
land_uses = land_uses.reindex(columns=to_keep)

del new_names, to_keep

In [None]:
number_of_parcels = len(land_uses)
land_uses['type'].value_counts() / number_of_parcels * 100

In [None]:
land_uses.category.value_counts() / number_of_parcels * 100

**The regrouping of land uses aims at reducing the number of categories by bringing together like things that are apart for some reason. The rationale below refers to the 2017 land use map and it might (and probably will) be at least slightly different for the other years.**

- Most instances are straightforward, such as 'casa unifamiliar', for example, which clearly is residential; or such as 'edificio de uso comercial e/ou servicos' which is obviously retail/services.

- There were cases in which discordant classifications were residual. Hence, the residuals have been put together along with the majority class. For example, there are 6 mixed use parcels of 'edificio' amidst a sea of 16009 residential parcels.

- Mixed uses presented a difficulty as well because the classical model deals with discrete classifications only. There are models that allow for some fuzzy classification, which means that a cell can go from fully residential to completely commercial, while being able to assume any percentual balance in between those extremes. Unfortunatelly, to the best of my knowledge, such models seem still incipient and displaying some perks that I don't think are for me to solve (at least not now). Hence, I chose to create a 'mixed' category and leave it at that. After some future experimentations, I might choose to place it either under retail/services or under residential.

- All parcels containing any kind of (a) club or public goods, and of (b) public facilities, amenities or infrastructures have been considered static and are all labeled as ***public***

- 'edificação sem tipologia especificada' was assumed to be residential, because:
    - For any given lot, the highest probability is that it is residential;
    - In the brazilian context, there is plenty of illegal land ocupation due to favelas and whatnot;
    - I assumed it is harder for commercial activity to function without registry in the tax records.
    
**A minor number of instances require a slightly more detailed analysis, which here means that a proper reclassification required a simultaneous analysis of 'type' and 'category' attributes. Specifically:**

- There are two land use categories that are unclear in of themselves ('ocupação diversificada', 'vaga residencial ou comercial') these have been classified with the aid of _reclassify_with_scrutiny() function.

- 'galpao' at first seemed obviously related to the industrial sector, but a closer look at its spatial distribution, along with a couple checks in Google maps, made it seem otherwise. It did not seem, for the most part, related to heavy industry, specially the parcels of this category that were placed within the city's core and along its major roads. Hence, for the time being, it was placed under either retail/services or mixed, until some later analysis reveals otherwise, if such a thing is revealed at all. The choice between mixed and retail/services is made within _reclassify_with_scrutiny().

**Finally, the following category required a method that was a bit more thorough:**

- 'sem informacao' refers to all parcels without registry at the municipal treasury. To gain better insight, I performed a quick and coarse visual inspection with the aid of (a) OpenStreetMaps, (b) Google Maps, and (c) other auxilliary data. That seemed to support the reasoning that this category could be divided thus:
    - First, parcels are evaluated against the 2018 urban footprint: those that are not completely contained within the footprint are considered vacant land. The year 2018 was chosen because its the closest year available that was found at the time of wrting.
    - Second, there are parcels that mostly coincide with the footprints of the subnormal agglomerates - see note below -  in the municipality, as per data retrieved both at IBGE and at Belo Horizonte City Hall. Those are to be labeled subnormal, but probably will be joined together with residential in a later step of the analysis.
    - Third, the two above procedures still leave the data with a quite representative amount of unknowns, and if even BH city hall could not gather enough information on what happens at those places, it seems reasonable to assume, for the time being, that they simply are functional voids.

**NOTE:**

A subnormal agglomerate is a form of irregular occupation of land – either public or private - owned by a third party, for housing purposes in urban areas, usually characterized by an irregular urban pattern, with scarce essential public services and located in areas not proper or allowed for housing use.  In Brazil, those irregular settlements are known by the names of favelas, invaded areas, slums in deep valleys, slums in low-lands, communities, villages, slums in backwaters, irregular lots, shacks and stilt houses

In [None]:
regrouping_2017 = {
    'active': {
        'residential': ['casa unifamiliar',
                        'edificio', 
                        'conjunto multifamiliar vertical',
                        'conjunto multifamiliar horizontal',
                        'edificação sem tipologia especificada',],
        
        'retail/services': ['loja ou conjunto de lojas',
                            'edificio de uso comercial e/ou servicos',
                            'loja em edificio/galeria',
                            'shopping center',
                            'galeria/mini shopping de bairro',
                            'apart hotel',],
        # TO DO: settle approach towards mixed uses
        'mixed': ['casa/sobrado',
                  'edificio residencial e comercio e/ou servicos',]
                },

     'passive': { 
         'vacant': ['lote vago',]
                 },

     'static': {
         'industry': ['industria',],
         
         'public': ['instituicao de ensino',
                    'equipamento de saude',
                    'instituicao religiosa',
                    'cemiterio',
                    'parque',
                    'clubes esportivos e sociais',
                    'estadio/ginasio',
                    'aterro sanitario',
                    'estacao de transporte coletivo',
                    'aeroporto',],
               }
}

In [None]:
def reclassify_with_scrutiny(gdf):
    """Deals with some perks that seem too specific for generalization.
    Handles instances in which TIPOLOGIA0 may be more accurately placed
    when analyzed together with TIPOLOGIA_
    
    NOTE: function's name is an exaggeration
    """
    # 'ocupação diversificada'
    # ----------------------
    mask_a = gdf.category=='ocupação diversificada'
    mask_b = gdf['type'].isin(['RESIDENCIAL', 'RESIDENCIAL+LOTE VAGO'])
    mask_c = gdf['type'].isin(['MISTO', 'MISTO + LOTE VAGO'])
    
    full_mask = mask_a & mask_b
    gdf.loc[full_mask,'category'] = 'residential'
    
    full_mask = mask_a & mask_c
    gdf.loc[full_mask,'category'] = 'mixed'
    
    full_mask = mask_a & (~mask_b) & (~mask_c)
    gdf.loc[full_mask,'category'] = 'retail/services'
    
    # 'vaga residencial ou comercial'
    # -----------------------------
    mask_d = gdf.category=='vaga residencial ou comercial'
    mask_e = gdf['type']=='NAO RESIDENCIAL'
    mask_f = gdf['type']=='RESIDENCIAL'
    
    full_mask = mask_d & mask_e
    gdf.loc[full_mask,'category'] = 'retail/services'
    
    full_mask = mask_d & mask_f
    gdf.loc[full_mask,'category'] = 'residential'
    
    full_mask = mask_d & (~mask_e) & (~mask_f)
    gdf.loc[full_mask,'category'] = 'mixed'
    
    # 'galpao'
    # ------
    # TO DO: assert if this decision is appropriate
    mask_g = gdf.category=='galpao'
    
    full_mask = mask_g & mask_c
    gdf.loc[full_mask,'category'] = 'mixed'
    
    full_mask = mask_g & (~mask_c)
    gdf.loc[full_mask,'category'] = 'retail/services'
    
    # 'Final adjustment'
    # ----------------
    cat_list = ['residential', 'retail/services', 'mixed', 'galpao']
    gdf.loc[gdf.category.isin(cat_list), 'type'] = 'active'
    

def reclassify_land_uses(gdf, dictionary):
    for key in dictionary.keys():
        for umbrella_category,old_categories in dictionary[key].items():
            mask = gdf.category.isin(old_categories)
            gdf.loc[mask, 'type'] = key
            
            replacement_dict = {
                cat: umbrella_category
                for cat
                in old_categories
                                }
            view = gdf.loc[mask,'category']
            gdf.update(view.replace(replacement_dict))

In [None]:
reclassify_with_scrutiny(land_uses)


reclassify_land_uses(land_uses,
                     regrouping_2017,)

In [None]:
path_to_footprint = (db_folder
                     / 'beaga'
                     / 'footprints'
                     / '2018_footprint.zip')
footprint_2018 = get_geodata(path_to_footprint)

In [None]:
# BH city hall uses the term favelas for the subnormal agglomerates
path_to_favelas = (db_folder
                   / 'beaga'
                   / 'vila_favela.zip')
favelas = get_geodata(path_to_favelas)

footprint and favela data:
   - BH Maps: http://bhmap.pbh.gov.br

In [None]:
path_to_subnormal = (db_folder
                     / 'census'
                     / '2010_subnormal_agglomerates'
                     / 'SetoresXAreaDivAGSN_shp.zip') 
subnormal_agg = get_geodata(path_to_subnormal)

# IBGE data provides all census tracts of Brazil and
# specifies which are subnormal, so that we need to
# slice the dataset.
#
# First we select only census tracts of BH city. To do
# that end we take advantage of the fact the tract IDs 
# contain the city ID they belong to.
#
# Second, we use the appropriate column to select the
# subnormal places.
subnormal_agg = subnormal_agg.loc[subnormal_agg
                                  .CD_GEOCODI
                                  .str
                                  .match('3106200')] # ibgeID for BH

subnormal_agg = subnormal_agg.loc[subnormal_agg
                                  .Subnormal == 'Sim']

# IBGE data comes in a geographic coordinate system
# that needd to be projected to the same CRS as
# Belo Horizonte's data
subnormal_agg.to_crs(epsg=31983, inplace=True)

In [None]:
subnormal = gpd.overlay(subnormal_agg,
                        favelas,
                        how='union')

data on subnormal agglomerates:

- IBGE: ftp://geoftp.ibge.gov.br/recortes_para_fins_estatisticos/malha_de_aglomerados_subnormais/censo_2010/areas_de_divulgacao_da_amostra/

In [None]:
def _find_vacant(parcels, footprint):
    inner_parcels = gpd.sjoin(parcels,
                              footprint,
                              how='inner',
                              op='within',)
    inner_parcel_ids = inner_parcels.index
    
    vacant_land = parcels.loc[~parcels
                              .index
                              .isin(inner_parcel_ids)]
    
    
    return vacant_land.index


def _reclassify_vacant(parcels, vacant_land):    
    parcels.loc[vacant_land, 'type'] = 'passive'
    
    mask = parcels.index.isin(vacant_land)
    parcels.loc[mask,'category'] = 'vacant'


def _find_subnormal(parcels, subnormal):
    subnormal_land = gpd.sjoin(parcels,
                               subnormal,
                               op='within',)
    
    
    return subnormal_land.index


def _reclassify_subnormal(parcels, subnormal_land):    
    parcels.loc[subnormal_land, 'type'] = 'active'
    
    mask = parcels.index.isin(subnormal_land)
    parcels.loc[mask, 'category'] = 'subnormal'
    
    
def _reclassify_leftovers(parcels, mask):
    parcels.loc[mask, 'type'] = 'passive'
    parcels.loc[mask, 'category'] = 'vacant'


def reclassify_unknown_parcels(parcels, footprint,
                               subnormal, untaxed_cat):
    submask_a = (parcels['category'] == untaxed_cat)
    
    vacant_land = _find_vacant(parcels.loc[submask_a],
                               footprint)
    _reclassify_vacant(parcels, vacant_land)
    
    subnormal_land = _find_subnormal(parcels.loc[submask_a],
                                     subnormal)
    _reclassify_subnormal(parcels, subnormal_land)
    
    submask_b = (~parcels.index.isin(vacant_land))
    submask_c = (~parcels.index.isin(subnormal_land))
    mask = (submask_a & submask_b & submask_c)
    _reclassify_leftovers(parcels, mask)
    

In [None]:
reclassify_unknown_parcels(land_uses,
                           footprint_2018,
                           subnormal,
                           'sem informacao',)

In [None]:
number_of_parcels = len(land_uses)
land_uses['type'].value_counts() / number_of_parcels * 100

In [None]:
land_uses.category.value_counts() / number_of_parcels * 100

In [None]:
def get_hexagon_edge(land_uses):
    """Computes the edge of an hexagon with the same area as that
    of the median parcel. Requires projected CRS.
    """
    land_parcel_areas = land_uses.geometry.map(lambda x: x.area)
    median_area = np.median(land_parcel_areas)
    
    
    return np.sqrt(median_area*2/(3*np.sqrt(3)))


mask = land_uses['type'] != 'static'
hex_edge = get_hexagon_edge(land_uses.loc[mask])
print(f'Hex edge should be of approximately {hex_edge:.2f} meters')

That leaves either H3 resolution 11 (edge length of approx. 25m) or resolution 12 (~9.5m) — _refer to https://h3geo.org/docs/core-library/restable/ for H3 resolutions_

_**Resolution 11**_ seems close enough and it's most certainly better for processing purposes. This resolution will be the basis for all subsequent analysis in the other scripts.



## Retrieving Hexagons

Those are to be the main spatial unit of analysis throughout the Thesis

In [None]:
state = 31 # MG ID as int
ibgeID = '3106200' # Belo Horizonte ID as str
                   # TO DO: allow ibgeID as int as well
hexagon_size = 11

path_to_census = (db_folder
                  / 'census'
                  / '2010 Universo'
                  / '2010_aggregates_by_enumeration_area.csv')

usecols = {
    'v002': 'pop',
    # Average income of all those who are 10 years
    # or older and earn wages of some sort
    'v011': 'income',
         }

# TO DO: allow for 'area_weighted_vars' and
# 'pop_weighted_vars' not to be lists
hexagons = get_hexagons_with_census_data(state,
                                         ibgeID,
                                         hexagon_size,
                                         usecols,
                                         area_weighted_vars=['pop'],
                                         query_data=False,
                                         save_query=False,
                                         path=path_to_census,
                                         pop_col='pop',
                                         pop_weighted_vars=['income'],
                                         output_epsg=31983,)

In [None]:
def input_uses_into_hex(land_uses, hexagons):
    """This goes hexagon by hexagon, checks what land
    uses are contained within and assigns to the entire
    hexagon the land use that ocupies most of the area.
    
    TO DO: this smells of inneficiency. Think of another way later
    """
    # I used reset index method in hexagons as a way to
    # retain hexagon labels after overlay
    overlay = gpd.overlay(land_uses,
                          hexagons.reset_index(),
                          how='intersection',)
    hex_land_uses = {}
    for hex_label,group in overlay.groupby(f'{hexagons.index.name}'):
        gdf = group.dissolve(by='category', aggfunc='first')
        gdf['area'] = gdf.geometry.map(lambda x: x.area)
        # Mind that gdf is indexed by 'category' because of dissolve
        idxmax = gdf.area.idxmax()
        hex_land_uses[hex_label] = {'type': gdf.loc[idxmax, 'type'],
                                    'category': idxmax,}
    
    
    return pd.DataFrame.from_dict(hex_land_uses, orient='index')
    
    
hex_land_uses = input_uses_into_hex(land_uses, hexagons)

In [None]:
hex_ = hexagons.merge(hex_land_uses,
                      how='inner',
                      left_index=True,
                      right_index=True,)

In [None]:
path_to_hexes = out_folder / f'BH_hex_{hexagon_size}_with_land_uses.gpkg'

hex_.to_file(path_to_hexes, layer='2017', driver='GPKG')

In [None]:
def plot_land_uses(parcels, attribute, palette, ax):
    legend_elements = []
    for cat, group in parcels.groupby(attribute):
        color = palette[cat]

        group.plot(ax=ax,
                   color=color,)

        patch_element = Patch(facecolor=color,
                              edgecolor=color,
                              label=cat,)

        legend_elements.append(patch_element)

    ax.legend(handles=legend_elements,
              bbox_to_anchor=(1, 0.01),
              loc='lower right',
              prop={'size': 4},)
    
    ax.axis('off')

In [None]:
fig,axes = plt.subplots(ncols=2, dpi=300, figsize=(5, 8))
ax1,ax2 = axes


type_palette = {'active': '#C97064',
                'static': '#39487F',
                'passive': '#527048',}
plot_land_uses(hex_,
               'type',
               type_palette,
               ax1,)


category_palette = {'residential': '#FECEF1',
                    'subnormal': '#B05F66',
                    'retail/services': '#3B727C',
                    'mixed': '#60E1E0',
                    'public': '#B9A37E',
                    'industry': '#64513B',
                    'vacant': '#82A775',}
plot_land_uses(hex_,
               'category',
               category_palette,
               ax2,)

# Land Use Maps for 2011

data dource: provided by Prodabel directly

These have been made with a methodology similar to that of 2017 data

In [None]:
path = (db_folder
        / 'beaga'
        / 'tipologia_uso_ocupacao'
        / 'uso_ocup_2011.zip')
land_uses = get_geodata(path)
land_uses = land_uses.explode()
land_uses.reset_index(drop=True, inplace=True)

new_names = {'sigla_uso': 'type', 'descr_ocup': 'category'}
land_uses.rename(columns=new_names, inplace=True)

to_keep = ['id_lotectm', 'type', 'category', 'geometry']
land_uses = land_uses.reindex(columns=to_keep)

del new_names, to_keep

In [None]:
land_uses.head(3)

In [None]:
print(land_uses.crs)

In [None]:
number_of_parcels = len(land_uses)

land_uses['type'].value_counts() / number_of_parcels * 100

In [None]:
land_uses.category.value_counts() / number_of_parcels * 100

Again, it is self-evident where to put some of the land use categories. Contrarily, the folowwing require a somewhat in-depth analysis:

- ***'nulo'*** seem to be a category akin to 2017's ***'sem informacao'***: they are close in number of parcels and are not given any specific category because there's no detailed information about them in the treasury's registry. Hence, 'nulo' is going to be classified in the same way as 'sem informacao' was previously. The difference being that I'll now use the 2010 urban footrpint. (TO DO: is it necessary to elaborate on the different methodologies for footprint creation?)

- ***edificações em LV*** are land parcels that do not yet have a consolidated land use class because that only happens when the real estate receives a certificate of occupancy. Also obviuosly uncertain is the category ***indeterminado***. Hence, given the lack enough information that could enable a proper classification, the two categories are assumed to be residential. The rationale here being the same as the one used for ***'edificacao sem tipologia especificada'***, which is present in the 2017 data.

- Similar to the approach with 2017 data, ***galpao*** is placed under retail/services.

- ***ZEIS-1*** are those parcels that explicitly belong to subnormal agglomerates

This time the function *_reclassify_with_scrutiny()* is useless because type attribute provides no useful information that enables a more precise category disaggregation.

In [None]:
regrouping_2011 = {
    'active': {
        'residential': ['casa unifamiliar',
                        'edificações em LV', 
                        'edifício',
                        'indeterminado',
                        'conjunto multifamiliar vertical',
                        'conjunto multifamiliar horizontal',],
        
        'subnormal': ['ZEIS-1',],
        
        'retail/services': ['loja ou conjunto de lojas',
                            'galpão',
                            'edifício de uso comercial e/ou de serviços',
                            'loja em edifício / galeria',
                            'galeria / mini-shopping de bairro',
                            'shopping',
                            'apart hotel',],
        # TO DO: settle approach towards mixed uses
        'mixed': ['casa / sobrado',
                  'edifício residencial e comércio e/ou serviços',]
                },

     'passive': { 
         'vacant': ['lote vago',]
                 },

     'static': {
         'industry': ['indústria',],
         
         'public': ['instituição de ensino',
                    'instituição religiosa',
                    'equipamento de saúde',
                    'clubes esportivos e sociais',
                    'parques',
                    'cemitério',
                    'estação de transporte coletivo',],
               }
}

In [None]:
reclassify_land_uses(land_uses,
                     regrouping_2011,)

In [None]:
# 2010 data that I found is for the whole metropolitan region
path_to_footprint = (db_folder
                     / 'beaga'
                     / 'footprints'
                     / '2010_footprint.zip')
footprint_2010 = get_geodata(path_to_footprint)

bh_contours = read_municipality(int(ibgeID))
bh_contours.to_crs(epsg=31983, inplace=True)

footprint_2010 = gpd.clip(footprint_2010,
                          bh_contours,)

footprint source: http://www.rmbh.org.br/central-cartog.php

In [None]:
reclassify_unknown_parcels(land_uses,
                           footprint_2010,
                           subnormal,
                           'nulo',)

In [None]:
# The airport was classified as a warehouse in the source data,
# hence, I'll have to fix this manually.
mask = (land_uses.id_lotectm == 255797)
land_uses.loc[mask, 'type'] = 'static'
land_uses.loc[mask, 'category'] = 'public'

land_uses.drop(columns='id_lotectm', inplace=True)

In [None]:
land_uses['type'].value_counts() / number_of_parcels * 100

In [None]:
land_uses.category.value_counts() / number_of_parcels * 100

In [None]:
hex_land_uses = input_uses_into_hex(land_uses, hexagons)

hex_ = hexagons.merge(hex_land_uses,
                      how='inner',
                      left_index=True,
                      right_index=True,)

hex_.to_file(path_to_hexes, layer='2011', driver='GPKG')

In [None]:
fig,axes = plt.subplots(ncols=2, dpi=300, figsize=(5, 8))
ax1,ax2 = axes

plot_land_uses(hex_,
               'type',
               type_palette,
               ax1,)

plot_land_uses(hex_,
               'category',
               category_palette,
               ax2,)

# Land Use Maps for 2018

data dource: provided by Prodabel directly

In [None]:
path = (db_folder
        / 'beaga'
        / 'tipologia_uso_ocupacao'
        / 'uso_ocup_2018.zip')
land_uses = get_geodata(path)

# Some land uses are not georeferenced and, even though they
# have an existing parcel ID, that ID is not present at the
# parcel geodata provided by BH city hall. Those features
# amount to about 0.4% and it seeems reasonable to discard them.
land_uses = land_uses.loc[land_uses
                      .geometry
                      .notnull()]

land_uses = land_uses.explode()
land_uses.reset_index(drop=True, inplace=True)

land_uses = land_uses.reindex(columns=['D_USO_REV', # Revised land use (broad) categories
                                       'USO_AT_REV',  # Revised spatial transcription of land use
                                       'ATIV', # Activity list per parcel 
                                       'geometry'])

land_uses.rename(columns={'D_USO_REV': 'type',
                          'USO_AT_REV': 'category',
                          'ATIV': 'activities'},
                 inplace=True,)

In [None]:
regrouping_2018 = {
    'active': {
        'residential': ['CASA/SOBRADO RESIDENCIAL',
                        'EDIFICIO RESIDENCIAL',],
        
        'retail/services': ['EDIFICIO NAO RESIDENCIAL',
                            'CASA DE SHOW',
                            'INSTITUICAO FINANCEIRA',
                            'HOTEIS, MOTEIS E SIMILARES',
                            'ESTACIONAMENTO',
                            'CASA/SOBRADO NAO RESIDENCIAL',
                            'SUPERMERCADO/HIPERMERCADO',
                            'SHOPPING CENTER',
                            'GALPAO COMERCIAL',],
        # TO DO: settle approach towards mixed uses
        'mixed': ['CASA/SOBRADO DE USO MISTO',
                  'EDIFICIO DE USO MISTO',],
                },

     'passive': { 
         'vacant': ['LOTE VAGO',]
                 },

     'static': {
         'industry': ['GALPAO INDUSTRIAL',],
         
         'public': ['INSTITUICAO DE ENSINO',
                    'HOSPITAL/SERVICO DE SAUDE',
                    'SERVICOS PUBLICOS',
                    'TERMINAL RODOVIARIO/FERROVIARIO',
                    'CLUBE ESPORTIVO/SOCIAL',
                    'CENTRO DE CONVENCOES/EXPOSICOES',
                    'INSTITUICAO CULTURAL',
                    'ESTADIO/GINASIO',
                    'INSTITUICAO RELIGIOSA',
                    'PARQUE',
                    'AEROPORTO',
                    'CEMITERIO',
                    'ATERRO SANITARIO',],
               }
}

In [None]:
def reclassify_with_scrutiny(gdf):
    """Deals with some perks that seem too specific for generalization.
    Handles instances in which TIPOLOGIA0 may be more accurately placed
    when analyzed together with TIPOLOGIA_
    
    NOTE: function's name is an exaggeration
    """
    # 'SEM INFORMACAO'
    # ----------------------
    mask_a = (
        (gdf.category=='SEM INFORMACAO') & (gdf['type']=='RESIDENCIAL'))
    
    mask_b = (
        (gdf.category=='SEM INFORMACAO') & (gdf['type']=='NAO RESIDENCIAL'))
    
    gdf.loc[mask_a,'category'] = 'residential'
        
    gdf.loc[mask_b,'category'] = 'retail/services'
    
    
    # Final adjustments
    # ----------------
    cat_list = ['residential', 'retail/services']
    gdf.loc[gdf.category.isin(cat_list), 'type'] = 'active'

In [None]:
reclassify_with_scrutiny(land_uses)

In [None]:
reclassify_land_uses(land_uses,
                     regrouping_2018,)

In [None]:
# Some parcels without information had a number of activities 
# listed as taking place within them. Of those activities,
# a quick inspection suggested they were mostly associated with
# retail/services. Hence, I simply categorized all of them as such.
# These are weak assumptions as in reality there's not really enough
# information for a proper classification.So much so that staff at city
# hall categorized the field as "no info".
mask = (land_uses.activities.notnull()) & (land_uses.category=='SEM INFORMACAO')
land_uses.loc[mask, 'type'] = 'active'
land_uses.loc[mask, 'category'] = 'retail/services'
land_uses.drop(columns='activities', inplace=True)

reclassify_unknown_parcels(land_uses,
                           footprint_2018,
                           subnormal,
                           'SEM INFORMACAO',)

In [None]:
land_uses['type'].value_counts() / number_of_parcels * 100

In [None]:
land_uses.category.value_counts() / number_of_parcels * 100

In [None]:
hex_land_uses = input_uses_into_hex(land_uses, hexagons)

hex_ = hexagons.merge(hex_land_uses,
                      how='inner',
                      left_index=True,
                      right_index=True,)

hex_.to_file(path_to_hexes, layer='2018', driver='GPKG')

In [None]:
fig,axes = plt.subplots(ncols=2, dpi=300, figsize=(5, 8))
ax1,ax2 = axes

plot_land_uses(hex_,
               'type',
               type_palette,
               ax1,)

plot_land_uses(hex_,
               'category',
               category_palette,
               ax2,)

# Land Use Maps for 2020

data dource: provided by Prodabel directly

In [None]:
path = (db_folder
        / 'beaga'
        / 'tipologia_uso_ocupacao'
        / 'uso_ocup_2020.zip')
land_uses = get_geodata(path)

land_uses = land_uses.explode()
land_uses.reset_index(drop=True, inplace=True)

land_uses = land_uses.rename(columns={'uso_do_s_1': 'type',
                                      'agreg_ativ': 'activities',
                                      'uso_ativ_1': 'category'})

land_uses = land_uses.reindex(columns=['type', 'category', 'activities', 'geometry'])

land_uses.head(2)

In [None]:
# Land use categories and types are between curly bracket
# by whatever reason... Apart from that, those are the same
# categories as in the 2018 data, hence, if I simply remove
# the curly brackets, the same script for 2018 should work
regex = r'(\{?)(.+\w)(\})?'

for each in ['type', 'category']:
    land_uses[each] = land_uses[each].str.extract(regex)[1]

# I know this now seems unnecessary, but might avoid 
# confusion when revisiting the code
regrouping_2020 = regrouping_2018

# There's just one category named slightly different. That is adjusted now.
regrouping_2020['active']['retail/services'][3] = 'HOTEIS/MOTEIS E SIMILARES'

In [None]:
reclassify_with_scrutiny(land_uses)

In [None]:
reclassify_land_uses(land_uses,
                     regrouping_2020,)

In [None]:
# The following block exists because of the same reason
# it figured in the 2018 data.
mask = (land_uses.activities.notnull()) & (land_uses.category=='SEM INFORMACAO')
land_uses.loc[mask, 'type'] = 'active'
land_uses.loc[mask, 'category'] = 'retail/services'
land_uses.drop(columns='activities', inplace=True)

reclassify_unknown_parcels(land_uses,
                           footprint_2018, # Couldn't find a closer one
                           subnormal,
                           'SEM INFORMACAO',)

In [None]:
land_uses['type'].value_counts() / number_of_parcels * 100

In [None]:
land_uses.category.value_counts() / number_of_parcels * 100

In [None]:
hex_land_uses = input_uses_into_hex(land_uses, hexagons)

hex_ = hexagons.merge(hex_land_uses,
                      how='inner',
                      left_index=True,
                      right_index=True,)

hex_.to_file(path_to_hexes, layer='2020', driver='GPKG')

In [None]:
fig,axes = plt.subplots(ncols=2, dpi=300, figsize=(5, 8))
ax1,ax2 = axes

plot_land_uses(hex_,
               'type',
               type_palette,
               ax1,)

plot_land_uses(hex_,
               'category',
               category_palette,
               ax2,)