Notebook by [@zodiacfireworks](https://github.com/zodiacfireworks)

<h1 style="text-align:center">Completing and correcting Peru TopoJson data from Peru GeoJson data</h1>

<div style="text-align:center">
<strong>Martín Josemaría Vuelta Rojas</strong><br/>
<code>martin.vuelta@gmail.com</code><br/><br/>
<em>Escuela Académico Profesional de Física</em><br/>
<em>Facultad de Ciencias Físicas</em><br/>
<em>Universidad Nacional Mayor de San Marcos</em><br/>
</div>

## 1. Introduction

This file is intended to complete the geographical information of Peru given in the file `peru_all_topo.json`, obtained from the line 853 of the file [`datamaps.per.js`](https://raw.githubusercontent.com/markmarkoh/datamaps/master/dist/datamaps.per.js) provided by [DataMaps](http://datamaps.github.io/), with the information of the file `peru_all_geo.json` obtained from the file [`pe-all.geo.json`](http://code.highcharts.com/mapdata/countries/pe/pe-all.geo.json) provided for [HighCharts](http://www.highcharts.com/) and based on data from [Natural Earth](http://www.naturalearthdata.com).

Before processing and mixing the data some manual tweaks were necessaries. In the file `peru_all_topo.json` the geometry id `PE.LR`, corresponding to Lima where found twice. One corresponding to Lima Provice (FIPS PE26, HASC PE.LP) and the other correspondign to Lima Department (FIPS PE15, HASC PE.LR). There were an error with identification of Callao, that appeared with `id` PE. and PE.CL. Some part of the d¿data is conrresponding to the Constitutional Provice of Callao and the other is for Department of Puno. The same mistake was found in `peru_all_geo.json` which refers to PE.LP and PE.CL as PE.145 and PE.3341. Also some errors in FIPS keys were found. Those all mistakes where corrected usign statoids information provided by http://www.statoids.com/upe.html.
Keys that not figure in http://www.statoids.com/upe.html were neglected.

## 2. Processing

All the subseqent proccessign make use of python json library in order to load data from json files.

In [1]:
import json

### 2.1 Statoids Data

In [2]:
statoids = {
    "PE.AM": {
        "name": "Amazonas",
        "hasc": "PE.AM",
        "iso": "AMA",
        "fips": "PE01",
        "nute": 40201
    },
    "PE.AN": {
        "name": "Ancash",
        "hasc": "PE.AN",
        "iso": "ANC",
        "fips": "PE02",
        "nute": 40502
    },
    "PE.AP": {
        "name": "Apurímac",
        "hasc": "PE.AP",
        "iso": "APU",
        "fips": "PE03",
        "nute": 40903
    },
    "PE.AR": {
        "name": "Arequipa",
        "hasc": "PE.AR",
        "iso": "ARE",
        "fips": "PE04",
        "nute": 41004
    },
    "PE.AY": {
        "name": "Ayacucho",
        "hasc": "PE.AY",
        "iso": "AYA",
        "fips": "PE05",
        "nute": 40905
    },
    "PE.CJ": {
        "name": "Cajamarca",
        "hasc": "PE.CJ",
        "iso": "CAJ",
        "fips": "PE06",
        "nute": 40106
    },
    "PE.CL": {
        "name": "Callao",
        "hasc": "PE.CL",
        "iso": "CAL",
        "fips": "PE07",
        "nute": 40607
    },
    "PE.CS": {
        "name": "Cusco",
        "hasc": "PE.CS",
        "iso": "CUS",
        "fips": "PE08",
        "nute": 40808
    },
    "PE.HV": {
        "name": "Huancavelica",
        "hasc": "PE.HV",
        "iso": "HUV",
        "fips": "PE09",
        "nute": 40909
    },
    "PE.HC": {
        "name": "Huánuco",
        "hasc": "PE.HC",
        "iso": "HUC",
        "fips": "PE10",
        "nute": 40410
    },
    "PE.IC": {
        "name": "Ica",
        "hasc": "PE.IC",
        "iso": "ICA",
        "fips": "PE11",
        "nute": 41011
    },
    "PE.JU": {
        "name": "Junín",
        "hasc": "PE.JU",
        "iso": "JUN",
        "fips": "PE12",
        "nute": 40712
    },
    "PE.LL": {
        "name": "La Libertad",
        "hasc": "PE.LL",
        "iso": "LAL",
        "fips": "PE13",
        "nute": 40513
    },
    "PE.LB": {
        "name": "Lambayeque",
        "hasc": "PE.LB",
        "iso": "LAM",
        "fips": "PE14",
        "nute": 40114
    },
    "PE.LP": {
        "name": "Lima Provincia",
        "hasc": "PE.LP",
        "iso": "LMA",
        "fips": "PE26",
        "nute": 40615
    },
    "PE.LR": {
        "name": "Lima",
        "hasc": "PE.LR",
        "iso": "LIM",
        "fips": "PE15",
        "nute": 40615
    },
    "PE.LO": {
        "name": "Loreto",
        "hasc": "PE.LO",
        "iso": "LOR",
        "fips": "PE16",
        "nute": 40316
    },
    "PE.MD": {
        "name": "Madre de Dios",
        "hasc": "PE.MD",
        "iso": "MDD",
        "fips": "PE17",
        "nute": 40817
    },
    "PE.MQ": {
        "name": "Moquegua",
        "hasc": "PE.MQ",
        "iso": "MOQ",
        "fips": "PE18",
        "nute": 41118
    },
    "PE.PA": {
        "name": "Pasco",
        "hasc": "PE.PA",
        "iso": "PAS",
        "fips": "PE19",
        "nute": 40719
    },
    "PE.PI": {
        "name": "Piura",
        "hasc": "PE.PI",
        "iso": "PIU",
        "fips": "PE20",
        "nute": 40120
    },
    "PE.PU": {
        "name": "Puno",
        "hasc": "PE.PU",
        "iso": "PUN",
        "fips": "PE21",
        "nute": 41121
    },
    "PE.SM": {
        "name": "San Martín",
        "hasc": "PE.SM",
        "iso": "SAM",
        "fips": "PE22",
        "nute": 40222
    },
    "PE.TA": {
        "name": "Tacna",
        "hasc": "PE.TA",
        "iso": "TAC",
        "fips": "PE23",
        "nute": 41123
    },
    "PE.TU": {
        "name": "Tumbes",
        "hasc": "PE.TU",
        "iso": "TUM",
        "fips": "PE24",
        "nute": 40124
    },
    "PE.UC": {
        "name": "Ucayali",
        "hasc": "PE.UC",
        "iso": "UCA",
        "fips": "PE25",
        "nute": 40425
    }
}

In [3]:
properties_list = ["#", "Name", "HASC", "ISO", "FIPS", "NUTE"]
str_template = "{0:2s}  {1:20s}{2:10s}{3:10s}{4:10s}{5:10s}"
print(str_template.format(*properties_list))

statoids_list = list(statoids.items())
statoids_list.sort()

str_template= "{0:0>2d}  {1:20s}{2:10s}{3:10s}{4:10s}{5:10s}"

for i, (id_, properties)  in enumerate(statoids_list):
    properties = list(str(properties[property_name.lower()]) for property_name in properties_list[1:])
    print(str_template.format(i+1, *properties))

#   Name                HASC      ISO       FIPS      NUTE      
01  Amazonas            PE.AM     AMA       PE01      40201     
02  Ancash              PE.AN     ANC       PE02      40502     
03  Apurímac            PE.AP     APU       PE03      40903     
04  Arequipa            PE.AR     ARE       PE04      41004     
05  Ayacucho            PE.AY     AYA       PE05      40905     
06  Cajamarca           PE.CJ     CAJ       PE06      40106     
07  Callao              PE.CL     CAL       PE07      40607     
08  Cusco               PE.CS     CUS       PE08      40808     
09  Huánuco             PE.HC     HUC       PE10      40410     
10  Huancavelica        PE.HV     HUV       PE09      40909     
11  Ica                 PE.IC     ICA       PE11      41011     
12  Junín               PE.JU     JUN       PE12      40712     
13  Lambayeque          PE.LB     LAM       PE14      40114     
14  La Libertad         PE.LL     LAL       PE13      40513     
15  Loreto              P

### 2.1 Corrections in GeoJson Data

In [4]:
# Loading GeoJson Data
geo_file = "./peru_all_geo.json"
geo_handler = open(geo_file, "r")
geo_json = geo_handler.read()
geo_handler.close()
geo_json = json.loads(geo_json)

Every departament and province in `geoJson` variable is identified as a feature object, every feature object has a property called `properties` which contains metadata of the feature. The list of properties names is

In [5]:
# Getting names of properties for each feature in 
geo_properties_list = list(geo_json["features"][0]["properties"].keys())
geo_properties_list.sort()

for property_ in geo_properties_list:
    print("*", property_)

* alt-name
* country
* fips
* hasc
* hc-a2
* hc-group
* hc-key
* hc-middle-x
* hc-middle-y
* labelrank
* latitude
* longitude
* name
* postal-code
* region
* subregion
* type
* type-en
* woe-id
* woe-label
* woe-name


Tabulating properties in the same way as satoids

In [6]:
# Tabulating data of features in geojson
properties_list = ["#", "ID", "Name", "HASC", "ISO", "FIPS", "NUTE"]
str_template = "{0:2s}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"
print(str_template.format(*properties_list))

geo_features = dict((feature["id"], feature["properties"]) for feature in geo_json["features"])

features_list = list(geo_features.items())
features_list.sort()

str_template = "{0:0>2d}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"

for i, (id_, property_) in enumerate(features_list):
    
    property_ = list(
        str(property_[pn.lower()]) if pn.lower() in property_.keys() else "---" for pn in properties_list[2:]
    )
    
    print(
        str_template.format(
            i + 1,
            id_,
            *property_
        )
    )

#   ID        Name                HASC      ISO       FIPS      NUTE      
01  PE.145    Lima Province       PE.LR     ---       None      ---       
02  PE.3341   Callao              PE.       ---       None      ---       
03  PE.AM     Amazonas            PE.AM     ---       PE01      ---       
04  PE.AN     Ancash              PE.AN     ---       PE02      ---       
05  PE.AP     Apurímac            PE.AP     ---       PE03      ---       
06  PE.AR     Arequipa            PE.AR     ---       PE04      ---       
07  PE.AY     Ayacucho            PE.AY     ---       PE09      ---       
08  PE.CJ     Cajamarca           PE.CJ     ---       PE06      ---       
09  PE.CL     Callao              PE.CL     ---       PE21      ---       
10  PE.CS     Cusco               PE.CS     ---       PE08      ---       
11  PE.HC     Huánuco             PE.HC     ---       PE10      ---       
12  PE.HV     Huancavelica        PE.HV     ---       PE09      ---       
13  PE.IC     Ica        

As is seen in the above list, HASC codes doesn't match with feature id for PE.145 and PE.3341. Also HASC code is repeated for PE.145 and PE.LR, and name Callao appear twice in PE.3341 and PE.CL. This fact give us a reazon to analze more in deepth the features with id PE.145, PE.3341, PE.CL and PE.LR.

In the following cells of code the data will be analized and corrected

In [7]:
features_list = list(geo_features.items())
features_list.sort();

unused_features_id_list = list(geo_features.keys())
unused_statoids_id_list = list(statoids.keys())

for id_, properties in features_list:
    print("+ ID: {0:} ... ".format(id_), end="")
    
    if id_ in unused_statoids_id_list:
        statoids_properties = statoids[id_]
        
        geo_features[id_].update(statoids_properties)
        properties.update(statoids_properties)
        
        unused_features_id_list.remove(id_)
        unused_statoids_id_list.remove(id_)
        
        print("Updated")
        
    else:
        print("Not registered")
        
        
    properties = list(properties.items())
    properties.sort()
    
    for name, value in properties:
        print("  - {0:15s}: {1:20s}".format(name, str(value)))
    
    print("\n")

+ ID: PE.145 ... Not registered
  - alt-name       : Federal capital     
  - country        : Peru                
  - fips           : None                
  - hasc           : PE.LR               
  - hc-a2          : LP                  
  - hc-group       : admin1              
  - hc-key         : pe-145              
  - hc-middle-x    : 0.33                
  - hc-middle-y    : 0.33                
  - labelrank      : 9                   
  - latitude       : -12.1124            
  - longitude      : -76.92359999999999  
  - name           : Lima Province       
  - postal-code    : None                
  - region         : None                
  - subregion      : None                
  - type           : Distrito Capital    
  - type-en        : Captial District    
  - woe-id         : 28358302            
  - woe-label      : Lima Metropolitan Area, PE, Peru
  - woe-name       : Lima Province       


+ ID: PE.3341 ... Not registered
  - alt-name       : Region            

in the previous block is seen that some ID are unused and are stored in `unused_` lists. Thos ID are potential errors and can reveal other errors

In [8]:
print("Unused features IDs:", unused_features_id_list)
print("Unused statoids IDs:", unused_statoids_id_list)

Unused features IDs: ['PE.3341', 'PE.145']
Unused statoids IDs: ['PE.LP', 'PE.PU']


As is seen, statoids doesn't have ids 'PE.3341' nor 'PE.145', and the ids both 'PE.PU' and 'PE.LP' are not registered in features of geojson, those data need to be compared

In [9]:
properties_list = ["type","ID", "Name", "HASC", "FIPS"]
str_template = "{0:10s}{1:10s}{2:20s}{3:10s}{4:10s}"

print(str_template.format(*properties_list))

for id_ in unused_features_id_list:
    print(
        str_template.format(
            "feature",
            id_, 
            *list(str(geo_features[id_][property_name.lower()]) for property_name in properties_list[2:])
        )
    )

for id_ in unused_statoids_id_list:
    print(
        str_template.format(
            "statoid",
            id_, 
            *list(str(statoids[id_][property_name.lower()]) for property_name in properties_list[2:])
        )
    )


type      ID        Name                HASC      FIPS      
feature   PE.3341   Callao              PE.       None      
feature   PE.145    Lima Province       PE.LR     None      
statoid   PE.LP     Lima Provincia      PE.LP     PE26      
statoid   PE.PU     Puno                PE.PU     PE21      


The code above reveals that Callao, needs to be analized in both features and statoids. In order to complete the list of keys that was previously deduced, is necessary add tho this analisys the PE.LR key.

In [10]:
warn_features_id_list = unused_features_id_list
warn_features_id_list.append('PE.CL')
warn_features_id_list.append('PE.LR')
warn_features_id_list.sort()

warn_statoids_id_list = unused_statoids_id_list
warn_statoids_id_list.append('PE.CL')
warn_statoids_id_list.append('PE.LR')
warn_statoids_id_list.sort()

properties_list = ["type","ID", "Name", "HASC", "FIPS"]
str_template = "{0:10s}{1:10s}{2:20s}{3:10s}{4:10s}"

print(str_template.format(*properties_list))

for id_ in warn_features_id_list:
    print(
        str_template.format(
            "feature",
            id_, 
            *list(str(geo_features[id_][property_name.lower()]) for property_name in properties_list[2:])
        )
    )

for id_ in warn_statoids_id_list:
    print(
        str_template.format(
            "statoid",
            id_, 
            *list(str(statoids[id_][property_name.lower()]) for property_name in properties_list[2:])
        )
    )

type      ID        Name                HASC      FIPS      
feature   PE.145    Lima Province       PE.LR     None      
feature   PE.3341   Callao              PE.       None      
feature   PE.CL     Callao              PE.CL     PE07      
feature   PE.LR     Lima                PE.LR     PE15      
statoid   PE.CL     Callao              PE.CL     PE07      
statoid   PE.LP     Lima Provincia      PE.LP     PE26      
statoid   PE.LR     Lima                PE.LR     PE15      
statoid   PE.PU     Puno                PE.PU     PE21      


The HASC code of Lima Provice is wrong, Callao is twice referenced, and Puno is not referenced.

In [11]:
features_list = list((id_,geo_features[id_]) for id_ in unused_features_id_list)
features_list.sort();

for id_, properties in features_list:
    print("+ ID: {0:}".format(id_))
        
    properties = list(properties.items())
    properties.sort()
    
    for name, value in properties:
        print("  - {0:15s}: {1:20s}".format(name, str(value)))
    
    print("\n")

+ ID: PE.145
  - alt-name       : Federal capital     
  - country        : Peru                
  - fips           : None                
  - hasc           : PE.LR               
  - hc-a2          : LP                  
  - hc-group       : admin1              
  - hc-key         : pe-145              
  - hc-middle-x    : 0.33                
  - hc-middle-y    : 0.33                
  - labelrank      : 9                   
  - latitude       : -12.1124            
  - longitude      : -76.92359999999999  
  - name           : Lima Province       
  - postal-code    : None                
  - region         : None                
  - subregion      : None                
  - type           : Distrito Capital    
  - type-en        : Captial District    
  - woe-id         : 28358302            
  - woe-label      : Lima Metropolitan Area, PE, Peru
  - woe-name       : Lima Province       


+ ID: PE.3341
  - alt-name       : Region              
  - country        : Peru          

#### 2.1.1 Correction for PE.145

In [12]:
geo_features['PE.145']["alt-name"] = "Lima Metropolitana"
geo_features['PE.145']["country"] = "Peru"
geo_features['PE.145']["fips"] = None
geo_features['PE.145']["hasc"] = "PE.LP"
geo_features['PE.145']["hc-a2"] = "LP"
geo_features['PE.145']["hc-group"] = "admin1"
geo_features['PE.145']["hc-key"] = "pe-lp"
geo_features['PE.145']["hc-middle-x"] = 0.33
geo_features['PE.145']["hc-middle-y"] = 0.33
geo_features['PE.145']["labelrank"] = 9
geo_features['PE.145']["latitude"] = -12.1124
geo_features['PE.145']["longitude"] = -76.92359999999999
geo_features['PE.145']["name"] = "Lima Province"
geo_features['PE.145']["postal-code"] = None
geo_features['PE.145']["region"] = None
geo_features['PE.145']["subregion"] = None
geo_features['PE.145']["type"] = "Distrito Capital"
geo_features['PE.145']["type-en"] = "Captial District"
geo_features['PE.145']["woe-id"] = 28358302
geo_features['PE.145']["woe-label"] = "Área Metropolitanan de Lima, PE, Peru"
geo_features['PE.145']["woe-name"] = "Lima Provincia"
geo_features.update({'PE.LP':geo_features['PE.145']})
geo_features['PE.LP'].update(statoids['PE.LP'])
geo_features['PE.145'].update(statoids['PE.LP'])

#### 2.1.2 Addition of PE.PU as replace of PE.CL

In [13]:
geo_features["PE.CL"]["alt-name"] = "Puno"
geo_features["PE.CL"]["country"] = "Peru"
geo_features["PE.CL"]["fips"] = "PE07"
geo_features["PE.CL"]["hasc"] = "PE.PU"
geo_features["PE.CL"]["hc-a2"] = "PU"
geo_features["PE.CL"]["hc-group"] = "admin1"
geo_features["PE.CL"]["hc-key" ] = "pe-pu"
geo_features["PE.CL"]["hc-middle-x"] = 0.46
geo_features["PE.CL"]["hc-middle-y"] = 0.41
geo_features["PE.CL"]["iso"] = "CAL"
geo_features["PE.CL"]["labelrank"] = 4
geo_features["PE.CL"]["latitude"] = -15.1677
geo_features["PE.CL"]["longitude"] = -69.9802
geo_features["PE.CL"]["name"] = "Puno"
geo_features["PE.CL"]["nute"] = 40607
geo_features["PE.CL"]["postal-code"] = "PU"
geo_features["PE.CL"]["region"] = None
geo_features["PE.CL"]["subregion"] = None
geo_features["PE.CL"]["type"] = "Departamento"
geo_features["PE.CL"]["type-en"] = "Department"
geo_features["PE.CL"]["woe-id"] = 2346488
geo_features["PE.CL"]["woe-label"] = "Puno, PE, Peru"
geo_features["PE.CL"]["woe-name"] = "Puno"
geo_features.update({'PE.PU':geo_features['PE.CL']})
geo_features['PE.PU'].update(statoids['PE.PU'])

#### 2.1.3 Filling of PE.CL with PE.3341

In [14]:
geo_features["PE.CL"] = geo_features["PE.3341"]
geo_features["PE.CL"].update(statoids['PE.CL'])
geo_features["PE.3341"].update(statoids['PE.CL'])

#### 2.1.4 Making corrections in `geo_json` data

In [15]:
for i, feature in enumerate(geo_json["features"]):
    if feature['id'] == "PE.145":
        geo_json["features"][i]["properties"] = geo_features["PE.145"]
        geo_json["features"][i]["id"] = "PE.LP"
        
    elif feature['id'] == "PE.CL":
        geo_json["features"][i]["properties"] = geo_features["PE.PU"]
        geo_json["features"][i]["id"] = "PE.PU"
        
    elif feature['id'] == "PE.3341":
        geo_json["features"][i]["properties"] = geo_features["PE.CL"]
        geo_json["features"][i]["id"] = "PE.CL"
    
    else:
        geo_json["features"][i]["properties"] = geo_features[feature["id"]]

In [16]:
properties_list = ["#", "ID", "Name", "HASC", "ISO", "FIPS", "NUTE"]
str_template = "{0:2s}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"
print(str_template.format(*properties_list))

corrected_geo_features = dict((feature["id"], feature["properties"]) for feature in geo_json["features"])

features_list = list(corrected_geo_features.items())
features_list.sort()

str_template = "{0:0>2d}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"

for i, (id_, property_) in enumerate(features_list):
    
    property_ = list(
        str(property_[pn.lower()]) if pn.lower() in property_.keys() else "---" for pn in properties_list[2:]
    )
    
    print(
        str_template.format(
            i + 1,
            id_,
            *property_
        )
    )

#   ID        Name                HASC      ISO       FIPS      NUTE      
01  PE.AM     Amazonas            PE.AM     AMA       PE01      40201     
02  PE.AN     Ancash              PE.AN     ANC       PE02      40502     
03  PE.AP     Apurímac            PE.AP     APU       PE03      40903     
04  PE.AR     Arequipa            PE.AR     ARE       PE04      41004     
05  PE.AY     Ayacucho            PE.AY     AYA       PE05      40905     
06  PE.CJ     Cajamarca           PE.CJ     CAJ       PE06      40106     
07  PE.CL     Callao              PE.CL     CAL       PE07      40607     
08  PE.CS     Cusco               PE.CS     CUS       PE08      40808     
09  PE.HC     Huánuco             PE.HC     HUC       PE10      40410     
10  PE.HV     Huancavelica        PE.HV     HUV       PE09      40909     
11  PE.IC     Ica                 PE.IC     ICA       PE11      41011     
12  PE.JU     Junín               PE.JU     JUN       PE12      40712     
13  PE.LB     Lambayeque 

#### 2.1.5 Saving `geo_json` data

In [17]:
geo_json = json.dumps(geo_json, indent=4)
geo_json = geo_json.strip('"')
geo_json = geo_json.replace('\\n', '\n')
geo_json = geo_json.replace('\\"', '"')
geo_json = geo_json + '\n'
print(geo_json)

geo_file = "./peru_geo.json"
geo_handler = open(geo_file, "w")
geo_handler.write(geo_json.encode('UTF-8').decode('UTF-8'))
geo_handler.close()

{
    "title": "Peru",
    "features": [
        {
            "type": "Feature",
            "properties": {
                "woe-name": "Ica",
                "hasc": "PE.IC",
                "iso": "ICA",
                "hc-middle-y": 0.52,
                "alt-name": null,
                "hc-middle-x": 0.49,
                "woe-id": "2346478",
                "longitude": "-75.6773",
                "subregion": null,
                "hc-group": "admin1",
                "nute": 41011,
                "type": "Departamento",
                "region": null,
                "hc-key": "pe-ic",
                "labelrank": "7",
                "hc-a2": "IC",
                "fips": "PE11",
                "type-en": "Department",
                "name": "Ica",
                "woe-label": "Ica, PE, Peru",
                "postal-code": "IC",
                "country": "Peru",
                "latitude": "-14.2257"
            },
            "id": "PE.IC",
            "geometry": {
 

### 2.1 Corrections in TopoJson Data

Following  the same steps as with GeoJson data, is it possible to correct the error in TopoJson data.

In [18]:
# Loading Peru TopoJson
topo_file = "./peru_all_topo.json"
topo_handler = open(topo_file, "r")
topo_json = topo_handler.read()
topo_handler.close()
topo_json = json.loads(topo_json)

In [19]:
# Getting names of properties for each feature in 
topo_properties_list = list(topo_json["objects"]["peru"]["geometries"][0]["properties"].keys())
topo_properties_list.sort()

for property_ in topo_properties_list:
    print("*", property_)

* name


In [20]:
# Tabulating data of features in geojson
properties_list = ["#", "ID", "Name", "HASC", "ISO", "FIPS", "NUTE"]
str_template = "{0:2s}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"
print(str_template.format(*properties_list))

topo_geometries_list = list(
    (geometry["id"], geometry["properties"]) for geometry in topo_json["objects"]["peru"]["geometries"]
)

geometries_list = topo_geometries_list

str_template = "{0:0>2d}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"

for i, (id_, property_) in enumerate(geometries_list):
    
    property_ = list(
        str(property_[pn.lower()]) if pn.lower() in property_.keys() else "---" for pn in properties_list[2:]
    )
    
    print(
        str_template.format(
            i + 1,
            id_,
            *property_
        )
    )

#   ID        Name                HASC      ISO       FIPS      NUTE      
01  PE.       Callao              ---       ---       ---       ---       
02  PE.LB     Lambayeque          ---       ---       ---       ---       
03  PE.PI     Piura               ---       ---       ---       ---       
04  PE.TU     Tumbes              ---       ---       ---       ---       
05  PE.AP     Apurímac            ---       ---       ---       ---       
06  PE.AR     Arequipa            ---       ---       ---       ---       
07  PE.CS     Cusco               ---       ---       ---       ---       
08  PE.MD     Madre de Dios       ---       ---       ---       ---       
09  PE.CL     Callao              ---       ---       ---       ---       
10  PE.MQ     Moquegua            ---       ---       ---       ---       
11  PE.TA     Tacna               ---       ---       ---       ---       
12  PE.AN     Ancash              ---       ---       ---       ---       
13  PE.CJ     Cajamarca  

There is no much information to correct the data, bus since the amount of data is 'small' is posible to check for aditional information in the file. Especifically looking `arcs` property of every geometry and the coordinates corresponding to every `arc`. This is necessary only for ids PE., PE.CL, and PE.LR which apear twice

In [21]:
topo_metadata = geo_features

for i, geometry in enumerate(topo_json["objects"]["peru"]["geometries"]):      
    if geometry["id"] == "PE.":
        topo_json["objects"]["peru"]["geometries"][i]["id"] = "PE.CL"
        geometry["id"] = "PE.CL"

    elif geometry["id"] == "PE.CL":
        topo_json["objects"]["peru"]["geometries"][i]["id"] = "PE.PU"
        geometry["id"] = "PE.PU"

    elif geometry['properties']["name"] == "Lima Province":
        topo_json["objects"]["peru"]["geometries"][i]["id"] = "PE.LP"
        geometry["id"] = "PE.LP"
    
    topo_json["objects"]["peru"]["geometries"][i]["properties"] = topo_metadata[geometry["id"]]

In [22]:
# Tabulating data of features in geojson
properties_list = ["#", "ID", "Name", "HASC", "ISO", "FIPS", "NUTE"]
str_template = "{0:2s}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"
print(str_template.format(*properties_list))

topo_geometries = dict(
    (geometry["id"], geometry["properties"]) for geometry in topo_json["objects"]["peru"]["geometries"]
)

geometries_list = list(topo_geometries.items())
geometries_list.sort()

str_template = "{0:0>2d}  {1:10s}{2:20s}{3:10s}{4:10s}{5:10s}{6:10s}"

for i, (id_, property_) in enumerate(geometries_list):
    
    property_ = list(
        str(property_[pn.lower()]) if pn.lower() in property_.keys() else "---" for pn in properties_list[2:]
    )
    
    print(
        str_template.format(
            i + 1,
            id_,
            *property_
        )
    )

#   ID        Name                HASC      ISO       FIPS      NUTE      
01  PE.AM     Amazonas            PE.AM     AMA       PE01      40201     
02  PE.AN     Ancash              PE.AN     ANC       PE02      40502     
03  PE.AP     Apurímac            PE.AP     APU       PE03      40903     
04  PE.AR     Arequipa            PE.AR     ARE       PE04      41004     
05  PE.AY     Ayacucho            PE.AY     AYA       PE05      40905     
06  PE.CJ     Cajamarca           PE.CJ     CAJ       PE06      40106     
07  PE.CL     Callao              PE.CL     CAL       PE07      40607     
08  PE.CS     Cusco               PE.CS     CUS       PE08      40808     
09  PE.HC     Huánuco             PE.HC     HUC       PE10      40410     
10  PE.HV     Huancavelica        PE.HV     HUV       PE09      40909     
11  PE.IC     Ica                 PE.IC     ICA       PE11      41011     
12  PE.JU     Junín               PE.JU     JUN       PE12      40712     
13  PE.LB     Lambayeque 

In [23]:
topo_json = json.dumps(topo_json, indent=4)
topo_json = topo_json.strip('"')
topo_json = topo_json.replace('\\n', '\n')
topo_json = topo_json.replace('\\"', '"')
topo_json = topo_json + '\n'
print(topo_json)

topo_file = "./peru_topo.json"
topo_handler = open(topo_file, "w")
topo_handler.write(topo_json.encode('UTF-8').decode('UTF-8'))
topo_handler.close()

{
    "transform": {
        "scale": [
            0.001265457050305029,
            0.0018310484543392366
        ],
        "translate": [
            -81.33755752899992,
            -18.337746206937936
        ]
    },
    "type": "Topology",
    "arcs": [
        [
            [
                3281,
                3402
            ],
            [
                3,
                -4
            ],
            [
                -8,
                2
            ],
            [
                -10,
                -1
            ],
            [
                -10,
                5
            ],
            [
                -15,
                6
            ],
            [
                -9,
                9
            ],
            [
                -5,
                8
            ],
            [
                8,
                2
            ],
            [
                9,
                -3
            ],
            [
                30,
                -