# Exploring EA editing

The editing process happens once EAS have been successfully digitized, and coding has occured (on paper). The editing process is the reconciliation of the paper coding efforts with the digital data. It involves a combination of manual data entry with some geospatial comparisons

In [1]:
import os, sys

import geopandas as gpd

In [2]:
# I chose two shapefiles at random that were edited and non-edited
editedShape = r"C:\Users\WB411133\OneDrive - WBG\AAA_BPS\GOST\Projects\Ghana_Census_Support\Data\GSS_Data\EDITED_DISTRICTS\2020 GUSHIEGU\TYPE 2\gushiegu final1.shp"
uneditedShape = r"C:\Users\WB411133\OneDrive - WBG\AAA_BPS\GOST\Projects\Ghana_Census_Support\Data\GSS_Data\UNEDITED_DSITRICTS\2020 KORLE KLOTTEY\2020_KORLE_KLOTTEY_FINAL.shp"

inE = gpd.read_file(editedShape)
inU = gpd.read_file(uneditedShape)

### Investigating columns
The edited dataset has many more columns than the unedited dataset, as exepcted. The goal of this exercise is to identify which aspects of this column creation can be handled in a model.

In [3]:
inE.columns

Index(['SP_ID', 'GEOMETRY_S', 'ID1', 'EST_POP', 'Z00_EA_COD', 'REG_NAME',
       'DIST_CODE', 'FID_1', 'DIST_NAME', 'TYP', 'LOC_NAME', 'BASE_NAM',
       'Z010_EA_CO', 'SA', 'EA_NUM', 'AGRIC_ZONE', 'LOC_NO', '20_PROV_CO',
       '20_REG_NAM', '20_DIS_NAM', '20_EA_TYP', '20_BASE_N', '20_LOCAL',
       '20_EST_POP', 'SPECIAL', '20_DIS_COD', '20_DIS_TYP', '20_LOC_TYP',
       '20_LOC_NO', '20_EA_NO', '20_EA_CODE', '20_SA_NO', 'ORIENT',
       'geometry'],
      dtype='object')

In [4]:
inU.columns

Index(['SP_ID', 'GEOMETRY_S', 'ID1', 'EST_POP', 'REG_NAME', 'DIST_CODE',
       'DIST_NAME', 'Z00_EA_COD', 'TYP', 'LOC_NAME', 'Z010_EA_CO', 'BASE_NAM',
       'FID_1', 'SA', 'AGRIC_ZONE', 'LOC_NUM', 'EA_NO_', 'Z020_PROV',
       'Z020_LOC', 'Z020_BASE', 'Z020_TYPE', 'Z020_EST_P', 'Z020_EA',
       'SPECIAL', 'Z020_SA_', 'Z020_REGIO', 'geometry'],
      dtype='object')

In [5]:
# These are the columns in the unedited dataset that are not present in edited
new_unedited_columns = [x for x in list(inU.columns) if not x in list(inE.columns)]
new_unedited_columns

['LOC_NUM',
 'EA_NO_',
 'Z020_PROV',
 'Z020_LOC',
 'Z020_BASE',
 'Z020_TYPE',
 'Z020_EST_P',
 'Z020_EA',
 'Z020_SA_',
 'Z020_REGIO']

In [6]:
# These are the columns in the edited dataset that are not present in unedited
new_edited_columns = [x for x in list(inE.columns) if not x in list(inU.columns)]
new_edited_columns

['EA_NUM',
 'LOC_NO',
 '20_PROV_CO',
 '20_REG_NAM',
 '20_DIS_NAM',
 '20_EA_TYP',
 '20_BASE_N',
 '20_LOCAL',
 '20_EST_POP',
 '20_DIS_COD',
 '20_DIS_TYP',
 '20_LOC_TYP',
 '20_LOC_NO',
 '20_EA_NO',
 '20_EA_CODE',
 '20_SA_NO',
 'ORIENT']

In [7]:
['EA_NUM',     # RENAME
 'LOC_NO',     # RENAME
 '20_EST_POP', # RENAME       # Estimated Population
 '20_PROV_CO', # RENAME       # Provisional EA Code from the field
 '20_BASE_N',  # RENAME       # EA Name
 '20_EA_TYP',  # RENAME       # EA TYPE
 '20_LOC_TYP', # RENAME       # Locality type - 1 - urban 2 - rural
 
 '20_REG_NAM', # MANUAL       # REGION NAME  
 '20_LOC_NO',  # MANUAL       # Locality Number - number of localities in area
 '20_LOCAL',   # MANUAL       # Same as 20_BASE_N
 '20_EA_NO',   # MANUAL       # EA Number
 '20_SA_NO',   # MANUAL       # SA number
 '20_DIS_NAM', # MANUAL       # DISTRICT NAME
 '20_DIS_COD', # JOIN         # District Code
 '20_DIS_TYP', # JOIN         # District Type
 '20_EA_CODE', # CONCATENATED # Complete EA CODE - REGION + DISTRICT + DISTRICTTYPE + EA
 'ORIENT']     # WB           # My estimate of portrait v landscape for the EA

['EA_NUM',
 'LOC_NO',
 '20_EST_POP',
 '20_PROV_CO',
 '20_BASE_N',
 '20_EA_TYP',
 '20_LOC_TYP',
 '20_REG_NAM',
 '20_LOC_NO',
 '20_LOCAL',
 '20_EA_NO',
 '20_SA_NO',
 '20_DIS_NAM',
 '20_DIS_COD',
 '20_DIS_TYP',
 '20_EA_CODE',
 'ORIENT']

# Steps to perform editing

1. Add final fields
2. Populate rename fields

### Manual data entry

### Locality verification

### Google Earth comparison

1. Concatanate final EA code
2. Perform SA dissolve - this is actually a locality dissolve!
