### Prep A - Warfront Layer Generation/

This script sits outside of the main process. In this script, we take the raw data given to us by the WHO in Excel format, and join it to a shapefile. The raw document was composed of a list of homogenous governorates for the North, a list of homogenous governorates for the South, and then additional tabs for governorates that are split (see Taizz, Al Jawf, etc). 

We aim to build this information into a shapefile which we can bring in to the main process when relevant.

We commence by importing the usual suspects

In [1]:
import pandas as pd
import os, sys
sys.path.append(r'C:\Users\charl\Documents\GitHub\GOST_PublicGoods\GOSTNets\GOSTNets')
sys.path.append(r'C:\Users\charl\Documents\GitHub\GOST')
import GOSTnet as gn
import importlib
importlib.reload(gn)
import geopandas as gpd
import rasterio as rt
from rasterio import features
from shapely.wkt import loads
import numpy as np
import networkx as nx
from shapely.geometry import box, Point

peartree version: 0.6.0 
networkx version: 2.2 
matplotlib version: 2.2.2 
osmnx version: 0.8.2 
peartree version: 0.6.0 
networkx version: 2.2 
matplotlib version: 2.2.2 
osmnx version: 0.8.2 


We set paths to our utility working folder, and import the shapefile we want to join the information on to. Note, we use GADM boundaries and NOT World Bank boundaries here because the spelling of placenames is notoriously unreliable in the Yemen region. As the Excel document used the placenames taken from the GADM layer, we go with the GADM layer, rather than trying to manually match the placenames on the equivalent World Bank admin boundary shapefile (I tried this first, it is VERY painful). 

In [2]:
util_pth = r'C:\Users\charl\Documents\GOST\Yemen\util_files\gadm36_YEM_shp'
util_fil = r'gadm36_YEM_2.shp'
util_shp = gpd.read_file(os.path.join(util_pth, util_fil))

We use a simple binary code - 1 for South, 0 for North. This will enable easier manipulation later. We build a dictionary for ease of use.

In [3]:
homogenous_status = {'Abyan':1,
                    '`Adan':1, 
                    'Al Dali\'':1,
                    'Al Mahrah':1,
                    'Hadramawt':1, 
                    'Lahij':1,
                    'Ma\'rib':1,
                    'Shabwah':1, 
                     
                    'Al Mahwit':0,
                    'Amran':0, 
                    'Dhamar':0,
                    'Hajjah':0,
                    'Ibb':0,
                    'Sa`dah':0,
                    'San`a\'':0,
                    'Raymah':0,
                    'Amanat Al Asimah':0
                    }

We pick out the unique values in the NAME_1 field of the administrative boundary shapefile:

In [4]:
unique_regions = list(util_shp.NAME_1.unique())

...and check against the keys of the above dictionary to identify the non-homogenous governorates. We will then take each of those in turn and build similar dictionaries for them. 

In [5]:
for i in homogenous_status.keys():
    if i not in unique_regions:
        print(i)

Taizz is one such governorates where control is split between the north and the south. So, we pick out from the shapefile all districts where NAME_1 is Ta`izz, and make a list of the unique values in NAME_2 (the sub-governorate districts)

In [6]:
Taizz_shp_districts = list(util_shp.loc[util_shp.NAME_1 == 'Ta`izz'].NAME_2.unique())

N.B. This information is hard-coded from the excel I was given

In [7]:
Taizz = {'Al  Mukha':0,
 "Al Ma'afer":1,
 'Al Mawasit':0,
 'Al Misrakh':1,
 'Al Mudhaffar':0,
 'Al Qahirah':1,
 "Al Wazi'iyah":0,
 'As Silw':1,
 'Ash Shamayatayn':0,
 "At Ta'iziyah":0,
 'Dhubab':0,
 'Dimnat Khadir':0,
 'Hayfan':1,
 'Jabal Habashy':1,
 'Maqbanah':0,
 "Mashra'a Wa Hadnan":0,
 'Mawiyah':0,
 'Mawza':0,
 'Sabir Al Mawadim':1,
 'Salh':1,
 'Sama':0,
 "Shara'b Ar Rawnah":0,
 "Shara'b As Salam":0}

We check to make sure that all Taizz districts have been given a value

In [8]:
for i in Taizz.keys():
    if i not in Taizz_shp_districts:
        print(i)  

We repeat this process for Al Jawf

In [9]:
al_jawf_shp_districts = list(util_shp.loc[util_shp.NAME_1 == 'Al Jawf'].NAME_2.unique())

In [10]:
al_jawf = {
 'Al Ghayl':1,
 'Al Hazm':1,
 'Al Humaydat':0,
 'Al Khalq':1,
 'Al Maslub':1,
 'Al Matammah':0,
 'Al Maton':0,
 'Az Zahir':0,
 'Bart Al Anan':0,
 "Khabb wa ash Sha'af":0,
 'Kharab Al Marashi':0,
 'Rajuzah':0
}

In [11]:
for i in al_jawf.keys():
    if i not in al_jawf_shp_districts:
        print(i)      

... and Al Hudaydah

In [12]:
al_hudaydah_shp_districts = list(util_shp.loc[util_shp.NAME_1 == 'Al Hudaydah'].NAME_2.unique())

In [13]:
Al_Hudaydah = {'Ad Dahi':0,
 'Ad Durayhimi':1,
 'Al Garrahi':0,
 'Al Hajjaylah':0,
 'Al Hali':0,
 'Al Hawak':1,
 'Al Khawkhah':1,
 'Al Mansuriyah':0,
 "Al Marawi'ah":0,
 'Al Mighlaf':0,
 'Al Mina':0,
 'Al Munirah':0,
 'Al Qanawis':0,
 'Alluheyah':0,
 'As Salif':0,
 'As Sukhnah':0,
 'At Tuhayat':1,
 'Az Zaydiyah':0,
 'Az Zuhrah':0,
 'Bajil':0,
 'Bayt Al Faqiah':1,
 'Bura':0,
 'Hays':1,
 "Jabal Ra's":0,
 'Kamaran':0,
 'Zabid':0}

In [14]:
for i in Al_Hudaydah.keys():
    if i not in al_hudaydah_shp_districts:
        print(i)     

... and Al Bayda

In [15]:
al_bayda_shp_districts = list(util_shp.loc[util_shp.NAME_1 == 'Al Bayda\''].NAME_2.unique())

In [16]:
al_bayda = {"Al A'rsh":0,
 'Al Bayda':0,
 'Al Bayda City':0,
 'Al Malagim':0,
 'Al Quraishyah':0,
 'Ar Ryashyyah':0,
 'As Sawadiyah':0,
 "As Sawma'ah":1,
 'Ash Sharyah':0,
 'At Taffah':0,
 'Az Zahir':1,
 "Dhi Na'im":0,
 'Maswarah':1,
 'Mukayras':0,
 "Na'man":1,
 "Nati'":1,
 "Rada'":0,
 'Radman Al Awad':0,
 'Sabah':0,
 "Wald Rabi'":0}

Now that we have assembled our binary dictionaries, we can map this 1/0 value onto ALL of the districts in the shapefile. We know by construction above that we have assigned values to every governorate or every district, so we need not worry about completeness at this stage. 

In [17]:
# Firstly, we use the map method on NAME_1 column to fill in our homogenous states, filling blanks with the word 'split'
util_shp['allegiance'] = util_shp['NAME_1'].map(homogenous_status).fillna('split')

# Now, we take locs of the GeoDataFrame and apply the govenorate specific dictionaries against the NAME_2 column 
# Where NAME_1 is a known split-loyalty state
util_shp['allegiance'].loc[util_shp.NAME_1 == 'Ta`izz'] = util_shp['NAME_2'].map(Taizz)
util_shp['allegiance'].loc[util_shp.NAME_1 == 'Al Jawf'] = util_shp['NAME_2'].map(al_jawf)
util_shp['allegiance'].loc[util_shp.NAME_1 == 'Al Bayda\''] = util_shp['NAME_2'].map(al_bayda)
util_shp['allegiance'].loc[util_shp.NAME_1 == 'Al Hudaydah'] = util_shp['NAME_2'].map(Al_Hudaydah)

# Finally, we count up the total number of 1s and 0s. 
util_shp['allegiance'].value_counts()

0    197
1    136
Name: allegiance, dtype: int64

We can now send this to file

In [42]:
util_shp.to_file(os.path.join(util_pth, 'conflictbounds.shp'), driver = 'ESRI Shapefile')

# Although this is useful, what's even more useful is to have files that 'join up' regions of homogenous control - 
# as the boundaries between these regions are the Warfronts. We do that by using Unary Union, below. 

from shapely.ops import unary_union
north = util_shp.loc[util_shp.allegiance == 0]
south = util_shp.loc[util_shp.allegiance == 1]
north_shp = unary_union(north.geometry)
south_shp = unary_union(south.geometry)

# We can now send to a file a 'south districts; file and a 'north districts' file. 
south_file = gpd.GeoDataFrame({'geometry':south_shp},geometry = 'geometry', crs = {'init':'epsg:4326'})
south_file.to_file(os.path.join(util_pth,'districts_south.shp'))

north_file = gpd.GeoDataFrame({'geometry':north_shp},geometry = 'geometry', crs = {'init':'epsg:4326'})
north_file.to_file(os.path.join(util_pth,'districts_north.shp'))

# finally, we can join them together into a merged districts file composed only of regions of homogenous control. 
combo = pd.concat([north_file, south_file])
combo.to_file((os.path.join(util_pth,'merged_dists_unedited.shp')))