### Prep C 

In this script, we take our c. 100 x 100m resolution WorldPop Grid and resample to 1000m x 1000m grid. This will enable us to use this grid as the origin points in our analysis. 

Credit to Ben Stewart for the assistance 

Import the usual suspects

In [1]:
import os, sys, time, pandas as pd, geopandas as gpd, rasterio as rt, numpy as np

import some special funcs for handling raster reprojection

In [2]:
import affine
from rasterio.warp import reproject
from rasterio.warp import Resampling

Bring in our raster as src

In [52]:
infil = r'pop15.tif'
src = rt.open(os.path.join(r'C:\Users\charl\Documents\GOST\Yemen\worldpop', infil))

Read the array into arr, check the first value

In [53]:
arr = src.read(masked=True)
arr[0].data[0,0]

-3.402823e+38

Remove insane values. After the removal of these big boys, we can sum the raster to check the population of Yemen

In [38]:
max_int32 = 2147483647
small = -0.000001
arr[arr > 1E6] = max_int32
arr[arr < 0] = small
arr = np.ma.masked_where(arr <= small, arr)
arr = np.ma.masked_where(arr >= max_int32, arr)
arr.sum()

28956732.0

...or about 29 million people. Next, we set values to unsigned int 32, and resave the file

In [39]:
profile = src.profile
d_type = rt.uint32
profile.update(nodata = 0,dtype = d_type)

with rt.open(os.path.join(r'C:\Users\charl\Documents\GOST\Yemen\worldpop',infil.replace('.tif','_norm.tif')), 'w', **profile) as dst:
    dst.write(arr.astype(d_type))

In preparation for the fact that the new raster will have 10x fewer values on each axis, we make a new array that is 10x smaller, but the same shape ratio as the original arr

In [40]:
factah = 0.1

newarr = np.empty(shape=(arr.shape[0],  # same number of bands
                         round(arr.shape[1] * factah), 
                         round(arr.shape[2] * factah)))

Here, we define the existing Affine transform as aff, then divide the bits of the transform through by the same factor to generate a new transform, newaff. For more info on what each number does, read here: https://buildmedia.readthedocs.org/media/pdf/rasterio/latest/rasterio.pdf

In [None]:
# adjust the new affine transform to the smaller cell size
aff = src.transform

newaff = affine.Affine(aff[0] / factah,  # x cell width
                aff[1],
                aff[2],  # upper left coordinate: x axis value
                aff[3], 
                aff[4] / factah,  # y cell width
                aff[5]) # upper left coordinate: y axis value

here we call the reproject method on our array using the new affine transform, keeping the crs the same, and using the average resampling method:

In [41]:
reproject(
    arr, newarr,
    src_transform = aff,
    dst_transform = newaff,
    src_crs = src.crs,
    dst_crs = src.crs,
    resampling = Resampling.average
)

We also need to correct the values - multiplying them by the factor ratio, commensurately. Here, we x 100, as factah = 0.1

In [None]:
newarr = newarr * ((1/factah) ** 2)

Next, we remove resampled insane values. This happens around the coastal areas where the average takes in some high no data vals

In [42]:
newarr[newarr >= max_int32] = max_int32
newarr[newarr <= small] = 0

We check to make sure this process hasn't knocked too many people out. We have ended up with a sum of total people at around 28.7 million, vs an original 28.9 million - so a very small percetnage change. 

In [43]:
newarr = np.ma.masked_where(newarr <= small, newarr)
newarr = np.ma.masked_where(newarr >= max_int32, newarr)
newarr.sum()

28711247.37282433

...equivalent to less than 1% of the total

In [44]:
((arr.sum() - newarr.sum()) / arr.sum())

0.008477635776567263

Satisfied, we now proceed to write this out to file

In [45]:
# Write an array as a raster band to a new 8-bit file. For
# the new file's profile, we start with the profile of the source
profile = src.profile

d_type = rt.int32

# And then change the band count to 1, set the
# dtype to uint8, and adjust the transform accordingly
profile.update(width = newarr.shape[2],
              height = newarr.shape[1],
              transform = newaff, 
               dtype = d_type, 
               nodata = max_int32)

outfil = infil.replace('.tif','_resampled.tif')

with rt.open(os.path.join(r'C:\Users\charl\Documents\GOST\Yemen\worldpop',outfil), 'w', **profile) as dst:
    dst.write(newarr.astype(d_type))

### Final Step - Important - Convert to .csv
In order to use this resampled file as our origins, we need a .csv with points. 
I haven't found a near an efficient way of doing this in Python. 

As such, we use the toolbox 'vector creation --> raster pixels to points' function in QGIS to turn the final raster into a point layer, as a shapefile. 

Once this has been achieved, run the step below to move from shapefile to .csv (the shapefile can then be deleted to save disk space as we will not refer to it again)

In [55]:
pth = r'C:\Users\charl\Documents\GOST\Yemen\origins'
fil = r'origins_1km_2018.shp'
loc = os.path.join(pth, fil)

points_shp = gpd.read_file(loc)
points_shp['Longitude'] = points_shp['geometry'].apply(lambda x: x.x)
points_shp['Latitude'] = points_shp['geometry'].apply(lambda x: x.y)
points_shp.to_csv(os.path.join(pth, fil.replace('.shp','.csv')))