# South Asia GDP Prediction

This jupyter notebook is developed to predict Gross Domestic Product (GDP) of several countries in the South Asian region by utilizing the nighttime light satellite data. GDP is an important economic indicator that is closely related to the economic development of a country. Therefore, accurately predicting GDP can provide valuable insights into the economic growth of a region.

Nighttime light data provides a unique perspective into human activities and infrastructure, which can be used to estimate electricity consumption, economic activity, and urbanization trends. In this project, I will explore the relationship between nighttime light data and GDP to predict GDP for several countries in South Asia.

In [1]:
import ee
import json
import requests
import os
import shapefile
import geemap
import geemap as gee
import geopandas as gpd
import pandas as ps
import numpy as np
from pprint import pprint 

# Initialize the Earth Engine API
# ee.Authenticate()
ee.Initialize()

The `SAsia_Merged.shp` is a vector file that was created by combining the countries of Sri Lanka, India, Pakistan, Nepal, Bhutan, and Bangladesh, which are the countries interested in for predicting GDP. To filter data from Google Earth Engine (GEE), it needs to define an area of interest (AOI) that covers the regions of these countries. Instead of using the geometry of `SAsia_Merged.shp` directly, which contains a large number of nodes making it difficult for GEE to filter data efficiently, I can create a square shape that covers the AOI.

To create a square shape that covers the AOI, I first calculated the maximum and minimum coordinates of x (latitudes) and y (longitudes) for the `SAsia_Merged.shp`. I then used these values to create a bounding box that defines square shape. The bounding box is a rectangular box with sides aligned with the x and y axes, created by specifying the minimum and maximum values of both the x and y dimensions. The use of a bounding box allows me to efficiently filter the data from GEE, allowing for faster processing times

In [13]:
base_dir = os.getcwd()
vector_file_path = os.path.join("AdminBound", "SAsia_Merged.shp")
countries_interested = ["Pakistan", "Nepal", "Sri Lanka", "India", "Bhutan", "Bangladesh"]

vector_data = gpd.read_file(vector_file_path)
vector_bound_coordinates = vector_data.bounds
minX, minY, maxX, maxY = vector_bound_coordinates.values[0][0], vector_bound_coordinates.values[0][1], vector_bound_coordinates.values[0][2], vector_bound_coordinates.values[0][3]

for item in vector_bound_coordinates.values:
    if minX > item[0]: minX = item[0]
    if minY > item[1]: minY = item[1]
    if maxX < item[2]: maxX = item[2]
    if maxY < item[3]: maxY = item[3]

area_of_interest = ee.Geometry.Rectangle([[minX-1, minY-1],[maxX+1, maxY+1]])


To visually display and validate if the created rectangular area covers my AOI, I can create an interactive map using geemap. The map will show the blue area covered by the rectangle and the red polygons that represent the country areas obtained from the GEE feature collection USDOS/LSIB_SIMPLE/2017.

The purpose of this map is to provide a visual representation of the selected rectangular area and to ensure that it encompasses the countries of Sri Lanka, India, Pakistan, Nepal, Bhutan, and Bangladesh. The red polygons represent the actual country boundaries, providing a frame of reference to ensure that our rectangular selection aligns with the desired area of interest.

By using this map, I can visually confirm that the rectangular area I have selected includes the targeted countries and exclude any unwanted areas. This process helps ensure that the data we collect from GEE is accurate and relevant for my analysis.

In [3]:
Map = geemap.Map(zoom=1, lite_mode=False)

area_of_interest = ee.Geometry.Rectangle([[60, 37],[98, 5]])

Map.addLayer(area_of_interest, {"color":"blue"}, "Rect 1")
Map.center_object(area_of_interest)

for country in countries_interested:

    ca_geom = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017").filter(ee.Filter.eq('country_na', country)).geometry()
    Map.addLayer(ca_geom, {"color":"red"}, country)

Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

GEE provides access to two night light emission image collections, which were collected by sensors mounted on two different satellites. The first collection is the `VIIRS` (Visible Infrared Imaging Radiometer Suite) satellite night light emission image collection, with data available from January 1st, 1992 to January 1st, 2014. The second collection is the `CCNL` (Climate Change Initiative Combined Nighttime Lights) satellite night light emission image collection, with data available from January 1st, 2014 to June 1st, 2022. Each collection has a different range of minimum and maximum values. This means that the night light data from the two collections cannot be directly compared without first normalizing them into a consistent range.

In [3]:
VIIRS = ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG') 
CCNL = ee.ImageCollection('BNU/FGS/CCNL/v1')

To refine our dataset further, I have filtered the GEE night light emission image collections for each year and extracted the cloud-free coverage from VIIRS and band 1 from CCNL.

In [8]:
def filterCollection(collection, band=None, geometry=None, timeYear=[1992,2014], timeMonths=[(1,6), (6,12)]):
    FILTERED_DATASET = {}
    print("Filtering...")
    for YEAR in range(timeYear[0],timeYear[1]):
        print(YEAR ,end='')
        
        for MONTH in timeMonths:
            print(".", end="")
            
            if band:
                collection = collection.select(band)
            if geometry:
                collection = collection.filterBounds(geometry)
            
            collection.filter(ee.Filter.calendarRange(YEAR, YEAR,'year'))
            collection.filter(ee.Filter.calendarRange(MONTH[0], MONTH[1],'month'))
            
            if YEAR in FILTERED_DATASET:
                FILTERED_DATASET[YEAR].append(collection)
            else:
                FILTERED_DATASET[YEAR] = [collection,]
    return FILTERED_DATASET

In [6]:
# For testing
url = 'https://tiles.stadiamaps.com/tiles/alidade_smooth_dark/{z}/{x}/{y}{r}.png'

Map = geemap.Map(zoom=8, lite_mode=False)
Map.add_tile_layer(url, name='AlidadeSmoothDark', attribution='&copy; <a href="https://stadiamaps.com/">Stadia Maps</a>, &copy; <a href="https://openmaptiles.org/">OpenMapTiles</a> &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors')
attributes = {'palette': ['000004', '160b39', '400a67', '69166e', '902568', 'd94d3d', 'f1711f', 'fb9d07', 'f8cd37', 'fcffa4']}

for year, data in dataset_ccnl_filtered.items():
    Map.addLayer(data, attributes, str(year) + " CCNL")
    break

for year, data in dataset_viirs_filtered.items():
    Map.addLayer(data, attributes, str(year) + " VIIRS")
    break

Map.center_object(area_of_interest)
Map

NameError: name 'dataset_ccnl_filtered' is not defined

One way to export filtered data sets to Google Drive is by using the following code to export them as GeoTiff files:
```python

for year, data in dataset.items():
    geemap.ee_export_image_to_drive(data, fileNamePrefix="nle_" + str(year), region=area_of_interest, folder='export_nle')
```

However, exporting large GeoTiff files can be time-consuming. An alternative approach to this is to export the data sets as numpy arrays, which is a faster process.

In [4]:
def export_npy(collectionDict, featureCol=None, outfile="file", outdir="output", band='b1', timeMonths=[(1,6), (6,12)]):

    if not os.path.isdir(outdir):
        os.mkdir(outdir)
        
    if not featureCol:
        print("feature collection is required")
        return
    
    geometry = featureCol.geometry() 

    for key, val in collectionDict.items():
        

        for month in range(len(val)):
            imgCol = val[month]
            img = imgCol.mean()

            dl_para = {
                'bands': ['b1'],
                'scale': 2000,
                'format': 'NPY',
                'region': geometry,
            }
                       
            
            months_str = str(timeMonths[month][0]) + "-" + str(timeMonths[month][1])
            print(f"{key} [{months_str}]", end=" ")
            print(img.getDownloadUrl(dl_para))
            
            response = requests.get(img.getDownloadUrl(dl_para))
            _output_ = os.path.join(output_folder, outfile + "_" + str(year) + "_" + months_str + ".npy")
            open(_output_, "wb").write(response.content)


In [None]:
def export_stat(collectionDict, featureCol=None, outfile="file", outdir="output", timeMonths=[(1,6), (6,12)]):
    


    print("\nExporting Stat...")
    result = {"counry":[], "year":[], "month":[], "min":[], "max":[], "mean":[]}
    if not os.path.isdir(outdir):
        os.mkdir(outdir)
        
    if not featureCol:
        print("feature collection is required")
        return
    
    geometry = featureCol.geometry()
    ca_name = ca_geom.first().get("country_na").getInfo()

    for key, val in collectionDict.items():
        print(key, end="")
        for month in range(len(val)):
            imgCol = val[month]
            img = imgCol.mean()
            
            month_ = f"{timeMonths[month][0]}-{timeMonths[month][1]}"
            
            # reduced = img.reduceRegion(reducer=ee.Reducer.min(), geometry=featureCol.geometry(), scale=1000)
            # min_ = ee.Feature(None, reduced).get("b1").getInfo()
        
            reducers = ee.Reducer.min().combine(
                reducer2=ee.Reducer.max(),
                sharedInputs=True
            ).combine(
                reducer2=ee.Reducer.mean(),
                sharedInputs=True
            )

            # Use the combined reducer to get the mean and SD of the image.
            stats = img.reduceRegion(
                reducer=reducers,
                bestEffort=True,
                geometry=featureCol.geometry(),
                scale=1000
            )
            
            result["counry"].append(ca_name)
            result["year"].append(key)
            result["month"].append(month_)
            result["min"].append(stats.get("b1_min").getInfo())
            result["max"].append(stats.get("b1_max").getInfo())
            result["mean"].append(stats.get("b1_mean").getInfo())
            
            print(".", end="")
            
    return result

In [6]:
def export(caName, imgCol, band):
    try:
        ca_geom = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017").filter(ee.Filter.eq('country_na', caName)) 
        data = filterCollection(imgCol, band, ca_geom)
        stat = export_stat(data, ca_geom, caName)
        df = ps.DataFrame(stat)
        file = os.path.join(base_dir, "output", ca+".csv")
        df.to_csv(ca, index=True)
    except Exception as e:
        print(e)

In [12]:
for ca in countries_interested:
    print(ca)
    ca_geom = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017").filter(ee.Filter.eq('country_na', ca))            
    
    data = filterCollection(CCNL, 'b1', ca_geom)
    # data = filterCollection(VIIRS, 'cf_cvg', ca_geom)
    
    stat = export_stat(data, ca_geom, ca)
    
    df = ps.DataFrame(stat)
    file = os.path.join(base_dir, "output", ca+".csv")
    df.to_csv(file, index=True)

    # print(ne_stat)
    # break

Bangladesh
Filtering...
1992..1993..1994..1995..1996..1997..1998..1999..2000..2001..2002..2003..2004..2005..2006..2007..2008..2009..2010..2011..2012..2013..Exporting Stat...
1992..1993..1994..1995..1996..1997..1998..1999..2000..2001..2002..2003..2004..2005..2006..2007..2008..2009..2010..2011..2012..2013..

In [None]:
import sys, os
import zipfile
import requests
from multiprocessing import Pool, cpu_count
from functools import partial
from io import BytesIO
import concurrent.futures 

print("There are {} CPUs on this machine ".format(cpu_count()))
pool = Pool(cpu_count())

download_func = partial(export, imgCol=CCNL, band='b1')
results = pool.map(download_func, countries_interested)
pool.close()
pool.join()


There are 8 CPUs on this machine 
