### Giga Spatial - Hackathon Sample Notebook Document
 
 **Objective**: Load and process data from multiple sources with different resolutions for a CCRI-DRM country. This notebook demonstrates how to:

 - Load administrative boundaries from UNICEF GeoRepo
 - Fetch school location data from Giga API 
 - Process GHSL datasets for population and built-up area analysis
 - Access MODIS land surface temperature data
 - Generate zonal statistics by combining multiple data sources
 - Visualize results and geodataframes

 **Countries**: Rwanda and Kenya

# Imports

In [1]:
import sys, os

In [2]:
!{sys.executable} -m pip install giga-spatial --upgrade


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:

### Not needed with pip install - path to giga-spatial if you cloned it and are going to extend it
sys.path.append("path_to_giga_spatial/")
###
from dotenv import load_dotenv
load_dotenv()

False

In [4]:
from gigaspatial.config import config
from gigaspatial.handlers import AdminBoundaries, GigaSchoolLocationFetcher, GHSLDataHandler
from gigaspatial.generators import GeometryBasedZonalViewGenerator
from gigaspatial.processing import TifProcessor

# Example 1 - Map GHSL built_s and smod

In [5]:
country_code = "TJK"

## Our zones will be admin 2's

In [6]:
# Needs Georepo API key, otherwise GADM 
admin2_data = AdminBoundaries.create(
        country_code=country_code, admin_level=2
    ).to_geodataframe()

INFO        AdminBoundaries 2025-06-16 14:50:39,339       : Creating AdminBoundaries instance. Country: TJK, admin level: 2, data_store provided: False, path provided: False
INFO        AdminBoundaries 2025-06-16 14:50:39,340       : Loading GADM data for country: TJK, admin level: 2 from URL: https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_TJK_2.json


URLError: <urlopen error [Errno 54] Connection reset by peer>

In [None]:
admin2_data.head()

## Our view generator will be zonal

In [None]:
view_gen = GeometryBasedZonalViewGenerator(
    zone_data = admin2_data,
    zone_id_column = "id",
    zone_data_crs = admin2_data.crs)

## We can now map ghsl built surface and smod

In [None]:
# we sum by defaul t
view_gen.map_built_s()

In [None]:
# We do median by default
view_gen.map_smod(year=2020, resolution=1000)

## We can now check our mapped data

In [None]:
view_gen.zone_gdf.head()

In [None]:
!{sys.executable} -m pip install folium matplotlib mapclassify

In [None]:
view_gen.zone_gdf.explore('built_surface_m2_sum')

# Example 2 - get schools and map them

## Create a giga school fetcher and fetch

In [None]:
gslf = GigaSchoolLocationFetcher(country=country_code) # Needs giga API key
df_schools = gslf.fetch_locations()

## Map using base class function

In [None]:
result = view_gen.map_points(df_schools)
view_gen._zone_gdf["school_count"] = view_gen.zone_gdf.index.map(result)

### Or Extend ZonalViewGenerator

In [None]:
class GeometryBasedZonalViewGenerator(ZonalViewGenerator[T]):
    ...

    def map_school_counts(self,fetcher):
        # Add logs and error control
        df_schools = fetcher.fetch_locations()
        result = view_gen.map_points(df_schools)
        self._zone_gdf["school_count"] = self._zone_gdf.index.map(result)

## Explore the result

In [None]:
view_gen.zone_gdf.head()

In [None]:
view_gen.zone_gdf.explore('school_count')

# Example 3 - Map population

## Initialize a Handler

In [None]:
ghs_pop_handler = GHSLDataHandler(product='GHS_POP')

## Map using geometry class function

In [None]:
view_gen.map_ghsl(
            handler=ghs_pop_handler, stat="sum", name_prefix="ghsl_pop_"
        )

### Or extend geometry class to add a new map_ghs_pop() function

## Explore

In [None]:
view_gen.zone_gdf.head()

In [None]:
view_gen.zone_gdf.explore('ghsl_pop_sum')

In [None]:
view_gen.zone_gdf['ccri_index'] = view_gen.zone_gdf["built_surface_m2_sum"]	+ view_gen.zone_gdf["smod_class_median"] + view_gen.zone_gdf["school_count"] + view_gen.zone_gdf["ghsl_pop_sum"]

In [None]:
view_gen.zone_gdf.explore('ccri_index')

# Example 4 - Time Series

### Example using Modis Dataset from Kenya
#### Ken_MODIS_LST_Day.tif
##### https://opensource.unicc.org/open-source-united-initiative/un-tech-over/challenge-2/ge-puzzle-challenge2-datasets/-/tree/main/Kenya?ref_type=heads


### This is a stacked tiff file, with one band for each day of the year

In [None]:
tf = TifProcessor(dataset_path="./Ken_MODIS_LST_Day.tif",mode='multi')

In [None]:
df_tf = tf.to_dataframe()

In [None]:
df_tf.head()

#### Although this has a certain resolution, we will treat it as points here and deal with map_points function for simplicity,
#### but you could use/extend map_raster or create a zoned geodataframe and use map_polygons

### Let's get the admins and view generator for Kenya now

In [None]:
admin2_data_KEN = AdminBoundaries.create(
        country_code="KEN", admin_level=2
    ).to_geodataframe()

In [None]:
view_gen_KEN = GeometryBasedZonalViewGenerator(
    zone_data = admin2_data_KEN,
    zone_id_column = "id",
    zone_data_crs = admin2_data_KEN.crs)

### Now we can average each daily value on the admin 2 levels

In [None]:
### Let's just do 3 to illustrate
mean_result = view_gen_KEN.map_points(points=df_tf, 
                                 value_columns=["2015_01_01_LST_Day_1km", "2015_01_02_LST_Day_1km","2015_01_03_LST_Day_1km"],
                                 aggregation="mean",
                                 predicate="within",)
view_gen_KEN._zone_gdf["2015_01_01_LST_Day_1km_mean"] = view_gen_KEN.zone_gdf.index.map(mean_result["2015_01_01_LST_Day_1km"])
view_gen_KEN._zone_gdf["2015_01_02_LST_Day_1km_mean"] = view_gen_KEN.zone_gdf.index.map(mean_result["2015_01_02_LST_Day_1km"])
view_gen_KEN._zone_gdf["2015_01_03_LST_Day_1km_mean"] = view_gen_KEN.zone_gdf.index.map(mean_result["2015_01_03_LST_Day_1km"])

In [None]:
view_gen_KEN.zone_gdf.head()

### There is no implicit Time Series support currently, but you can calculate from your data source and use the result or extend the library!
### For instance, imagine we want to calculate average skewness:

In [None]:
from scipy.stats import skew

time_cols = []
#Let's do a few days only for this
for i in range(1,10):
    time_cols.append(f"2015_01_0{i}_LST_Day_1km")



In [None]:
df_tf['skewness'] = skew(df_tf[time_cols].values, axis=1, bias=False)

#### Now we can map it

In [None]:
skew_result = view_gen_KEN.map_points(points=df_tf, 
                                 value_columns=["skewness"],
                                 aggregation="mean",
                                 predicate="within",)


In [None]:
view_gen_KEN._zone_gdf["skewness_mean"] = view_gen_KEN._zone_gdf.index.map(skew_result["skewness"])

### Explore

In [None]:
view_gen_KEN._zone_gdf.head()

In [None]:
view_gen_KEN.zone_gdf.explore('skewness_mean')

### Core Architecture
Giga Spatial is a geospatial data processing framework with three main components:

- Handlers
    - Purpose: Manage data access and processing for specific datasets (incl. coordinate system management)
    - Key handlers include:
        - GHSLDataHandler: For Global Human Settlement Layer data
        - GoogleOpenBuildingsHandler: For Google's building footprint data
        - MSBuildingsHandler: For Microsoft's building data
        - GigaSchoolLocationFetcher: For school location data
        - AdminBoundaries: For administrative boundary data
- Generators
    - Purpose: Create enriched views of geospatial data (inc spatial aggregation, data enrichment, statistical calculations)
    - Two main types:
        - ZonalViewGenerator: For area-based analysis
            - map_points(): Maps point data to zones with spatial aggregation
            - map_polygons(): Maps polygon data to zones with spatial aggregation
            - map_rasters(): Maps raster data to zones using zonal statistics
            - There are specific methods for mapping specific datasets (map_hgls(), map_google_buildings(), map_ms_buildings())
        - PoiViewGenerator: For point-of-interest analysis (maps geospatial data to POIs)
- Data Store
    - Purpose: Manage data storage and retrieval
    - Supports both local and remote data sources

### Common Issues and Solutions
- Authentication Issues
    - Ensure API keys are properly set
    - Check environment variables
    - Verify credentials
- Data Availability
    - Use ensure_available=True to check data
    - Verify data paths
    - Check data permissions
- Performance Issues
    - Use appropriate data types
    - Consider data resolution
    - Leverage caching