# Irrigation time-series attributes extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the area equipped for irrigation (AEI) between 1900 and 2005 from the Historical Irrigation Dataset (HID).

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* glob
* numpy
* os
* pandas
* rasterio
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/shapefiles/estreams_catchments.shp
* data/irrigation/AEI_EARTHSTAT_IR_{1900, 1910, 1920, 1930, 1940, 1960, 1970, 1980, 1985, 1990, 1995, 2000, 2005}.asc https://mygeohub.org/publications/8 (Last access: 05 December 2023)

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Siebert, S., Kummu, M., Porkka, M., Döll, P., Ramankutty, N., and Scanlon, B. R.: A global data set of the extent of irrigated land from 1900 to 2005, Hydrol. Earth Syst. Sci., 19, 1521–1545, https://doi.org/10.5194/hess-19-1521-2015, 2015.

## Licenses
* CC0 - Creative Commons: https://mygeohub.org/publications/8 (Last access: 06 December 2023)

## Observations

* HID provides the AEI in 8 different products. Here we decided to use the AEI_EARTHSTAT_IR_{} version of available, which was the version used in HydroAtlas (yet only for year 2005) and other studies. 

# Import modules

In [1]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm
import glob
import rasterio
from rasterio.mask import geometry_mask
from rasterio.warp import calculate_default_transform

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.


In [3]:
# Non-editable variables:
PATH_OUTPUT = "results/timeseries/irrigation/"

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [4]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.head()

Unnamed: 0,id,area_km2,outlet_lat,outlet_lng,name,area_offic,layer,path,Code,basin_id,area_calc,geometry
0,FR003159,37,47.488,7.393,A100003001,38.6,FR003159,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003159,FR003159,37.183,"POLYGON ((7.30374 47.49375, 7.30708 47.49375, ..."
1,FR003160,227,47.626,7.239,A105003001,233.0,FR003160,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003160,FR003160,226.962,"POLYGON ((7.22291 47.63458, 7.22374 47.63458, ..."
2,FR003161,14,47.586,7.384,A106000101,15.0,FR003161,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003161,FR003161,13.595,"POLYGON ((7.38791 47.59041, 7.39874 47.59041, ..."
3,FR003162,70,47.622,7.275,A107020001,70.0,FR003162,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003162,FR003162,70.152,"POLYGON ((7.28375 47.60958, 7.28291 47.60958, ..."
4,FR003163,330,47.653,7.265,A108003001,325.0,FR003163,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003163,FR003163,330.158,"POLYGON ((7.22958 47.65291, 7.23208 47.65291, ..."


In [5]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 1972


## AEI files

In [6]:
filenames =['data/irrigation/AEI_EARTHSTAT_IR_1900.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1910.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1920.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1930.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1940.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1950.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1960.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1970.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1980.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1985.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1990.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1995.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_2000.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_2005.asc']

## Computation processes

In [7]:
# Initialize an empty DataFrame to store the results
irrigation_attributes_df = pd.DataFrame()

prefix_values = ["1900", "1910", "1920", "1930", "1940",
                "1950", "1960", "1970", "1980", "1985", 
                 "1990", "1995", "2000", "2005"]

# Define the CRS for EPSG:4326 (WGS 84) (same as the boundaries shapefile)
crs = 'EPSG:4326'

i = 0

for filename in filenames:
    
    # Open the ASC file to read metadata
    with rasterio.open(filename) as src:
        # Reproject the data to WGS84
        transform, width, height = calculate_default_transform(
            src.crs, crs, src.width, src.height, *src.bounds)

        kwargs = src.meta.copy()
        kwargs.update({
            'crs': crs,
            'transform': transform,
            'width': width,
            'height': height
        })

        # Create an empty list to store the sum
        mean_values = []

        for idx, geom in tqdm.tqdm(catchment_boundaries.iterrows()):
            # Check if the geometry is empty or invalid
            if geom['geometry'] is None or geom['geometry'].is_empty or not geom['geometry'].is_valid:
                avg_value = np.nan
            else:
                # Create a mask for the geometry
                mask = geometry_mask([geom['geometry']], out_shape=(height, width), transform=transform, invert=True)

                # Read the values within the geometry from the raster
                data = src.read(1, masked=True)
                values = data[mask]

            # Calculate statistics only if there are valid values in the 'values' array
            if len(values) > 0:
                
                avg_value = np.sum(values)
            else:
                # Handle the case when there are no valid values (e.g., by setting them to NaN or a specific value)
                avg_value = np.nan
                
            # Append the mean value to the list
            mean_values.append(avg_value)

            
    # Create a DataFrame to store the results for this file
    col_name = prefix_values[i]
    data = {
        'basin_id': catchment_boundaries['basin_id'],
        col_name: mean_values,
    }
    results_df = pd.DataFrame(data)
    results_df.set_index("basin_id", inplace=True)
    #results_df = results_df.add_prefix(prefix_values[i])

    # Concatenate the results with the final DataFrame
    irrigation_attributes_df = pd.concat([irrigation_attributes_df, results_df], axis=1)
    i = i + 1
    
irrigation_attributes_df = irrigation_attributes_df.T*0.01

1972it [00:44, 43.86it/s]
1972it [00:44, 44.36it/s]
1972it [00:45, 43.82it/s]
1972it [00:44, 44.22it/s]
1972it [00:45, 43.65it/s]
1972it [00:45, 43.60it/s]
1972it [00:48, 40.89it/s]
1972it [00:46, 42.70it/s]
1972it [00:45, 43.71it/s]
1972it [00:44, 44.18it/s]
1972it [00:44, 44.03it/s]
1972it [00:45, 43.56it/s]
1972it [00:44, 44.15it/s]
1972it [00:44, 44.04it/s]


In [8]:
# We set the index's name to date
irrigation_attributes_df.index.name = "date"
irrigation_attributes_df

basin_id,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317,HR000298
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1910,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1920,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1930,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1940,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1950,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1960,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1970,0.0,0.0,,0.0,0.0,,0.0,0.0,0.32556,0.32556,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1980,0.0,0.0,,0.0,0.0,,0.0,0.0,0.24303,0.24303,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1985,0.0,0.0,,0.11293,0.11293,,0.0,0.0,0.658332,0.771262,...,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [9]:
# Here we sort the columns:
irrigation_attributes_df = irrigation_attributes_df.sort_index(axis=1)
irrigation_attributes_df

basin_id,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247219,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1910,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247219,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1920,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247219,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1930,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247219,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1940,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247219,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1950,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247219,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1960,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,0.993804,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1970,0.0,0.0,,0.0,0.0,,0.0,0.0,0.32556,0.32556,...,0.955787,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1980,0.0,0.0,,0.0,0.0,,0.0,0.0,0.24303,0.24303,...,1.081448,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1985,0.0,0.0,,0.11293,0.11293,,0.0,0.0,0.658332,0.771262,...,1.234254,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
# Round the data to 3 decimals
irrigation_attributes_df = irrigation_attributes_df.astype(float).round(3)
irrigation_attributes_df

basin_id,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1910,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1920,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1930,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1940,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1950,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,1.247,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1960,0.0,0.0,,0.0,0.0,,0.0,0.0,0.0,0.0,...,0.994,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1970,0.0,0.0,,0.0,0.0,,0.0,0.0,0.326,0.326,...,0.956,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1980,0.0,0.0,,0.0,0.0,,0.0,0.0,0.243,0.243,...,1.081,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1985,0.0,0.0,,0.113,0.113,,0.0,0.0,0.658,0.771,...,1.234,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Data export

In [11]:
# Export the final dataset:
irrigation_attributes_df.to_csv(PATH_OUTPUT+"estreams_irrigation_yearly.csv")

# End