# Irrigation time-series attributes extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the area equipped for irrigation (AEI) between 1900 and 2005 from the Historical Irrigation Dataset (HID).

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* glob
* numpy
* os
* pandas
* rasterio
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/shapefiles/estreams_catchments.shp
* data/irrigation/AEI_EARTHSTAT_IR_{1900, 1910, 1920, 1930, 1940, 1960, 1970, 1980, 1985, 1990, 1995, 2000, 2005}.asc https://mygeohub.org/publications/8 (Last access: 05 December 2023)

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Siebert, S., Kummu, M., Porkka, M., Döll, P., Ramankutty, N., and Scanlon, B. R.: A global data set of the extent of irrigated land from 1900 to 2005, Hydrol. Earth Syst. Sci., 19, 1521–1545, https://doi.org/10.5194/hess-19-1521-2015, 2015.

## Licenses
* CC0 - Creative Commons: https://mygeohub.org/publications/8 (Last access: 06 December 2023)

## Observations

* HID provides the AEI in 8 different products. Here we decided to use the AEI_EARTHSTAT_IR_{} version of available, which was the version used in HydroAtlas (yet only for year 2005) and other studies. 

# Import modules

In [1]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm
import glob
import rasterio
from rasterio.mask import geometry_mask
from rasterio.warp import calculate_default_transform

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


# Configurations

In [5]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.


In [6]:
# Non-editable variables:
PATH_OUTPUT = "results/timeseries/irrigation/"

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [7]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.head()

Unnamed: 0,id,area_km2,outlet_lat,outlet_lng,name,area_offic,layer,path,area_diff,area_calc,basin_id,geometry
0,HUGR020,9600,46.785,21.142,6444410,9011,HUGR020,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,6.536,9595.794,HUGR020,"POLYGON ((21.13208 46.77291, 21.13208 46.77375..."
1,HUGR021,189000,46.423,18.896,6442080,189538,HUGR021,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-0.284,188597.11,HUGR021,"POLYGON ((18.91708 46.41791, 18.91708 46.41625..."
2,HUGR022,28500,48.126,22.34,6444304,29057,HUGR022,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-1.917,28507.473,HUGR022,"POLYGON ((22.32875 48.10875, 22.32791 48.10875..."
3,HUGR023,188000,46.627,18.869,6442060,189092,HUGR023,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-0.577,188286.167,HUGR023,"POLYGON ((18.89041 46.62875, 18.88875 46.62708..."
4,HUGR025,1210,47.662,19.683,6444240,1222,HUGR025,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-0.982,1206.441,HUGR025,"POLYGON ((19.68124 47.66875, 19.68291 47.66875..."


In [8]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 33


## AEI files

In [9]:
filenames =['data/irrigation/AEI_EARTHSTAT_IR_1900.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1910.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1920.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1930.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1940.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1950.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1960.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1970.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1980.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1985.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1990.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_1995.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_2000.asc',
            'data/irrigation/AEI_EARTHSTAT_IR_2005.asc']

## Computation processes

In [10]:
# Initialize an empty DataFrame to store the results
irrigation_attributes_df = pd.DataFrame()

prefix_values = ["1900", "1910", "1920", "1930", "1940",
                "1950", "1960", "1970", "1980", "1985", 
                 "1990", "1995", "2000", "2005"]

# Define the CRS for EPSG:4326 (WGS 84) (same as the boundaries shapefile)
crs = 'EPSG:4326'

i = 0

for filename in filenames:
    
    # Open the ASC file to read metadata
    with rasterio.open(filename) as src:
        # Reproject the data to WGS84
        transform, width, height = calculate_default_transform(
            src.crs, crs, src.width, src.height, *src.bounds)

        kwargs = src.meta.copy()
        kwargs.update({
            'crs': crs,
            'transform': transform,
            'width': width,
            'height': height
        })

        # Create an empty list to store the sum
        mean_values = []

        for idx, geom in tqdm.tqdm(catchment_boundaries.iterrows()):
            # Check if the geometry is empty or invalid
            if geom['geometry'] is None or geom['geometry'].is_empty or not geom['geometry'].is_valid:
                avg_value = np.nan
            else:
                # Create a mask for the geometry
                mask = geometry_mask([geom['geometry']], out_shape=(height, width), transform=transform, invert=True)

                # Read the values within the geometry from the raster
                data = src.read(1, masked=True)
                values = data[mask]

            # Calculate statistics only if there are valid values in the 'values' array
            if len(values) > 0:
                
                avg_value = np.sum(values)
            else:
                # Handle the case when there are no valid values (e.g., by setting them to NaN or a specific value)
                avg_value = np.nan
                
            # Append the mean value to the list
            mean_values.append(avg_value)

            
    # Create a DataFrame to store the results for this file
    col_name = prefix_values[i]
    data = {
        'basin_id': catchment_boundaries['basin_id'],
        col_name: mean_values,
    }
    results_df = pd.DataFrame(data)
    results_df.set_index("basin_id", inplace=True)
    #results_df = results_df.add_prefix(prefix_values[i])

    # Concatenate the results with the final DataFrame
    irrigation_attributes_df = pd.concat([irrigation_attributes_df, results_df], axis=1)
    i = i + 1
    
irrigation_attributes_df = irrigation_attributes_df.T*0.01

33it [00:01, 32.42it/s]
33it [00:00, 34.14it/s]
33it [00:00, 35.62it/s]
33it [00:01, 32.79it/s]
33it [00:00, 34.30it/s]
33it [00:00, 35.30it/s]
33it [00:00, 34.39it/s]
33it [00:00, 35.04it/s]
33it [00:00, 35.51it/s]
33it [00:00, 35.44it/s]
33it [00:00, 34.68it/s]
33it [00:01, 32.64it/s]
33it [00:00, 34.50it/s]
33it [00:00, 33.84it/s]


In [11]:
# We set the index's name to date
irrigation_attributes_df.index.name = "date"
irrigation_attributes_df

basin_id,HUGR020,HUGR021,HUGR022,HUGR023,HUGR025,HUGR026,HUGR027,HUGR028,HUGR029,HUGR030,...,HUGR044,HUGR045,HUGR046,HUGR047,HUGR048,HUGR049,HUGR050,HUGR051,HUGR019,HUGR024
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,2.392213,1728.132568,2.57518,1728.099609,0.0,0.62784,0.0,7.91669,1717.278564,1.95484,...,14.04947,0.0,11.931682,0.0,2.57518,0.0,0.0,9.316152,3.10337,0.0
1910,2.960787,1737.588867,2.97714,1737.481201,0.0,0.72306,0.0,8.410409,1726.066162,1.95484,...,15.217677,0.0,13.270647,0.0,2.97714,0.0,0.0,9.316152,3.673378,0.0
1920,3.613333,1546.223999,3.36921,1546.043823,0.0,0.81548,0.0,8.88902,1534.053345,1.95484,...,16.419102,0.0,14.670332,0.0,3.36921,0.0,0.0,9.316152,4.295202,0.0
1930,4.25789,1398.677124,3.75128,1398.426514,0.0,0.90518,0.0,9.35333,1385.877808,1.95484,...,17.586609,0.0,16.067944,0.0,3.75128,0.0,0.0,9.316152,4.90031,0.0
1940,4.8898,905.030151,4.12347,904.711853,0.0,0.99142,0.0,9.803329,891.623596,1.95484,...,18.716351,0.0,17.483263,0.0,4.12347,0.0,0.0,9.316152,5.48645,0.0
1950,13.385427,604.084412,8.5832,603.240479,0.0,1.6434,0.0,13.50728,586.297302,4.09996,...,31.713921,0.0,33.295105,0.0,9.26666,0.077307,0.0,13.148263,14.602719,0.0
1960,50.61565,861.20105,24.038,858.846313,0.0,3.4294,0.0,24.60124,831.520142,3.804174,...,86.341537,0.17395,97.602264,0.72155,29.264441,1.421711,0.17844,11.200605,55.979099,1.1464
1970,201.944168,1759.988892,86.955467,1750.747803,11.684799,10.956999,0.0,72.165596,1669.459229,4.845314,...,459.478699,23.912758,436.718781,38.080433,111.121773,13.931211,12.157249,12.41484,347.410309,16.066698
1980,392.168396,2416.093262,210.632156,2403.501465,43.318298,14.601999,0.0,95.1408,2292.379395,6.802728,...,818.743713,52.670593,742.171875,87.001434,262.199127,20.731501,71.972427,15.824328,664.500366,21.271399
1985,456.457184,2450.127197,347.276306,2438.843262,32.081497,13.173,0.0,88.862793,2339.353027,8.057494,...,958.709595,56.285702,779.322937,94.228966,420.51947,18.669899,110.834282,18.352186,820.889648,19.329201


In [12]:
# Here we sort the columns:
irrigation_attributes_df = irrigation_attributes_df.sort_index(axis=1)
irrigation_attributes_df

basin_id,HUGR019,HUGR020,HUGR021,HUGR022,HUGR023,HUGR024,HUGR025,HUGR026,HUGR027,HUGR028,...,HUGR042,HUGR043,HUGR044,HUGR045,HUGR046,HUGR047,HUGR048,HUGR049,HUGR050,HUGR051
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,3.10337,2.392213,1728.132568,2.57518,1728.099609,0.0,0.0,0.62784,0.0,7.91669,...,0.938803,0.0,14.04947,0.0,11.931682,0.0,2.57518,0.0,0.0,9.316152
1910,3.673378,2.960787,1737.588867,2.97714,1737.481201,0.0,0.0,0.72306,0.0,8.410409,...,1.295255,0.0,15.217677,0.0,13.270647,0.0,2.97714,0.0,0.0,9.316152
1920,4.295202,3.613333,1546.223999,3.36921,1546.043823,0.0,0.0,0.81548,0.0,8.88902,...,1.686784,0.0,16.419102,0.0,14.670332,0.0,3.36921,0.0,0.0,9.316152
1930,4.90031,4.25789,1398.677124,3.75128,1398.426514,0.0,0.0,0.90518,0.0,9.35333,...,2.078,0.0,17.586609,0.0,16.067944,0.0,3.75128,0.0,0.0,9.316152
1940,5.48645,4.8898,905.030151,4.12347,904.711853,0.0,0.0,0.99142,0.0,9.803329,...,2.46649,0.0,18.716351,0.0,17.483263,0.0,4.12347,0.0,0.0,9.316152
1950,14.602719,13.385427,604.084412,8.5832,603.240479,0.0,0.0,1.6434,0.0,13.50728,...,9.0639,0.0,31.713921,0.0,33.295105,0.0,9.26666,0.077307,0.0,13.148263
1960,55.979099,50.61565,861.20105,24.038,858.846313,1.1464,0.0,3.4294,0.0,24.60124,...,38.145897,0.0,86.341537,0.17395,97.602264,0.72155,29.264441,1.421711,0.17844,11.200605
1970,347.410309,201.944168,1759.988892,86.955467,1750.747803,16.066698,11.684799,10.956999,0.0,72.165596,...,102.821121,26.535446,459.478699,23.912758,436.718781,38.080433,111.121773,13.931211,12.157249,12.41484
1980,664.500366,392.168396,2416.093262,210.632156,2403.501465,21.271399,43.318298,14.601999,0.0,95.1408,...,217.767044,65.645134,818.743713,52.670593,742.171875,87.001434,262.199127,20.731501,71.972427,15.824328
1985,820.889648,456.457184,2450.127197,347.276306,2438.843262,19.329201,32.081497,13.173,0.0,88.862793,...,285.021332,77.785606,958.709595,56.285702,779.322937,94.228966,420.51947,18.669899,110.834282,18.352186


In [24]:
# Round the data to 3 decimals
irrigation_attributes_df = irrigation_attributes_df.astype(float).round(3)
irrigation_attributes_df

basin_id,HUGR019,HUGR020,HUGR021,HUGR022,HUGR023,HUGR024,HUGR025,HUGR026,HUGR027,HUGR028,...,HUGR042,HUGR043,HUGR044,HUGR045,HUGR046,HUGR047,HUGR048,HUGR049,HUGR050,HUGR051
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,3.103,2.392,1728.133,2.575,1728.1,0.0,0.0,0.628,0.0,7.917,...,0.939,0.0,14.049,0.0,11.932,0.0,2.575,0.0,0.0,9.316
1910,3.673,2.961,1737.589,2.977,1737.481,0.0,0.0,0.723,0.0,8.41,...,1.295,0.0,15.218,0.0,13.271,0.0,2.977,0.0,0.0,9.316
1920,4.295,3.613,1546.224,3.369,1546.044,0.0,0.0,0.815,0.0,8.889,...,1.687,0.0,16.419,0.0,14.67,0.0,3.369,0.0,0.0,9.316
1930,4.9,4.258,1398.677,3.751,1398.426,0.0,0.0,0.905,0.0,9.353,...,2.078,0.0,17.587,0.0,16.068,0.0,3.751,0.0,0.0,9.316
1940,5.486,4.89,905.03,4.123,904.712,0.0,0.0,0.991,0.0,9.803,...,2.466,0.0,18.716,0.0,17.483,0.0,4.123,0.0,0.0,9.316
1950,14.603,13.385,604.084,8.583,603.24,0.0,0.0,1.643,0.0,13.507,...,9.064,0.0,31.714,0.0,33.295,0.0,9.267,0.077,0.0,13.148
1960,55.979,50.616,861.201,24.038,858.846,1.146,0.0,3.429,0.0,24.601,...,38.146,0.0,86.342,0.174,97.602,0.722,29.264,1.422,0.178,11.201
1970,347.41,201.944,1759.989,86.955,1750.748,16.067,11.685,10.957,0.0,72.166,...,102.821,26.535,459.479,23.913,436.719,38.08,111.122,13.931,12.157,12.415
1980,664.5,392.168,2416.093,210.632,2403.502,21.271,43.318,14.602,0.0,95.141,...,217.767,65.645,818.744,52.671,742.172,87.001,262.199,20.732,71.972,15.824
1985,820.89,456.457,2450.127,347.276,2438.843,19.329,32.081,13.173,0.0,88.863,...,285.021,77.786,958.71,56.286,779.323,94.229,420.519,18.67,110.834,18.352


# Data export

In [23]:
# Export the final dataset:
irrigation_attributes_df.to_csv(PATH_OUTPUT+"estreams_irrigation_yearly.csv")

# End