### Code used for reading and converting PCraster files to array and computing monthly streamflow metrics

#### This code is divided in two parts: 

#### (a) The first part provide an automatic read of PCraster files into array, and then organize the data in a time series to be used for further analysis.

The code will read a total of Z .map files, and will reshape each file from a (X, Y) to a (1, X * Y), and will assing each file matrix to a different line. Therefore we will have a time-series matrix of (Z, X * Y) shape. 

Since the full raster matrix is too big, a clone map mask is used to convert just the grids that are from rivers and streams in the map, reducing the number of columns. 

#### (b) Additinally, this code provide the automatic calculation of the Monthly Hydrological Indicators for river streamflow. It follows the methodology proposed by Pumo et al. (2018), which is an addaptation of the methodlogy proposed for daily streamflow by Richter et al. (1996).

In total there are 22 individual indicators, 5 group indices (MI-HRA), and one Global indice (GMI-HRA).

References: Pumo, D., Francipane, A., Cannarozzo, M., Antinoro, C., Noto, L.V., 2018. Monthly hydrological indicators to assess possible alterations on rivers' flow regime. Water Resour. Manag.
32, 3687–3706. https://doi.org/10.1007/s11269-018-2013-6.

Richter, B.D., Baumgartner, J.V., Powell, J., Braun, D.P., 1996. A method for assessing hydrologic alteration within ecosystems. Conserv. Biol. 10, 1163–1174. https://doi.org/10.1046/j.1523-1739.1996.10041163.x.

Developed by: Thiago Victor Medeiros do Nascimento

In [1]:
from pcraster import *
import numpy as np
from osgeo import gdal, gdalconst
from osgeo import gdal_array
from osgeo import osr
import matplotlib.pylab as plt
import subprocess
import glob,os
import time
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

### (a) Reading and converting PCraster files to array:

In [2]:
path =r'D:\pythonDATA\AquadaptTemez\rastermapaccum'
filenames = glob.glob(path + "/*.map")
len(filenames)

1032

In [3]:
# time series matrix 
mapfile = filenames[0]

RasterLayer = gdal.Open(mapfile)

ncols = RasterLayer.RasterXSize
nrows = RasterLayer.RasterYSize

# about 50 arrays
numtotal = nrows*ncols
numtotal

48694800

We cannot process all the data efficiently, and besides we are interested mainly on data in the river cells, therefore we can use a clone map with a filter. This map has 1 as river cells and 0 as non-river cells. Therefore, the processing will be optimizes solely for the river cells. 

In [4]:
pathfilter =r'C:\Users\User\OneDrive\IST\RESEARCH\python\flowindicatorsmap\rivernetworkabove5000mmclipped.map'
mapfilter = readmap(pathfilter)
Rastermapfilter = gdal.Open(pathfilter)
mapfilterarray = pcr_as_numpy(mapfilter)
mapfilterarray
mapfilterarray[mapfilterarray < 1 ] = np.nan

newnumtotal = np.count_nonzero(~np.isnan(mapfilterarray))

# We need additionally to reshape our filter:
mapfilterarrayres = np.reshape(mapfilterarray, (1, numtotal))
mapfilterarrayres

array([[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)

In [5]:
print("We initially had a total of:", numtotal, "cells, however only", newnumtotal, "refers indeed for river cells")

We initially had a total of: 48694800 cells, however only 2301110 refers indeed for river cells


In [6]:
# We create an array with the total number of non NaNs as columns and the total months as rows:
runofftsarray = np.zeros((len(filenames),newnumtotal),dtype=np.float32)

This loop will read each month (.map file) transform it in a array, and exctract solely the river cells data for our runofftsarray:

In [7]:
start = time.time()
for mapfile in filenames:

    
    namewithmap = os.path.basename(mapfile)
    namemap = namewithmap.replace("accum.map", "")
    namemap = namemap.replace("T", "")
    
    # The files are not organized in order when they are read, therefore we will make a way to write each line in the correct line of the geral
    namemapint = int(namemap) - 1 
    
     
    mapreadarray = pcr_as_numpy(readmap(mapfile))
    mapreadarrayres = np.reshape(mapreadarray, (1, numtotal))
    
    
    
    runofftsarray[namemapint,:] = mapreadarrayres[~np.isnan(mapfilterarrayres)]
end = time.time()
print(end - start)

KeyboardInterrupt: 

Finally, the time-series matrix is saved as a CSV file to be further analysed

In [145]:
#np.savetxt(r'D:\pythonDATA\AquadaptTemez\runofftsarray.csv', runofftsarray, delimiter=',')

As the raster too big, the optimal way to deal with is istead of saving and reading it, just keep working

### (b) Individual indicators computation for each statistical group:

At this part we may proceed with the further computations of the monthly indicators

In [9]:
runoffdataarray = runofftsarray
#runoffdataarray = runofftsarray[:,0:100000]

In [10]:
runofftotal = pd.DataFrame(index = pd.date_range('10-01-1930','09-30-2016', freq='M'), data = runoffdataarray, dtype=np.float32)
runofftotal

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
1930-10-31,186.280884,724.408447,522.880615,442.437469,361.994293,454.240265,532.276794,386.615234,845.388062,402.320862,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1930-11-30,277.891907,941.554016,680.941528,576.181274,471.421051,572.416748,645.340027,487.380188,1097.728271,523.479187,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1930-12-31,1004.007446,2365.947510,1703.937866,1441.793579,1179.649292,1346.724121,1400.728027,1087.877075,2764.150635,1312.383667,...,34462.714844,75.389893,34462.714844,75.389893,34462.714844,75.389893,34462.714844,34462.714844,34462.714844,34462.714844
1931-01-31,782.301697,1517.583496,1095.134766,926.652527,758.170227,854.408875,872.405029,711.804626,1771.239136,842.722168,...,41169.398438,94.856750,41169.398438,94.856750,41169.398438,94.856750,41169.398438,41169.398438,41169.398438,41169.398438
1931-02-28,873.085144,1704.448364,1234.111450,1044.248169,854.384888,953.682129,960.230469,751.570557,1986.002319,948.236145,...,298798.875000,304.492462,298798.875000,304.492462,298798.875000,304.492462,298798.875000,298798.875000,298798.875000,298798.875000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016-05-31,1153.784302,1914.100098,1376.102539,1164.394531,952.686462,1054.996704,1049.680908,839.829163,2238.207764,1060.722412,...,47012.652344,982.653320,47163.933594,985.426697,47189.765625,1085.521484,47197.156250,47223.027344,47230.261719,47237.339844
2016-06-30,599.007019,686.304871,485.986938,411.219727,336.452515,389.023315,411.804138,331.952667,808.505615,377.186066,...,41475.972656,1127.956177,41649.402344,1130.938110,41678.976562,1240.630737,41687.437500,41717.054688,41725.347656,41733.472656
2016-07-31,388.932617,449.226196,318.050781,269.119873,220.188995,254.934128,270.353638,218.098129,529.258667,246.866470,...,20596.419922,560.126465,20682.544922,561.607239,20697.232422,616.078979,20701.435547,20716.144531,20720.263672,20724.298828
2016-08-31,247.994385,286.439270,202.798126,171.598419,140.398712,162.772308,172.932938,139.298447,337.470215,157.409012,...,10227.871094,278.150543,10270.637695,278.885864,10277.930664,305.935730,10280.016602,10287.320312,10289.365234,10291.369141


In [11]:
runofftotal["month"] = pd.date_range('10-01-1930','09-30-2016', freq='M').month

In [12]:
datanatural = runofftotal.loc['10-01-1934':'10-01-1990']
datamodified = runofftotal.loc['10-01-1990':'10-01-2016']

In [13]:
# Firstly one function is defined for the computation of the generic k-th indicator of hydrological alteration:
def pik(Xn25ik, Xn75ik, Xpik):
    if (Xpik >= Xn25ik) and (Xpik <= Xn75ik):
        result = 0
    else:
        if Xn75ik == Xn25ik:
            result = 0
        else:
            result = min(abs((Xpik - Xn25ik)/(Xn75ik - Xn25ik)), abs((Xpik - Xn75ik)/(Xn75ik - Xn25ik)))   
    return np.float16(result)

In [27]:
# The number of grids is computed
numstationsused = datanatural.shape[1]-1
numstationsused

2301110

#### Magnitude timing (Group 1):

In [482]:
# A table to be filled with the p1s is created and filled with NaNs:

p1s = pd.DataFrame(index = range(1,13), data=np.zeros((12,numstationsused)))
p1s.iloc[:,:] = np.nan
p1s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
6,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,
10,,,,,,,,,,,...,,,,,,,,,,


A loop is made to compute each indicator. Some metrics are computed inside the loop to prevent a memory usage error and to save time.

In [483]:
start = time.time()
#for numstations in range(datanatural.shape[1]-1):
for j in range(1000):
    for k in range(12):

        
        MedianMonthlyStreamflow = datamodified.iloc[:,[j,-1]].groupby('month').median()
        Quantile25MonthlyStreamflow = datanatural.iloc[:,[j,-1]].groupby('month').quantile(q=0.25)
        Quantile75MonthlyStreamflow = datanatural.iloc[:,[j,-1]].groupby('month').quantile(q=0.75)
        
        
        p1s.iloc[k,j] = pik(Quantile25MonthlyStreamflow.iloc[k,0], Quantile75MonthlyStreamflow.iloc[k,0], MedianMonthlyStreamflow.iloc[k,0])

end = time.time()
print(end - start)

42.55244827270508


In [484]:
p1s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
8,0.0,0.207275,0.185181,0.185181,0.185181,0.105713,0.024521,0.0,0.211426,0.195557,...,,,,,,,,,,
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
10,0.153809,0.161255,0.15332,0.15332,0.15332,0.180176,0.213257,0.156616,0.165283,0.15686,...,,,,,,,,,,


In [485]:
MIhra1 = p1s.mean()
MIhra1

0          0.012817
1          0.030711
2          0.028208
3          0.028208
4          0.028208
             ...   
2301105         NaN
2301106         NaN
2301107         NaN
2301108         NaN
2301109         NaN
Length: 2301110, dtype: float64

In [486]:
np.nanmax(MIhra1)
#(((end - start)*(numstationsused/1000))/60)/60

0.04920768737792969

#### Magnitude duration (Group 2):

In [487]:
# This is an empty table to be filled with the indicators for group 2:
p2s = pd.DataFrame(index = range(4), columns = range(0,numstationsused), data=np.nan)
p2s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,


In [488]:
#for numstations in range(numstationsused):
start = time.time()

for j in range(1000):
    
    
    Months3flownat = pd.DataFrame(data = datanatural.iloc[:,j].resample('3M',closed='left').sum())
    Months3flowmod = pd.DataFrame(data = datamodified.iloc[:,j].resample('3M',closed='left').sum())
    
    Months6flownat = pd.DataFrame(data = datanatural.iloc[:,j].resample('6M',closed='left').sum())
    Months6flowmod = pd.DataFrame(data = datamodified.iloc[:,j].resample('6M',closed='left').sum())
    
    
    # Statistics for 3-months:
    # Computation of the water years for natural conditions:
    Months3flownat["datetime"] = Months3flownat.index
    Months3flownat['water_year'] = Months3flownat.datetime.dt.year.where(Months3flownat.datetime.dt.month < 10, Months3flownat.datetime.dt.year + 1)
    # Correction of a small bug in the resample:
    Months3flownat['water_year'] = Months3flownat.water_year.where(Months3flownat.datetime.dt.month != 10, Months3flownat.water_year - 1)
    Months3flownat.drop(columns=['datetime'], inplace = True)


    # Computation of the water years for modified conditions:
    Months3flowmod["datetime"] = Months3flowmod.index
    Months3flowmod['water_year'] = Months3flowmod.datetime.dt.year.where(Months3flowmod.datetime.dt.month < 10, Months3flowmod.datetime.dt.year + 1)
    # Correction of a small bug in the resample:
    Months3flowmod['water_year'] = Months3flowmod.water_year.where(Months3flowmod.datetime.dt.month != 10, Months3flowmod.water_year - 1)
    Months3flowmod.drop(columns=['datetime'], inplace = True)

    # Minimum and maximum for 3-months:
    AnnualMin3MonthsFlownat = Months3flownat.groupby('water_year',dropna=False).min()
    AnnualMax3MonthsFlownat = Months3flownat.groupby('water_year',dropna=False).max()

    AnnualMin3MonthsFlowmod = Months3flowmod.groupby('water_year',dropna=False).min()
    AnnualMax3MonthsFlowmod = Months3flowmod.groupby('water_year',dropna=False).max()
    
    
    # Statistics for 6-months:
    # Computation of the water years for natural conditions:
    Months6flownat["datetime"] = Months6flownat.index
    Months6flownat['water_year'] = Months6flownat.datetime.dt.year.where(Months6flownat.datetime.dt.month < 10, Months6flownat.datetime.dt.year + 1)
    # Correction of a small bug in the resample:
    Months6flownat['water_year'] = Months6flownat.water_year.where(Months6flownat.datetime.dt.month != 10, Months6flownat.water_year - 1)
    Months6flownat.drop(columns=['datetime'], inplace = True)

    # Computation of the water years for modified conditions:
    Months6flowmod["datetime"] = Months6flowmod.index
    Months6flowmod['water_year'] = Months6flowmod.datetime.dt.year.where(Months6flowmod.datetime.dt.month < 10, Months6flowmod.datetime.dt.year + 1)
    # Correction of a small bug in the resample:
    Months6flowmod['water_year'] = Months6flowmod.water_year.where(Months6flowmod.datetime.dt.month != 10, Months6flowmod.water_year - 1)
    Months6flowmod.drop(columns=['datetime'], inplace = True)

    # Minimum and maximum for 6-months:
    AnnualMin6MonthsFlownat = Months6flownat.groupby('water_year',dropna=False).min()
    AnnualMax6MonthsFlownat = Months6flownat.groupby('water_year',dropna=False).max()

    AnnualMin6MonthsFlowmod = Months6flowmod.groupby('water_year',dropna=False).min()
    AnnualMax6MonthsFlowmod = Months6flowmod.groupby('water_year',dropna=False).max()
    
    # Computation of the median and quantiles:
    ## First empty data frames for each case are made:
    MedianAnnualMinMax2 = pd.DataFrame(index = ["Min3months","Max3months","Min6months","Max6months"], columns = AnnualMax6MonthsFlownat.columns)
    Quantile25AnnualMinMax2 = pd.DataFrame(index = ["Min3months","Max3months","Min6months","Max6months"], columns = AnnualMax6MonthsFlownat.columns)
    Quantile75AnnualMinMax2 = pd.DataFrame(index = ["Min3months","Max3months","Min6months","Max6months"], columns = AnnualMax6MonthsFlownat.columns)

    # The median is computed only for the modified streamflow:
    MedianAnnualMinMax2.iloc[0,:] = AnnualMin3MonthsFlowmod.median()
    MedianAnnualMinMax2.iloc[1,:] = AnnualMax3MonthsFlowmod.median()
    MedianAnnualMinMax2.iloc[2,:] = AnnualMin6MonthsFlowmod.median()
    MedianAnnualMinMax2.iloc[3,:] = AnnualMax6MonthsFlowmod.median()

    # The quantiles are computed only for the natural streamflow:
    Quantile25AnnualMinMax2.iloc[0,:] = AnnualMin3MonthsFlownat.quantile(q=0.25)
    Quantile25AnnualMinMax2.iloc[1,:] = AnnualMax3MonthsFlownat.quantile(q=0.25)
    Quantile25AnnualMinMax2.iloc[2,:] = AnnualMin6MonthsFlownat.quantile(q=0.25)
    Quantile25AnnualMinMax2.iloc[3,:] = AnnualMax6MonthsFlownat.quantile(q=0.25)

    Quantile75AnnualMinMax2.iloc[0,:] = AnnualMin3MonthsFlownat.quantile(q=0.75)
    Quantile75AnnualMinMax2.iloc[1,:] = AnnualMax3MonthsFlownat.quantile(q=0.75)
    Quantile75AnnualMinMax2.iloc[2,:] = AnnualMin6MonthsFlownat.quantile(q=0.75)
    Quantile75AnnualMinMax2.iloc[3,:] = AnnualMax6MonthsFlownat.quantile(q=0.75)
    
    
    
    
    
    
    for k in range(4):
        p2s.iloc[k,j] = pik(Quantile25AnnualMinMax2.iloc[k,0], Quantile75AnnualMinMax2.iloc[k,0], MedianAnnualMinMax2.iloc[k,0])

end = time.time()
print(end - start)

40.84799885749817


In [489]:
# Computation of the Group index (Group 2):
MIhra2 = p2s.mean()
MIhra2

0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
          ... 
2301105    NaN
2301106    NaN
2301107    NaN
2301108    NaN
2301109    NaN
Length: 2301110, dtype: float64

In [490]:
np.nanmax(MIhra2)

0.0804443359375

#### Timing (Group 3):

In [491]:
# This is an empty table to be filled with the indicators for group 3:
p3s = pd.DataFrame(index = range(2), columns = range(0,numstationsused), data=np.nan)
p3s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,


In [492]:
# Computation of the water years for natural condition:
datanaturalaux = pd.DataFrame(index = datanatural.index)
datanaturalaux["datetime"] = datanatural.index
datanaturalaux['water_year'] = datanaturalaux.datetime.dt.year.where(datanaturalaux.datetime.dt.month < 10, datanaturalaux.datetime.dt.year + 1)

# Computation of the water years for modified condition:
datamodifiedaux = pd.DataFrame(index = datamodified.index)
datamodifiedaux["datetime"] = datamodified.index
datamodifiedaux['water_year'] = datamodifiedaux.datetime.dt.year.where(datamodifiedaux.datetime.dt.month < 10, datamodifiedaux.datetime.dt.year + 1)

In [493]:
#for numstations in range(numstationsused):
start = time.time()

for j in range(1000):
    
    # Cliping the data to be used:
    datanatural3 = pd.DataFrame(data= datanatural.iloc[:,j])
    datamodified3 = pd.DataFrame(data= datamodified.iloc[:,j])
    
    # Assigning the water year of each row:
    datanatural3['water_year'] = datanaturalaux['water_year']
    datamodified3['water_year'] = datamodifiedaux['water_year']
    
    # The ID (location) of each minimum or maximum is computed:
    Mininumslocationnatural = datanatural3.groupby('water_year').idxmin()
    Maximumslocationnatural = datanatural3.groupby('water_year').idxmax()
    Mininumslocationmodified = datamodified3.groupby('water_year').idxmin()
    Maximumslocationmodified= datamodified3.groupby('water_year').idxmax()
    
    # Empty tables to be filled with the actual month of each extreme are built:
    # The month of each specific event (maximum and minimum) is computed:

    # Natural:
    MininumslocationMonthnatural = pd.DataFrame(index = Mininumslocationnatural.index, data = Mininumslocationnatural.iloc[:,0].dt.month)
    MaximumlocationMonthnatural = pd.DataFrame(index = Mininumslocationnatural.index, data = Maximumslocationnatural.iloc[:,0].dt.month )

    # Modified:
    MininumslocationMonthmodified = pd.DataFrame(index = Mininumslocationmodified.index, data = Mininumslocationmodified.iloc[:,0].dt.month)
    MaximumlocationMonthmodified = pd.DataFrame(index = Mininumslocationmodified.index, data = Maximumslocationmodified.iloc[:,0].dt.month)
    
    # Replace the months according to the classification of the paper used: (May = 1 until April = 12)
    MininumslocationMonthnatural.replace([5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], inplace = True)
    MaximumlocationMonthnatural.replace([5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], inplace = True)
    MininumslocationMonthmodified.replace([5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], inplace = True)
    MaximumlocationMonthmodified.replace([5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], inplace = True)

    # Computation of the median and quantiles:
    ## First empty data frames for each case are made:
    # The median(mode) is computed only for the modified streamflow:
    MedianMonths = pd.DataFrame(index = range(2), columns =[0])
    Quantile25Months = pd.DataFrame(index =  range(2), columns =[0])
    Quantile75Months = pd.DataFrame(index = range(2), columns =[0])
    
    # The median(mode) is computed only for the modified streamflow:
    MedianMonths.iloc[0,:] = MininumslocationMonthmodified.mode(dropna=False).max()
    MedianMonths.iloc[1,:] = MaximumlocationMonthmodified.mode(dropna=False).max()


    # The quantiles are computed only for the natural streamflow:
    Quantile25Months.iloc[0,:] = MininumslocationMonthnatural.quantile(q=0.25)
    Quantile25Months.iloc[1,:] = MaximumlocationMonthnatural.quantile(q=0.25)

    Quantile75Months.iloc[0,:] = MininumslocationMonthnatural.quantile(q=0.75)
    Quantile75Months.iloc[1,:] = MaximumlocationMonthnatural.quantile(q=0.75)
    
    
    for k in range(2):
        p3s.iloc[k,j] = pik(Quantile25Months.iloc[k,0], Quantile75Months.iloc[k,0], MedianMonths.iloc[k,0])  
    
end = time.time()
print(end - start)  

105.17007040977478


In [494]:
p3s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,


In [495]:
# Computation of the Group index (Group 3):
MIhra3 = p3s.mean()
MIhra3

0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
          ... 
2301105    NaN
2301106    NaN
2301107    NaN
2301108    NaN
2301109    NaN
Length: 2301110, dtype: float64

#### Magnitude frequency (Group 4):

In [496]:
# This is an empty table to be filled with the indicators for group 4:
p4s = pd.DataFrame(index = range(2), columns = range(0,numstationsused), data=np.nan)
p4s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,


In [497]:
# Creating empty tables for data filling:
condlowpulsesnat = pd.DataFrame(index = datanatural.index, columns = range(1), data=np.nan)
condhighpulsesnat = pd.DataFrame(index = datanatural.index, columns = range(1), data=np.nan)
condlowpulsesmod = pd.DataFrame(index = datamodified.index, columns = range(1), data=np.nan)
condhighpulsesmod = pd.DataFrame(index = datamodified.index, columns = range(1), data=np.nan)
condlowpulsesnat
# Computing the number of months per year:
condlowpulsesnat["datetime"] = condlowpulsesnat.index
condlowpulsesmod["datetime"] = condlowpulsesmod.index

# Computation of water-years:
condlowpulsesnat['year'] = condlowpulsesnat.datetime.dt.year.where(condlowpulsesnat.datetime.dt.month < 10, condlowpulsesnat.datetime.dt.year + 1)
condhighpulsesnat['year'] = condlowpulsesnat['year']

condlowpulsesmod['year'] = condlowpulsesmod.datetime.dt.year.where(condlowpulsesmod.datetime.dt.month < 10, condlowpulsesmod.datetime.dt.year + 1)
condhighpulsesmod['year'] = condlowpulsesmod['year']


condlowpulsesnat.drop(columns=['datetime'], inplace = True) 
condlowpulsesmod.drop(columns=['datetime'], inplace = True) 

In [498]:
# Loop for computing for each station:
start = time.time()

#for numstations in range(numstationsused):
for numstations in range(1000):
    
    # The quantiles 10% and 90% are computed:
    Quantile10Streamflow = datanatural.iloc[:,numstations].quantile(q=0.10)
    Quantile90Streamflow = datanatural.iloc[:,numstations].quantile(q=0.90)
    
    condlowpulsesnat.iloc[:,0] = np.where((datanatural.iloc[:,numstations] < Quantile10Streamflow),1,0)
    condhighpulsesnat.iloc[:,0] = np.where((datanatural.iloc[:,numstations] > Quantile90Streamflow),1,0)
    condlowpulsesmod.iloc[:,0] = np.where((datamodified.iloc[:,numstations] < Quantile10Streamflow),1,0)
    condhighpulsesmod.iloc[:,0] = np.where((datamodified.iloc[:,numstations] > Quantile90Streamflow),1,0)
    
    # The total number of low and high pulses are computed for each situation:
    lowpulsesnat = condlowpulsesnat.groupby('year',dropna=False).sum()
    highpulsesnat = condhighpulsesnat.groupby('year',dropna=False).sum()
    lowpulsesmod = condlowpulsesmod.groupby('year',dropna=False).sum()
    highpulsesmod = condhighpulsesmod.groupby('year',dropna=False).sum()

    # Computation of the median and quantiles:
    ## First empty data frames for each case are made:
    MedianLowAndHighPulses = pd.DataFrame(index = ["lowpulses","highpulses"], columns = lowpulsesmod.columns)
    Quantile25LowAndHighPulses = pd.DataFrame(index = ["lowpulses","highpulses"], columns = lowpulsesnat.columns)
    Quantile75LowAndHighPulses = pd.DataFrame(index = ["lowpulses","highpulses"], columns = lowpulsesnat.columns)

    # The median is computed only for the modified streamflow:
    MedianLowAndHighPulses.iloc[0,:] = lowpulsesmod.median()
    MedianLowAndHighPulses.iloc[1,:] = highpulsesmod.median()


    # The quantiles are computed only for the natural streamflow:
    Quantile25LowAndHighPulses.iloc[0,:] = lowpulsesnat.quantile(q=0.25)
    Quantile25LowAndHighPulses.iloc[1,:] = highpulsesnat.quantile(q=0.25)

    Quantile75LowAndHighPulses.iloc[0,:] = lowpulsesnat.quantile(q=0.75)
    Quantile75LowAndHighPulses.iloc[1,:] = highpulsesnat.quantile(q=0.75)
    for k in range(2):
        p4s.iloc[k,numstations] = pik(Quantile25LowAndHighPulses.iloc[k,0], Quantile75LowAndHighPulses.iloc[k,0], MedianLowAndHighPulses.iloc[k,0])
        
end = time.time()
print(end - start) 

12.555983781814575


In [499]:
p4s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.600098,0.0,...,,,,,,,,,,
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,


In [500]:
# Computation of the Group index (Group 5):
MIhra4 = p4s.mean()
MIhra4

0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
          ... 
2301105    NaN
2301106    NaN
2301107    NaN
2301108    NaN
2301109    NaN
Length: 2301110, dtype: float64

#### Frequency rate of change (Group 5):

In [501]:
# This is an empty table to be filled with the indicators for group 5:
p5s = pd.DataFrame(index = range(2), columns = range(0,numstationsused), data=np.nan)
p5s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,


In [502]:
# Statistics:
# Obtain the data without the month within:
#datanatural5 = pd.DataFrame(data = datanatural.drop(columns = "month")
#datamodified5 = datamodified.drop(columns = "month")

In [503]:
# Loop for computing for each station:
start = time.time()

for numstations in range(1000):
    # Cumulative differences: 
    diffnatural = datanatural.iloc[:,numstations].diff(1)
    diffmodififed = datamodified.iloc[:,numstations].diff(1)
    
    # Compute separatly the positive and the negative differences:
    diffnaturalpositives = diffnatural[diffnatural>=0]
    diffnaturalnegatives = diffnatural[diffnatural<0]
    diffmodifiedpositives = diffmodififed[diffmodififed>=0]
    diffmodifiednegatives = diffmodififed[diffmodififed<0]
    
    # Computation of the median and quantiles:
    ## First empty data frames for each case are made:
    MedianDifferences = pd.DataFrame(index = ["positives","negatives"], columns = [0])
    Quantile25Differences = pd.DataFrame(index = ["positives","negatives"], columns = [0])
    Quantile75Differences = pd.DataFrame(index = ["positives","negatives"], columns = [0])

    # The median is computed only for the modified streamflow:
    MedianDifferences.iloc[0,:] = diffmodifiedpositives.median(skipna=True)
    MedianDifferences.iloc[1,:] = diffmodifiednegatives.median(skipna=True)

    # The quantiles are computed only for the natural streamflow:
    Quantile25Differences.iloc[0,:] = diffnaturalpositives.quantile(q=0.25)
    Quantile25Differences.iloc[1,:] = diffnaturalnegatives.quantile(q=0.25)

    Quantile75Differences.iloc[0,:] = diffnaturalpositives.quantile(q=0.75)
    Quantile75Differences.iloc[1,:] = diffnaturalnegatives.quantile(q=0.75)
    
    for k in range(2):
        p5s.iloc[k,numstations] = pik(Quantile25Differences.iloc[k,0], Quantile75Differences.iloc[k,0], MedianDifferences.iloc[k,0])

end = time.time()
print(end - start)  

6.481004238128662


In [504]:
#print("The total time for computation is:", (((numstationsused * (end - start))/60)/60), "hours")

In [505]:
# Computation of the Group index (Group 5):
MIhra5 = p5s.mean()
MIhra5

0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
          ... 
2301105    NaN
2301106    NaN
2301107    NaN
2301108    NaN
2301109    NaN
Length: 2301110, dtype: float64

#### Concatetanion of all the single indicators in one single table:

In [506]:
# Concatanation of all the indicators:
ps = pd.DataFrame(columns = MIhra5.index, index =["p11","p12", "p13", "p14", "p15","p16","p17", "p18", "p19", "p110","p111","p112", "p213", "p214", "p215","p216","p317", "p318", "p419", "p420","p521", "p522"], data = pd.concat([p1s, p2s, p3s, p4s, p5s], axis = 0).values )
ps

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2301100,2301101,2301102,2301103,2301104,2301105,2301106,2301107,2301108,2301109
p11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p18,0.0,0.207275,0.185181,0.185181,0.185181,0.105713,0.024521,0.0,0.211426,0.195557,...,,,,,,,,,,
p19,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
p110,0.153809,0.161255,0.15332,0.15332,0.15332,0.180176,0.213257,0.156616,0.165283,0.15686,...,,,,,,,,,,


#### Concatetanion of all the group indices in one single table:

In [507]:
MIhra = pd.DataFrame(index = MIhra5.index, columns =["MI-HRA1","MI-HRA2", "MI-HRA3", "MI-HRA4", "MI-HRA5"], data = pd.concat([MIhra1,MIhra2,MIhra3,MIhra4,MIhra5], axis = 1) .values )
MIhra.head(10)

Unnamed: 0,MI-HRA1,MI-HRA2,MI-HRA3,MI-HRA4,MI-HRA5
0,0.012817,0.0,0.0,0.0,0.0
1,0.030711,0.0,0.0,0.0,0.0
2,0.028208,0.0,0.0,0.0,0.0
3,0.028208,0.0,0.0,0.0,0.0
4,0.028208,0.0,0.0,0.0,0.0
5,0.023824,0.0,0.0,0.0,0.0
6,0.019815,0.0,0.0,0.0,0.0
7,0.013051,0.0,0.0,0.0,0.0
8,0.031392,0.00202,0.0,0.300049,0.0
9,0.029368,0.0,0.0,0.0,0.0


### Computation of the Global index (HMI-HRA):

In [508]:
#Weights (wis) of each index. They represent the number of indicatores per group:
wistable = [12,4,2,2,2]

# Now we proceed with a matrix multiplication:
wisarraey = np.array(wistable)
mihrasarray = np.array(MIhra.values)
# Finally one DataFrame with the Global indexes is computed:
GMIs = pd.DataFrame(index =MIhra5.index, columns = ["GMI-HRA"], data = (np.dot(mihrasarray,np.transpose(wisarraey)))/(22))
GMIs

Unnamed: 0,GMI-HRA
0,0.006991
1,0.016751
2,0.015386
3,0.015386
4,0.015386
...,...
2301105,
2301106,
2301107,
2301108,


In [509]:
np.nanmax(GMIs)

0.17803538929332385