### Download PcrGlobWB data to instance

* Purpose of script: This notebook will download the data from S3 to the EC2 instance 
* Author: Rutger Hofste
* Kernel used: python36
* Date created: 20170731

In this notebook we will copy the data for the first couple of steps from WRI's Amazon S3 Bucket. The data is large i.e. **40GB** so a good excuse to drink a coffee. The output in Jupyter per file is suppressed so you will only see a result after the file has been donwloaded. You can also run this command in your terminal and see the process per file

Before you run this script, make sure you configure aws by running (in your terminal): `aws configure`

Create folder to store the data

In [6]:
S3_INPUT_PATH = "s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/"

In [7]:
EC2_PATH = "/volumes/data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/"

In [8]:
!mkdir -p {EC2_PATH}

Grab a coffee before you run the following command. This will copy the files from S3 to your EC2 instance. 

In [9]:
!aws s3 cp {S3_INPUT_PATH} {EC2_PATH} --recursive

download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/global_environmentalflows_5min_1960-2014.asc to ../../../../data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/global_environmentalflows_5min_1960-2014.asc
download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/global_droughtseveritystandardisedsoilmoisture_5min_1960-2014.asc to ../../../../data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/global_droughtseveritystandardisedsoilmoisture_5min_1960-2014.asc
download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/global_droughtseveritystandardisedstreamflow_5min_1960-2014.asc to ../../../../data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/global_droughtseveritystandardisedstreamflow_5min_1960-2014.asc
download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/global_historical_PDomWW_year_millionm3_5min_196

download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/global_historical_riverdischarge_month_m3second_5min_1960_2014.nc4 to ../../../../data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/global_historical_riverdischarge_month_m3second_5min_1960_2014.nc4
download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/global_historical_soilmoisture_month_meter_5min_1958-2014.nc4 to ../../../../data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/global_historical_soilmoisture_month_meter_5min_1958-2014.nc4
download: s3://wri-projects/Aqueduct30/processData/Y2017M07D31_RH_copy_S3raw_s3process_V01/output/totalRunoff_monthTot_output.zip to ../../../../data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/totalRunoff_monthTot_output.zip


List files downloaded (32 in my case)

In [12]:
!find {EC2_PATH} -type f | wc -l

32


As you can see there are some zipped files. Unzipping

Unzipping the file results in a 24GB file which is signifact. Therefore this step will take quite some time

In [13]:
!unzip {EC2_PATH}totalRunoff_monthTot_output.zip -d {EC2_PATH}

Archive:  /volumes/data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output//totalRunoff_monthTot_output.zip
  inflating: /volumes/data/Y2017M07D31_RH_download_PCRGlobWB_data_V01/output/totalRunoff_monthTot_output.nc  


The total number of files should be around 25 but can change if the raw data changed. 

In [14]:
!ls -lah {EC2_PATH}

total 64G
drwxr-xr-x 2 root root 4.0K Aug  8 20:16 .
drwxr-xr-x 3 root root 4.0K Aug  8 19:58 ..
-rw-r--r-- 1 root root  57M Aug  8 19:49 global_droughtseveritystandardisedsoilmoisture_5min_1960-2014.asc
-rw-r--r-- 1 root root  55M Aug  8 19:49 global_droughtseveritystandardisedstreamflow_5min_1960-2014.asc
-rw-r--r-- 1 root root  56M Aug  8 19:49 global_environmentalflows_5min_1960-2014.asc
-rw-r--r-- 1 root root 3.2G Aug  8 19:44 global_historical_PDomUse_month_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 270M Aug  8 19:44 global_historical_PDomUse_year_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 3.2G Aug  8 19:44 global_historical_PDomWW_month_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 271M Aug  8 19:44 global_historical_PDomWW_year_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 1.7G Aug  8 19:44 global_historical_PIndUse_month_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 156M Aug  8 19:44 global_historical_PIndUse_year_millionm3_5min_1

In the data that Yoshi provided there is only Livestock data for consumption (WN). However in an email he specified that the withdrawal (WW) equals the consumption (100% consumption) for livestock. Therefore we copy the WN Livestock files to WW to make looping over WN and WW respectively easier. 

In [15]:
!cp {EC2_PATH}/global_historical_PLivWN_month_millionm3_5min_1960_2014.nc4 {EC2_PATH}/global_historical_PLivWW_month_millionm3_5min_1960_2014.nc4

In [16]:
!cp {EC2_PATH}/global_historical_PLivWN_year_millionm3_5min_1960_2014.nc4 {EC2_PATH}/global_historical_PLivWW_year_millionm3_5min_1960_2014.nc4

In [17]:
!ls -lah {EC2_PATH}

total 68G
drwxr-xr-x 2 root root 4.0K Aug  8 20:30 .
drwxr-xr-x 3 root root 4.0K Aug  8 19:58 ..
-rw-r--r-- 1 root root  57M Aug  8 19:49 global_droughtseveritystandardisedsoilmoisture_5min_1960-2014.asc
-rw-r--r-- 1 root root  55M Aug  8 19:49 global_droughtseveritystandardisedstreamflow_5min_1960-2014.asc
-rw-r--r-- 1 root root  56M Aug  8 19:49 global_environmentalflows_5min_1960-2014.asc
-rw-r--r-- 1 root root 3.2G Aug  8 19:44 global_historical_PDomUse_month_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 270M Aug  8 19:44 global_historical_PDomUse_year_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 3.2G Aug  8 19:44 global_historical_PDomWW_month_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 271M Aug  8 19:44 global_historical_PDomWW_year_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 1.7G Aug  8 19:44 global_historical_PIndUse_month_millionm3_5min_1960_2014.nc4
-rw-r--r-- 1 root root 156M Aug  8 19:44 global_historical_PIndUse_year_millionm3_5min_1

In [18]:
import os
files = os.listdir(EC2_PATH)
print("Number of files: " + str(len(files)))

Number of files: 35


Copy PLivWN to PLivWW because Livestock Withdrawal = Livestock Consumption (see Yoshi's email'). This will solve some lookping issues in the future. Copies 4GB of data so takes a while

Some files that WRI received from Utrecht refer to water "Use" instead of WN (net). Renaming the relevant file. Renaming them

In [19]:
!mv {EC2_PATH}/global_historical_PDomUse_month_millionm3_5min_1960_2014.nc4 {EC2_PATH}/global_historical_PDomWN_month_millionm3_5min_1960_2014.nc4
!mv {EC2_PATH}/global_historical_PDomUse_year_millionm3_5min_1960_2014.nc4 {EC2_PATH}/global_historical_PDomWN_year_millionm3_5min_1960_2014.nc4

!mv {EC2_PATH}/global_historical_PIndUse_month_millionm3_5min_1960_2014.nc4 {EC2_PATH}/global_historical_PIndWN_month_millionm3_5min_1960_2014.nc4
!mv {EC2_PATH}/global_historical_PIndUse_year_millionm3_5min_1960_2014.nc4 {EC2_PATH}/global_historical_PIndWN_year_millionm3_5min_1960_2014.nc4


As you can see, the filename structure of the runoff files is different. Using Panoply to inspect the units, we rename the files accordingly. 

new names for annual:  

global_historical_runoff_year_myear_5min_1958_2014.nc

new name for monthly:  

global_historical_runoff_month_mmonth_5min_1958_2014.nc


In [20]:
!mv {EC2_PATH}/totalRunoff_annuaTot_output.nc {EC2_PATH}/global_historical_runoff_year_myear_5min_1958_2014.nc

In [21]:
!mv {EC2_PATH}/totalRunoff_monthTot_output.nc {EC2_PATH}/global_historical_runoff_month_mmonth_5min_1958_2014.nc

Final Folder strcuture

In [22]:
!ls {EC2_PATH}

global_droughtseveritystandardisedsoilmoisture_5min_1960-2014.asc
global_droughtseveritystandardisedstreamflow_5min_1960-2014.asc
global_environmentalflows_5min_1960-2014.asc
global_historical_PDomWN_month_millionm3_5min_1960_2014.nc4
global_historical_PDomWN_year_millionm3_5min_1960_2014.nc4
global_historical_PDomWW_month_millionm3_5min_1960_2014.nc4
global_historical_PDomWW_year_millionm3_5min_1960_2014.nc4
global_historical_PIndWN_month_millionm3_5min_1960_2014.nc4
global_historical_PIndWN_year_millionm3_5min_1960_2014.nc4
global_historical_PIndWW_month_millionm3_5min_1960_2014.nc4
global_historical_PIndWW_year_millionm3_5min_1960_2014.nc4
global_historical_PIrrWN_month_millionm3_5min_1960_2014.nc4
global_historical_PIrrWN_year_millionm3_5min_1960_2014.nc4
global_historical_PIrrWW_month_millionm3_5min_1960_2014.nc4
global_historical_PIrrWW_year_millionm3_5min_1960_2014.nc4
global_historical_PLivWN_month_millionm3_5min_1960_2014.nc4
global_historical_PLivWN_year_milli