<a href="https://colab.research.google.com/github/m-zaniolo/CEE690-ESAA/blob/main/Lab_6_scenario_based.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Lab 6 - download and analyze climate projections
_____________


In [None]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 600
import pandas as pd

## _Climate change - analyze data_

Streamflow projections for the Colorado River at the Grand Canyon from USBR database.

Load data. Combine first three columns into a datetime index.

In [None]:
url = 'https://raw.githubusercontent.com/m-zaniolo/CEE690-ESAA/main/data/'

# Step 1: Download the zip file
!wget -O streamflow_data.zip https://raw.githubusercontent.com/m-zaniolo/CEE690-ESAA/main/data/streamflow_cmip5_ncar_day_GRAND.csv.zip

# Step 2: Unzip the file
!unzip streamflow_data.zip -d content/

df = pd.read_csv('content/streamflow_cmip5_ncar_day_GRAND-2.csv',
                  index_col='datetime',
                  parse_dates={'datetime': [0,1,2]})

Example - filter columns

In [None]:
#filter by RCP
df_rcp85 = df.filter(like='rcp85')

#filter by product
df_rcp85_miroc = df_rcp85.filter(like='miroc')


Let's see how the daily projections look like:

In [None]:
df_rcp85_miroc.plot()

### Annual data ###

It's hard to spot trends with daily timeseries. Pandas is really convenient for processing timeseries because it has several built-in methods to aggregate and resample data.

We will use: `resample` and `rolling`

In [None]:
annual_df = df.resample('YE').sum() #sum daily values to annual values
# It is still hard to interpret trends from annual values

# if we plot the three projections from before:
df_rcp85_miroc.resample('YE').sum().plot()


We can better grasp trends with rolling window mean

In [None]:
annual_df = df.resample('YE').sum() #sum daily values to annual values
annual_df_rol20 = annual_df.rolling(window = 20).mean() #rolling window of 20 years, take avg value
mean_projection = annual_df_rol20.mean(axis = 1)

annual_df_rol20.plot(legend=None)
mean_projection.plot(color = 'k')
plt.title('Colorado River @ Grand Canyon - NCAR CMIP5 Projections')
plt.ylabel('Annual Streamflow (cfs)')
plt.show()


Example: filter by RCP (2.6, 4.5, 6, 8.5) and plot each projection with a different color

In [None]:
#
annual_df_rcp26 = annual_df_rol20.filter(like='rcp26')
annual_df_rcp45 = annual_df_rol20.filter(like='rcp45')
annual_df_rcp60 = annual_df_rol20.filter(like='rcp60')
annual_df_rcp85 = annual_df_rol20.filter(like='rcp85')

# Create a new figure and axis for the plot
fig, ax = plt.subplots(figsize=(10, 6))

# Plot each DataFrame on the same axis with different colors
annual_df_rcp26.plot(ax=ax, legend=None, color='green')
annual_df_rcp45.plot(ax=ax, legend=None, color='yellow')
annual_df_rcp60.plot(ax=ax, legend=None, color='blue')
annual_df_rcp85.plot(ax=ax, legend=None, color='red')

# Add plot details
ax.set_xlabel('Year')
ax.set_ylabel('Annual Streamflow (cfs)')
plt.title('Colorado River @ Grand Canyon - NCAR CMIP5 Projections by RCP')
ax.grid(True)  # Optional: add a grid for readability

plt.show()




---

##Indicators##

Annual peak flow

In [None]:
#annual max - hint: resample over the year, aggregate with the max,
# and then show the trend with a rolling mean
annual_max_df =
annual_max_df_roll20 =

#let's just focus on the two extreme RCP to identify any differences
annual_max_df_roll20_rcp26 = annual_max_df_roll20.filter(like = 'rcp26')
annual_max_df_roll20_rcp85 = annual_max_df_roll20.filter(like = 'rcp85')

fig, ax = plt.subplots(figsize=(10, 6))

# Plot each DataFrame on the same axis with different colors
annual_max_df_roll20_rcp26.plot(ax=ax, legend=None, color='green')
annual_max_df_roll20_rcp85.plot(ax=ax, legend=None, color='red')


Generate a boxplot for these values:

In [None]:
# Extract all values from each DataFrame and flatten them into single arrays
data_rcp26 = annual_max_df_roll20_rcp26.values[-20:].flatten() #focus on the last 20 years of data (end of century)
data_rcp85 = annual_max_df_roll20_rcp85.values[-20:].flatten()

# Plot box plots for each dataset side-by-side
plt.figure(figsize=(8, 5))
plt.boxplot([data_rcp26, data_rcp85], labels=['RCP26', 'RCP85'],  widths=0.5)

# Customize plot appearance
plt.ylabel('Annual Max Value')
plt.title('Distribution of Annual Maximum Values')

plt.show()

---

### Trend

Identify drying trends:

Identify the driest 10th percentile of the ensemble at the end of the horizon

In [None]:
# Step 1: Select the last row (end of the time horizon)
# Step 2: Calculate the 10th percentile for the last row
# Step 3: Identify the time series that fall into the lowest 10th percentile

last_row = annual_df_rol20.iloc[-1]
percentile_10 = last_row.quantile(0.1)

driest_projections = last_row[last_row <= percentile_10].index

# Step 4: Plot all time series, with the lowest 10th percentile in red
fig, ax = plt.subplots(figsize=(10, 6))
# Plot all projections in blue
annual_df_rol20.plot(ax = ax, legend = None, color='blue', alpha=0.5)
# Highlight lowest 10th percentile time series in red
annual_df_rol20[driest_projections].plot(ax = ax, color='red')

plt.title('Climate Projections with Highlighted Lowest 10th Percentile at End of Time Horizon')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

### Variance

Identify scenarios with highest interannual variability.
For this analysis, let's use annual values rather than 20-year rolling means

In [None]:
# Step 1: Calculate the standard deviation for each projection
# Step 2: Determine the threshold for the top 5th percentile
# Step 3: Identify the projections in the top 5th percentile of variability



# Step 4: Plot all projections in blue, with top variability projections in red
# Note: we plot the rolling mean projections for readability

fig, ax = plt.subplots(figsize=(10, 6))
# Plot all projections in blue
annual_df_rol20.plot(ax = ax, legend = None, color='blue', alpha=0.5)
# Highlight time series in red
annual_df_rol20[most_variable_projections].plot(ax = ax, color='red')

plt.title('Climate Projections with HighlightedMost variable projections')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

### Discussion:
We developed code to identify:

1) changes in indicators (annual 1-day peak flow)

2) most pronounced trends (dryest 10th percentile, we could do the same for wettest prct)

3) most pronounced interannual variability

Why is such an analysis useful?

----------------