# Introduction to Python Loops

This notebook follows from the *introduction_to_data_analysis_pandas* notebook which was mainly focused on analyzing single text files (.csv). In this notebook, we will perform the same analysis but for multiple text files using for loops. By the end of this exercise it is expected that you will be able to:

1. Write your own simple loops for specific tasks
2. Apply some of the concepts learnt from the *introduction_to_python* notebook
3. Have a better understanding of how to analyze data in text files in Python
4. Index a list
5. Concatenate strings

The datasets for this exercise are stored in the Google drive. You can access it using the following link https://drive.google.com/drive/u/0/folders/1_mN7ic2mq7MZmvF5J7Osdxlb5J4mZ-Lv . We will use Pandas and matplotlib to plot the final results.

## Reading Multiple Files

In [None]:
## Load the modules

import pandas as pd                    ## Load Pandas and rename it pd
import matplotlib.pyplot as plt        ## Load Matplotlib.pyplot and rename it plt 

We will start with analyzing projections of precipitation from different regional climate models for specific location. We will examine the urls of all the files and check for patterns to loop over. One thing to note is that looping requires consistency in file structure and naming convention.

In [None]:
## URL of all the precipitation files in the Google drive folder

url1 = 'https://drive.google.com/file/d/1N4UpZnGdKzXkIW1yCoXAeOVYsdxjVYZs/view?usp=share_link'
url2 = 'https://drive.google.com/file/d/1b3d0guTpDEP_bYxHHITz0fW4ynK93qE6/view?usp=share_link'
url3 = 'https://drive.google.com/file/d/1qutwNwKGF7Wm8L_WbZ7xni7O6l9tGo25/view?usp=share_link'
url4 = 'https://drive.google.com/file/d/1MOdEE66_sPJ49tAFI6qtdkFsh8oc3jfU/view?usp=share_link'

Through examination, the only unique part of the urls (1 to 4) are:

* 1N4UpZnGdKzXkIW1yCoXAeOVYsdxjVYZs
* 1b3d0guTpDEP_bYxHHITz0fW4ynK93qE6
* 1qutwNwKGF7Wm8L_WbZ7xni7O6l9tGo25
* 1MOdEE66_sPJ49tAFI6qtdkFsh8oc3jfU

Let's extract these parts from the urls and assign them to unique variables.

In [None]:
## We will use the inbuilt function for splitting strings. We will use '/' to indicate where the string should be split.

url1.split('/')        ## Split url1 at any point where '/' occurs (result is a list). 

In [None]:
url1.split('/')[-2]    ## Then pick the second last item in that list

Let's do this for all urls and assign them to unique variables called file1, file2, file3, and file4.

In [None]:
## Splitting and assigning to unique variable

file1 = url1.split('/')[-2]
file2 = url2.split('/')[-2]
file3 = url3.split('/')[-2]
file4 = url4.split('/')[-2]
# print(file1, file2, file3, file4)

In [None]:
## Let's dump all the results into a list object to enable use iterate easily

fl_nms = [file1, file2, file3, file4]; fl_nms             ## Combine all strings into a single list
# fl_nms[0]                                                 ## Indexing a list 

We need to join this string *'https://drive.google.com/uc?export=download&id='* with the above extracted strings to complete our full path to the files in the Google Drive. Since 'https://drive.google.com/uc?export=download&id=' is similar for all files, we can define it as a string variable once and use it everywhere it is needed.

In [None]:
## Let's define a variable path and assign 'https://drive.google.com/uc?export=download&id=' to it

path = 'https://drive.google.com/uc?export=download&id='

Remember, you need the function *read_csv* from the Pandas module to read .csv files. we can do that for file1 as shown below. Note, use *'+'* to concatenate two strings.

In [None]:
## Example of how to concatenate strings

# path + file1                  ## Concatenating two strings
# path + '/' + file1            ## Concatenate three strings (all have quotation marks).
# path + ' / ' + file1          ## Concatenate three strings (all have quotation marks) as above but with spaces.

In [None]:
## Read .csv file

df = pd.read_csv(path + file1, sep=',') #;df    ## Reading one csv file

Let's write a loop to read all the precipitation files. We will need a container to put the data that will be read. The most appropriate container is a **List**. We will initialize an empty list and then add data into it from all the four files.

In [None]:
## Loop for reading multiple files

all_df = [None] * len(fl_nms)           ## Initialize an empty list of length 'len(fl_nms)' that will act as our container

## We will loop over the items contained in fl_nms. Note that Python indexing starts from 0 and not 1

# len(fl_nms)                ## Tells you the length of the list
# range(len(fl_nms))

for i in [0, 1, 2, 3]:
    
    df = pd.read_csv(path + fl_nms[i], sep=',')       ## For each item in fl_nms, concatenate with path and read the csv file. 
                                                      ## Finally assign to df which will be overwritten with every iteration.
    
    all_df[i] = df                                    ## After reading, place it in the list container at a specific position defined by i.

## Check the contents of the list

all_df

### Exercise

Write a loop to read multiple temperature files contained in the same Google Drive.

## Analyzing Multiple Files

We will repeat the same analysis in the *introduction_to_data_analysis_pandas* notebook. We will be assessing the inter-annual variability from multi-models (here we have four). The season of interest is the March to May rainfall season. We have already read in our data which are contained in the list *all_df*.

The first column of our dataset contains dates when the parameter was observed. It is however not captured as an index column. Forcing the Dates column to Index column is beneficial for DateTime operations in Pandas. Let's force the first column to be our Index column by first forcing the column to a DateTime object by using *'to_datetime()'* function and then using the *'set_index()'* function to make it our index column. The following are the steps we took in the *introduction_to_data_analysis_pandas* notebook:

* Convert the Dates column to Pandas Datetime object and make that column an index
* Select the months in March to May season using a condition
* Group by year and get the annual MAM rainfall totals. Use mean for temperature
* Add a column name to the new column

Let's write a loop to do the above using the data contained in the list *all_df*.

In [None]:
## Our list is of length...

print('The list is of length:', len(all_df))

In [None]:
## Loop for analyzing multiple elements in a list

out_dat = [None] * 4                ## Initialize a list of NoneType

for i in [0, 1, 2, 3]:
    ## Convert the Dates column to Pandas Datetime object and make that column an index
    
    all_df[i]['Dates'] = pd.to_datetime(all_df[i]['Dates']);all_df      ## Convert to a DatetTime object
    df = all_df[i].set_index('Dates')                                   ## Setting the 'Dates' column as an index
    
    ## Select the months in March to May season using a condition
    
    df_mam = df.iloc[((df.index.month >= 3) & (df.index.month <= 5))]
    
    ## Group by year and get the annual MAM rainfall totals. Use mean for temperature
    
    ann_mam = df_mam.groupby(df_mam.index.year).sum()
    
    out_dat[i] = ann_mam

## Check the contents of the final output

# out_dat

### Exercise

Write a loop to analyze multiple temperature files contained in the list all_df.

## Plotting the Results

The approach to plotting wil be similar to that taken in the *introduction_to_data_analysis_pandas* notebook. However, there will be a slight change in the final plot. In this exercise we want to add multiple lines (models) to the same plot for the assessment of the different models. The idea is to check which part of the code will be changing and which will not. In our case, the only part of the code that will change is the *'ax.plot()'* part. All other details will be shared.

In [None]:
# Create figure and plot space
fig, ax = plt.subplots(figsize=(10, 10))

# Plot the data, here is where we will insert the loop.

lines = ['solid', 'dashed', 'dotted', 'dashdot']

col = ['red', 'green', 'blue', 'black']       ## Define the line colors to differentiate the line plots

for i in [0, 1, 2, 3]:
    
    ax.plot(out_dat[0].index.values,
            out_dat[i]['Precipitation(mm/day)'],
            linestyle=lines[i], color = col[i])     

## Set title and labels for axes

ax.set_xlabel("Years", fontsize=14)
ax.set_ylabel("Precipitation (mm)", fontsize=14)
plt.title('Inter-Annual Variability', fontsize=15, fontweight="bold")

## Legends

ax.legend(['Model1', 'Model2', 'Model3', 'Model4'])

# Rotate tick marks on x-axis
plt.setp(ax.get_xticklabels(), rotation=45)

plt.show()

### Multipanel plots using Loop

In [None]:
# Create figure and plot space
fig, ax = plt.subplots(figsize=(10, 10))


# Define some key variables to elp with plotting.

modls = ['Model1', 'Model2', 'Model3', 'Model4']  ## Define title labels for each plot
col = ['red', 'green', 'blue', 'black']           ## Define the line colors to differentiate the line plots
indx = [1, 2, 3, 4]                               ## Define index positions for plotting

# Plot the data, here is where we will insert the loop.

for i in [0, 1, 2, 3]:
    
    ax = plt.subplot(2,2,indx[i])   ## Divide our figure into four and plot in position defined by indx[i]
    ax.plot(out_dat[0].index.values,
            out_dat[i]['Precipitation(mm/day)'],
            color = col[i])     

    ## Set title and labels for axes

    ax.set_xlabel("Years", fontsize=14)
    ax.set_ylabel("Precipitation (mm)", fontsize=14)
    plt.title(modls[i], fontsize=15, fontweight="bold", color=col[i])
    ##ax.label_outer()                 ## Automatically switch off the axis labels for multipanel plots

plt.show()