# Exercise 12: Looping Over Data Sets

## Aim: Use `glob` to find sets of files, and write loops to perform operations on files. 

### Issues covered:
- Using a `for` loop to loop over a given list of files
- Using `glob` to match patterns of files
- Looping over sets of files using `glob`
- Using `glob` and `for` to process data from mutliple files

## 1. Looping over files from a simple list

Use a `for` loop to loop over the files in the data folder (americas_gdp.csv, europe_gdp.csv and oceania_gdp.csv) and print out the maxima of the datasets.

In [1]:
import pandas as pd
import pathlib
for file in ['../data/americas_gdp.csv', '../data/europe_gdp.csv', '../data/oceania_gdp.csv']:
    data = pd.read_csv(pathlib.Path(file).resolve(), index_col='country')
    print(file, data.max())

data/americas_gdp.csv continent            Americas
gdpPercap_1952    13990.48208
gdpPercap_1957    14847.12712
gdpPercap_1962    16173.14586
gdpPercap_1967    19530.36557
gdpPercap_1972    21806.03594
gdpPercap_1977    24072.63213
gdpPercap_1982    25009.55914
gdpPercap_1987    29884.35041
gdpPercap_1992    32003.93224
gdpPercap_1997    35767.43303
gdpPercap_2002    39097.09955
gdpPercap_2007    42951.65309
dtype: object
data/europe_gdp.csv gdpPercap_1952    14734.23275
gdpPercap_1957    17909.48973
gdpPercap_1962    20431.09270
gdpPercap_1967    22966.14432
gdpPercap_1972    27195.11304
gdpPercap_1977    26982.29052
gdpPercap_1982    28397.71512
gdpPercap_1987    31540.97480
gdpPercap_1992    33965.66115
gdpPercap_1997    41283.16433
gdpPercap_2002    44683.97525
gdpPercap_2007    49357.19017
dtype: float64
data/oceania_gdp.csv gdpPercap_1952    10556.57566
gdpPercap_1957    12247.39532
gdpPercap_1962    13175.67800
gdpPercap_1967    14526.12465
gdpPercap_1972    16788.62948
gdpPerca

## 2. Using `glob` to loop through files

What do you think `glob.glob('*.ipynb')` will return? Try it and see.

In [2]:
import glob
glob.glob('*.ipynb')

['ex01_running_notebooks.ipynb',
 'ex02_variables_assignment.ipynb',
 'ex03_data_types.ipynb',
 'ex04_built_in_functions.ipynb',
 'ex05_coffee.ipynb',
 'ex06_libraries.ipynb',
 'ex07_dataframes.ipynb',
 'ex08_pandas_dataframes.ipynb',
 'ex09_plotting.ipynb',
 'ex10_lunch.ipynb',
 'ex11_lists.ipynb',
 'ex12_for_loops.ipynb',
 'ex13_conditionals.ipynb',
 'ex14_looping_data_sets.ipynb',
 'ex15_coffee.ipynb',
 'ex16_writing_functions.ipynb',
 'ex17_variable_scope.ipynb',
 'ex18_programming_style.ipynb',
 'ex19_wrap_up.ipynb',
 'ex20_feedback.ipynb',
 'example.ipynb']

How can we return only the files for exercises 1-9?

In [3]:
import glob
# The exercises between 1 and 9 start with a 0.
glob.glob('ex0?*.ipynb')

['ex01_running_notebooks.ipynb',
 'ex02_variables_assignment.ipynb',
 'ex03_data_types.ipynb',
 'ex04_built_in_functions.ipynb',
 'ex05_coffee.ipynb',
 'ex06_libraries.ipynb',
 'ex07_dataframes.ipynb',
 'ex08_pandas_dataframes.ipynb',
 'ex09_plotting.ipynb']

Which exercises have coffee in their title?

In [4]:
import glob
# Remember there could be content before or after the word 'coffee'
glob.glob('*coffee*')

['ex05_coffee.ipynb', 'ex15_coffee.ipynb']

## 3. Using `glob` and `for` to process files

Write a `for` loop which loops through all of the files containing `gdp` in the title, write the data to a dataframe, then print the dataframe minimum for all columns (using `.min()`).

In [5]:
import pandas as pd
for filename in glob.glob('../data/*gdp*'):
    data = pd.read_csv(filename)
    print(filename, data.min())

data/americas_gdp.csv continent            Americas
country             Argentina
gdpPercap_1952    1397.717137
gdpPercap_1957    1544.402995
gdpPercap_1962    1662.137359
gdpPercap_1967    1452.057666
gdpPercap_1972    1654.456946
gdpPercap_1977    1874.298931
gdpPercap_1982    2011.159549
gdpPercap_1987    1823.015995
gdpPercap_1992    1456.309517
gdpPercap_1997    1341.726931
gdpPercap_2002    1270.364932
gdpPercap_2007    1201.637154
dtype: object
data/europe_gdp.csv country               Albania
gdpPercap_1952     973.533195
gdpPercap_1957    1353.989176
gdpPercap_1962    1709.683679
gdpPercap_1967    2172.352423
gdpPercap_1972     2860.16975
gdpPercap_1977    3528.481305
gdpPercap_1982    3630.880722
gdpPercap_1987    3738.932735
gdpPercap_1992    2497.437901
gdpPercap_1997    3193.054604
gdpPercap_2002    4604.211737
gdpPercap_2007    5937.029526
dtype: object
data/oceania_gdp.csv country             Australia
gdpPercap_1952    10039.59564
gdpPercap_1957    10949.64959
gdpPercap