# Exercise 12: Looping Over Data Sets

## Aim: Use `glob` to find sets of files, and write loops to perform operations on files. 

### Issues covered:
- Using a `for` loop to loop over a given list of files
- Using `glob` to match patterns of files
- Looping over sets of files using `glob`
- Using `glob` and `for` to process data from mutliple files

## 1. Looping over files from a simple list

Use a `for` loop to loop over the files in the data folder (americas_gdp.csv, europe_gdp.csv and oceania_gdp.csv) and print out the maxima of the datasets.

In [3]:
import pandas as pd
import glob
for filenames in glob.glob('data/*.csv'):
    df = pd.read_csv(filenames)
    
    print(df.max())

country           United Kingdom
gdpPercap_1952       14734.23275
gdpPercap_1957       17909.48973
gdpPercap_1962        20431.0927
gdpPercap_1967       22966.14432
gdpPercap_1972       27195.11304
gdpPercap_1977       26982.29052
gdpPercap_1982       28397.71512
gdpPercap_1987        31540.9748
gdpPercap_1992       33965.66115
gdpPercap_1997       41283.16433
gdpPercap_2002       44683.97525
gdpPercap_2007       49357.19017
dtype: object
country           New Zealand
gdpPercap_1952    10556.57566
gdpPercap_1957    12247.39532
gdpPercap_1962      13175.678
gdpPercap_1967    14526.12465
gdpPercap_1972    16788.62948
gdpPercap_1977    18334.19751
gdpPercap_1982    19477.00928
gdpPercap_1987    21888.88903
gdpPercap_1992    23424.76683
gdpPercap_1997    26997.93657
gdpPercap_2002    30687.75473
gdpPercap_2007    34435.36744
dtype: object
continent            Americas
country             Venezuela
gdpPercap_1952    13990.48208
gdpPercap_1957    14847.12712
gdpPercap_1962    16173.14586
gdp

In [6]:
import pandas as pd
import pathlib
for file in ['/data/americas_gdp.csv', '/data/europe_gdp.csv', '/data/oceania_gdp.csv']:
    data = pd.read_csv(pathlib.Path(file).resolve(), index_col='country')
    print(file, data.max())

FileNotFoundError: [Errno 2] No such file or directory: '/data/americas_gdp.csv'

## 2. Using `glob` to loop through files

What do you think `glob.glob('*.ipynb')` will return? Try it and see.

In [7]:
glob.glob('*.ipynb')

['ex09_lists.ipynb',
 'ex11_conditionals.ipynb',
 'ex01_running_notebooks.ipynb',
 'ex15_programming_style.ipynb',
 'ex02_variables_assignment.ipynb',
 'ex16_wrap_up.ipynb',
 'ex12_looping_data_sets.ipynb',
 'ex08_plotting.ipynb',
 'ex13_writing_functions.ipynb',
 'ex10_for_loops.ipynb',
 'ex14_variable_scope.ipynb',
 'ex07_pandas_dataframes.ipynb',
 'ex03_data_types.ipynb',
 'ex04_built_in_functions.ipynb',
 'ex05_libraries.ipynb',
 'ex06_dataframes.ipynb']

How can we return only the files for exercises 1-9?

In [8]:
glob.glob('ex0?*.ipynb')

['ex09_lists.ipynb',
 'ex01_running_notebooks.ipynb',
 'ex02_variables_assignment.ipynb',
 'ex08_plotting.ipynb',
 'ex07_pandas_dataframes.ipynb',
 'ex03_data_types.ipynb',
 'ex04_built_in_functions.ipynb',
 'ex05_libraries.ipynb',
 'ex06_dataframes.ipynb']

Which exercises have coffee in their title?

In [11]:
glob.glob('*loop*.ipynb')

['ex12_looping_data_sets.ipynb', 'ex10_for_loops.ipynb']

In [12]:
glob.glob('*loop*')

['ex12_looping_data_sets.ipynb', 'ex10_for_loops.ipynb']

## 3. Using `glob` and `for` to process files

Write a `for` loop which loops through all of the files containing `gdp` in the title, write the data to a dataframe, then print the dataframe minimum for all columns (using `.min()`).

In [5]:
for filenames in glob.glob('*loop*.ipynb')