# Analyzing Data from Multiple Files

We now have almost everything we need to process all our data files.
The only thing that's missing is a library with a rather unpleasant name:

In [1]:
import glob

The `glob` library contains a function, also called `glob`,
that finds files and directories whose names match a pattern.
We provide those patterns as strings:
the character `*` matches zero or more characters,
while `?` matches any one character.
We can use this to get the names of all the CSV files in the current directory:

In [7]:
files = glob.glob("inflammation-*.csv")
files.sort(reverse=True)
files

['inflammation-12.csv',
 'inflammation-11.csv',
 'inflammation-10.csv',
 'inflammation-09.csv',
 'inflammation-08.csv',
 'inflammation-07.csv',
 'inflammation-06.csv',
 'inflammation-05.csv',
 'inflammation-04.csv',
 'inflammation-03.csv',
 'inflammation-02.csv',
 'inflammation-01.csv']

As these examples show,
`glob.glob`'s result is a list of file and directory paths in arbitrary order.
This means we can loop over it
to do something with each filename in turn.
In our case,
the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
If we want to start by analyzing just the first three files in alphabetical order, we can use the
`sorted` built-in function to generate a new sorted list from the `glob.glob` output:

In [8]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

In [10]:
for f in files:
    print(f)
    
    data = np.loadtxt(f, delimiter=',')
    
    fig = plt.figure(figsize=(9,3))
    
    ax1 = fig.add_subplot(1,3,1)
    ax2 = fig.add_subplot(1,3,2)
    ax3 = fig.add_subplot(1,3,3)
    
    fig
    
    ax1.set_ylabel('average')
    ax1.plot(np.mean(data, axis=0))
    
    ax2.set_ylabel('max')
    ax2.plot(np.max(data, axis=0))
    
    ax3.set_ylabel('min')
    ax3.plot(np.min(data, axis=0))
    
    fig.tight_layout()
    
plt.show()

inflammation-12.csv


<IPython.core.display.Javascript object>

inflammation-11.csv


<IPython.core.display.Javascript object>

inflammation-10.csv


<IPython.core.display.Javascript object>

inflammation-09.csv


<IPython.core.display.Javascript object>

inflammation-08.csv


<IPython.core.display.Javascript object>

inflammation-07.csv


<IPython.core.display.Javascript object>

inflammation-06.csv


<IPython.core.display.Javascript object>

inflammation-05.csv


<IPython.core.display.Javascript object>

inflammation-04.csv


<IPython.core.display.Javascript object>

inflammation-03.csv


<IPython.core.display.Javascript object>

inflammation-02.csv


<IPython.core.display.Javascript object>

inflammation-01.csv


<IPython.core.display.Javascript object>

Sure enough,
the maxima of the first two data sets show exactly the same ramp as the first,
and their minima show the same staircase structure;
a different situation has been revealed in the third dataset,
where the maxima are a bit less regular, but the minima are consistently zero.


<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="fa fa-pencil"></span> Challenge: Plotting Differences</h2>
</div>


<div class="panel-body">

<p>Plot the difference between the average of the first dataset
and the average of the second dataset,
i.e., the difference between the leftmost plot of the first two figures.</p>

</div>

</section>



<section class="solution panel panel-primary">
<div class="panel-heading">
<h2><span class="fa fa-eye"></span> Solution</h2>
</div>

</section>



<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="fa fa-pencil"></span> Challenge: Generate Composite Statistics</h2>
</div>


<div class="panel-body">

<p>Use each of the files once to generate a dataset containing values averaged over all patients:</p>

</div>

</section>


Then use pyplot to generate average, max, and min for all patients.


<section class="solution panel panel-primary">
<div class="panel-heading">
<h2><span class="fa fa-eye"></span> Solution</h2>
</div>

</section>


---
The material in this notebook is derived from the Software Carpentry lessons
&copy; [Software Carpentry](http://software-carpentry.org/) under the terms
of the [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.