## Creating Functions

If we only had one data set to analyze,
it would probably be faster to load the file into a spreadsheet
and use that to plot some simple statistics.
But we have twelve files to check,
and may have more in future.
In this lesson,
we'll learn how to write a function
so that we can repeat several operations with a single command.

#### Objectives

*   Define a function that takes parameters.
*   Return a value from a function.
*   Test and debug a function.
*   Explain what a call stack is, and trace changes to the call stack as functions are called.
*   Set default values for function parameters.
*   Explain why we should divide programs into small, single-purpose functions.

### Defining a Function

Let's start by defining a function `fahr_to_kelvin` that converts temperatures from Fahrenheit to Kelvin.

The definition opens with the word `def`,
which is followed by the name of the function
and a parenthesized list of parameter names.
The [body](./gloss.html#function-body) of the function&mdash;the
statements that are executed when it runs&mdash;is indented below the definition line,
typically by four spaces.

When we call the function,
the values we pass to it are assigned to those variables
so that we can use them inside the function.
Inside the function,
we use a [return statement](./gloss.html#return-statement) to send a result back to whoever asked for it.


By the way in case you forgot the formula, here it is: $T_{(K)} = (T_{(°F)}-32) \times (5/9) + 273.15$

Let's try running our function.
Calling our own function is no different from calling any other function:

It works! (By the way, this simple program would have had a bug in Python 2, but we're using Python 3, so no worries. If you keep programming you'll have lots more opportunities to encounter bugs.)

### Composing Functions

Now that we've seen how to turn Fahrenheit into Kelvin,
it's easy to turn Kelvin into Celsius.  If you know the formula.  But that's simple, just subtract 273.15 from the temperature in Kelvin.

What about converting Fahrenheit to Celsius?
We could write out the formula,
but we don't need to.
Instead,
we can [compose](./gloss.html#function-composition) the two functions we have already created:

This is our first taste of how larger programs are built:
we define basic operations,
then combine them in ever-large chunks to get the effect we want.
Real-life functions will usually be larger than the ones shown here&mdash;typically half a dozen to a few dozen lines&mdash;but
they shouldn't ever be much longer than that,
or the next person who reads it won't be able to understand what's going on.

#### Challenges

1.  "Adding" two strings produces their concatention:
    `'a' + 'b'` is `'ab'`.
    Write a function called `fence` that takes two parameters called `original` and `wrapper`
    and returns a new string that has the wrapper character at the beginning and end of the original:

    ~~~python
    print(fence('name', '*'))
    *name*
    ~~~

1.  If the variable `s` refers to a string,
    then `s[0]` is the string's first character
    and `s[-1]` is its last.
    Write a function called `outer`
    that returns a string made up of just the first and last characters of its input:

    ~~~python
    print(outer('helium'))
    hm
    ~~~

### Some thoughts on encapsulation

[Encapsulation](./gloss.html#encapsulation)
is the key to writing correct, comprehensible, programs.
A function's job is to turn several operations into one
so that we can think about a single function call
instead of a dozen or a hundred statements
each time we want to do something.
That only works if functions don't interfere with each other;
if they do,
we have to pay attention to the details once again,
which quickly overloads our short-term memory.

Let's take a closer look at what happens when we call `fahr_to_celsius(32.0)`.
To make things clearer,
we'll start by putting the initial value 32.0 in a variable
and store the final result in one as well:

If we try to get the value of `temp` after our functions have finished running,
Python tells us that there's no such thing.  Go ahead, try it.

Why go to all this trouble?
Well, now let's consider a function called `span` that calculates the difference between
the mininum and maximum values in an array.  You can write this using the functions we learned about in the previous notebook.  Rather than just returning the difference between the `max` and `min` directly, for the sake of argument, please assign it to a variable called `diff` and return that.  Oh, and don't forget to import the relevant library for working with arrays!

Once you've defined the function, load up our good old `inflammation-01.csv` data set and test it out.

But what would have happened if we decided, just to be annoying, to load the data file into a variable called `diff`?  Try it that way.

No problem!  Did you think you'd see an error here?

This `diff` doesn't refer to the same thing as the one inside `span`.
Of course, we didn't *need* to use the name `diff` here, but the point is that we could get away with it.  The moral of the story is that functions shield their contents from the outside world.

#### Challenges

1.  We previously wrote functions called `fence` and `outer`.
    Draw a diagram showing how the call stack changes when we run the following:
    
    ~~~python
    print(outer(fence('carbon', '+')))
    ~~~

### Testing and Documenting

Once we start putting things in functions so that we can re-use them,
we need to start testing that those functions are working correctly.
To see how to do this,
let's write a function to center a dataset around a particular value.
Thanks to operator overloading, you can do arithmetic that combines arrays and numbers and it will just work.  Hint: subtract the mean of the data to center around zero, then add the desired value to center around that.

We could test this on some of our actual data,
but since we don't know what the values ought to be,
it will be hard to tell if the result was correct.
Instead,
let's use NumPy to create a matrix of 0's
and then center that around 3.  Numpy has a method called `zeros` that will work for this.  It takes a two-tuple that specifies the dimensions of the matrix to fill with zeros.  Did you know you can also spell "zeros" as "zeroes"?   I can't promise you python would like the alternative spelling, however.

That looks right,
so let's try `center` to center our data around `0`:

It's hard to tell from the default output whether the result is correct,
but there are a few simple tests that could reassure us.  Let's compare the `min`, `mean`, and `max` of the original data and the new centered data.

It looks like the original `mean` was about `6.1`, while the original `min` was `0`.
So, it makes sense that the original `mean` becomes the centred `min`.
The mean of the centered data isn't *quite* zero&mdash;we'll explore why not in the challenges&mdash;but it is really close.
We can take the check one step further and see that the standard deviation hasn't changed:

Those values look the same,
but we probably wouldn't notice if they were different in the sixth decimal place.
Let's print their difference instead:

Again,
the difference is very small, if not exactly zero :-)

Perhaps our function is wrong,
but it seems unlikely, so we should probably get back to doing our analysis.

We have one more task first, though:
we should write some [documentation](./gloss.html#documentation) for our function
to remind ourselves later what it's for and how to use it.

One way to put documentation in software is to add [comments](./gloss.html#comment).  In Python, comments begin with a `#` symbol (sometimes referred to as an "octothorpe", commonly known as a "hash").  Try redefining the function with some commented documentation now.

There's a better way, though.
If the first "thing" in a function is a string that isn't assigned to a variable,
that string is attached to the function as its documentation.  Try defining the function that way.

This is better because we can now ask Python's built-in help system to show us the documentation for the function:

A string like the one we've created is called a [docstring](./gloss.html#docstring).
If we write the string with **triple quotes**, then we're allowed to have it span multiple lines.  Try that now, and include an example in the docstring.  Call the `help` function again to inspect the results of your efforts.

#### Challenges

1.  Write a function called `analyze` that takes a filename as a parameter
    and displays the three graphs produced in the [previous lesson](01-numpy.ipynb),
    i.e.,
    `analyze('inflammation-01.csv')` should produce the graphs already shown,
    while `analyze('inflammation-02.csv')` should produce corresponding graphs for the second data set.
    Be sure to give your function a docstring.

2.  Write a function `rescale` that takes an array as input
    and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0.
    (If $L$ and $H$ are the lowest and highest values in the original array,
    then the replacement for a value $v$ should be $(v-L) / (H-L)$.)
    Be sure to give the function a docstring.

3.  Run the commands `help(numpy.arange)` and `help(numpy.linspace)`
    to see how to use these functions to generate regularly-spaced values,
    then use those values to test your `rescale` function.

### Defining Defaults

We have passed parameters to functions in two ways:
directly, as in `span(data)`,
and by name, as in `numpy.loadtxt(fname='something.csv', delimiter=',')`.
In fact, if you haven't noticed this already, you'll be interested to know that
we can pass the filename to `loadtxt` without the `fname=`:

but we still need to specify the `delimiter=`, or we get an error.  Go ahead and try that.

To understand what's going on,
and make our own functions easier to use,
let's re-define our `center` function so that the second parameter, `desired`, has a default value of 0.

Now, if we call the function with two arguments,
it works as it did before:

But we can also now call it with just one parameter,
in which case `desired` is automatically assigned the [default value](./gloss.html#default-parameter-value) of 0.0:

This is handy:
if we usually want a function to work one way,
but occasionally need it to do something else,
we can allow people to pass a parameter when they need to
but provide a default to make the normal case easier.
Can you create some test data that will show how the default behavior works?

The parameters are matched up from left to right,
and any that haven't been given a value explicitly get their default value.
Now override the default behavior by specifying a particular value when you call the function.

With these concepts in hand,
let's look at the help for `numpy.loadtxt`:

There's a lot of information here,
but the most important part is the first couple of lines:

~~~python
loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None,
        unpack=False, ndmin=0)
~~~

This tells us that `loadtxt` has one parameter called `fname` that doesn't have a default value,
and eight others that do.
If we call the function like this:

~~~python
numpy.loadtxt('inflammation-01.csv', ',')
~~~

then the filename is assigned to `fname` (which is what we want),
but the delimiter string `','` is assigned to `dtype` rather than `delimiter`,
because `dtype` is the second parameter in the list.
That's why we don't have to provide `fname=` for the filename,
but *do* have to provide `delimiter=` for the second parameter.

### Debugging a program

Perhaps you're sad that we didn't get to see the Python 2 bug that was mentioned way back at the beginning of this file.  Well to make up for that, here's another program with a bug in it.

``` python
def foo(bar=[]):           # bar is optional and defaults to [] if not specified
       bar.append("baz")
       return bar
```

How do you expect the function to work?  Do you want to give it a try?  Go ahead and enter the definition here, and then test it out a bit.

Did that go as expected?  Will it work the same way every time?  Go on, give it another try.

Well, it sort of works the same way every time, but this probably isn't what you were expecting.  Can you fix it?  Hint: the default value for a function argument is only evaluated once, at the time that the function is defined.  Another hint: try using the default setting `bar=None` this time.  You will need an `if` statement.  But if you're not familiar with `if`, don't worry, we will have more practice with `if` statements in another notebook later on, and you can come back and finish this exercise later.

#### Challenges

1.  Rewrite the `rescale` function so that it scales data to lie between 0.0 and 1.0 by default,
    but will allow the caller to specify lower and upper bounds if they want.
    Compare your implementation to your neighbor's:
    do the two functions always behave the same way?

#### Key Points

*   Define a function using `def name(...params...)`.
*   The body of a function must be indented.
*   Call a function using `name(...values...)`.
*   Functions can be composed.
*   Here's the technical idea behind encapsulation: each time a function is called, a new stack frame is created on the [call stack](./gloss.html#call-stack) to hold its parameters and local variables.
 *   Python then looks for variables in the current stack frame before looking for them at the top level.
*   Use `help(thing)` to view help for something.
*   Put docstrings in functions to provide help for that function.
*   Specify default values for parameters when defining a function using `name=value` in the parameter list.
*   Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used).
 * Be careful with defaults, keeping in mind the default value for a function argument is only evaluated once, at the time that the function is defined.

#### Next Steps

We now have a function called `analyze` to visualize a single data set.
We could use it to explore all 12 of our current data sets like this:

~~~python
analyze('inflammation-01.csv')
analyze('inflammation-02.csv')
...
analyze('inflammation-12.csv')
~~~

but the chances of us typing all 12 filenames correctly aren't great,
and we'll be even worse off if we get another hundred files.
What we need is a way to tell Python to do something once for each file,
and that will be the subject of the next lesson.