# Slack

https://openriskgroup.slack.com

# Online tutorials

https://www.codecademy.com/learn/python

http://pythontutor.ru/lessons/inout_and_arithmetic_operations/

## Functions

Functions are the primary and most important method of code organization and reuse
in Python. There may not be such a thing as having too many functions. In fact, I would
argue that most programmers doing data analysis don’t write enough functions! As you
have likely inferred from prior examples, functions are declared using the def keyword
and returned from using the return keyword:

In [None]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

There is no issue with having multiple return statements. If the end of a function is
reached without encountering a return statement, None is returned.
Each function can have some number of positional arguments and some number of
keyword arguments. Keyword arguments are most commonly used to specify default
values or optional arguments. In the above function, x and y are positional arguments
while z is a keyword argument. This means that it can be called in either of these
equivalent ways:

In [None]:
my_function(5, 6, z=0.7)
my_function(3.14, 7, 3.5)

The main restriction on function arguments it that the keyword arguments must follow
the positional arguments (if any). You can specify keyword arguments in any order;
this frees you from having to remember which order the function arguments were
specified in and only what their names are.

### Namespaces, scope, and local functions

Functions can access variables in two different scopes: global and local. An alternate
and more descriptive name describing a variable scope in Python is a namespace. Any
variables that are assigned within a function by default are assigned to the local name-
space. The local namespace is created when the function is called and immediately
populated by the function’s arguments. After the function is finished, the local name-
space is destroyed (with some exceptions, see section on closures below). Consider the
following function:

In [None]:
def func():
    a = []
    for i in range(5):
        a.append(i)

Upon calling **func()** , the empty list **a** is created, 5 elements are appended, then **a** is
destroyed when the function exits. Suppose instead we had declared **a**

In [None]:
a = []
def func():
    for i in range(5):
        a.append(i)


Assigning global variables within a function is possible, but those variables must be
declared as global using the global keyword:

In [None]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print a

I generally discourage people from using the global keyword frequently.
Typically global variables are used to store some kind of state in a sys-
tem. If you find yourself using a lot of them, it’s probably a sign that
some object-oriented programming (using classes) is in order.

Functions can be declared anywhere, and there is no problem with having local func-
tions that are dynamically created when a function is called:

In [None]:
def outer_function(x, y, z):
    def inner_function(a, b, c):
        pass
    pass

In the above code, the **inner_function** will not exist until **outer_function** is called. As
soon as **outer_function** is done executing, the **inner_function** is destroyed.
Nested inner functions can access the local namespace of the enclosing function, but
they cannot bind new variables in it. I’ll talk a bit more about this in the section on
closures.
In a strict sense, all functions are local to some scope, that scope may just be the module
level scope.

### Returning multiple values

When I first programmed in Python after having programmed in Java and C++, one of
my favorite features was the ability to return multiple values from a function. Here’s a
simple example:

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

In data analysis and other scientific applications, you will likely find yourself doing this
very often as many functions may have multiple outputs, whether those are data struc-
tures or other auxiliary data computed inside the function. If you think about tuple
packing and unpacking from earlier in this chapter, you may realize that what’s hap-
pening here is that the function is actually just returning one object, namely a tuple,
which is then being unpacked into the result variables. In the above example, we could
have done instead:

In [None]:
return_value = f()

In this case, return_value would be, as you may guess, a 3-tuple with the three returned
variables. A potentially attractive alternative to returning multiple values like above
might be to return a dict instead:

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}

### Functions are objects

Since Python functions are objects, many constructs can be easily expressed that are
difficult to do in other languages. Suppose we were doing some data cleaning and
needed to apply a bunch of transformations to the following list of strings:

In [None]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south   carolina##', 'West virginia?']

Anyone who has ever worked with user-submitted survey data can expect messy results
like these. Lots of things need to happen to make this list of strings uniform and ready
for analysis: whitespace stripping, removing punctuation symbols, and proper capital-
ization. As a first pass, we might write some code like:

In [None]:
import re  # Regular expression module

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value) # remove punctuation
        value = value.title()
        result.append(value)
    return result

The result looks like this:

In [None]:
In [15]: clean_strings(states)
Out[15]:
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

An alternate approach that you may find useful is to make a list of the operations you
want to apply to a particular set of strings:

In [None]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

Then we have

In [None]:
In [22]: clean_strings(states, clean_ops)
Out[22]:
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

A more *functional* pattern like this enables you to easily modify how the strings are
transformed at a very high level. The **clean_strings** function is also now more reusable!
You can naturally use functions as arguments to other functions like the built-in map
function, which applies a function to a collection of some kind:

In [None]:
In [23]: map(remove_punctuation, states)
Out[23]:
['   Alabama ',
 'Georgia',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south   carolina',
 'West virginia']

### Anonymous (lambda) functions

Python has support for so-called anonymous or lambda functions, which are really just
simple functions consisting of a single statement, the result of which is the return value.
They are defined using the lambda keyword, which has no meaning other than “we are
declaring an anonymous function.”

In [None]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

I usually refer to these as lambda functions in the rest of the book. They are especially
convenient in data analysis because, as you’ll see, there are many cases where data
transformation functions will take functions as arguments. It’s often less typing (and
clearer) to pass a lambda function as opposed to writing a full-out function declaration
or even assigning the lambda function to a local variable. For example, consider this
silly example:

In [None]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

You could also have written __[x * 2 for x in ints]__ , but here we were able to succintly
pass a custom operator to the __apply_to_list__ function.
As another example, suppose you wanted to sort a collection of strings by the number
of distinct letters in each string:

In [None]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

Here we could pass a lambda function to the list’s **sort** method:

In [None]:
strings.sort(key=lambda x: len(set(list(x))))
strings

One reason lambda functions are called anonymous functions is that
the function object itself is never given a name attribute.

### Closures: functions that return functions

Closures are nothing to fear. They can actually be a very useful and powerful tool in
the right circumstance! In a nutshell, a closure is any dynamically-generated function
returned by another function. The key property is that the returned function has access
to the variables in the local namespace where it was created. Here is a very simple
example:

In [None]:
def make_closure(a):
    def closure():
        print('I know the secret: %d' % a)
    return closure

closure = make_closure(5)

In [None]:
closure()

The difference between a closure and a regular Python function is that the closure
continues to have access to the namespace (the function) where it was created, even
though that function is done executing. So in the above case, the returned closure will
always print **I know the secret: 5** whenever you call it. While it’s common to create
closures whose internal state (in this example, only the value of a ) is static, you can just
as easily have a mutable object like a dict, set, or list that can be modified. For example,
here’s a function that returns a function that keeps track of arguments it has been called
with:

In [None]:
def make_watcher():
    have_seen = {}

    def has_been_seen(x):
        if x in have_seen:
            return True
        else:
            have_seen[x] = True
            return False

    return has_been_seen

Using this on a sequence of integers I obtain:

In [None]:
watcher = make_watcher()
vals = [5, 6, 1, 5, 1, 6, 3, 5]
[watcher(x) for x in vals]

However, one technical limitation to keep in mind is that while you can mutate any
internal state objects (like adding key-value pairs to a dict), you cannot bind variables
in the enclosing function scope. One way to work around this is to modify a dict or list
rather than binding variables:

In [None]:
def make_counter():
    count = [0]
    def counter():
        # increment and return the current count
        count[0] += 1
        return count[0]
    return counter

counter = make_counter()

You might be wondering why this is useful. In practice, you can write very general
functions with lots of options, then fabricate simpler, more specialized functions.
Here’s an example of creating a string formatting function:

In [None]:
def format_and_pad(template, space):
    def formatter(x):
        return (template % x).rjust(space)

    return formatter

You could then create a floating point formatter that always returns a length-15 string
like so:

In [None]:
fmt = format_and_pad('%.4f', 15)
fmt(1.756)

If you learn more about object-oriented programming in Python, you might observe
that these patterns also could be implemented (albeit more verbosely) using classes.

### Extended call syntax with *args, **kwargs

The way that function arguments work under the hood in Python is actually very sim-
ple. When you write **func(a, b, c, d=some, e=value)** , the positional and keyword
arguments are actually packed up into a tuple and dict, respectively. So the internal
function receives a tuple args and dict kwargs and internally does the equivalent of:

In [None]:
a, b, c = args
d = kwargs.get('d', d_default_value)
e = kwargs.get('e', e_default_value)

This all happens nicely behind the scenes. Of course, it also does some error checking
and allows you to specify some of the positional arguments as keywords also (even if
they aren’t keyword in the function declaration!).

In [None]:
def say_hello_then_call_f(f, *args, **kwargs):
    print 'args is', args
    print 'kwargs is', kwargs
    print("Hello! Now I'm going to call %s" % f)
    return f(*args, **kwargs)

def g(x, y, z=1):
    return (x + y) / z

Then if we call __g__ with __say_hello_then_call_f__ we get:

In [None]:
say_hello_then_call_f(g, 1, 2, z=5.)

In [None]:
In [8]:  say_hello_then_call_f(g, 1, 2, z=5.)
args is (1, 2)
kwargs is {'z': 5.0}
Hello! Now I'm going to call <function g at 0x2dd5cf8>
Out[8]: 0.6

### Currying: partial argument application

Currying is a fun computer science term which means deriving new functions from
existing ones by partial argument application. For example, suppose we had a trivial
function that adds two numbers together:

In [None]:
def add_numbers(x, y):
    return x + y

Using this function, we could derive a new function of one variable, add_five , that adds
5 to its argument:

In [None]:
add_five = lambda y: add_numbers(5, y)

The second argument to add_numbers is said to be curried. There’s nothing very fancy
here as we really only have defined a new function that calls an existing function. The
built-in functools module can simplify this process using the partial function:

In [None]:
from functools import partial
add_five = partial(add_numbers, 5)

When discussing pandas and time series data, we’ll use this technique to create speci-
alized functions for transforming data series

In [None]:
# compute 60-day moving average of time series x
ma60 = lambda x: pandas.rolling_mean(x, 60)

# Take the 60-day moving average of of all time series in data
data.apply(ma60)

### Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file,
is an important Python feature. This is accomplished by means of the iterator proto-
col, a generic way to make objects iterable. For example, iterating over a dict yields the
dict keys:

In [None]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print key,

When you write **for key in some_dict** , the Python interpreter first attempts to create
an iterator out of **some_dict** :

In [None]:
dict_iterator = iter(some_dict)
dict_iterator

Any iterator is any object that will yield objects to the Python interpreter when used in
a context like a for loop. Most methods expecting a list or list-like object will also accept
any iterable object. This includes built-in methods such as min , max , and sum , and type
constructors like list and tuple :

In [None]:
list(dict_iterator)

A generator is a simple way to construct a new iterable object. Whereas normal func-
tions execute and return a single value, generators return a sequence of values lazily,
pausing after each one until the next one is requested. To create a generator, use the
yield keyword instead of return in a function:

In [None]:
def squares(n=10):
    for i in xrange(1, n + 1):
        print 'Generating squares from 1 to %d' % (n ** 2)
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [None]:
In [2]: gen = squares()

In [3]: gen
Out[3]: <generator object squares at 0x34c8280>

It is not until you request elements from the generator that it begins executing its code:

In [None]:
In [4]: for x in gen:
   ...:     print x,
   ...:
Generating squares from 0 to 100
1 4 9 16 25 36 49 64 81 100

As a less trivial example, suppose we wished to find all unique ways to make change
for $1 (100 cents) using an arbitrary set of coins. You can probably think of various
ways to implement this and how to store the unique combinations as you come up with
them. One way is to write a generator that yields lists of coins (represented as integers):

In [None]:
def make_change(amount, coins=[1, 5, 10, 25], hand=None):
    hand = [] if hand is None else hand
    if amount == 0:
        yield hand
    for coin in coins:
        # ensures we don't give too much change, and combinations are unique
        if coin > amount or (len(hand) > 0 and hand[-1] < coin):
            continue

        for result in make_change(amount - coin, coins=coins,
                                  hand=hand + [coin]):
            yield result

The details of the algorithm are not that important (can you think of a shorter way?).
Then we can write:

In [None]:
for way in make_change(100, coins=[10, 25, 50]):
    print way
len(list(make_change(100)))

#### Generator expresssions

A simple way to make a generator is by using a generator expression. This is a generator
analogue to list, dict and set comprehensions; to create one, enclose what would other-
wise be a list comprehension with parenthesis instead of brackets:

In [None]:
gen = (x ** 2 for x in xrange(100))
gen

This is completely equivalent to the following more verbose generator:

In [None]:
def _make_gen():
    for x in xrange(100):
        yield x ** 2
gen = _make_gen()

Generator expressions can be used inside any Python function that will accept a gen-
erator:

In [None]:
sum(x ** 2 for x in xrange(100))
dict((i, i **2) for i in xrange(5))

#### itertools module

The standard library itertools module has a collection of generators for many common
data algorithms. For example, groupby takes any sequence and a function; this groups
consecutive elements in the sequence by return value of the function. Here’s an exam-
ple:

In [None]:
import itertools
first_letter = lambda x: x[0]

names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

for letter, names in itertools.groupby(names, first_letter):
    print letter, list(names) # names is a generator

See Table A-4 for a list of a few other itertools functions I’ve frequently found useful.

Function | Description
--- | --- 
imap(func, *iterables) | Generator version of the built-in map ; applies func to each zipped tuple of the passed sequences.
ifilter(func, iterable) | Generator version of the built-in filter ; yields elements x for which func(x) is True .
combinations(iterable, k) |  Generates a sequence of all possible k -tuples of elements in the iterable, ignoring order.
permutations(iterable, k) |  Generates a sequence of all possible k -tuples of elements in the iterable, respecting order.
groupby(iterable[, keyfunc]) |  Generates (key, sub-iterator) for each unique key

In Python 3, several built-in functions ( zip, map, filter ) producing
lists have been replaced by their generator versions found in itertools
in Python 2.

## Files and the operating system

Most of this book uses high-level tools like pandas.read_csv to read data files from disk
into Python data structures. However, it’s important to understand the basics of how
to work with files in Python. Fortunately, it’s very simple, which is part of why Python
is so popular for text and file munging.
To open a file for reading or writing, use the built-in open function with either a relative
or absolute file path:

In [None]:
path = 'ch13/segismundo.txt'
f = open(path)

By default, the file is opened in read-only mode **'r'** . We can then treat the file handle
f like a list and iterate over the lines like so

In [None]:
for line in f:
    pass

The lines come out of the file with the end-of-line (EOL) markers intact, so you’ll often
see code to get an EOL-free list of lines in a file like

In [None]:
lines = [x.rstrip() for x in open(path)]
lines

If we had typed **f = open(path, 'w')** , a *new file* at **ch13/segismundo.txt** would have
been created, overwriting any one in its place. See below for a list of all valid file read/
write modes.

Table A-5. Python file modes

Mode | Description
--- | ---
r | Read-only mode
w | Write-only mode. Creates a new file (deleting any file with the same name)
a | Append to existing file (create it if it does not exist)
r+ | Read and write
b | Add to mode for binary files, that is 'rb' or 'wb'
U | Use universal newline mode. Pass by itself 'U' or appended to one of the read modes like 'rU'

To write text to a file, you can use either the file’s write or writelines methods. For
example, we could create a version of prof_mod.py with no blank lines like so:

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

open('tmp.txt').readlines()

See Table A-6 for many of the most commonly-used file methods.

Table A-6. Important Python file methods or attributes

Method | Description
--- | ---
read([size]) | Return data from file as a string, with optional size argument indicating the number of bytes to read
readlines([size]) | Return list of lines in the file, with optional size argument
readlines([size]) | Return list of lines (as strings) in the file
write(str) | Write passed string to file.
writelines(strings) | Write passed sequence of strings to the file.
close() | Close the handle
flush() | Flush the internal I/O buffer to disk
seek(pos) | Move to indicated file position (integer).
tell() | Return current file position as integer.
closed | True is the file is closed.

In [None]:
os.remove('tmp.txt')