[(previous)](02%20-%20-%20-%20Python%20-%20basics.ipynb) | [(index)](00%20-%20-%20-%20Introduction%20-%20to%20-%20Python.ipynb) | [(next)](04%20-%20-%20-%20Python%20-%20tools%20-%20for%20-%20data%20-%20analysis.ipynb)

# Python intermediate

<div class="alert alert-block alert-warning">
    <b>Learning outcomes:</b>
    <br>
    <ul>
        <li>Develop and use reusable code by encapsulating tasks in functions.</li>
        <li>Package functions into flexible and extensible classes.</li>
        <li>Apply closures and decorators to functions to modify function behaviour.</li>
    </ul>
</div>

The code from the [last module](02 - Python basics.ipynb) was limited to short snippets. Solving more complex problems means more complex code stretching over hundreds, to thousands, of lines, and - if you want to reuse that code - it's not convenient to copy and paste it multiple times. Worse, any error is magnified, and any changes become tedious to manage.

It would be far better to write a discrete module to contain that task, get it absolutely perfect, and call it whenever you want the same problem solved or task executed.

In software, this process of packaging up discrete functions into their own modular code is called "abstraction". A complete software system consists of a number of discrete modules all interacting to produce an integrated experience.

In Python, these modules are called `functions` and a complete suite of functions grouped around a set of related tasks is called a `library` or `module`. Libraries permit you to inherit a wide variety of powerful software solutions developed and maintained by other people.

Python is [open source](https://en.wikipedia.org/wiki/Open-source_software), which means that its source code is released under a licence which permits anyone to study, change, and distribute the software to anyone and for any purpose. Many of the most popular Python libraries are also open source. There are thousands of shared libraries for you to use, and - maybe, when you feel confident enough - to contribute to with your own code.

## Functions

Functions are callable modules of code, some with parameters or arguments (variables you can pass to the function), which performs a task and may return a value. They're a convenient way to package code into discrete blocks, making your overall program more readable, reusable, and saving time.

You can also easily share your functions with others, saving them time as well.
<br>
<div class="alert alert-block alert-info">
    <b>Syntax</b>
    <br>
    <ul>
        <li>You structure a function using `def`, like so:</li>
        `def function_name(parameters):
         code
         return response`
        <li>`return` is optional, but allows you to return the results of any task performed by the function to the point where the function was called</li>
        <li>To test whether an object is a function (i.e. callable), use `callable`, e.g. `callable(function)` will return `1`</li>
    </ul>
</div>

In [1]:
# A simple function with no arguments
def say_hello():
    print("Hello, World!")

# Calling it is as simple as this
say_hello()

# And you can test that it's a callable
print(callable(say_hello))

Hello, World!
True


An argument can be any variable, such as integers, strings, lists, dictionaries or even other functions. This is where you start realising the importance of leaving comments and explanations in your code because you need to ensure that anyone using a function knows what variables the function expects, and in what order.

Functions can also perform calculations and return these to whatever called them.

In [2]:
# A function with two string arguments
def say_hello_to_user(username, greeting):
    # Returns a greeting to a username
    print("Hello, {}! I hope you have a great {}.".format(username, greeting))

# Call it
say_hello_to_user("Jill", "day")

# Perform a calculation and return it
def sum_two_numbers(x, y):
    # Returns the sum of x + y
    return x + y

sum_two_numbers(5, 10)

Hello, Jill! I hope you have a great day.


15

You can see that swapping `username` and `greeting` in the `say_hello_to_user` function would be confusing, but swapping the numbers in `sum_two_numbers` wouldn't cause a problem.

Not only can you call functions from functions, but you can also create variables that are functions, or the result of functions.

In [3]:
def number_powered(number, exponent):
    # Returns number to the power of exponent
    return number ** exponent

# Jupyter keeps functions available that were called in other cells
# This means `sum_two_numbers` is still available
def sum_and_power(number1, number2, exponent):
    # Returns two numbers summed, and then to an exponent
    summed = sum_two_numbers(number1, number2)
    return number_powered(summed, exponent)

# Call `sum_and_power`
print(sum_and_power(2, 3, 4))

625


With careful naming, and plenty of commentary, you can see how you can make your code extremely readable and self-explanatory.

A better way of writing comments in functions is called `docstrings`. 
<div class="alert alert-block alert-info">
    <b>Syntax</b>
    <br>
    <ul>
        <li>Docstrings are written as structured text between three sets of inverted commas, e.g. `""" This is a docstring """`</li>
        <li>You can access a function's docstring by calling `function.__doc__`</li>
    </ul>
</div>

In [4]:
def docstring_example():
    """
    An example function which returns `True`.
    """
    return True

# Printing the docstring
print(docstring_example.__doc__)

# Calling it
print(docstring_example())


    An example function which returns `True`.
    
True


## Classes and Objects

A complete Python object is an encapsulation of both variables and functions into a single entity. Objects get their variables and functions from `classes`.

Classes are where most of the action happens in Python and coding consists, largely, of producing and using classes to perform tasks. 

A very basic class would look like this:

In [5]:
class myClass:
    """
    A demonstration class.
    """
    my_variable = "Look, a variable!"
    
    def my_function(self):
        """
        A demonstration class function.
        """
        return "I'm a class function!"

# You call a class by creating a new class object
new_class = myClass()

# You can access class variables or functions with a dotted call, as follows
print(new_class.my_variable)
print(new_class.my_function())

# Access the class docstrings
print(myClass.__doc__)
print(myClass.my_function.__doc__)

Look, a variable!
I'm a class function!

    A demonstration class.
    

        A demonstration class function.
        


Let's unpack the new syntax.
<div class="alert alert-block alert-info">
    <b>Syntax</b>
    <br>
    <ul>
        <li>You instantiate a class by calling it as `class()`. If you only called `class`, without the brackets, you'd gain access to the object itself. This is useful as well, and means you can pass classes around as you would variables.</li>
        <li>All variables and functions of a class are reached via the dotted call, `.function()` or `.variable`. You can even add new functions and variables to a class you created. Remember, though, these will not exist in new classes you create since you haven't changed the underlying code.</li>
        <li>Functions within a class require a base argument that, by convention, is called `self`. There's a complex explanation as to why `self` is needed, but - briefly - think of it as the instance of the object itself. So, inside the class, `self.function` is the way the class calls its component functions.</li>
        <li>You can also access the docstrings as you would before.</li>
    </ul>
</div>

In [6]:
# Add a new variable to a class instance
new_class1 = myClass()
new_class1.my_variable2 = "Hi, Bob!"
print(new_class1.my_variable2, new_class1.my_variable)

# But, trying to access my_variable2 in new_class causes an error
print(new_class.my_variable2)

Hi, Bob! Look, a variable!


AttributeError: 'myClass' object has no attribute 'my_variable2'

Classes can initialise themselves with a set of available variables. This makes the `self` referencing more explicit, and also permits you to pass arguments to your class to set the initial values.
<div class="alert alert-block alert-info">
    <b>Syntax</b>
    <br>
    <ul>
        <li>Initialise a class with the special function `def __init__(self)`</li>
        <li>Pass arguments to your functions with `__init__(self, arguments)`</li>
        <li>We can also differentiate between arguments, and keyword arguments:</li>
        <ul>
            <li>**arguments**: these are passed in the usual way, as a single term, e.g. `my_function(argument)`.</li>
            <li>**keyword arguments**: these are passed the way you would think of a dictionary, e.g. `my_function(keyword_argument = value)`. This is also a way to initialise an argument with a default. If you leave out the argument when it has a default it will apply without the function failing.</li>
            <li>Functions often need to have numerous arguments and keyword arguments passed to them, and this can get messy. You can also think of a list of arguments like a list, and a list of keyword arguments like a dictionary. A tidier way to deal with this is to reference your arguments and keyword arguments like this, `my_function(*args, **kwargs)` where `*args` will be available to the function as an ordered list, and `**kwargs` as a dictionary.</li>
        </ul>
    </ul>
</div>

In [7]:
# A demonstration of all these new concepts

class demoClass:
    """
    A demonstration class with an __init__ function, and a function that takes args and kwargs.
    """
    
    def __init__(self, argument = None):
        """
        A function that is called automatically when the demoClass is initialised.
        """
        self.demo_variable = "Hello, World!"
        self.initial_variable = argument
        
    def demo_class(self, *args, **kwargs):
        """
        A demo class that loops through any args and kwargs provided and prints them.
        """
        for i, a in enumerate(args):
            print("Arg {}: {}".format(i+1, a))
        for k, v in kwargs.items():
            print("{} - {}".format(k, v))
        if kwargs.get(self.initial_variable):
            print(self.demo_variable)
        return True

demo1 = demoClass()
demo2 = demoClass("Bob")

# What was initialised in each demo object?
print(demo1.demo_variable, demo1.initial_variable)
print(demo2.demo_variable, demo2.initial_variable)

# A demo of passing arguments and keyword arguments
args = ["Alice", "Bob", "Carol", "Dave"]
kwargs = {"Alice": "Engineer",
          "Bob": "Consultant",
          "Carol": "Lawyer",
          "Dave": "Doctor"
         }
demo2.demo_class(*args, **kwargs)

Hello, World! None
Hello, World! Bob
Arg 1: Alice
Arg 2: Bob
Arg 3: Carol
Arg 4: Dave
Alice - Engineer
Bob - Consultant
Carol - Lawyer
Dave - Doctor
Hello, World!


True

Using `*args` and `**kwargs` in your function calls while you're developing makes it easier to change your code without having to go back through every line of code that calls your function and bug-fix when you change the order or number of arguments you're calling. 

This reduces errors, improves readability, and makes for a more enjoyable and efficient coding experience.

At this stage, you've learned the fundamental syntax, as well as how to create modular code. Now we need to make our code reusable and shareable.

## Modules and Packages

A module in Python is a set of classes or functions that encapsulate a single, and related, set of tasks. Packages are a set of modules collected together into a single focused unit. This can also be called a library.

Creating a module is as simple as saving your class code in a file with the `.py` extension (much as a text file ends `.txt`).

### Writing modules

A set of modules in a library require have a specific set of requirements. Imagine we wish to develop a ping pong game. We can place the game logic in one module, and the functionality for drawing the game in another. That leads to a folder with the following file structure:

    pingpong/
    pingpong/game.py
    pingpong/draw.py

Within each file will be a set of functions. Assume that, within `draw.py` there is a function called `draw_game`. If you wanted to import the `draw_game` function into the `game.py` file, the convention is as follows:

    import draw
    
This will import everything in the `draw.py` file. After that, you access functions from the file by making calls to, for example, `draw.draw_game`.

Or, you can access each function directly and only import what you need (since some files can be extremely large and you don't necessarily wish to import everything):

    from draw import draw_game
    
You're not always going to want to run programs from an interpreter (like Jupyter Notebook). When you run a program directly from the command-line, you need a special function called `main`, which is then executed as follows:

    if __name__ == '__main__':
        main()
        
Putting that together, the syntax for calling `game.py` from the command-line would be:

    # game.py
    # Import the draw_game function from draw.py
    from draw import draw_game

    def play_game():
        ...

    def main():
        result = play_game()
        draw_game(result)

    # If this script is executed, then main() will be executed
    if __name__ == '__main__':
        main()

<div class="alert alert-block alert-info">
    <b>Syntax</b>
    <br>
    <ul>
        <li>Python functions and classes can be saved for reuse into files with the extension `.py`</li>
        <li>You can import the functions from those files using either `import filename` (without the `.py` extension), or specific functions or classes from that file with `from filename import class, function1, function2`</li>
        <li>You may notice that, after you run your program, Python automatically creates a file with the same name, but with `.pyc` as an extension. This is a compiled version of the file and happens automatically.</li>
        <li>If you intend to run a file from the command-line, you must insert a `main` function and call it as follows</li>
        `if __name__ == '__main__':
            main()`
        <li>If a module has a large number of functions you intend to use throughout your own code, then you can specify a custom name for use. For example, a module we'll learn about in the next section is called `pandas`. Convention is to import it as `import pandas as pd`. Now you'd access the functions in `pandas` using the dot notation of `pd.function`</li>
        <li>You can also import modules based on logical conditions. If you then import these options under the same name, your code isn't effected by logical outcomes</li>        
    </ul>
</div>

Putting all of this together in a pseudocode example (i.e this code doesn't work, so don't try executing it):

In [8]:
# game.py
# Import the draw module
if visual_mode:
    # in visual mode, we draw using graphics
    import draw_visual as draw
else:
    # In textual mode, we print out text
    import draw_textual as draw

def main():
    result = play_game()
    # this can either be visual or textual depending on visual_mode
    draw.draw_game(result)

NameError: name 'visual_mode' is not defined

See, pseudocode, it will break ... Note, though, that this shows how "safe" it is to experiment with code snippets in Jupyter Notebook. No harm done.

## Built-in modules

There are a vast range of built-in modules. Jupyter Notebook comes with an even larger list of third-party modules you can explore.

<div class="alert alert-block alert-info">
    <b>Syntax</b>
    <br>
    <ul>
        <li>After you've imported a module, `dir(module)` lets you see a list of all the functions implemented in that library.</li>
        <li>You can also read the help from the module docstrings with `help(module)`</li>
    </ul>
</div>

Let's explore a module you'll be using and learning about in future sessions of this course, `pandas`.

In [9]:
import pandas as pd

help(pd)

Help on package pandas:

NAME
    pandas

DESCRIPTION
    pandas - a powerful data analysis and manipulation library for Python
    
    **pandas** is a Python package providing fast, flexible, and expressive data
    structures designed to make working with "relational" or "labeled" data both
    easy and intuitive. It aims to be the fundamental high-level building block for
    doing practical, **real world** data analysis in Python. Additionally, it has
    the broader goal of becoming **the most powerful and flexible open source data
    analysis / manipulation tool available in any language**. It is already well on
    its way toward this goal.
    
    Main Features
    -------------
    Here are just a few of the things that pandas does well:
    
      - Easy handling of missing data in floating point as well as non-floating
        point data
      - Size mutability: columns can be inserted and deleted from DataFrame and
        higher dimensional objects
      - Automatic and

In [10]:
dir(pd)

['Categorical',
 'CategoricalIndex',
 'DataFrame',
 'DateOffset',
 'DatetimeIndex',
 'ExcelFile',
 'ExcelWriter',
 'Expr',
 'Float64Index',
 'Grouper',
 'HDFStore',
 'Index',
 'IndexSlice',
 'Int64Index',
 'Interval',
 'IntervalIndex',
 'MultiIndex',
 'NaT',
 'Panel',
 'Panel4D',
 'Period',
 'PeriodIndex',
 'RangeIndex',
 'Series',
 'SparseArray',
 'SparseDataFrame',
 'SparseList',
 'SparseSeries',
 'Term',
 'TimeGrouper',
 'Timedelta',
 'TimedeltaIndex',
 'Timestamp',
 'UInt64Index',
 'WidePanel',
 '_DeprecatedModule',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__docformat__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_hashtable',
 '_lib',
 '_libs',
 '_np_version_under1p10',
 '_np_version_under1p11',
 '_np_version_under1p12',
 '_np_version_under1p13',
 '_np_version_under1p14',
 '_np_version_under1p15',
 '_tslib',
 '_version',
 'api',
 'bdate_range',
 'compat',
 'concat',
 'core',
 'crosstab',
 'cut',
 'date_range',
 'dateti

## Writing packages

Packages are libraries containing multiples modules and files. They are stored in directories and have one important requirement: each package is a directory which **must** contain an initialisation file called (unsurprisingly) `__init__.py`.

The file can be entirely empty, but it is imported and executed with the `import` function. This permits you to set some rules, or initial steps to be performed with the first importation of the package.

You may be concerned that - with the modular nature of Python files and code - you may import a single library multiple times. Python keeps track and will only import (and initialise) the package once.

One useful part of the `__init__.py` file is that you can limit what is imported with the command `from package import *`.

In [11]:
#__init__.py

__all__ = ["class1", "class2"]

This means that `from package import *` actually only imports `class1` and `class2`

The next two sections are optional since, at this stage of your development practice, you're far less likely to need to produce code of this nature, but it can be useful to see how Python can be used in a slightly more advanced way.

## Closures

Python has the concept of `scopes`. The variables created within a class or function are only available within that class or function. The variables are available within the `scope` of the place they are called. If you want variables to be available within a function, you pass them as arguments (as you've seen previously).

Sometimes you want to have a global argument available to all functions, and sometimes you want a variable to be available to specific functions without being available more generally. Functions that can do this are called `closures`, and closures start with `nested functions`.

A `nested function` is a function defined inside another function. These nested functions gain access to the variables created in the enclosing scope.

In [12]:
def transmit_to_space(message):
    """
    This is the enclosing function
    """
    def data_transmitter():
        """
        The nested function
        """
        print(message)
    # Now the enclosing function calls the nested function
    data_transmitter()

transmit_to_space("Test message")

Test message


It's useful to remember that functions are also objects, so we can simply return the nested function as a response.

In [13]:
def transmit_to_space(message):
    """
    This is the enclosing function
    """
    def data_transmitter():
        """
        The nested function
        """
        print(message)
    # Return an object of the nested function (i.e. without brackets)
    return data_transmitter

msg = transmit_to_space("Into the sun!")
msg()

Into the sun!


## Decorators

Closures may seem a little esoteric. Why would you use them?

Think in terms of the modularity of Python code. Sometimes you want to pre-process arguments before a function acts on them. You may have multiple different functions, but you want to validate your data in the same way each time. Instead of modifying each function, it would be better to enclose your function and only return data once your closure has completed its task.

One example of this is in websites. Some functions should only be executed if the user has the rights to do so. Testing for that in every function is tedious.

Python has syntax for enclosing a function in a closure. This is called the `decorator`, which has the following form:

    @decorator
    def functions(arg):
        return True

This is equivalent to `function = decorator(function)`, which is similar to the way the closures are structured in the previous section.

As a silly example:

In [14]:
def repeater(old_function):
    """
    A closure for any function which, passed as `old_function`
    returns `new_function`
    """
    def new_function(*args, **kwds):
        """
        A demo function which repeats any function in the outer scope.
        """
        old_function(*args, **kwds)
        old_function(*args, **kwds)
    return new_function

# We user `repeater` as a decorator like this
@repeater
def multiply(num1, num2):
    print(num1 * num2)

# And execute
multiply(6,7)

42
42


You can modify the output as well as the input.

In [15]:
def exponent_out(old_function):
    """
    This modification works on any combination of args and kwargs.
    """
    def new_function(*args, **kwargs):
        return old_function(*args, **kwargs) ** 2
    return new_function

def exponent_in(old_function):
    """
    This modification only works if we know we have one argument.
    """
    def new_function(arg):
        return old_function(arg ** 2)
    return new_function

@exponent_out
def multiply(num1, num2):
    return num1 * num2

print(multiply(6,7))

@exponent_in
def digit(num):
    return num

print(digit(6))

# And, let's trigger an error
@exponent_in
def multiply(num1, num2):
    return num1 * num2

print(multiply(6,7))

1764
36


TypeError: new_function() takes 1 positional argument but 2 were given

You can use decorators to check that an argument meets certain conditions before running the function.

In [16]:
def check_zero(old_function):
    """
    Check the argument passed to a function to ensure it is not zero.
    """
    def new_function(arg):
        if arg == 0: 
            raise (ValueError, "Zero Argument")
        old_function(arg)
    return new_function

@check_zero
def print_num(num):
    print(num)

print_num(0)

TypeError: exceptions must derive from BaseException

Sometimes, though, you want to pass new arguments to a decorator so that you can do something before executing your function. That rests on doubly-nested functions.

In [17]:
def multiply(multiplier):
    """
    Using the multiplier argument, modify the old function to return
    multiplier * old_function
    """
    def multiply_generator(old_function):
        def new_function(*args, **kwds):
            return multiplier * old_function(*args, **kwds)
        return new_function
    return multiply_generator

@multiply(3)
def return_num(num):
    return num

return_num(5)

15

And that marks the end of this section of the tutorial.

[(previous)](02%20-%20-%20-%20Python%20-%20basics.ipynb) | [(index)](00%20-%20-%20-%20Introduction%20-%20to%20-%20Python.ipynb) | [(next)](04%20-%20-%20-%20Python%20-%20tools%20-%20for%20-%20data%20-%20analysis.ipynb)