# October 31

Let's debrief on Tuesday's business. How did it go?

Now that we have some orientation to LONI, we'll be starting to build a Python package.

Let's have a look at my package, [WrightLabUtils](https://github.com/wrightaprilm/wrightlabutils). 

## Anatomy of a python package

### Required: 
- Code -  this is typically in a directory of the same name as the package
- Setup.py - this file tells python how to install your program


### Nice to have:
- Data - For testing that the package works
- Notebooks and/or documents - containing explanations of how the code works.


## Today's work

Today, we will get started on our very own Python packages. We are going to get the required structure of a Python package set up, along with one function. Next week, we will look at testing in Python packages.

Let's get started. Create a new file in the root of your JupyterHub. Call it YourLastName_Package.

Next, create a file called `setup.py`. The basic structure of this file follows:




In [None]:
from setuptools import setup

setup(name='Your Package Name',
      version='0.1',
      description='A description of what it does',
      url='TBD',
      author='You',
      author_email='Yours,
      license='MIT',
      packages=['name of package with no caps or spaces'],
      install_requires=[
          'required',
          'packages'
      ],
      long_description=open('README.txt').read(),
zip_safe=True)

## What does my code depend on?

Hard to say - what will your package do? For this part, I would like all of us to make a package that does the same thing. I'm going to give us three options:

1. Parse Open Tree data (Sept 27)
2. Subset a matrix and do a BLAST search (Oct 2)
3. Read in locality data and make a map (Oct 18)


## Task One

Look at the lesson exercise we chose. What libraries did we import? Fill them in in the `install_requires` field.

## Task Two

What code do we need to carry out our tasks? How can you put this code into functions? What will the arguments be? 

## Interlude: Errors 

Every programmer encounters errors, both those who are just beginning, and those who have been programming for years. Encountering errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, understanding what the different types of errors are and when you are likely to encounter them can help a lot. Once you know why you get certain types of errors, they become much easier to fix.

Errors in Python have a very specific form, called a traceback. Let's examine one:

In [None]:
# This code has an intentional error. You can type it directly or
# use it for reference to understand the error message below.
def favorite_ice_cream():
    ice_creams = [
        "chocolate",
        "vanilla",
        "strawberry"
    ]
    print(ice_creams[3])

favorite_ice_cream()

This particular traceback has two levels. You can determine the number of levels by looking for the number of arrows on the left hand side. In this case:

- The first shows code from the cell above, with an arrow pointing to Line 8 (which is favorite_ice_cream()).
- The second shows some code in the function favorite_ice_cream, with an arrow pointing to Line 6 (which is print(ice_creams[3])).

The last level is the actual place where the error occurred. The other level(s) show what function the program executed to get to the next level down. So, in this case, the program first performed a function call to the function favorite_ice_cream. Inside this function, the program encountered an error on Line 6, when it tried to run the code `print(ice_creams[3])`.

So what error did the program actually encounter? In the last line of the traceback, Python helpfully tells us the category or type of error (in this case, it is an `IndexError`) and a more detailed error message (in this case, it says `list index out of range`).

If you encounter an error and don’t know what it means, it is still important to read the traceback closely. That way, if you fix the error, but encounter a new one, you can tell that the error changed. Additionally, sometimes just knowing where the error occurred is enough to fix it, even if you don’t entirely understand the message.

If you do encounter an error you don’t recognize, try looking at the official documentation on errors. However, note that you may not always be able to find the error there, as it is possible to create custom errors. In that case, hopefully the custom error message is informative enough to help you figure out what went wrong.


## Syntax Errors

When you forget a colon at the end of a line, accidentally add one space too many when indenting under an if statement, or forget a parenthesis, you will encounter a syntax error. This means that Python couldn’t figure out how to read your program. This is similar to forgetting punctuation in English: for example, this text is difficult to read there is no punctuation there is also no capitalization why is this hard because you have to figure out where each sentence ends you also have to figure out where each sentence begins to some extent it might be ambiguous if there should be a sentence break or not

People can typically figure out what is meant by text with no punctuation, but people are much smarter than computers. If Python doesn’t know how to read the program, it will just give up and inform you with an error. For example:

In [None]:
def some_function():
    msg = "hello, world!"
    print(msg)
    return msg


Here, Python tells us that there is a `SyntaxError` on line 1, and even puts a little arrow in the place where there is an issue. In this case the problem is that the function definition is missing a colon at the end.

Actually, the function above has two issues with syntax. If we fix the problem with the colon, we see that there is also an `IndentationError`, which means that the lines in the function definition do not all have the same indentation:

In [None]:
def some_function():
    msg = "hello, world!"
    print(msg)
     return msg


Both `SyntaxError` and `IndentationError` indicate a problem with the syntax of your program, but an `IndentationError` is more specific: it always means that there is a problem with how your code is indented.

Some indentation errors are harder to spot than others. In particular, mixing spaces and tabs can be difficult to spot because they are both whitespace. In the example below, the first two lines in the body of the function some_function are indented with tabs, while the third line — with spaces. If you’re working in a Jupyter notebook, be sure to copy and paste this example rather than trying to type it in manually because Jupyter automatically replaces tabs with spaces.

In [None]:
def some_function():
	msg = "hello, world!"
	print(msg)
        return msg



## Variable Name Errors

Another very common type of error is called a `NameError`, and occurs when you try to use a variable that does not exist. For example:

In [None]:
print(a)

Variable name errors come with some of the most informative error messages, which are usually of the form "name `the_variable_name` is not defined".

Why does this error message occur? That’s a harder question to answer, because it depends on what your code is supposed to do. However, there are a few very common reasons why you might have an undefined variable. The first is that you meant to use a string, but forgot to put quotes around it:

In [None]:
print(hello)

The second is that you just forgot to create the variable before using it. In the following example, `count` should have been defined (e.g., with `count = 0`) before the for loop:

In [None]:
count = 0

for number in range(10):
    count = count + number
print("The count is:", count)


Finally, the third possibility is that you made a typo when you were writing your code. Let’s say we fixed the error above by adding the line `Count = 0` before the for loop. Frustratingly, this actually does not fix the error. Remember that variables are case-sensitive, so the variable count is different from Count. We still get the same error, because we still have not defined count:

In [None]:
Count = 0
for number in range(10):
    count = count + number
print("The count is:", count)


## Index Errors

Next up are errors having to do with containers (like lists and strings) and the items within them. If you try to access an item in a list or a string that does not exist, then you will get an error. This makes sense: if you asked someone what day they would like to get coffee, and they answered `caturday`, you might be a bit annoyed. Python gets similarly annoyed if you try to ask it for an item that doesn't exist:

In [None]:
letters = ['a', 'b', 'c']
print("Letter #1 is", letters[0])
print("Letter #2 is", letters[1])
print("Letter #3 is", letters[2])
print("Letter #4 is", letters[3])


Here, Python is telling us that there is an `IndexError` in our code, meaning we tried to access a list index that did not exist.

## File Errors

The last type of error we’ll cover today are those associated with reading and writing files: `FileNotFoundError`. If you try to read a file that does not exist, you will receive a `FileNotFoundError` telling you so. If you attempt to write to a file that was opened read-only, Python 3 returns an `UnsupportedOperationError`. More generally, problems with input and output manifest as `IOErrors` or `OSErrors`, depending on the version of Python you use.

In [None]:
file_handle = open('myfile.txt', 'r')

One reason for receiving this error is that you specified an incorrect path to the file.

## Exercise: Read the python code and the resulting traceback below, and answer the following questions:

- How many levels does the traceback have?
- What is the function name where the error occurred?
- On which line number in this function did the error occurr?
- What is the type of error?
- What is the error message?


In [None]:
# This code has an intentional error. Do not type it directly;
# use it for reference to understand the error message below.
def print_message(day):
    messages = {
        "monday": "Hello, world!",
        "tuesday": "Today is tuesday!",
        "wednesday": "It is the middle of the week.",
        "thursday": "Today is Donnerstag in German!",
        "friday": "Last day of the week!",
        "saturday": "Hooray for the weekend!",
        "sunday": "Aw, the weekend is almost over."
    }
    print(messages[day])

def print_friday_message():
    print_message("Friday")

print_friday_message()



## Exercise: 
- Read the code below, and (without running it) try to identify what the errors are.
- Run the code, and read the error message. Is it a SyntaxError or an IndentationError?
- Fix the error.
- Repeat steps 2 and 3, until you have fixed all the errors.


In [None]:
def another_function
  print("Syntax errors are annoying.")
   print("But at least python tells us about them!")
  print("So they are usually not too hard to fix.")



## How do we know what we've done is right?

Not like, cosmically.

Our previous lessons have introduced the basic tools of programming: variables and lists, file I/O, loops, conditionals, and functions. What they haven't done is show us how to tell whether a program is getting the right answer, and how to tell if it's still getting the right answer as we make changes to it.

To achieve that, we need to:

- Write programs that check their own operation.
- Write and run tests for widely-used functions.
- Make sure we know what "correct" actually means.

The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry --- the kind done with lumber --- the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes.

## Assertions

The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion's condition. If it's true, Python does nothing, but if it's false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn't positive:



In [None]:
numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0.0
for n in numbers:
    assert n > 0.0, 'Data should only contain positive values'
    total += n
print('total is:', total)


Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80–90% are working correctly. Broadly speaking, assertions fall into three categories:

    A precondition is something that must be true at the start of a function in order for it to work correctly.

    A postcondition is something that the function guarantees is true when it finishes.

    An invariant is something that is always true at a particular point inside a piece of code.

For example, suppose we are representing rectangles using a tuple of four coordinates (x0, y0, x1, y1), representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense:



In [None]:
def normalize_rectangle(rect):
    '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
    Input should be of the format (x0, y0, x1, y1).
    (x0, y0) and (x1, y1) define the lower left and upper right corners
    of the rectangle, respectively.'''
    assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
    x0, y0, x1, y1 = rect
    assert x0 < x1, 'Invalid X coordinates'
    assert y0 < y1, 'Invalid Y coordinates'

    dx = x1 - x0
    dy = y1 - y0
    if dx > dy:
        scaled = float(dx) / dy
        upper_x, upper_y = 1.0, scaled
    else:
        scaled = float(dx) / dy
        upper_x, upper_y = scaled, 1.0

    assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
    assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'

    return (0, 0, upper_x, upper_y)

The preconditions on lines 3, 5, and 6 catch invalid inputs:

In [None]:
print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate

In [None]:
print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted

The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:

In [None]:
print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) ))

but if we normalize one that's wider than it is tall, the assertion is triggered:

In [None]:
print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))

## What could we test about function one? 

- What must be true for the function to work?
- What must be true for it to have worked? 

## What could we test about function two? 

- What must be true for the function to work?
- What must be true for it to have worked? 