# Assignment 3 Feedback

## Syntax & Style

### Put Written Answers in Markdown Cells

Please put your written answers in Markdown cells! They're much easier to read than code comments or `print` statements.

### Run Your Code

Please make sure to run all the cells in your notebook before submitting. This will help you make sure there are no errors and that all of your plots are visible.

### Strings Are Not Comments

The Python comment marker is `#`, not `"""`.

The `"""` marker starts a multi-line string. Don't use strings as comments, because it will slow your program down!

The only time you should put comments in `"""` is for the _docstring_ on the very first line of the function:

In [13]:
def add(x, y):
    """Add two numbers.
    
    THIS IS THE DOCSTRING
    
    Args:
        x The first number.
        y The second number.
    """
    return x + y

A docstring is a special string (not a comment) that Python displays when you use `help()`.

### Don't Repeat Calls

It takes time for a function to compute a result. If you need to use the return value of a function more than once, save it in a variable rather than calling the function multiple times:

In [40]:
# Compute (4 + 5) * (4 + 5)

# BAD
# Time is wasted on the second call to sum!
val = sum([4, 5]) * sum([4, 5])

# GOOD:
val = sum([4, 5])
val = val * val

This is especially important when you're reading a file. Reading files is one of the slowest operations you can do, so avoid reading the same file twice!

---
## File Paths

### Don't Hardcode Paths in Functions

Functions that have a hardcoded path are not reusable, which defeats the purpose of writing a function. It also makes your code hard to use on any other computer, because the paths will typically be different.

Parameterize paths in your functions so that they can be used and reused anywhere.

In [8]:
import os
import os.path

# BAD: This function has a hardcoded path, so it isn't reusable.
def get_basenames():
    return [os.path.basename(x) for x in os.listdir("/home/nick/")]


# GOOD: This function takes the path as an argument, so it is reusable.
# Notice that it's okay to hardcode a path in the default argument.
def get_basenames(path = "/home/nick"):
    return [os.path.basename(x) for x in os.listdir(path)]

### Use Relative Paths

If you need to use other files that are associated with a notebook (such as data files), use relative paths. Relative paths make it easier for other people to use the notebook, since they can drop the notebook and associated files anywhere on their computer.

A _relative path_ is a path that starts from the current working directory. Usually, the working directory will be the same directory that your Jupyter notebook or Python script is in.

For example, on my computer the relative path from this notebook to the Assignment 3 fruit files is just `fruit/`. In other words, this notebook and the `fruit/` directory are in the same directory. The absolute path to the fruit directory is
```
/home/nick/university/teach/sta141b/2017-sta141b/assignments/assignment3/fruit
```
It's easy to make sure this notebook and the `fruit/` directory are in the same directory. On the other hand, you probably don't have
```
/home/nick/university/teach/sta141b/2017-sta141b/assignments/assignment3
```
directories on your computer, and creating all those directories just to use this notebook is incredibly tedious.

### Join Paths with `os.path.join`

Some operating systems don't use `/` to separate files and directories in a path. For instance, Windows uses `\`. Getting the separator wrong can cause errors in your programs.

Python has a built-in function for joining paths: `os.path.join()`. This function checks the operating system and always chooses the correct separator.

In [10]:
import os.path

os.path.join("hello", "world")

'hello/world'

---
## Loops

### Avoid Explicit Indexes in Loops

You'll frequently need to iterate over a list of things: the names of files in a directory, the columns in a data frame, the tags in an HTML document, and so on.

In a low-level programming language like C, you'd do this by assigning an index (number) to each file, then writing a loop that increments the number at each iteration. The equivalent Python code is:

In [15]:
import os

files = os.listdir(".")

# BAD
for i in xrange(len(files)):
    # Do something.
    # For example, check if the file ends with ".ipynb".
    if files[i].endswith(".ipynb"):
        print files[i]

assignment3-Copy1.ipynb
assignment3.ipynb
feedback.ipynb
assignment3-solutions.ipynb


The variable `i` is only used to access the file name for the current iteration. It doesn't add any important information to the code, and distracts from what the code really means.

In Python you should just iterate over the file names directly:

In [16]:
# GOOD
for f in files:
    if f.endswith(".ipynb"):
        print f

assignment3-Copy1.ipynb
assignment3.ipynb
feedback.ipynb
assignment3-solutions.ipynb


### Iterate in Parallel with `zip()`

When you want to iterate over two (or more) lists in parallel, you can zip them together rather than using indexes:

In [21]:
xlist = [0, 1, 2]
ylist = ['a', 'b', 'c']

for x, y in zip(xlist, ylist):
    print "{} and {}".format(x, y)

0 and a
1 and b
2 and c


### Combine Back-to-back Loops

When you need to apply several different steps to each element of a list, you might be tempted to write back-to-back for-loops:

In [24]:
import os

files = os.listdir(".")

# BAD
ipynb = []
for f in files:
    # Get file names that end with ".ipynb".
    if f.endswith(".ipynb"):
        ipynb.append(f)
        
threes = []
for nb in ipynb:
    # Get file names that contain "3".
    if "3" in nb:
        threes.append(nb)
        
for three in threes:
    # Print each file name.
    print three

assignment3-Copy1.ipynb
assignment3.ipynb
assignment3-solutions.ipynb


Using a single loop is shorter, clearer, and more efficent:

In [36]:
# GOOD
threes = []
for f in files:
    if f.endswith(".ipynb") and "3" in f:
        threes.append(f)
        print f

assignment3-Copy1.ipynb
assignment3.ipynb
assignment3-solutions.ipynb


---
## Pandas

### Index with Booleans

You can get all rows in a Pandas data frame that match a condition with Boolean indexing (using `.loc`):

In [44]:
import pandas as pd

df = pd.DataFrame({
    "form": ["Dried", "Frozen", "Fresh", "Fresh, peeled", "Fresh", "Frozen"],
    "price": [10.10, 3.43, 5.43, 6.51, 4.32, 3.21]
})

# Get all rows that start with "Fresh".
df.loc[df.form.str.startswith("Fresh"), :]  ### why we need to use str here ????

Unnamed: 0,form,price
2,Fresh,5.43
3,"Fresh, peeled",6.51
4,Fresh,4.32


Boolean indexing is much faster and clearer than looping over all the rows.