# Files and Exceptions

## Exceptions

### Analyzing Text

Eventually we'd like to read in files so we can analyze data. For now, let's use a few classic books as our dataset. The texts in this section come from [Project Gutenberg](https://gutenberg.org), which maintains a collection of literary works that are available in the public domain.

We'll start by "reading" *Alice in Wonderland* and counting the number of words in the text:

In [None]:
# Count the approximate number of words in a file.
from pathlib import Path

path = Path('alice.txt')
try:
    contents = path.read_text(encoding='utf-8')
except FileNotFoundError:
    print(f'Sorry, the file {path} does not exist.')
else:
    # Count the approximate number of words in the file
    words = contents.split()
    num_words = len(words)
    print(f'The file {path} has about {num_words} words.')

Above, we read the contents of `alice.txt` into a single string `contents`. Since the file exists, the `try` block succeed and the program moves to the `else` block. Next, we split the contents, which by default breaks a string up into a list of strings based on whitespace. Finally, we use `len()` on the list to count the approximate number of words and print the result.

Note: the count is a bit high, since there are extra words at the front and it also counts things like "Chapter IV" as two words. It's not a bad estimate though.

### Working with Multiple Files

Let's take a look at more books. (Why analyze one dataset when you can analyze many?!?) Before we do, let's move the bulk of the code to a function, since we'll use the same analysis on each book.

In [None]:
from pathlib import Path

def count_words(filename):
    """
    Count the approximate number of words in a file.

    Args:
        path (string): path to file to count words in
    
    Returns:
        Nothing. The number of words in the file is printed.
    """
    try:
        path = Path(filename)
        contents = path.read_text(encoding='utf-8')
    except FileNotFoundError:
        print(f'Sorry, the file {path} does not exist.')
    else:
        # Count the approximate number of words in the file
        words = contents.split()
        num_words = len(words)
        print(f'The file {path} has about {num_words} words.')

filenames = ['alice.txt', 'jackson.txt', 'principia.txt']

for filename in filenames:
    count_words(filename)

We can now run the same analysis on each book. Even though `jackson.txt` doesn't exist (without figures, why bother?), the program moves on to the next one after printing a useful error.

### Failing Silently

In the previous example we let the user know one of the books wasn't available, but you don't need to report the result of every failed exception. If your program is OK failing silently and continuing on, you can use the `pass` statement.

In [None]:
from pathlib import Path

def count_words(filename):
    """
    Count the approximate number of words in a file.

    Args:
        path (string): path to file to count words in
    
    Returns:
        Nothing. The number of words in the file is printed.
    """
    try:
        path = Path(filename)
        contents = path.read_text(encoding='utf-8')
    except FileNotFoundError:
        pass
    else:
        # Count the approximate number of words in the file
        words = contents.split()
        num_words = len(words)
        print(f'The file {path} has about {num_words} words.')

filenames = ['alice.txt', 'jackson.txt', 'principia.txt']

for filename in filenames:
    count_words(filename)

### Deciding Which Errors to Report

Generally, you should think about whether the user is expecting the information or not. For example, when calculating the roots of a quadratic equation, if you silently fail when dividing by zero or when the argument of the root is negative, the user will be confused when no answer is returned. (What are the roots? Why didn't it print any result?) On the other hand, if a user is expecting to see the word counts of some books, but doesn't know which are supposed to be checked, they may not care if a few aren't printed. (If all files fail to open though, they will certainly get confused!)

### Refactoring

In almost every program I've written, I come to a ppoint where my code works, but it could cleary be improved by breaking it up into a series of functions. This (very common) process is called *refactoring*. We did this above by making a `count_words()` function and converting the comment into a docstring.

Often times the overall logic of a program isn't clear until you start writing it. The more practice you get, the more you will be able to outline your code ahead of time to determine the structure (which functions you will need) and include them from the start. Thinking ahead is an important skill, but you will likely always find yourself refactoring code, even if just a little.

## Practice

Open the book *Alice in Wonderland* and pick a few words you think might commonly appear. Count how many times each word appears in the book and print the result.