# Files and Exceptions

In this section, you will learn how to work with files so that you can write programs to deal with large amounts of data. You will also learn to handle errors so your program doesn't crash when it encounters non-standard situations.

## Reading from a File

When working with large datasets, the information will often be stored in text files. Later on, we'll talk about other common formats for storing data, such as using the `json` or `pickle` modules.

When you want to work with information in a text file, the first step is to read the file into memory. You can then work through all of the data at once or line by line.

### Reading the Contents of a File

To start, we need a file to read in! Inside this `08-files-and-exceptions` directory, create a file called `pi_digits.txt` and include the following three lines:
```txt
3.1415926535
  8979323846
  2643383279
```
Here is a program that opens this file, reads the contents, and prints them to the screen:

In [None]:
from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()
print(contents)

'3.1415926535
  8979323846
  2643383279'


To work with the contents of a file, we need to tell Python the path to the file. The *path* is the exact location of a file or folder on a computer. Python provides a module called `pathlib` that makes it easier to work with files and paths on a computer, for any operating system. A module that provides specific functionality like this is often called a *library*.

Here, we built a `Path` object representing the file `pi_digits.txt`, which we assign to the variable `path`. (Notice again that Python syntax is case sensitive.) Since the file is saved in the same directory as this Jupyter notebook, the filename is all `Path` needs to open the file.

Once we have the a `Path` object representing `pi_digits.txt`, we use the `read_text()` method to read the entire contents of the file and store them in a single (multiple line) string. Finally, we printed that string.

While it's not needed here, you will often want to clean up whitespace on either side of the text your read from a file using a command like:

In [None]:
contents = path.read_text().strip()

This line tells Python to call the `read_text()` method on the file we've opened. Then it applies the `strip()` method to the string that `read_text()` returns. The cleaned-up string is then assigned to the variable `contents`. This approach is called *method chaining* and is a common way to make code more concise and readable.

### Relative and Absolute File Paths

When you pass a simple filename like `pi_digits.txt`, Python looks in the directory you're executing the program from. In the case of this notebook, that is the directory `08-files-and-exceptions`.

Depending on how you organize your code, the file you want to open won't be in the same directory you're running your program from. For example, we have a directory called `08-files-and-exceptions/` that contains your programs. Inside that directory, you might have another directory called `text_files/` to distinguish our program files from our data. Just passing `Path` the name of a file inside `text_files/` won't work, because Python will only look in the directory it's being run from (here: `08-files-and-exceptions/`).

There are two main ways to specify paths in programming. A *relative file path* tells Python to look for a given location relative to the directory the program is running from. Since this notebook is running from `08-files-and-exceptions/` we can build a path that starts with `text_files/` and ends with the file name.

For example, if we make a directory `text_files/` inside `08-files-and-exceptions/` and copy `pi_digits.txt` into it, we would write:

In [None]:
from pathlib import Path

path = Path('text_files/pi_digits.txt')
contents = path.read_text()

You can also tell Python exactly where the file is on your computer, regardless of where the program is being executed from. This is called an *absolute file path*.

For example, on my laptop, the file is stored in `/Users/cjmey/Desktop/p325-sp26-lectures/08-files-and-exceptions/text_files/`. You can check the path on your computer by opening a terminal and typing:
```bash
cd 08-files-and-exceptions     # chance directory to 08-files-and-exceptions/
cd text_files                  # chance directory to text_files/
pwd                            # print the working directory
```
Specifying the absoulte file path would look like:

In [9]:
from pathlib import Path

path = Path('/Users/cjmey/Desktop/p325-sp26-lectures/08-files-and-exceptions/text_files/pi_digits.txt')
contents = path.read_text()

Absolute paths are always longer than relative paths, because they start with your systems top-level folder. This can be useful for reading files from anywhere on your computer. For now, it's easier to stick with keeping your text files in the same folder your running your program from and simply specifying the file name.

Note: Windows systems use a backslash (`\`) instead of a forward slash (`/`) when displaying file paths. **You should always use forward slashes in your code, even on Windows**. The `pathlib` library will take care of the conversion correctly for whatever operating system you are running on.

### Accessing a File's Lines

When you're working with data in a file, you'll often want to look at each line separately. For example, each line could represent a different sample taken that you need to convert to another unit before combining with the rest of the data.

You can use the `splitlines()` method to turn a long string into a set of lines, and then use a `for` loop to examine each line individually:

In [None]:
from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
for line in lines:
    print(line)

3.1415926535
  8979323846
  2643383279


Since we haven't modified any lines, the output matches the original text file exactly.

### Working with a File's Content

After reading the contents of a file into memory, you can do anything you want with the data. Let's start by building a single string with all the digits of $\pi$ in it.

In [12]:
from pathlib import Path

path = Path('pi_digits.txt')
content = path.read_text().splitlines() # method chain

pi_string = ''
for line in lines:
    pi_string += line

print(pi_string)
print(len(pi_string))

3.1415926535  8979323846  2643383279
36


Uh oh! That's not quite right, since it includes the extra whitespace on the right. Let's update the code in the `for` loop to fix this.

In [None]:
from pathlib import Path

path = Path('pi_digits.txt')
lines = path.read_text().splitlines() # method chain

pi_string = ''
for line in lines:
    pi_string += line.strip() # remove whitespace

print(pi_string)
print(len(pi_string))

3.141592653589793238462643383279
32


Finally, if we want to use this representation of `pi` in an equation, we'll need to convert it to a number using `int()` or `float()`. Since the string represents a floating point number (it includes a period) we must use `float()` here.

In [15]:
# Concert string to float
pi_float = float(pi_string)
print(pi_float)

3.141592653589793


### Large Files: One Million Digits

The same code we've used above works equally well on very large files. If we start with a text file that contains $\pi$ to 1,000,000 decimal places we can create a single string containing all those digits.

The example below shows how to do this, although it only prints the first 100 digits so we don't have to watch the screen scroll for the next hour!

In [17]:
from pathlib import Path

path = Path('pi_million_digits.txt')
lines = path.read_text().splitlines()

pi_string = ''
for line in lines:
    pi_string += line.strip()

print(f'{pi_string[:100]}')
print(len(pi_string))

3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706
1000002


## Practice

Copy the code for reading `pi_digits.txt` into a `.py` program in the same directory, but don't include the `strip()` method. Run this program and look at the output. Now update the program to include the `strip()` method and run it again. Is the behavior different than what we saw in the Jupyter notebook?

Next, let's check if your birthday is conained in $\pi$. Write a program that opens `pi_million_digits.txt` and turns the contents into a single string. Next, add an `input()` prompt that asks for the users birthday in the form mmddyy. Finally, check if the birthday string is `in` the first million digits of $\pi$.