# Chapter 10. Files and Exceptions

In this chapter, we will tackle two important tasks. First, we will learn how to read and write files. Next, we will learn how to handle unexpected situations with *exceptions*. Lastly, we will also experiment with the `json` module, which allows you to save user data so it is not lost when your program stops running.

## Reading from a File

Most data analysis will require you to read data from external files. Even if you don't deal with data, you may still need to access the contents of files, make changes, and then save your changes back to files.

### Reading the Contents of a File

We will learn a few things about how to read the contents of a file:

- We need to import `Path` class from the `pathlib` library (a special kind of module proving specific functionalities).
- We will use the `read_text()` method to read the contents in a file.
- When Python reaches the end of a file, it returns an empty string. Therefore, your output includes one extra blank line, which you can strip off with `rstrip()` method.
- You can use *method chaining* by connecting two methods together, e.g., `read_text().rstrip()`.

### Relative and Absolute File Paths

If your data file is not stored in the same folder as your Python code file, you need to specify the correct folder, in two ways:

- *Relative file path.* If your data file is in a sub-folder (e.g., `data`) of your Python code, you can use this approach, such as `path = Path('data/filename.txt').
- *Absolute file path.* Otherwise, it's easier to use absolute file path, e.g., `path = Path('c:/my document/filename.txt')`.

Regardless of your OS, we recommend that you always use forward slash (`/`) to specify file paths. Note, that Windows users are generally used to backslach (`\`).

### Accessing a File's Lines

Sometimes, you may want to examine each line of your file contents separately. In this case, you can use the `splitlines()` methods, followed by a `for` loop.

### Working with a File's Contents

In our `pi_digits.txt` example, we see that the data is split into multiple lines. If we want to combine all lines into one long string, we can do this in a `for` loop. To remove all the white spaces, we need to use the `lstrip()` method.

### Large Files: One Million Digits

### Is your Birthday Contained in Pi?



In [14]:
# Reading the pi_digits.txt file and printing the contents
from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()
print(contents)
contents = contents.rstrip()
print(contents)

print("Method Chaining:")
path = Path('pi_digits.txt')
contents = path.read_text().rstrip()
print(contents)

# Separating lines and printing each line
path = Path('pi_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
count = 0
for line in lines:
	print(f"{count}: {line}")
	count += 1
	
# First attemp to read the file and combine into a single string
path = Path('pi_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ""
for line in lines:
	pi_string += line

print(pi_string)
print(f"The length of the string is {len(pi_string)} characters.")

# Second attempt using join to combine lines into a single string
path = Path('pi_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ""
for line in lines:
	pi_string += line.lstrip()  # Remove leading/trailing whitespace from each line

print(pi_string)
print(f"The length of the string is {len(pi_string)} characters.")

# Read a million digits of pi from a file
print("Reading a million digits of pi from the file:")
path = Path('Data/chapter_10/reading_from_a_file/pi_million_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ""
for line in lines:
	pi_string += line.lstrip()

print(f"{pi_string[:52]}")
print(f"The length of the string is {len(pi_string)} characters.")

# Test whether your birthday is in pi
birthday = input("Enter your birthday in the form mmddyy: ")
if birthday in pi_string:
	print("Your birthday appears in the first million digits of pi!")
else:
	print("Your birthday is unique, so it doesn't appear in the first million digits of pi!")

3.1415926535
  8979323846
  2643383279
  
3.1415926535
  8979323846
  2643383279
Method Chaining:
3.1415926535
  8979323846
  2643383279
0: 3.1415926535
1:   8979323846
2:   2643383279
3:   
3.1415926535  8979323846  2643383279  
The length of the string is 38 characters.
3.141592653589793238462643383279
The length of the string is 32 characters.
Reading a million digits of pi from the file:
3.14159265358979323846264338327950288419716939937510
The length of the string is 1000002 characters.
Your birthday appears in the first million digits of pi!


In [15]:
# Exercise 10.1
path = Path('Data/chapter_10/learning.txt')
contents = path.read_text()
print(contents)

print("\n\nAfter line splitting:")
lines = contents.splitlines()
for line in lines:
	print(line)
	
# Exercise 10.2 Replacing 'file' with 'document'
path = Path('Data/chapter_10/learning.txt')
contents = path.read_text()
print("\n\nAfter replacing files:")
lines = contents.splitlines()

# Note that replace() method does not modify the original string, it returns a new string
# So we need to create a new list to modify the existing lines
for line in lines:
	line = line.replace('file', 'document')
	line = line.replace('File', 'Document')
	print(line)

# Chapter 10. Files and Exceptions

In this chapter, we will tackle two important tasks. First, we will learn how to read and write files. Next, we will learn how to handle unexpected situations with *exceptions*. Lastly, we will also experiment with the `json` module, which allows you to save user data so it is not lost when your program stops running.

## Reading from a File

Most data analysis will require you to read data from external files. Even if you don't deal with data, you may still need to access the contents of files, make changes, and then save your changes back to files.


After line splitting:
# Chapter 10. Files and Exceptions

In this chapter, we will tackle two important tasks. First, we will learn how to read and write files. Next, we will learn how to handle unexpected situations with *exceptions*. Lastly, we will also experiment with the `json` module, which allows you to save user data so it is not lost when your program stops running.

## Reading from a File

Mo

## Writing to a File

When you work with large volume of data, it is often better to save your data in a separate file. So you can continue to work on the data later. You can also share your data with other people.

### Writing a Single Line

To write a single line of text to a file, you do:

- Create a Path instance: `path = Path('file_name.txt')`
- Write your line to the file: path.write_text("your text here")

### Writing Multiple Lines

To write multiple lines of text, you may want to first create a variable that combines these lines of text. Make sure you include `\n` at the end of each line.



In [16]:
from pathlib import Path
path = Path('Data/chapter_10/programming.txt')
path.write_text("I love programming in Python!\n")

# multiple lines of text
contents = "I love programming in Python!\n"
contents += "Python is a versatile language.\n"
contents += "I enjoy solving problems with code.\n"
contents += "Let's keep learning and growing!\n"

path.write_text(contents)


131

In [17]:
# Exercise 10.4
from pathlib import Path
path = Path('Data/chapter_10/guest.txt')

condition = True
users = ""
while condition:
	user = input("Tell me your name (type 'q' to exit): ").strip().lower()
	if user != "q":
		users += user.title() +"\n"
	else:
		condition = False

path.write_text(users)

5

## Exceptions

When you encounter an error in your Python code, it generates an `exception` object. If you don't handle the exception, your program will halt and show a `traceback`, which is confusing to users - more importantly, it may unintentionally reveals too much information to potential hackers.

Exceptions in Python are handled with `try-except` blocks. If no error occurs, your program continues to run. Otherwise, the `except` blocks execute, which you can offer some friendly error messages.

### Handling the ZeroDivisionError Exception

- *Using `try-except` Blocks.*
- *Using Exceptions to Prevent Crashes.*
- *The `else` Block.*

### Handling the FileNotFoundError Exception

Another common error is `FileNotFoundError`.

### Analyzing Text

One of the simplest ways tokenize a document (i.e., count the number of words) is via Python's `split()` method.

### Working with Multiple Files

You should write a function to count the number of words in a file.

### Failing Silently

Sometimes, you may want to hide the details of exceptions from your users. In this case, you can use `pass` statement in the `except` block. It might be better to write all errors and exceptions to a separate log file, so you can examine them all together.

### Deciding which Errors to Report

In [18]:
# Using try-except blocks
try: 
    print(5/0)
except ZeroDivisionError:
    print("You can't divide by zero!")

# FileNotFoundError example
from pathlib import Path

path = Path('Data/chapter_10/exceptions/alice.txt')
try:
    contents = path.read_text(encoding='utf-8')
except FileNotFoundError:
    print(f"File {path} not found.")
else:
    # Count the approximate number of words in the file
    words = contents.split() # tokenization
    num_words = len(words)
    print(f"There are approximately {num_words} words in {path} file.")

# Using word_count function
print("Using word_count function:")
from word_count import count_words
path = Path('Data/chapter_10/exceptions/alice.txt')
count_words(path)

# Reading multiple files
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 'little_women.txt']

for filename in filenames:
    full_filename = 'Data/chapter_10/exceptions/' + filename
    path = Path(full_filename)
    count_words(path)


You can't divide by zero!
There are approximately 29594 words in Data\chapter_10\exceptions\alice.txt file.
Using word_count function:
There are approximately 29594 words in Data\chapter_10\exceptions\alice.txt file.
There are approximately 29594 words in Data\chapter_10\exceptions\alice.txt file.
File Data\chapter_10\exceptions\siddhartha.txt not found.
There are approximately 215864 words in Data\chapter_10\exceptions\moby_dick.txt file.
There are approximately 189142 words in Data\chapter_10\exceptions\little_women.txt file.


In [4]:
# Exercises 10.6/10.7
# See add.py file

# Exercise 10.8
from pathlib import Path

files = ['cat.txt', 'cats.txt', 'dog.txt']
for file in files:
    filename = "Data/chapter_10/" + file
    path = Path(filename)
    try:
        names = path.read_text()
    except FileNotFoundError:
        print(f"File {path} not found.")
    else:
        print(names)

# Exercise 10.9
print("Silent exception: ")
files = ['cat.txt', 'cats.txt', 'dog.txt']
for file in files:
    filename = "Data/chapter_10/" + file
    path = Path(filename)
    try:
        names = path.read_text()
    except FileNotFoundError:
        pass
    else:
        print(names)

# Exercise 10.10
from word_count import count_specific_word

filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 'little_women.txt']

for filename in filenames:
    full_filename = 'Data/chapter_10/exceptions/' + filename
    path = Path(full_filename)

    count_specific_word(path, 'the')
    count_specific_word(path, 'the ')

'Adam'
'Lucy'
'Chris'
File Data\chapter_10\cats.txt not found.
'Luke'
'Eric'
'Lisa'
Silent exception: 
'Adam'
'Lucy'
'Chris'
'Luke'
'Eric'
'Lisa'
The word 'the' occurs 2312 times in Data\chapter_10\exceptions\alice.txt file.
The word 'the ' occurs 1562 times in Data\chapter_10\exceptions\alice.txt file.
File Data\chapter_10\exceptions\siddhartha.txt not found.
File Data\chapter_10\exceptions\siddhartha.txt not found.
The word 'the' occurs 19060 times in Data\chapter_10\exceptions\moby_dick.txt file.
The word 'the ' occurs 12607 times in Data\chapter_10\exceptions\moby_dick.txt file.
The word 'the' occurs 11106 times in Data\chapter_10\exceptions\little_women.txt file.
The word 'the ' occurs 6651 times in Data\chapter_10\exceptions\little_women.txt file.


## Storing Data

You may want to store/retrieve your data in JSON (JaveScript Object Notation) format, using the `json` module. You first need to import the `json` module.

### Using `json.dumps()` and `json.loads()`

### Saving and Reading User-Generated Data

See `remember_me.py` and `greet_user.py` files saved in 'Data/chapter_10' folder.

### Refactoring

You often need to go back to your previous code to break it up into a series of functions that handle specific jobs. This process is called *refactoring*. It makes your code cleaner, easier to understand, and easier to extend.

In [8]:
from pathlib import Path
import json

numbers = [1, 2, 3, 4, 5, 12]

# Save data to JSON file
path = Path('numbers.json') # JSON file name, with extension json
contents = json.dumps(numbers) # create a string containing JSON representation
print(contents)
path.write_text(contents) # write JSON string to the file

# Read JSON file
path = Path('numbers.json')
contents = path.read_text()
data = json.loads(contents) # read JSON string from the file
print("Loaded JSON Data:")
print(data) # output the data read from the file

# Testing greet_user_module
import sys
sys.path.append('Data/chapter_10')
import greet_user_module as gt
gt.greet_user()

[1, 2, 3, 4, 5, 12]
Loaded JSON Data:
[1, 2, 3, 4, 5, 12]
Welcome back, Eric


In [None]:
# Exercise 10.11 - see fav_number.py file
# Exercise 10.12 - see fav_number2.py file
# 10.13 - see remember_me2.py file