# Chapter 9: Reading and Writing Files
Variables are a fine way to store data while your program is running, but if you want your data to persist even after your program has finished, you need to save it to a file. You can think of a file’s contents as a single string value, potentially gigabytes in size. In this chapter, you will learn how to use Python to create, read, and save files on the hard drive.

## Files and File Paths

A file has two key properties: a filename (usually written as one word) and a path. The path specifies the location of a file on the computer. For example, there is a file on my Windows laptop with the filename project.docx in the path C:\Users\Al\Documents. The part of the filename after the last period is called the file’s extension and tells you a file’s type. The filename project.docx is a Word document, and Users, Al, and Documents all refer to folders (also called directories). Folders can contain files and other folders. For example, project.docx is in the Documents folder, which is inside the Al folder, which is inside the Users folder. Figure 9-1 shows this folder organization.

The C:\ part of the path is the root folder, which contains all other folders. On Windows, the root folder is named C:\ and is also called the C: drive. On macOS and Linux, the root folder is /. In this book, I’ll use the Windows-style root folder, C:\. If you are entering the interactive shell examples on macOS or Linux, enter / instead.

Additional volumes, such as a DVD drive or USB flash drive, will appear differently on different operating systems. On Windows, they appear as new, lettered root drives, such as D:\ or E:\. On macOS, they appear as new folders under the /Volumes folder. On Linux, they appear as new folders under the /mnt (“mount”) folder. Also note that while folder names and filenames are not case-sensitive on Windows and macOS, they are case-sensitive on Linux.

### Backslash on Windows and Forward Slash on macOS and Linux

On Windows, paths are written using backslashes (\) as the separator between folder names. The macOS and Linux operating systems, however, use the forward slash (/) as their path separator. If you want your programs to work on all operating systems, you will have to write your Python scripts to handle both cases.

Fortunately, this is simple to do with the Path() function in the pathlib module. If you pass it the string values of individual file and folder names in your path, Path() will return a string with a file path using the correct path separators.

In [3]:
from pathlib import Path
Path('spam', 'bacon', 'eggs')

WindowsPath('spam/bacon/eggs')

In [4]:
str(Path('spam', 'bacon', 'eggs'))

'spam\\bacon\\eggs'

Note that the convention for importing pathlib is to run from pathlib import Path, since otherwise we’d have to enter pathlib.Path everywhere Path shows up in our code. Not only is this extra typing redundant, but it’s also redundant.

I’m running this chapter’s interactive shell examples on Windows, so Path('spam', 'bacon', 'eggs') returned a WindowsPath object for the joined path, represented as WindowsPath('spam/bacon/eggs'). Even though Windows uses backslashes, the WindowsPath representation in the interactive shell displays them using forward slashes, since open source software developers have historically favored the Linux operating system.

If you want to get a simple text string of this path, you can pass it to the str() function, which in our example returns 'spam\\bacon\\eggs'. (Notice that the backslashes are doubled because each backslash needs to be escaped by another backslash character.) If I had called this function on, say, Linux, Path() would have returned a PosixPath object that, when passed to str(), would have returned 'spam/bacon/eggs'. (POSIX is a set of standards for Unix-like operating systems such as Linux.)

These Path objects (really, WindowsPath or PosixPath objects, depending on your operating system) will be passed to several of the file-related functions introduced in this chapter. For example, the following code joins names from a list of filenames to the end of a folder’s name:

In [5]:
from pathlib import Path
myFiles = ['account.txt', 'details.csv', 'invite.docx'] # create a list of files
for filename in myFiles:
    print(Path(r'C:\Users\Zac', filename))

C:\Users\Zac\account.txt
C:\Users\Zac\details.csv
C:\Users\Zac\invite.docx


On Windows, the backslash separates directories, so you can’t use it in filenames. However, you can use backslashes in filenames on macOS and Linux. So while Path(r'spam\eggs') refers to two separate folders (or a file eggs in a folder spam) on Windows, the same command would refer to a single folder (or file) named spam\eggs on macOS and Linux. For this reason, it’s usually a good idea to always use forward slashes in your Python code (and I’ll be doing so for the rest of this chapter). The pathlib module will ensure that it always works on all operating systems.

Note that pathlib was introduced in Python 3.4 to replace older os.path functions. The Python Standard Library modules support it as of Python 3.6, but if you are working with legacy Python 2 versions, I recommend using pathlib2, which gives you pathlib’s features on Python 2.7. Appendix A has instructions for installing pathlib2 using pip. Whenever I’ve replaced an older os.path function with pathlib, I’ve made a short note. You can look up the older functions at https://docs.python.org/3/library/os.path.html.

### Using the / Operator to Join Paths

We normally use the + operator to add two integer or floating-point numbers, such as in the expression 2 + 2, which evaluates to the integer value 4. But we can also use the + operator to concatenate two string values, like the expression 'Hello' + 'World', which evaluates to the string value 'HelloWorld'. Similarly, the / operator that we normally use for division can also combine Path objects and strings. This is helpful for modifying a Path object after you’ve already created it with the Path() function.

In [1]:
from pathlib import Path
Path('spam') / 'bacon' / 'eggs'

WindowsPath('spam/bacon/eggs')

In [2]:
Path('spam') / Path('bacon/eggs')

WindowsPath('spam/bacon/eggs')

In [3]:
Path('spam') / Path('bacon', 'eggs')

WindowsPath('spam/bacon/eggs')

Using the / operator with Path objects makes joining paths just as easy as string concatenation. It’s also safer than using string concatenation or the join() method, like we do in this example:

In [None]:
# This is not a suggested way to join path objects (would only work for windows becauase of backslash)
homeFolder = r'C:\Users\Al'
subFolder = 'spam'
homeFolder + '\\' + subFolder

'C:\\Users\\Al\\spam'

In [None]:
# This is not a suggested way to join path objects (would only work for windows becauase of backslash)
'\\'.join([homeFolder, subFolder])

'C:\\Users\\Al\\spam'

A script that uses this code isn’t safe, because its backslashes would only work on Windows. You could add an if statement that checks sys.platform (which contains a string describing the computer’s operating system) to decide what kind of slash to use, but applying this custom code everywhere it’s needed can be inconsistent and bug-prone.

The pathlib module solves these problems by reusing the / math division operator to join paths correctly, no matter what operating system your code is running on. The following example uses this strategy to join the same paths as in the previous example:

In [6]:
homeFolder = Path('C:/Users/Al')
subFolder = Path('spam')
homeFolder / subFolder

WindowsPath('C:/Users/Al/spam')

In [7]:
str(homeFolder / subFolder)

'C:\\Users\\Al\\spam'

The only thing you need to keep in mind when using the / operator for joining paths is that one of the first two values must be a Path object. Python will give you an error if you try entering the following into the interactive shell:

In [9]:
# gives an error: 'spam' / 'bacon' / 'eggs'

Python evaluates the / operator from left to right and evaluates to a Path object, so either the first or second leftmost value must be a Path object for the entire expression to evaluate to a Path object.

If you see the TypeError: unsupported operand type(s) for /: 'str' and 'str' error message shown previously, you need to put a Path object on the left side of the expression.

The / operator replaces the older os.path.join() function, which you can learn more about from https://docs.python.org/3/library/os.path.html#os.path.join

### The Current Working Directory

Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory.

You can get the current working directory as a string value with the Path.cwd() function and change it using os.chdir().

In [16]:
from pathlib import Path
import os
Path.cwd()

WindowsPath('C:/Users/Zac/OneDrive/Python/Practice/ATBS')

In [18]:
os.chdir('C:\\Users\\Zac\\OneDrive\\Python\\Practice\\ATBS')
Path.cwd()

WindowsPath('C:/Users/Zac/OneDrive/Python/Practice/ATBS')

Python will display an error if you try to change to a directory that does not exist.

In [19]:
os.chdir('C:/ThisFolderDoesNotExist')

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:/ThisFolderDoesNotExist'

There is no pathlib function for changing the working directory, because changing the current working directory while a program is running can often lead to subtle bugs.

### The Home Directory

All users have a folder for their own files on the computer called the home directory or home folder. You can get a Path object of the home folder by calling Path.home()

In [21]:
Path.home()

WindowsPath('C:/Users/Zac')

The home directories are located in a set place depending on your operating system:

    On Windows, home directories are under C:\Users.
    On Mac, home directories are under /Users.
    On Linux, home directories are often under /home.

Your scripts will almost certainly have permissions to read and write the files under your home directory, so it’s an ideal place to put the files that your Python programs will work with.

### Absolute vs. Relative Paths

There are two ways to specify a file path:

    An absolute path, which always begins with the root folder
    A relative path, which is relative to the program’s current working directory

There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this directory.” Two periods (“dot-dot”) means “the parent folder.”

In [None]:
# Relativce path: .\fizz
# Absolute path: C:\bacon\fizz

### Creating New Folders Using the os.makedirs() Function

Your programs can create new folders (directories) with the os.makedirs() function. Enter the following into the interactive shell:

In [22]:
import os
os.makedirs('C:\\Users\\Zac\\OneDrive\\Python\\Practice\\ATBS\\Test')

os.makedirs() will create any necessary intermediate folders in order to ensure that the full path exists.

To make a directory from a Path object, call the mkdir() method. For example, this code will create a spam folder under the home folder on my computer:

In [None]:
# make a new directory from a Path object
from pathlib import Path
Path(r'C:\users\Zac\spam').mkdir()

Note that mkdir() can only make one directory at a time; it won’t make several subdirectories at once like os.makedirs().

### Handling Absolute and Relative Paths

The pathlib module provides methods for checking whether a given path is an absolute path and returning the absolute path of a relative path.

Calling the is_absolute() method on a Path object will return True if it represents an absolute path or False if it represents a relative path. For example, enter the following into the interactive shell, using your own files and folders instead of the exact ones listed here:

In [25]:
Path.cwd()

WindowsPath('C:/Users/Zac/OneDrive/Python/Practice/ATBS')

In [27]:
Path.cwd().is_absolute()

True

In [30]:
Path(r'OneDrive\Python\Practice').is_absolute()

False

To get an absolute path from a relative path, you can put Path.cwd() / in front of the relative Path object. After all, when we say “relative path,” we almost always mean a path that is relative to the current working directory. Enter the following into the interactive shell:

In [31]:
Path(r'OneDrive\Python\Practice')

WindowsPath('OneDrive/Python/Practice')

In [32]:
# Get an absolute path from the relative path
Path.cwd() / Path(r'OneDrive\Python\Practice')

WindowsPath('C:/Users/Zac/OneDrive/Python/Practice/ATBS/OneDrive/Python/Practice')

If your relative path is relative to another path besides the current working directory, just replace Path.cwd() with that other path instead. The following example gets an absolute path using the home directory instead of the current working directory:

In [33]:
Path(r'OneDrive\Python\Practice')

WindowsPath('OneDrive/Python/Practice')

In [35]:
# get an absolute path from the home directory instead of the current working directory
Path.home() / Path(r'OneDrive\Python\Practice')

WindowsPath('C:/Users/Zac/OneDrive/Python/Practice')

The os.path module also has some useful functions related to absolute and relative paths:

    Calling os.path.abspath(path) will return a string of the absolute path of the argument. This is an easy way to convert a relative path into an absolute one.
    Calling os.path.isabs(path) will return True if the argument is an absolute path and False if it is a relative path.
    Calling os.path.relpath(path, start) will return a string of a relative path from the start path to path. If start is not provided, the current working directory is used as the start path.

Try these functions in the interactive shell:

In [36]:
os.path.abspath('.')

'C:\\Users\\Zac\\OneDrive\\Python\\Practice\\ATBS'

In [37]:
os.path.abspath('.\\Scripts')

'C:\\Users\\Zac\\OneDrive\\Python\\Practice\\ATBS\\Scripts'

In [38]:
os.path.isabs('.')

False

In [39]:
os.path.isabs(os.path.abspath('.'))

True

Since C:\Users\Al\AppData\Local\Programs\Python\Python37 was the working directory when os.path.abspath() was called, the “single-dot” folder represents the absolute path 'C:\\Users\\Al\\AppData\\Local\\Programs\\Python\\Python37'.

Enter the following calls to os.path.relpath() into the interactive shell:

In [40]:
os.path.relpath('C:\\Windows', 'C:\\')

'Windows'

In [41]:
os.path.relpath('C:\\Windows', 'C:\\spam\\eggs')

'..\\..\\Windows'

When the relative path is within the same parent folder as the path, but is within subfolders of a different path, such as 'C:\\Windows' and 'C:\\spam\\eggs', you can use the “dot-dot” notation to return to the parent folder.

Test