To open a file for reading or writing, use the built-in open function with either a relative or absolute file path and an optional file encoding. Here, I pass `encoding="utf-8"` as a best practice because the default Unicode encoding for reading files varies from platform to platform.

In [None]:
path = "examples/segismundo.txt"
f = open(path, encoding="utf-8")

By default, the file is opened in read-only mode "r". We can then treat the file object f like a list and iterate over the lines like so:

In [None]:
for line in f:
    print(line)

Sueña el rico en su riqueza,

que más cuidados le ofrece;



sueña el pobre que padece

su miseria y su pobreza;



sueña el que a medrar empieza,

sueña el que afana y pretende,

sueña el que agravia y ofende,



y en el mundo, en conclusión,

todos sueñan lo que son,

aunque ninguno lo entiende.





In [None]:
f.mode  # file object attribute mode

In [None]:
list(f)

['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 '\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 '\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 '\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.\n',
 '\n']

The lines come out of the file with the end-of-line (EOL) markers intact, so you’ll often see code to get an EOL-free list of lines in a file like:

In Python, the `str.rstrip()` method is used to remove trailing (or rightmost) whitespace characters from a string. Whitespace characters include spaces, tabs, and newline characters.

In [None]:
lines = [x.rstrip() for x in open(path, encoding="utf-8")]
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.',
 '']

When you use open to create file objects, it is recommended to close the file when you are finished with it. Closing the file releases its resources back to the operating system:

In [None]:
f.close()

One of the ways to make it easier to clean up open files is to use the with statement:

In [None]:
with open(path, encoding="utf-8") as f:
    lines = [x.rstrip() for x in f]

This will automatically close the file f when exiting the with block. Failing to ensure that files are closed will not cause problems in many small programs or scripts, but it can be an issue in programs that need to interact with a large number of files.

If we had typed `f = open(path, "w")`, a new file at examples/segismundo.txt would have been created (be careful!), overwriting any file in its place. There is also the "x" file mode, which creates a writable file but fails if the file path already exists. See Table 3.3 for a list of all valid file read/write modes.

For readable files, some of the most commonly used methods are `read, seek, and tell`. read returns a certain number of characters from the file. What constitutes a "character" is determined by the file encoding or simply raw bytes if the file is opened in binary mode. 

`file.read(size)`
file: This is the file object that you've opened using the open() function or another file-related function.
`size (optional)`: This parameter specifies the number of bytes (or characters) to read from the file. If you omit this parameter or specify a negative value, the method will read the entire contents of the file.

In [None]:
f1 = open(path)
f1.read(10) # 10 characters

'Sueña el r'

In [None]:

f2 = open(path, mode="rb")  # Binary mode
f2.read(10) # read 10 bytes, 9 chars

b'Sue\xc3\xb1a el '

The `read` method advances the file object position by the number of bytes read. `tell` gives you the current position:

In [None]:
f1.tell()

11

In [None]:
f2.tell()

10

Even though we read 10 characters from the file `f1` opened in text mode, the position is `11` because it took that many bytes to decode 10 characters using the default encoding. You can check the default encoding in the sys module:

In [None]:
import sys
sys.getdefaultencoding()

'utf-8'

To get consistent behavior across platforms, it is best to pass an encoding (such as encoding="utf-8", which is widely used) when opening files.

`seek` changes the file position to the indicated byte in the file:

In [None]:
f1.seek(3)

3

In [None]:
f1.read(1)

'ñ'

In [None]:

f1.tell()

5

In [None]:
f1.close()
f2.close()

To write text to a file, you can use the file’s `write` or `writelines` methods. For example, we could create a version of examples/segismundo.txt with no blank lines like so:

In [None]:
path

with open("tmp.txt", mode="w") as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

with open("tmp.txt") as f:
    lines = f.readlines()

lines

['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.\n']

In [None]:
with open("tmp.txt") as f:
    stuff = f.readlines(5) # read 5 chars
    print(stuff)
    stuff = f.readlines(5)
    print(stuff)

In [None]:
# compared handle.to readline(): read one line at a time and move the file handle to the next line. 
with open("tmp.txt") as f:
    line = f.readline()
    print(line)
    line = f.readline()
    print(line)

In [None]:
import os
os.remove("tmp.txt")

## Bytes and Unicode with Files
The default behavior for Python files (whether readable or writable) is text mode, which means that you intend to work with Python strings (i.e., Unicode). This contrasts with binary mode, which you can obtain by appending b to the file mode. Revisiting the file (which contains non-ASCII characters with UTF-8 encoding) from the previous section, we have:

In [None]:
with open(path) as f:
    chars = f.read(10)

chars


'Sueña el r'

In [None]:
len(chars)

10

UTF-8 is a variable-length Unicode encoding, so when I request some number of characters from the file, Python reads enough bytes (which could be as few as 10 or as many as 40 bytes) from the file to decode that many characters. If I open the file in "rb" mode instead, read requests that exact number of bytes:

In [None]:
with open(path, mode="rb") as f:
    data = f.read(10)

data

b'Sue\xc3\xb1a el '

Depending on the text encoding, you may be able to decode the bytes to a str object yourself, but only if each of the encoded Unicode characters is fully formed:

In [None]:
data.decode("utf-8")

'Sueña el '

In [None]:

data[:4].decode("utf-8")

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 3: unexpected end of data

Text mode, combined with the encoding option of open, provides a convenient way to convert from one Unicode encoding to another:

In [None]:
sink_path = "sink.txt"
with open(path) as source:
    with open(sink_path, "x", encoding="iso-8859-1") as sink: #x: Write-only mode; creates a new file but fails if the file path already exists
        sink.write(source.read()) #source.read() read the entire source file

with open(sink_path, encoding="iso-8859-1") as f:
    print(f.read(10))

Sueña el r


In [None]:
os.remove(sink_path)

Beware using seek when opening files in any mode other than binary. If the file position falls in the middle of the bytes defining a Unicode character, then subsequent reads will result in an error:

In [None]:
f = open(path, encoding='utf-8')
f.read(5)

'Sueña'

In [None]:
f.seek(4)

4

In [None]:
f.read(1)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 0: invalid start byte

In [None]:

f.close()