# 3.3 Files and the Operating System

1. [General Info](#general)
2. [Bytes and Unicode with Files](#bytes)

<a name="general"></a>
# General Info

`pandas.read_csv` and other such functions will replace this stuff in our usage, but good to be familiar with.  

The built-in function `open` will open a file for reading or writing. Default is read-only. It will be read in essentially as a list with each line as an element in the list

In [5]:
myPath = "../examples/segismundo.txt"
myFile = open(myPath, encoding="utf-8")

print("Class is TextIOWrapper and printing it just gives details about the file, not the contents:")
print(type(myFile))
print(myFile)
print("\nWe can see that each line in the file is a string:")
for line in myFile:
    print(f"Line: {line}Class: {type(line)}\n")

Class is TextIOWrapper and printing it just gives details about the file, not the contents:
<class '_io.TextIOWrapper'>
<_io.TextIOWrapper name='../examples/segismundo.txt' mode='r' encoding='utf-8'>

We can see that each line in the file is a string:
Line: Sueña el rico en su riqueza,
Class: <class 'str'>

Line: que más cuidados le ofrece;
Class: <class 'str'>

Line: 
Class: <class 'str'>

Line: sueña el pobre que padece
Class: <class 'str'>

Line: su miseria y su pobreza;
Class: <class 'str'>

Line: 
Class: <class 'str'>

Line: sueña el que a medrar empieza,
Class: <class 'str'>

Line: sueña el que afana y pretende,
Class: <class 'str'>

Line: sueña el que agravia y ofende,
Class: <class 'str'>

Line: 
Class: <class 'str'>

Line: y en el mundo, en conclusión,
Class: <class 'str'>

Line: todos sueñan lo que son,
Class: <class 'str'>

Line: aunque ninguno lo entiende.
Class: <class 'str'>

Line: 
Class: <class 'str'>



Two things to notice:

1. You can only iterate over a file once. You have to reload it if you want to use it again.
2. By default, the end-of-line (EOL) markers are read in. See below.
3. Use list comprehension to read in and do other stuff to file?

In [6]:
myFile = open(myPath, encoding="utf-8")
for line in myFile:
    #print(line)
    print(f"Last: {line[len(line)-1]}Second to last: {line[len(line)-2]}")

Last: 
Second to last: ,
Last: 
Second to last: ;
Last: 
Second to last: 

Last: 
Second to last: e
Last: 
Second to last: ;
Last: 
Second to last: 

Last: 
Second to last: ,
Last: 
Second to last: ,
Last: 
Second to last: ,
Last: 
Second to last: 

Last: 
Second to last: ,
Last: 
Second to last: ,
Last: 
Second to last: .
Last: 
Second to last: 



In [7]:
myLines = [x.rstrip() for x in open(myPath, encoding="utf-8")]
print(type(myLines))
print(myLines)

<class 'list'>
['Sueña el rico en su riqueza,', 'que más cuidados le ofrece;', '', 'sueña el pobre que padece', 'su miseria y su pobreza;', '', 'sueña el que a medrar empieza,', 'sueña el que afana y pretende,', 'sueña el que agravia y ofende,', '', 'y en el mundo, en conclusión,', 'todos sueñan lo que son,', 'aunque ninguno lo entiende.', '']


Make sure to `close` open files. You can use `with` to have this done by default:

In [8]:
myFile.close()
with open(myPath, encoding='utf-8') as f:
    lines = [x.rstrip() for x in f]
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.',
 '']

Various file modes:  
<img src="./myImages/table3.3_fileModes.png" width = 600>

## Common readable file methods

1. `read` - return certain number of characters from the file and also advance the position in the file that far.  
1. `tell` - return the current position
1. `seek` - change the file position to the byte specified.

In [10]:
# Open file in text mode and read first 10 characters:
f1 = open(myPath)
f1.read(10)

'Sueña el r'

In [11]:
# Open file in binary bode and read first 10 bytes:
f2 = open(myPath, mode="rb")
f2.read(10)

b'Sue\xc3\xb1a el '

In [12]:
# Tell the position
print(f1.tell())
print(f2.tell())

11
10


In [13]:
# Change the position
f1.seek(30)

30

In [14]:
# Read in 1 character
f1.read(1)

'q'

In [15]:
# Look at final position
f1.tell()

31

In [16]:
f1.close()
f2.close()

## Writing text

Use `write` or `writeLines` to write out. Remember, if you use the `w` file mode, it will overwite the existing file (if present). Use `x` mode to fail instead of overwrite.

In [17]:
# Write all non-empty lines to new file called tmp.txt
with open("../examples/tmp.txt", mode="w") as handle:
    handle.writelines(x for x in open(myPath) if len(x) > 1)
    
# Read that in
with open("../examples/tmp.txt") as newFile:
    newLines = newFile.readlines()
    
print(newLines)

['Sueña el rico en su riqueza,\n', 'que más cuidados le ofrece;\n', 'sueña el pobre que padece\n', 'su miseria y su pobreza;\n', 'sueña el que a medrar empieza,\n', 'sueña el que afana y pretende,\n', 'sueña el que agravia y ofende,\n', 'y en el mundo, en conclusión,\n', 'todos sueñan lo que son,\n', 'aunque ninguno lo entiende.\n']


Common python file meethods and attributes:  

<img src="./myImages/table3.4_fileMethods.png" width=600/>  

<a name="bytes"></a>
# Bytes and Unicode with Files

Default in python is to read/write files in text mode.  

Append `b` to the standard file mode to read/write in binary.