# Working with files

 - There are two main categories of files: text files and binary files.
   - **Text files** are files that contain sequences of characters, and are human readable. They can be opened and edited with a text editor: For example .txt, .py and .html files are text files.
   - **Binary files** are files that contain sequences of bytes. They have to be opened with a special program: For example .pdfs are opened with a pdf reader, .jpgs are opened with an image viewer, .mp3s are opened with a music player, etc.
 - We will focus in opening and reading **text files**.

## Opening and closing files

To open a file in python we use the `open()` function.
 - The function has 3 main parameters:
   - `file`: The name of the file to open.
   - `mode`: The mode in which to open the file. 
     - `r`: Read mode. The file must exist.
     - `w`: Write mode. If the file exists, it will be **overwritten**. If it doesn't exist, it will be created.
   - `encoding`: The encoding of the file. To avoid issues in windows, use `encoding="utf-8"`.


After using the file, we must close it using the `close()` method.

## Reading from files.

Once you have opened a file, there are several ways to read from it:
 - `read()`: Reads the whole file as a single string.
 - `readline()`: Reads a single line from the file.
 - `readlines()`: Reads the whole file as a list of strings, one per line.
 - `for line in file`: Iterates over the file, reading one line at a time.

Lets run some examples. First, lets create a simple text file to open and use as input.
Just run the following cell, it will create a file called `input.txt` in the current directory.

In [19]:
%%writefile input.txt
This is a simple file. This is the first line
And this is the second
The file only has three lines so this is the last one


Overwriting input.txt


In [20]:
# Example 1. Read the whole file using read()

# Open the file in read mode
file = open('input.txt', 'r', encoding='utf-8')

# Read the whole file
text = file.read()

# Print the content of the file
print(text)

# Close the file
file.close()


This is a simple file. This is the first line
And this is the second
The file only has three lines so this is the last one



In [21]:
# Example 2. Read the file line by line using readline()

# Open the file in read mode
file = open('input.txt', 'r', encoding='utf-8')

# initialize the line variable to non-empty string
line = 'not empty'

# Read the file line by line. We stop when readline() returns an empty string
while line != '':
    line = file.readline()

    # Note we use end='' to avoid adding an extra newline.
    # The line variable already contains the newline character.
    print(line, end='')

# Close the file
file.close()


This is a simple file. This is the first line
And this is the second
The file only has three lines so this is the last one


In [22]:
# Example 3. Read the file line by line using readlines()

# Open the file in read mode
file = open('input.txt', 'r', encoding='utf-8')

# Read the file line by line and store the lines in a list
lines = file.readlines()

# Print the content of the file

for line in lines:
    print(line, end='')

# Close the file
file.close()


This is a simple file. This is the first line
And this is the second
The file only has three lines so this is the last one


In [27]:
# Example 4. Read the file line by line using for loop

# Open the file in read mode
file = open('input.txt', 'r', encoding='utf-8')

# Read the file line by line and store the lines in a list
for line in file:
    print(line, end='')

# Close the file
file.close()


This is a simple file. This is the first line
And this is the second
The file only has three lines so this is the last one


## Writing to a new file

To write to a file, we use the `write()` method. It takes a single string as a parameter, and writes it to the file.
 - If the file doesn't exist, it will be created.
 - If the file exists, it will be **overwritten**.

When writing, we have to explicitly add the new line characters `\n` at the end of each line.

In [26]:
# Example 5, Read from input.txt and write to output.txt

# Open the file in read mode
infile = open('input.txt', 'r', encoding='utf-8')

# Open the file in write mode
outfile = open('output.txt', 'w', encoding='utf-8')

for line in infile:
    outfile.write(line)

# Close the files
infile.close()
outfile.close()

# Check that outfile.txt has been created and contains the same text


In [25]:
# Example 6. Read from input.txt and write to output_upper.txt manipulating the line first

# Open the file in read mode
infile = open('input.txt', 'r', encoding='utf-8')

# Open the file in write mode
outfile = open('output_upper.txt', 'w', encoding='utf-8')

for line in infile:
    # Convert the line to uppercase
    line = line.upper()

    # Write the line to the output file
    outfile.write(line)

# Close the files
infile.close()
outfile.close()

# Check that outfile_upper.txt has been created and contains the same text in uppercase


## Stripping

 - When reading from a file, we often want to remove the new line characters `\n` at the end of each line. 
 - This is useful when we need to perform some operation on the line that we read, and we don't want the new line character to interfere.
 - We can do this using the `strip()` method.

In [24]:
# Example 7. Read from input.txt and write to output_last_char.txt the last character of each line

# Open the file in read mode
infile = open('input.txt', 'r', encoding='utf-8')

# Open the file in write mode
outfile = open('output_last_char.txt', 'w', encoding='utf-8')

for line in infile:
    # Strip the line by removing the trailing newline character
    line = line.rstrip('\n')

    # Get the last character of the line
    last_char = line[-1]

    # Write the last character to the output file
    # Note that we add the newline character, so that each character is written on a separate line
    outfile.write(last_char + '\n')

# Close the files
infile.close()
outfile.close()


## Splitting

- When reading from a file, we often want to split each line into a list of strings.
- This is useful when we need to perform some operation on each word of the line that we read.
- We can do this using the `split()` method.
- We can also split on a specific character, for example `split(",")` will split on commas.

In [28]:
# Example 8, Read from input.txt and write to output_second_word.txt the second word of each line

# Open the file in read mode
infile = open('input.txt', 'r', encoding='utf-8')

# Open the file in write mode
outfile = open('output_second_word.txt', 'w', encoding='utf-8')

for line in infile:
    # Strip the line by removing the trailing newline character
    line = line.rstrip('\n')

    # Split the line into words
    words = line.split()

    # Get the second word of the line
    second_word = words[1]

    # Write the second word to the output file
    # Note that we add the newline character, so that each word is written on a separate line
    outfile.write(second_word + '\n')

# Close the files
infile.close()
outfile.close()
