# Module 10 - File I/O
---
This module will teach you how to deal with files in Python - Reading data into Python, processing it and writing the output back to files. We won't yet deal with csv and json files in this module, because we haven't yet explored the data structures that are ideal for processing such data (Pandas dataframes). 


## *1. Opening and closing files*:
---
A file is a named location on the disk where we store information. A file has a name, path and content. Whenever we deal with files the 3 step process we always follow is `Open --> Perform Operations (read/write/etc) --> Close`. Python provides built-in functions for file I/O.

### A) `Opening a file`:

When we open a file, remember to specify:
- `Where it is located?` (File name + path)
- `What do we intend on doing with it?` (read/write/update/append)
- `How is the data represented and encoded?` (Encodings for windows/linux, text/binary)

Refer to the table below to understand the different modes in which the file can be opened:
<table style="float:left;width:100%;font-size:100%;">
<tr>
<th style="text-align:center;width:20%">Mode</th>
<th style="text-align:left">Meaning</th>
</tr>

<tr>
<td style="text-align:center;">r</td>
<td style="text-align:left">read-only (default)</td>
</tr>

<tr>
<td style="text-align:center;">w</td>
<td style="text-align:left">write (creates the file if it doesnt exist. Over-writes the contents if it does exist)</td>
</tr>

<tr>
<td style="text-align:center;">a</td>
<td style="text-align:left">append (creates the file if it doesnt exist. Adds to the end of the file if it does exist)</td>
</tr>

<tr>
<td style="text-align:center;">+</td>
<td style="text-align:left">reading and writing</td>
</tr>
    
<tr>
<td style="text-align:center;">t</td>
<td style="text-align:left">text mode (default)</td>
</tr>

<tr>
<td style="text-align:center;">b</td>
<td style="text-align:left">binary mode</td>
</tr>

<tr>
<td style="text-align:center;"></td>
<td style="text-align:left"></td>
</tr>
</table>  

In `text mode`, we get strings when we read from the file (like in .txt, .csv, .json files). These formats enable files to be read by text-readers and similar programs. In `binary mode`, we get bytes when we read from the file (like in .pdf, .jpg, .exe files). There are specific applications to open each of these types of file. Sometimes proprietary file formats are converted into binary files to make them portable. In this module, we will deal with only text mode.

The `encoding` specifies how the information in the file is stored or represented. Windows and Linux both have different default encodings. So don't rely on default encodings, and always specify `utf-8` for files in text mode, unless there is some other encoding that you specifically need, like `ansi or ascii`.

In the examples below, the defaults for mode are are 'r' (read mode) and 't' (text mode). 
```Python
f = open("module_10_input_text_1.txt") # file in current working directory
f = open("C:\\Users\\vikram\\Desktop\\Py Programming\\mission-ai-courses-live\\Course 1 - Core Python Programming\\module_10_input_text_1.txt") # full file path
print(type(f))
```
Lets specify the open mode and encoding:
```Python
f = open("C:\\Users\\vikram\\Desktop\\Py Programming\\mission-ai-courses-live\\Course 1 - Core Python Programming\\module_10_input_text_1.txt", mode="r", encoding="utf-8")
```

### B) `Closing a file`:

When we are done using the file, it needs to be closed so that `system resources which are tied to the file are freed`.
```Python
f.close()
```

In [32]:
# Run the above code:


In [31]:
# Exercise:

# 1. Open the file 'module_10_input_text_1.txt' with and without the full file path, in 'read' mode with 'utf-8' encoding.
# Print the statement "File opened successfully" after opening the file
# Close the file
# Print the statement "File closed successfully" after closing the file


### C) `Using Context Manager - 'with'`:

Instead of explicitly closing the file each time, Python offers a much more efficient way to open and close files - using the `with` keyword. `'with' serves as a context manager`, i.e., you can access the file and its contents within the context of the `with` block, and not outside. There is no need to explicitly close the file object.

```Python
with open("module_10_input_text_1.txt", mode="r", encoding="utf-8") as f:
    print("Processing the file contents here")

print("Outside the 'with' block")
```

You can also open multiple files at the same time. The 'with' statements can also be nested, but this is a much more efficient way of opening multiple files simulaneously:
```Python
with open("module_10_input_text_1.txt", "r", encoding="utf-8") as f1, open("module_10_input_text_2.txt", "r", encoding="utf-8") as f2:
    print("Opened 2 files in the same 'with' block")

print("Outside the 'with' block")
```

In [30]:
# Run the above code:


In [35]:
# Exercises:

# 1. Using the context manager 'with', Open the file 'module_10_input_text_1.txt' in 'read' mode with 'utf-8' encoding.
# Print the statement "File opened successfully" after opening the file
# Print the statement "File closed successfully" after exiting the context manager


In [None]:
# 2. Open the files 'module_10_input_text_1.txt' and 'module_10_input_text_2.txt', using only 1 'with' statement
# Print the statement "File opened successfully" after opening the files
# Print the statement "File closed successfully" outside the 'with' block


### D) `Catching IOError exceptions`:

Whenever any file operation like opening, closing, reading or writing fails, then an `IOError exception is thrown`. It is a good practice to enclose file operations in a try/except block, while defining the actions for dealing with IOError.

```Python
try:
    with open("module_10_input_text_1.txt", mode="r", encoding="utf-8") as f:
        print("Processing the file contents here")
    print("Outside the 'with' block")
except IOError:
    print("An IOError exception was caught")
```

In [38]:
# Run the above code:


In [None]:
# Exercise:

# 1. Open the files 'module_10_input_text_1.txt' and 'module_10_input_text_2.txt', using only 1 'with' statement
# Print the statement "File opened successfully" after opening the files
# Print the statement "File closed successfully" outside the 'with' block
# Catch and handle any IOError exceptions that might get generated in this process



## *2. Reading from a file*:
---

Refer to the table above to see the different modes in which a file can be opened. `Use the 'r' option to open a file in read (text) mode, and 'rb' to read from a binary file`. The file object contains a pointer to the beginning of the file. You can then use the following functions to read from the file:
- `read(n)` - reads the 1st 'n' characters from where the pointer is positioned. If 'n'  isnt specified, it reads the entire file at once
- `readline()` - reads the 1st line from the pointer position (till a newline character is encountered, and including it). It returns an empty string when it reaches EOF (end of file)
- `readlines()` - reads all the lines of a file into a list from the pointer position till the end of the file

```Python
with open("C:\\Users\\vikram\\Desktop\\Py Programming\\mission-ai-courses-live\\Course 1 - Core Python Programming\\module_10_input_text_1.txt", mode="r", encoding="utf-8") as f: # text mode
    print(f.read(4)) # reads the 1st 4 characters
    print(f.readline()) # reads the 1st line with newline at the end, minus the 1st 4 characters
    print(f.readline(),end="") # reads the 2nd line
    print(f.readlines()) # reads the remaining lines into a list, from the 3rd line onwards

with open("module_10_input_text_1.txt", mode="rb") as f: # binary mode
    print(f.read(4)) # reads the 1st 4 characters
    print(f.readline()) # reads the 1st line with newline at the end, minus the 1st 4 characters
    print(f.readline()) # reads the 2nd line
    print(f.readlines()) # reads the remaining lines into a list, from the 3rd line onwards
```
Contrast the outputs that you get above. `In binary mode, you get the raw text, with the line feed and carriage return characters at the end of each line`.

In [44]:
# Run the above code:


You can `process the contents of the file as it is being read, or once the contents have been read entirely`.
```Python
# processing as it is being read line by line
with open("module_10_input_text_1.txt", mode="r", encoding="utf-8") as f:
    for line in f: # equivalent of calling readline() successively for each line
        print(line, end='')

# processing once all lines have been read
with open("module_10_input_text_1.txt", mode="r", encoding="utf-8") as f:
    for line in f.readlines(): # readlines() processes the entire input into a list of lines
        print(line, end='')
```

In [5]:
# Run the above code:


In [47]:
# Exercises:

# 1. Using the context manager, open the file 'module_10_input_text_2.txt' in 'read' 'text' mode, with 'utf-8' encoding. 
# Print the 1st 6 characters of the first line
# Print the remainder of the 1st line
# Print the 2nd line
# Print a list containing the remaining lines from the 3rd line onwards
# For each of the print statements above, change the default line ending character of the print function to  space ' '


In [None]:
# 2. Print out each line of the file 'module_10_input_text_2.txt' using a for loop, both with and without readlines()


## *3. Writing to a file*:
---

To open a file with the intention of writing some information in it: 
- `Use the 'w' option to open a text file in write mode`. This overwrites the existing file contents if the file exists, and creates the file if it doesn't exist. The file object points to the beginning of the file. 
- `Use the 'a' option to append to a file`. It creates the file if it doesn't exist. It only adds to the end of the file, and doesnt overwrite the existing contents. The file object contains a pointer to the end of the file. 
- `Append the 'b' option to write and append to binary files` - 'wb' and 'ab'

You can then use the following functions to read from the file:
- `write(string)` - Writes the data in the string to the file
- `writelines(list_of_lines)` - writes the entire list of lines into a file

```Python
# Lets create a file and write/append some information to it

# creating new file & writing to it
with open("first_written_file.txt","w") as f:
    for i in range(1,11):
        f.write(str(i)+"\n")

# printing out the new file
with open("first_written_file.txt","r") as f:
    for line in f:
        print(line, end = '')

# appending to existing file
with open("first_written_file.txt","a") as f:
    for i in range(11,21):
        f.write(str(i)+"\n")

# printing out the file with appended data
with open("first_written_file.txt","r") as f:
    for line in f:
        print(line, end = '')

# over-writing the contents of the file
with open("first_written_file.txt","w") as f:
    for i in range(21,31):
        f.write(str(i)+"\n")

# printing out the overwritten file
with open("first_written_file.txt","r") as f:
    for line in f:
        print(line, end = '')
```

In [13]:
# Run the above code:


```Python
# We can also read the contents of 1 file and write it into another file

# method 1:
with open("module_10_input_text_1.txt", "r") as f_in, open("second_written_file.txt", "w") as f_out:
    for line in f_in:
        f_out.write(line+"\n")

# method 2:
with open("module_10_input_text_2.txt", "r") as f_in, open("second_written_file.txt", "a") as f_out:
    f_out.writelines(f_in.readlines())

# printing out the new file
with open("second_written_file.txt","r") as f:
    for line in f:
        print(line, end = '')

```

In [None]:
# Run the above code:


In [4]:
# Exercise

# 1. Write the first 10 negative integers into a new text file 'file_writing_exercise_1.txt' (1 integer per line)
# To this file, append the individual squares of the first 10 negative integers (1 square per line)
# print out the contents of the file 'file_writing_exercise_1.txt'


In [3]:
# 2. Create a new file 'file_writing_exercise_2.txt' containing every line from 'file_writing_exercise_1.txt' - use 'writelines()'
# Print out the contents of the file 'file_writing_exercise_2.txt'


In [2]:
# 3. Create a new file 'file_writing_exercise_3.txt' containing every 2nd line from 'file_writing_exercise_2.txt'
# Print out the contents of the file 'file_writing_exercise_3.txt'


## *4. Using generator functions to read large files efficiently*:

Lets recap some key concepts about generators from Module 6:
- Generators are a special kind of object that employ `lazy evaluation`, i.e., they don't store the values in memory, but compute them as required. This approach is especially useful for big data. `You can treat generators like you would treat any other iterable`. 
- One way to create a generator is by using a generator function. To add values to the generator through a generator function, use `yield`. Generator functions do not use the `return` keyword. `Think of 'yield' as the equivalent of list.append()`. 

```Python
# this is our generator function
def read_large_file(file_object):    
    """A generator function to read a large file lazily."""

    # Loop indefinitely until the end of the file
    while True:
        # Read a line from the file: data        
        data = file_object.readline()

        # Break if this is the end of the file (readline() returns an empty string when it reaches EOF)
        if data == "":          
            break
        # Yield the line of data        
        yield data
        
# Open a connection to the file
with open("module_10_input_text_2.txt","r") as f:

    # Create a generator object for the file: gen_file
    gen_file = read_large_file(f)

    # Print the first three lines of the file
    # the generator object is an iterator - use next() 
    print(next(gen_file))
    print(next(gen_file))
    print(next(gen_file))
    
    # print all other lines
    for l in gen_file:
        print(l)
 ```

In [19]:
# Run the above code:


In [47]:
# Exercise

# 1. Write the contents of 'module_10_input_text_1.txt' to a file called 'generator_input.txt'.
# Append the contents of 'module_10_input_text_2.txt' to 'generator_input.txt'.
# Define a generator function to read an input file lazily, and return a generator object of the file
# Call this generator function with 'generator_input.txt' as the input. 
# Print the 1st 2 lines of the generator
# Print the remaining contents of the file by wrapping the generator object in an appropriate function



## *Congratulations! You now know how to handle files in Python - open, close, read and write. You also know how to define generator functions to enable work on large files. Keep going!*