# Manipulating text files

Python uses file objects to interact with external files on your computer. These file objects can be any sort of file such as audio, text, emails, Excel etc. 

<strong> Note: </strong> You will probably need to install particular libraries or modules to interact with those various file types, but they are easily available.

Python has a built-in open function that allows us to open and edit basic file types. 


## Creating a File with IPython
This function is specific to jupyter notebooks. I will create a basic text file and add some text to it.

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

file1 = open("/content/gdrive/My Drive/mytextfile.txt", "w")

contents = "This is the first line of my new text file.\nThis is the second line of the text file."

file1.write(contents)

file1.close()

Mounted at /content/gdrive


A text file should now be created in your Google Drive.

Knowing the Google Drive path you are working in is important when opening a file that is saved in the same location as your notebook. Of course we can open a file from any location, and not just the working directory of the jupyter notebook.


I'm going to work on the text file called <strong> mytextfile.txt </strong> that I created earlier.

In [2]:
# Open the mytextfile.txt file I created earlier
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt")

Let's examine some details about this text file

In [3]:
my_text_file

<_io.TextIOWrapper name='/content/gdrive/My Drive/mytextfile.txt' mode='r' encoding='UTF-8'>

This feedback from the interpreter means we're using a wrapper to open the file that has opened the text file in a <strong> read-only </strong> mode. It is now an open file object held in memory. 

We'll perform some reading and writing exercises, and then we have to close the file to free up memory.

## Reading and seeking

Lets first read the file.

In [4]:
my_text_file.read()

'This is the first line of my new text file.\nThis is the second line of the text file.'

If I try to open the file again, something unexpected happens.

In [5]:
# What happens if we try to read it again?
my_text_file.read()

''

This happens because the reading <strong> cursor position </strong> is at the end of the file after having read it. So there is nothing left to read. 

We can reset the <strong> cursor position </strong> like this, to index position 0 (start of the file).

In [6]:
# Seek to the start of file (index 0)
my_text_file.seek(0)

0

This command resets the cursor position back to the beginning point of the file.

Now if we try to open the file again, we should be able to re-read all of its contents.

In [7]:
my_text_file.read()

'This is the first line of my new text file.\nThis is the second line of the text file.'

I can read the contents of the file into a string with this command. Make sure you reset the cursor position first with the <strong> `.seek` </strong> command, otherwise there will be nothing read into the string.

In [8]:
my_text_file.seek(0)
file_contents = my_text_file.read()

And I can show its contents using the print command

In [9]:
print(file_contents)

This is the first line of my new text file.
This is the second line of the text file.


Then we no longer need to re-read the file contents again, and instead we can work directly with the contents of the string.

It is important to close any files you open. We do this using the <strong> `.close()` </strong> command.

In [10]:
my_text_file.close()

## .readlines()

We can use the <strong> `.readlines()` </strong> command to read a file line by line. 

Use this command with caution with large files, since everything will be held in memory. We will learn how to iterate over large files later in this course.

I'll open the file again, and then use the .readlines() command.

In [11]:
# Read the text file again.
# This time the readlines() command will put each new line into a 
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt")
all_my_lines = my_text_file.readlines()

In [12]:
my_text_file.close()

Now that I have the contents of the text file in individual lines, I can perform various functions on it. For example, I can use a loop to iterate through each line and print out the fourth word of each line.

In [13]:
for line in all_my_lines:
    print (line.split()[3])

first
second


## Writing to a File

By default, the <strong> `open()` </strong> function will only allow us to read the file. We need to pass the argument `'w'` to write over the file. For example:

In [14]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt", "w+")

lets check what is now in the file

In [15]:
my_text_file.read()

''

This indicates that the text file contents have has been overwritten.

<div class="alert alert-danger" style="margin: 20px">Use <strong> `w+</strong> ` option with caution!<br>
Opening a file with 'w' or 'w+' *truncates the original*, meaning that anything that was in the original file is deleted!</div>

Lets add some new text to the file and see what happens to its contents.

In [16]:
my_text_file.write("This is new contents I'm adding to the text file.")

49

In [17]:
# Return the indexer to the start of the file
my_text_file.seek(0)
my_text_file.read()

"This is new contents I'm adding to the text file."

The text file no longer contains the original text I entered into it earlier. It now contains new text only. Thats is because I used the text mode <strong> w+ </strong> argument when I opened the file. Rememebr that <strong> w+ </strong> allows us to read and write to the file.

If we want to add text to a file, we need to append text to it.

In [18]:
# Close the file before we continue
my_text_file.close()

## Appending to a File
Passing the argument `'a'` with the `open` command opens the file and puts the pointer at the end, so anything written is appended. Like `w+`, `a+` lets us read and write to a file. If the file does not exist, one will be created.

In [19]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt", "a+")

In [20]:
my_text_file.write("This is the first line of my text using the a+ option.")

54

In [21]:
my_text_file.close()

Lets look at the contents of the file.

In [22]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt")

In [23]:
my_text_file.read()

"This is new contents I'm adding to the text file.This is the first line of my text using the a+ option."

The `a+` option lets us write contents to the end of the file.

Note that we can also press the `SHIFT` + `TAB` buttons to view more detail on the command we are using at any time. This options allows us to get more inforramtion on each of the various options availabel to us in a command. This works for all commands.

What happens if I try to open a file that doesnt exist?

In [25]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt")

FileNotFoundError: ignored

The file is not automatically created because the standard mode when opening a file is `r`.

We can easily resolve this issue by changing the mode to `a+`. That will then create the new file if it does not currently exist.

In [26]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt", "a+")

Now I'll add some text to this new file.

In [27]:
my_text_file.write("This is the first line of text in my new file.")

46

Now I'll close the file.

In [28]:
my_text_file.close()

Next I'll reopen the file, but I'll only open it with read permisisons. Remember that this is the default option when opening a file.

In [29]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt")

Now I'll try to write some contents to the file. **This will not work as I've not speficied which mode I'd like to read from the file, so the default `r` mode is used.**

In [30]:
my_text_file.write("This is a test to see if I can write to my text file.")

UnsupportedOperation: ignored

Now I'll easily fix this error. I'll close the file first, and then change the mode to `a+` to allow reading and writing  to the file.

In [31]:
my_text_file.close()

In [32]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt", "a+")

In [33]:
my_text_file.write("I'm adding a new line to my test text file using the a+ option.")

63

Now I'll seek to the start of the file and then read the contents of it into a string.

In [34]:
my_text_file.seek(0)
my_text_file.read()

"This is the first line of text in my new file.I'm adding a new line to my test text file using the a+ option."

All of the text is shown on one line of code. If I want to split each line into individual new lines, I need to add the `\n` special charcater when I am writing text to the file. Here's an example. Note that I include the special character inside the quote marks along with the text that I'm inserting at the end of the text file.

In [35]:
my_text_file.write("\nThis is another new line in my text file.")

42

Now I'll reset the seek to the start of the file and read all the files contents again.

In [36]:
my_text_file.seek(0)
my_text_file.read()

"This is the first line of text in my new file.I'm adding a new line to my test text file using the a+ option.\nThis is another new line in my text file."

to allow the special character `\n` to work, we need to use the `print` command to show our text to the screen.

In [37]:
my_text_file.seek(0)
for line in my_text_file:
    print (line)

This is the first line of text in my new file.I'm adding a new line to my test text file using the a+ option.

This is another new line in my text file.


We could also show the contents of the `.read()` command directly within the `print` statement.

In [38]:
my_text_file.seek(0)
print(my_text_file.read())

This is the first line of text in my new file.I'm adding a new line to my test text file using the a+ option.
This is another new line in my text file.


## Aliases and Context Managers
We can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a <strong> context manager </strong>. 

We can use the `with` command to control access to the text file. It will automatically control access to the file, and close it when we're done with the file. This is commonly used when interacting with text files in Python.

Here's an example of how to use the `with` command.

In [39]:
with open("/content/gdrive/My Drive/testfile.txt", "r") as my_text_file:
    file_contents = my_text_file.readlines()

Then we can show the contents of the text file. We dont need to issue the `.close()` command as all that is taken care of throuth the `with` context manager.

In [40]:
file_contents

["This is the first line of text in my new file.I'm adding a new line to my test text file using the a+ option.\n",
 'This is another new line in my text file.']

Note that the `with ... as ...:` context manager automatically closed `test.txt` after assigning the first line of text to first_line:

## Iterating through a File

In [41]:
with open("/content/gdrive/My Drive/testfile.txt", "r") as my_text_file:
    for line in my_text_file:
        print(line, end="")  # the end="" argument removes extra linebreaks

This is the first line of text in my new file.I'm adding a new line to my test text file using the a+ option.
This is another new line in my text file.