# Working With Text files using Python

#### Goals 

- Opening .txt file and .pdf files with basic python libraries 
- Working with f-strings (formatted string literals) to format printed text
- Working with Files - opening, reading, writing and appending text files


## Formatted String Literals (f-strings)

It was intorduced in Python 3.6, f-strings has several benifits over the older <b>.format()</b> string method. 

You can bring outside variables immediately into to the string rather than pass them through as keyword arguments. 

In [4]:
name = "Prabhat"

# using the old formatting method

print('My name is {}.'.format(name))

My name is Prabhat.


In [6]:
# using f-strings

print(f'My name is {name}.')

My name is Prabhat.


In [7]:
# We can Pass !r to get the string representation:

print(f'My name is {name!r}.')

My name is 'Prabhat'.


In [10]:
d = {'a':221,'b':456}

print(f"Address is {d['a']} Baker Street ")

Address is 221 Baker Street 


### Minimum Widths, Alignment and Padding
You can pass arguments inside a nested set of curly braces to set a minimum width for the field, the alignment and even padding characters.

In [17]:
Subjects = [('Name', 'Book', 'Pages'), ('Chemistry', 'Chemistry', 655), ('Math', 'Algebra', 555), ('Biology', 'Botany', 144)]

for book in Subjects:
    print(f'{book[0]:{10}} {book[1]:{8}} {book[2]:{7}}')

Name       Book     Pages  
Chemistry  Chemistry     655
Math       Algebra      555
Biology    Botany       144


Here the first three lines align, except Pages follows a default left-alignment while numbers are right-aligned. When setting minimum field widths make sure to take the longest item into account.

To set the alignment, use the character < for left-align,  ^ for center, > for right.
To set padding, precede the alignment character with the padding character (- and . are common choices).

Let's make some adjustments:

In [18]:
for book in Subjects:
    print(f'{book[0]:{10}} {book[1]:{10}} {book[2]:.>{7}}') # here .> was added

Name       Book       ..Pages
Chemistry  Chemistry  ....655
Math       Algebra    ....555
Biology    Botany     ....144


### Date Formatting

In [21]:
from datetime import datetime

today = datetime(year=2019, month=2, day=25)

print(f'{today:%B %d, %Y}')

February 25, 2019


# Files 

Python uses file objects to interact with external files on your computer. These file objects can be any sort of file you have on your computer, whether it be an audio file, a text file, emails, Excel documents, etc. Note: You will probably need to install certain libraries or modules to interact with those various file types, but they are easily available.

Python has a built-in open function that allows us to open and play with basic file types. First we will need a file though. 

### Creating a File with IPython

In [26]:
%%writefile hello.txt
Hello, this is hello world.
This is the second line of the hello world.

Writing hello.txt


## Python Opening a File

### Know Your File's Location



In [27]:
pwd

'C:\\Users\\Prabhat\\Desktop\\NLP WIth Python'

In [28]:
# Open the text.txt file we created earlier
my_file = open('hello.txt')

In [29]:
my_file

<_io.TextIOWrapper name='hello.txt' mode='r' encoding='cp1252'>

my_file is now an open file object held in memory. We'll perform some reading and writing exercises, and then we have to close the file to free up memory.

### .read() and .seek()

In [30]:
my_file.read()

'Hello, this is hello world.\nThis is the second line of the hello world.'

In [31]:
# But what happens if we try to read it again?
my_file.read()

''

This happens because you can imagine the reading "cursor" is at the end of the file after having read it. So there is nothing left to read. We can reset the "cursor" like this:

In [32]:
# Seek to the start of file (index 0)
my_file.seek(0)

0

In [33]:
# Now read again
my_file.read()

'Hello, this is hello world.\nThis is the second line of the hello world.'

### .readlines()
You can read a file line by line using the readlines method. Use caution with large files, since everything will be held in memory.

In [34]:
# Readlines returns a list of the lines in the file
my_file.seek(0)
my_file.readlines()

['Hello, this is hello world.\n',
 'This is the second line of the hello world.']

When you have finished using a file, it is always good practice to close it.

In [35]:
my_file.close()

## Writing to a File

By default, the `open()` function will only allow us to read the file. We need to pass the argument `'w'` to write over the file. For example:

In [36]:
# Add a second argument to the function, 'w' which stands for write.
# Passing 'w+' lets us read and write to the file

my_file = open('hello.txt','w+')

In [37]:
# Write to the file
my_file.write('This is a new first line')

24

In [38]:
# Read the file
my_file.seek(0)
my_file.read()

'This is a new first line'

In [39]:
my_file.close()  # always do this when you're done with a file

## Appending to a File
Passing the argument `'a'` opens the file and puts the pointer at the end, so anything written is appended. Like `'w+'`, `'a+'` lets us read and write to a file. If the file does not exist, one will be created.

In [40]:
my_file = open('hello.txt','a+')
my_file.write('\nThis line is being appended to hello.txt')
my_file.write('\nAnd another line here.')

23

In [41]:
my_file.seek(0)
print(my_file.read())

This is a new first line
This line is being appended to hello.txt
And another line here.


In [42]:
my_file.close()

### Appending with `%%writefile`
Jupyter notebook users can do the same thing using IPython cell magic:

In [43]:
%%writefile -a hello.txt

This is more text being appended to test.txt
And another line here.

Appending to hello.txt


Add a blank space if you want the first line to begin on its own line, as Jupyter won't recognize escape sequences like `\n`

## Aliases and Context Managers
You can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a context manager:

In [44]:
with open('hello.txt','r') as txt:
    first_line = txt.readlines()[0]
    
print(first_line)

This is a new first line



Note that the `with ... as ...:` context manager automatically closed `test.txt` after assigning the first line of text to first_line:

In [45]:
txt.read()

ValueError: I/O operation on closed file.

### Iterating through a File

In [46]:
with open('hello.txt','r') as txt:
    for line in txt:
        print(line, end='')  # the end='' argument removes extra linebreaks

This is a new first line
This line is being appended to hello.txt
And another line here.
This is more text being appended to test.txt
And another line here.