
<a href="https://colab.research.google.com/github/kokchun/Programmering-med-Python-21/blob/main/Lectures/L8-file-handling.ipynb" target="_parent"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> &nbsp; for interacting with the code

---
# Lecture notes - file handling

---
This is the lecture note for **file handling**, but it's built upon contents from previous lectures such as: 
- input-output
- variables
- if-statement
- for loop
- while 
- lists
- random
- strings
- functions
- error handling

<p class = "alert alert-info" role="alert"><b>Note</b> that this lecture note gives a brief introduction to file handling. I encourage you to read further about file handling.

Read more [w3schools - file handling](https://www.w3schools.com/python/python_file_handling.asp). Learn about the different functions of file handling in this resource, but don't use it exactly as stated. Use in combination with the **with** statement to ensure safe working with files.

Read more [real python - with statement](https://realpython.com/python-with-statement/)

Files used is found here: [Files](https://github.com/kokchun/Programmering-med-Python-21/tree/main/Files)

---

## with statement
- **with** statement together with **open** is used to safely open files and cleaning up the resource afterwards
- another way is to use try..except..finally and remember to close the file, but is too verbose

syntax: 
```python
with open(path, option) as file_name: 
    statements 
    ...
```
option
- "r" - read
- "a" - append, creates the file if it doesn't exist
- "w" - write - opens a file for writing, creates it if it doesn't exist
- "x" - create - creates a file, error if it already exists

In [78]:
path = "../Files/quotes.txt" # it's a relative path from '..' goes up a folder

with open(path, "r") as f:
    text = f.read() # reads the whole file

print(repr(text)) # prints out the raw string

'  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein\n\nTime is a drug. Too       much of it kills you.  -  Terry Pratchett\n\n\n An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr\n\n   Everything must be made as simple as possible. But not simpler. - Albert Einstein     \n\n\n  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  \n\nIf I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton'


---
## Clean up quotes.txt

Strategy
- inspect the txt-file (and notice that some prankster has added random noise in form of whitespaces)
- remove all leading and trailing whitespaces
- remove excessive white spaces
- add quote numbers

In [79]:
import re 

quotes, i = [], 1 # i is used for quotenumber

# opens two files, one for reading and one for writing
with open("../Files/quotes.txt", "r") as f_read, open("../Files/quotes_cleaned.txt", "w") as f_write: 
    
    f_write.write("Famous quotes\n\n")
    for quote in f_read:
        quote = quote.strip(" \n") # removes leading and trailing space and newlines
        quote = re.sub(" +", " ", quote) # regular expression to substitute >=1 whitespace with 1 whitespace
        
        #print(quote, end="")
        # some lines are empty due to vertical whitespaces
        if quote != "":
            f_write.write(f"{i}. {quote}\n")
            i+=1

---
## Extract authors

Strategy
- check for digits to find the quotes
- extract first name and last names
- join them into full name 
- extract unique values

In [80]:
with open("../Files/quotes_cleaned.txt", "r") as f_read, open("../Files/quotes_cleaned.txt", "a") as f_append:
    quotes = [quote.strip("\n") for quote in f_read.readlines() if quote[0].isdigit()] # save if it's a quote by searching the number
    authors = [quote.split()[-2:] for quote in quotes] # gets first name and last name
    print(authors)
    authors = set([" ".join(author) for author in authors]) # a set contains only the unique values
    print(authors)

    f_append.write("\nAuthors: ")
    for author in authors: 
        f_append.write(f"{author}, ")

[['Albert', 'Einstein'], ['Terry', 'Pratchett'], ['Niels', 'Bohr'], ['Albert', 'Einstein'], ['Marie', 'Curie'], ['Isaac', 'Newton']]
{'Niels Bohr', 'Terry Pratchett', 'Isaac Newton', 'Albert Einstein', 'Marie Curie'}


---

Kokchun Giang

[LinkedIn][linkedIn_kokchun]

[GitHub portfolio][github_portfolio]

[linkedIn_kokchun]: https://www.linkedin.com/in/kokchungiang/
[github_portfolio]: https://github.com/kokchun/Portfolio-Kokchun-Giang

---