# Files

One of the most fundamental and common way to persist or retrieve information is from files.
There are many specialised software libraries that we can use within python to work with common file types.

Therefore, here we give a generic overview how to access files, read and write information. \
We will use a basic text file for this.


## Reading

To read from a file, we first need to check if it exists, then open it.
Python provides a nice way of doing this with the ```with``` keyword. \
Under the hood, the ```with``` statement provides a context manager for unmanaged ressources (such as files, access to databases, URLs, etc). 
We can use it to avoid a try-except construction, and the statement also cleans up afterwards, closes the file, etc.

The general syntax is:
```
with open(file_name, 'r') as file:
    # do something with the file
```

We note:
* The call to ```open()``` opens the file and returns an object that we can then use. 
* We specify the path and name of the file as the first argument.
* We open the file in read-only mode (to prevent accidental write access or because we do not have write privileges)


The following example reads the whole content of the file. \
*Note*: Remeber the comment about generators, reading the whole file works well here - but not for large file.

In [3]:
file_name = 'example_file.txt'

with open(file_name, 'r') as file:
    contents = file.read()

print(contents)

This is the first line.
This is the second line.
This is the third line.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc semper, massa efficitur efficitur dictum, augue massa varius felis, in laoreet dui felis non ante.



We can also read line by line:

We can use the method ```rstrip()``` to remove any trailing characters such as spaces (which is also the default argument)

In [5]:
file_name = 'example_file.txt'

with open(file_name, 'r') as file:
    for line in file:
        print(line.rstrip())

This is the first line.
This is the second line.
This is the third line.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc semper, massa efficitur efficitur dictum, augue massa varius felis, in laoreet dui felis non ante.


Using the qualifier ```w``` we can write (and overwrite) files, using ```a```, we append to files, rather than ovewrite them.

**Exercise** \
Write the first 10 digits of the Fibonacci series to a file.

**Hints**
* Writing to a file works with strings, you need to convert the number to a string.
* Improve readability by adding a new line after each number. You can do this by "adding" another string ```' \n'``` to the converted number. This is an "escape character", that is interpreted by output methods (such as, e.g. writing to a file, print). [Other escape characters](https://www.w3schools.com/python/gloss_python_escape_characters.asp) exist as well.

In [None]:
# ... your code here ....

## Reading files in chunks

If we want to make sure that we do not run into memory problems, we can use the ```yield``` statement and define a generator.
Here we use, that the method ```read(chunk_size)``` takes an optional argument that defines how much we read at a time.


In [10]:
def read_chunk (file, chunk_size=64):
    while True:
        content = file.read(chunk_size)
        if not content:
            break
        yield content
    
file_name = 'example_file.txt'
with open(file_name) as f:
    for chunk in read_chunk(f):
        print(chunk)
        print('---')

This is the first line.
This is the second line.
This is the thi
---
rd line.

Lorem ipsum dolor sit amet, consectetur adipiscing eli
---
t.
Nunc semper, massa efficitur efficitur dictum, augue massa va
---
rius felis, in laoreet dui felis non ante.

---


Note that now we do not read the entire file, nor the line (e.g. the file might not have line breaks) but chunks of size "chunk_size", whether or not they coincide with lines.
To recover lines, we would have to post-process what we read - however, if we process large files we do not run a risk that we overload our system.