# Files

This section introduces the file concepts and operations that read, write and manipulate files.

## 1 File and Database

Variables represent memory data that will be lost when the computer turns off. The memory is considered `primary storage` because it is manipulated by CPU instructions (programs) directly. To permanently store data, you use `secondary storage`, a hard driver or solid state disk (SSD).

There are two common ways to store data in secondary storage: files and databases. You use file to store data in any structure/format you like. You use database to store data that follows a certain well-defined structure. The most popular database is [relational database](https://en.wikipedia.org/wiki/Relational_database) that store data in table structure. The boundary between the two is not clear because databases use files to store its data and enforce a structure. Recently, the [document-oriented database](https://en.wikipedia.org/wiki/Document-oriented_database) allows more flexbility in data structure.


This section only covers file operations.

## 2 Types of Files

Files can be categorized by the access mode (`random` or `sequential`) or data encoding (text or binary). 

Access mode is the order mechanism used by Python to access file data.

- Random Access: you can go to any position and access the data there inside a file.
- Sequential Access: you read/write files sequentially from the beginning to the end. We cover the sequential access mothod because most large secondary storage devices support the sequential access.

Data encoding is the method Python uses to interprete the file data.

- Text: a text file contains data encoded as a sequence of characters. The actual character can be represented by different coding format such as `ascii` or `Uniocde`. Many text files use first two or three bytes to mark the encoding type. Python will handle this for you automatically and it works correctly most of the time.
- Binary: a binary file is treated as a sequence of bytes. It is up to the program to interprete the meaning of the data. Multi-media data such as video/audio/picture often use binary file.




## 3 Files are Resources

Files and databases are stored in secondary storage. To access files or databases, Python creates some data objects in memory to represent a file or a database connection. These objects uses system resources such as memory, process (a running program is a process), file descriptor etc. Therefore files and databases are often called resources in programming language.

The resource concept is important because each computer only has a limited amount of any resoruces. A typical computer has 1GB to 1TB memory, a maximum of 32_767 (x86) or 4_194_303 (x86_64) processes in Linux and a limit of 65_535 file descriptors in Linux. A program should return a resource after use to let other programs to use the resource. In terms of file operation, you should `close` a file after you `open` and use the file. It is easy to forget to `close` a file after using it and cause resource-leak. To prevent this from happening, you shouldn't manually `close` a file. Use the Python `with` statement when you open a file and Python will close the file automatically.

## 4 Opening a File

### 4.1 Syntax

To open a file, use the following syntax

```python
file_variable = open(filename, mode)
with file_variable: 
    # read/write file operations
```

Or combine the `with` and `open` using:

```python
with open(filename, mode) as file_variable:
    # read/write file operations
```

You can use both but the first syntax is used when you open multiple files:

```python
file_variable1 = open(filename1, mode)
file_variable2 = open(filename2, mode)
with file_variable1, file_variable2: 
    # read/write the two files
```

### 4.2 `filename`

The `filename` can be a filename with or without a path. If the filename doesn't have a path prefix, the path is the current path that you run the Python program.

The path prefix is different in Windows and Linux/MacOS. In Windows, a path uses backslash to separate folders and is like `'C:\Users\Alice\tmp\data.txt'`. In Lunix/MacOS, a paht uses slash to seperate folder and is like `'/users/alice/tmp/data.txt'`.

### 4.3 `mode`

There are several `mode` values in Python but the three most used modes are:

- `'r'`: reading-only. You can only read data from the file.
- `'w'`: writing. If the file already exits, erase its content. Otherwise, creat a new file with the specified filename.
- `'a'`: appending. All data written to the file will be appended to the end.

**Warning**: be careful when you use the `'w'` mode because it alway starts with an empty content. If the file already exists, the existing content will be erased.

## 5 Writing Data

Use the `write(data)` method of a file object to write `data` to a file. You write strings to a text file as the following.

In [None]:
FILENAME = 'names.txt'
WRITE_MODE = 'w'

with open(FILENAME, WRITE_MODE) as names_file:
    names_file.write('Alice\n')
    names_file.write('Bob\n')
    names_file.write('Cindy\n')

If you run the above code, it creates a file `names.txt` in the current folder that as the following content:

```text
Alice
Bob
Cindy
```

The `with` statement will close the file automatically after the three write operation in its code block. If you don't use `with` statement, the code will be:

In [None]:
FILENAME = 'names.txt'
WRITE_MODE = 'w'

# not recommended
names_file = open(FILENAME, WRITE_MODE) 

names_file.write('Alice\n')
names_file.write('Bob\n')
names_file.write('Cindy\n')

names_file.close() # to prevent leaking resource

The above code is not recommended because file operation may cause errors/exceptions and the resource is leaked (not freed) when an error occurs. Use `with` statement for file operations.


6 Reading Data

In [6]:
FILENAME = 'names.txt'
READ_MODE = 'r'

with open(FILENAME, READ_MODE) as names_file:
    for line in names_file:
        print(line)