# Record and Operations

This section gives an exmaple of writing and reading of a text file that has multiple data elements in each line. It also covers the os file operations and ends with file updating.

## 1 Records and Fields

It is a convention for a text file to use `records` and `fields` to organize data.  A `record` is a complete set of data that describe an entity. A record consists a single piece or multiple pieces of data. A single piece of data is called a `field`. For example

- an employee entity has `id`, `name`, `department` to describe an employee information. The `id`, `name`, `department` are fields of a record. Each record has data for each employee.
- a student may have `name`, `score` to describe the testing score for a student. A course has mulitple records and each record represents one student. The `name`, `score` are fields of a student record.


## 2 Processing Records

There are many ways to process record data in a text file. Two popular options are:

- a line as a record: each line represents a record. Use field delimiter to separate different files in a record.
- a line as a field: each line represents a field. Use the number of fields to determine the record boundary. For example, for the above employee record that has three fields of `id`, `name` and `department`, three lines form a record.

Most applications use the method of `a line as a record` because it is more compact and easy to process. For this method, there are many different ways to delimit the fields in a line. The two most popular methods are:

- white space delimiter: it uses a white space character to seprate different fields. The white space characters include ` ` (space), `\t` (horizontal tab), `\v` (vertical tab), `\f` (feed), `\r` (carriage return). The string method [`split()`](https://docs.python.org/3/library/stdtypes.html#str.split) can be used to split fields of a line. `\n` (new line) is also a white space character but it is used for record delimiter.
- comma separated value (CSV): this is the most common import and export format for spreadsheets and database. Python has a built-in [`csv` module](https://docs.python.org/3/library/csv.html) to process csv files.

For simplicity, this document only covers the read/write operation of text files that use white space delimiter.

## 3 Writing Record

Writing is simple because all you need is to write a string that has all the fields and ends with a new line using `write` method. You can separate fields using any white space character but one or more spaces are common. 

In [1]:
FILENAME = 'scores.txt'
WRITE_MODE = 'w'

with open(FILENAME, WRITE_MODE) as names_file:
    names_file.write('Alice 97\n')

    # demo of writing variable data
    name = 'Bob'
    score = 93
    names_file.write(f'{name} {score}\n')

    names_file.write('Cindy 95\n')

## 4 Reading Record

Reading a record needs to get the fields from a record. The `str.split()` makes the task simple. 

In [2]:
FILENAME = 'scores.txt'
READ_MODE = 'r'

with open(FILENAME, READ_MODE) as names_file:
    for line in names_file:
        name, score = line.split()
        print(f'Name: {name}, Score: {score}')

Name: Alice, Score: 97
Name: Bob, Score: 93
Name: Cindy, Score: 95


The `line.split()` create a list of fields from a text line. You use multple variable names on the left hand side to receive the field values. You can also process the list directly as the following:

In [3]:
FILENAME = 'scores.txt'
READ_MODE = 'r'

with open(FILENAME, READ_MODE) as names_file:
    for line in names_file:
        fields = line.split()
        print(f'Field1: {fields[0]}, Field2: {fields[1]}')

Field1: Alice, Field2: 97
Field1: Bob, Field2: 93
Field1: Cindy, Field2: 95


The `fields[0]` and `fields[1]` are used to get the first and second element of a list.

## 5 Removing and Renaming Files

The built-in [`os` module](https://docs.python.org/3/library/os.html) is used to provide operating system functions such as removing, renaming and checking existence of a file. Following is an example.

In [4]:
import os # first import the module

FILENAME1 = 'test1.txt'
FILENAME2 = 'test2.txt'
WRITE_MODE = 'w'

def checkExistence(location):
    """check the existence of the two files"""
    isExist1 = os.path.exists(FILENAME1)
    isExist2 = os.path.exists(FILENAME2)
    print(f'[{location}] {FILENAME1} exists: {isExist1}, {FILENAME2} exists: {isExist2}')

# create a file first
with open(FILENAME1, WRITE_MODE) as names_file:
    names_file.write('test message\n')
checkExistence('After create')

os.rename(FILENAME1, FILENAME2)
checkExistence('after rename')

os.remove(FILENAME2)
checkExistence('after remove')



[After create] test1.txt exists: True, test2.txt exists: False
[after rename] test1.txt exists: False, test2.txt exists: True
[after remove] test1.txt exists: False, test2.txt exists: False


## 6 Updating Text Files

It is simple to append data to a file: just use the append mode using `open(filename, 'a')` and write data. However, update existing file content is tricky because the file content is stored in disk in a fixed format. The following three records:

```text
Alice 97
Bob 93
Cindy 95
```

are stored in a disk as `Alice 97\nBob 93\nCindy 95\n`. It is easy to change a field with a shorter one and leave some spaces. But chaning a field to a longer one will move all later content to a different place. For this reason, the pattern to update file is a multi-step task:

- open a source file for read
- open a temp file for write
- read data from the source file, make required changes and save to the temp file
- remove the source file
- rename the temp file to the name of the source file

The following is an example to change Bob's socre to `90` and change the name `Cindy` to `Cynthia`.


In [5]:
import os

FILENAME = 'scores.txt'
TEMP_FILE = 'temp'
READ_MODE = 'r'
WRITE_MODE = 'w'

source_file = open(FILENAME, READ_MODE)
temp_file = open(TEMP_FILE, WRITE_MODE)
with source_file, temp_file:
    for line in source_file:
        name, score = line.split()
        
        if name == 'Bob':
            score = 90
        
        if name == 'Cindy':
            name = 'Cynthia'
        
        temp_file.write(f'{name} {score}\n')

os.remove(FILENAME)
os.rename(TEMP_FILE, FILENAME)