# Text Files

### Read and Write Text Files

- A file usually contains more than one line of text (multi-line strings).
- Python uses the newline character (\n) to mark line breaks.

<img src="images/42.png" alt="The Answer is 42" style="width: 200px;"/>

In [1]:
print("The\n\nanswer\nis 42.")

The

answer
is 42.


<img src="images/solar_panels.jpg" alt="Solar Panels" style="width: 250px;"/>

In [2]:
f = open('solar.txt', 'r')  # Open the file
data = f.read()  # Read all the contents as a single, large string
f.close()  # Close the file
data

'6AM\t0\n7AM\t0.1\n8AM\t7.1\n9AM\t15.3\n10AM\t53.4\n11AM\t78.0\n12PM\t93.8\n1PM\t103.5\n2PM\t106.1\n3PM\t100.9\n4PM\t90.2\n5PM\t74.1\n6PM\t52.1\n7PM\t23.6\n8PM\t3.3\n9PM\t0'

<img src="images/solar_txt.png" alt="solar.txt" style="width: 225px;"/>

- Using the `with` keyword to deal with file objects.
- File is properly closed after its use finishes, even if an exception is raised at some point. 

In [3]:
with open('solar.txt','r') as f:  # Open solor.txt in the reading mode
    data = f.read()
f.closed  # Check if the file is properly closed

True

In [4]:
data

'6AM\t0\n7AM\t0.1\n8AM\t7.1\n9AM\t15.3\n10AM\t53.4\n11AM\t78.0\n12PM\t93.8\n1PM\t103.5\n2PM\t106.1\n3PM\t100.9\n4PM\t90.2\n5PM\t74.1\n6PM\t52.1\n7PM\t23.6\n8PM\t3.3\n9PM\t0'

- `f.read(size)` reads `size` characters from the current position.
- If the end of the file has been reached, `f.read()` will return an empty string (`''`).

In [5]:
f = open('solar.txt', 'r')
f.read(8)  # Read the first 8 character (starting with 1 from left) in the file

'6AM\t0\n7A'

In [6]:
f.read(1)  # Read one more character

'M'

In [7]:
f.read()  # Read the rest

'\t0.1\n8AM\t7.1\n9AM\t15.3\n10AM\t53.4\n11AM\t78.0\n12PM\t93.8\n1PM\t103.5\n2PM\t106.1\n3PM\t100.9\n4PM\t90.2\n5PM\t74.1\n6PM\t52.1\n7PM\t23.6\n8PM\t3.3\n9PM\t0'

In [8]:
f.read()  # Reach the end

''

In [9]:
f.close()

- `f.readline()` reads the next line from the file.
- If the end of the file has been reached, `f.readline()` will return an empty string (`''`).

In [10]:
with open('solar.txt','r') as f:
    for i in range(5):  # Read the first 5 lines
        line = f.readline()
        print(line, end='')  # Exclude the newline characters ('\n') at the end of each line

6AM	0
7AM	0.1
8AM	7.1
9AM	15.3
10AM	53.4


- Read the entire file line-by-line. This is memory efficient and fast:

In [11]:
with open('solar.txt','r') as f:
    for line in f:
        print(line, end='')

6AM	0
7AM	0.1
8AM	7.1
9AM	15.3
10AM	53.4
11AM	78.0
12PM	93.8
1PM	103.5
2PM	106.1
3PM	100.9
4PM	90.2
5PM	74.1
6PM	52.1
7PM	23.6
8PM	3.3
9PM	0

- `list(f)` and `f.readlines()` read all the lines of a file into a list.

In [12]:
with open('solar.txt','r') as f:
    for line in list(f):
        print(line, end='')

6AM	0
7AM	0.1
8AM	7.1
9AM	15.3
10AM	53.4
11AM	78.0
12PM	93.8
1PM	103.5
2PM	106.1
3PM	100.9
4PM	90.2
5PM	74.1
6PM	52.1
7PM	23.6
8PM	3.3
9PM	0

In [13]:
with open('solar.txt','r') as f:
    for line in f.readlines():
        print(line, end='')

6AM	0
7AM	0.1
8AM	7.1
9AM	15.3
10AM	53.4
11AM	78.0
12PM	93.8
1PM	103.5
2PM	106.1
3PM	100.9
4PM	90.2
5PM	74.1
6PM	52.1
7PM	23.6
8PM	3.3
9PM	0

- `f.write(string)` writes the contents of `string` to the file, returning the number of characters written. 
- If you open an existing file for writing, you will overwrite the file's contents.
- If the named file does not exist, a new one is created.

In [14]:
f = open('my_out.txt','w')
print('The\n\nanswer\nis 42.', file=f)
f.close()

<img src="images/my_out.png" alt="myout.txt" style="width: 225px;"/>

In [15]:
with open('my_out.txt','w') as f:  # Open the file in the writing mode
    f.write('The\n\nanswer\nis 42.')

- After a file object is closed, either by a with statement or by calling f.close(), attempts to use the file object will automatically fail.

In [16]:
f.close()
f.read()

ValueError: I/O operation on closed file.

### Example: Generating Usernames

- Batch mode processing is where program input and output are done through files (i.e., the program is not interactive).

In [17]:
with open('realnames.txt', 'r') as f_in, \
     open('usernames.txt', 'w') as f_out:
    for line in f_in:  # Process each line
        first, last = line.split()
        username = (first[0] + last[:7]).lower()
        print('{:>30} ==> {:<10}'.format(line[:-1], username))
        f_out.write(username + '\n')
    print("Usernames are stored in usernames.txt.")

             George Washington ==> gwashing  
                    John Adams ==> jadams    
              Thomas Jefferson ==> tjeffers  
                 James Madison ==> jmadison  
                  James Monroe ==> jmonroe   
                    John Adams ==> jadams    
                Andrew Jackson ==> ajackson  
               Martin VanBuren ==> mvanbure  
              William Harrison ==> wharriso  
                    John Tyler ==> jtyler    
                    James Polk ==> jpolk     
                Zachary Taylor ==> ztaylor   
              Millard Fillmore ==> mfillmor  
               Franklin Pierce ==> fpierce   
                James Buchanan ==> jbuchana  
               Abraham Lincoln ==> alincoln  
                Andrew Johnson ==> ajohnson  
                 Ulysses Grant ==> ugrant    
              Rutherford Hayes ==> rhayes    
                James Garfield ==> jgarfiel  
                Chester Arthur ==> carthur   
              Grover Cleveland ==>

<img src="images/realnames_usernames.png" alt="realnames.txt and usernames.txt" style="width: 500px;"/>

### Example: Student Cumulative GPA

- Assume the information of all the courses taken by a student is stored in a data file, `courses.txt`. <img src="images/courses.png" alt="Courses.txt" style="width: 400px;"/>
- Step 1: The grade points of each course = the letter grade $\times$ the credit hours.
    - A: 4.0 &nbsp;&nbsp; A-: 3.7 &nbsp;&nbsp; B+: 3.3 &nbsp;&nbsp; B: 3.0 &nbsp;&nbsp; B-: 2.7 &nbsp;&nbsp; C+: 2.3 &nbsp;&nbsp; C: 2.0 &nbsp;&nbsp; C-: 1.7 &nbsp;&nbsp; D: 1.0 &nbsp;&nbsp; F: 0

|Course ID | Course Name            | Credit Hours| Letter Grade | Grade Points (Each Course) |
|:---------|:-----------------------|:------------|:-------------|:---------------------------|
|ISE211    | Engineering Economics  | 4           | B+           | $4\times3.3=13.2$          |
|ISE295    | Undergraduate Seminars | 1           | C            | $1\times2.0=2.0$           |
|ISE314    | Computer Programming   | 4           | A            | $4\times4.0=16.0$            |
|ISE364    | Project Managemnt      | 3           | A-           | $3\times3.7=11.1$          |
|ISE420    | Operations Research    | 4           | A            | $4\times4=16.0$              |

- Step 2: Total credit hours $=4+1+4+3+4=16$, Total grade points $=13.2+2.0+16.0+11.1+16.0=58.3$.
- Step 3: Cumulative GPA = (Total grade points)$/$(Total credits) $=58.3/16=3.64$.

In [18]:
letter_equivalent = {'A': 4.0, 'A-': 3.7, 'B+': 3.3, 'B': 3.0, 'B-': 2.7, 'C+': 2.3, 'C': 2.0, 'C-': 1.7, 'D': 1.0, 'F': 0}
t_credits = 0  # Total credit hours
t_gps = 0  # Total grade points

with open('courses.txt','r') as file:
    header = file.readline()  # Read the header (first line) of the table
    print(header)
    data = file.readlines()  # Read all the remaining lines, which contain the course data
    print(data)
    for line in data:  # Process each course
        course = line.split(',')
        print(course)
        credit = float(course[2])
        letter = course[3].strip()  # Remove the leading space and the ending '\n'
        gps = credit * letter_equivalent[letter]
        t_credits += credit
        t_gps += gps

GPA = t_gps / t_credits
print('GPA =', GPA)

Course ID, Course Name, Credit Hours, Letter Grade

['ISE211, Engineering Economics, 4, B+\n', 'ISE295, Undergraduate Seminars, 1, C\n', 'ISE314, Computer Programming for Engineers, 4, A\n', 'ISE364, Engineering Project Managemnt, 3, A-\n', 'ISE420, Operations Research, 4, A']
['ISE211', ' Engineering Economics', ' 4', ' B+\n']
['ISE295', ' Undergraduate Seminars', ' 1', ' C\n']
['ISE314', ' Computer Programming for Engineers', ' 4', ' A\n']
['ISE364', ' Engineering Project Managemnt', ' 3', ' A-\n']
['ISE420', ' Operations Research', ' 4', ' A']
GPA = 3.64375


### Course Materials on YouTube and GitHub

- Course videos are hosted by YouTube ( http://youtube.com/yongtwang ).
- Course documents (Jupyter Notebooks and Python source code) are hosted by GitHub ( http://github.com/yongtwang ).