# Assignment 17: File I/O #

### Goals for this Assignment ###

By the time you have completed this assignment, you should be able to:

- Use `with` and `open` to open files for reading or writing
- Use `for...in` with a file to read files line-by-line
- Use `str`'s `rstrip` method to remove whitespace on the righthand side of a string
- Use the `write` method to write something to a file

## Step 1: Compute the Average of Numbers in a File ##

### Background: Opening Files and Reading Lines from a File ###

Programs very commonly work with files.
For example, Jupyter notebook itself works with `.ipynb` files.
Specifically, Jupyter will open a `.ipynb` file, read its contents, and use this information to display cells to the programmer.
These cells will possibly be pre-populated with text or code, like the text you're reading right now.
The contents of the `.ipynb` file itself were created by Jupyter notebook; specifically, I created the cells, wrote contents in them, and then told Jupyter notebook to save all this information as a `.ipynb` file.
The contents of the `.ipynb` file is in a _format_ which is recognized by Jupyter notebook; a format is really just a way of saying that we agree that certain information is represented a certain way, and so we will interpret information according to whatever that format is.

In this assignment, you'll write programs which directly work with files, separate from the `.ipynb` files used by Jupyter notebook.
The actual format used will be simplistic: the file will represent a list of numbers, where each number is represented with a bunch of text digits.
The numbers will be separated by newlines; i.e., each number will be on its own line.

Before we can read from or write to a file, we must first open the file.
In Python, this is accomplished with `with open`.
(A later assignment will go into the `with` part in more detail; right now you don't need to worry about all the details behind `with`.)
When opening the file, we need to tell Python the name of the file we are trying to open, as well as whether or not we are trying to open it for reading or writing.
For this first part, we will focus only on reading from files.
From there, Python may fail to open the file.
For example, if we request to open a file for reading which does not exist, then there is nothing to read from, and therefore we would get an error.

If, however, the file does exist and we can successfully open it, this creates a _file handle_.
A file handle is a representation of an open file, and allows for subsequent read or write operations.
This is where the name comes from - you can think of this as a handle on a file, much like a handle on a cooking pot.
In Python, a file handle is specifically represented with an object, and there are a variety of methods available on the file handle.
[This link to the official Python documentation](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) provides more details about what file handle objects can do and what methods they provide, but for here we will only focus on the bare essentials (and incidentally the most commonly-used operations).
Once we have a file handle open for reading, we can use a `for...in` loop to read the file line-by-line.

Putting all of this together, the following code snippet reads the file `numbers.txt`, and prints out the contents of `numbers.txt`, line-by-line.
`numbers.txt` is included with this assignment on Canvas, and will need to be downloaded from there and put into the same directory as the corresponding `.ipynb` file.

In [2]:
with open("numbers.txt", "r") as file_handle:
    for line in file_handle:
        print(line)

-35

66

-50

-47

87

80

71

0

-69

45

37

11

-89

-67

76

-59

6

-65

75

33

54

-68

-6

7

91

66

-57

40

60

51

9

29

20

90

-16

24

-65

-16

-50

5

64

41

-32

-78

-58

74

13

21

-23

-88

65

-9

50

10

-11

87

79

-56

16

-17

5

-58

21

44

-66

-6

71

-100

84

-56

-90

-29

-66

-25

-47

77

-67

14

-32

55

16

73

64

48

-83

45

39

67

-96

-13

-7

48

98

81

98

-23

80

-6

-21

45



Let's break down what this code is doing.

- `open("numbers.txt", "r")` attempts to create a filehandle.  This tries to open the file named `"numbers.txt"`, specifically for reading (`"r"`).
- The `with...as` part will be more properly explained later in the course, but the filehandle itself will be bound to the new variable `file_handle`.  Variable `file_handle` will be in scope in the body of the `with...as`.
- The `for line in file_handle` reads one line at a time from the file.  Variable `line` will be bound to the specific line read in, one at a time, and will be a string (`str`).
- The `print(line)` will print out the contents of the current line (`line`)

If you run this code, you'll see that the output may look a little odd.
Specifically, after each number you see, there is a blank line.
The reason for this blank line is that the `for...in`, upon reading each line, _includes_ the newline character (`"\n"`) at the end of each line.
That is, each number in the file is separated by a newline character (this is how each number can be on its own line in the first place), and this newline character ends up being included from `for...in`.
`print` itself automatically puts a newline character at the end of whatever you're printing, and therefore each number in the output ends up getting separated by _two_ newlines: one from `numbers.txt` itself, and another from `print`.

If this behavior is undesired, a common way to fix this problem is by calling the `rstrip` method on each individual line.
This is shown in the cell below:

In [3]:
with open("numbers.txt", "r") as file_handle:
    for line in file_handle:
        print(line.rstrip())

-35
66
-50
-47
87
80
71
0
-69
45
37
11
-89
-67
76
-59
6
-65
75
33
54
-68
-6
7
91
66
-57
40
60
51
9
29
20
90
-16
24
-65
-16
-50
5
64
41
-32
-78
-58
74
13
21
-23
-88
65
-9
50
10
-11
87
79
-56
16
-17
5
-58
21
44
-66
-6
71
-100
84
-56
-90
-29
-66
-25
-47
77
-67
14
-32
55
16
73
64
48
-83
45
39
67
-96
-13
-7
48
98
81
98
-23
80
-6
-21
45


If run, the above code will not have the extra newline characters in play.
The reason why is that `rstrip` returns a new string, one where any whitespace on the righthand-side of the string is removed.
In other words, while `line` may have a newline character at the end, the result of `line.rstrip()` will not.
As such, the only newline characters in the output come from `print`.

### Try this Yourself ###

For this step, you'll need to write a `file_average` function, which takes the name of a file to open.
You'll need to make sure you have both `numbers.txt` and `other_numbers.txt` downloaded from Canvas, and placed into the same directory as the `.ipynb` file.
`file_average` should take the name of a file to open.
From there, `file_average` should:

- Read the file line-by-line
- For each line, get the integer representation of the line
- Sum all the integers in the file
- Count the number of integers which were in the file
- Using the sum and the count of the integers, compute the average of the file, and return this average

The next cell has calls to `file_average`, and shows the expected output in the comments based on the contents of `numbers.txt` and `other_numbers.txt`.
Leave these calls in place in order to test your code.
Define `file_average` in the next cell.

In [14]:
# Define your file_average function here.  Leave the calls below for testing
def file_average(file_name):
    listFile = []
    with open(file_name, "r") as file_handle:
        for line in file_handle:
           listFile.append(int(line))
    #print(listFile)
    if listFile: 
        average = sum(listFile) / len(listFile)
        return average  
  
print(file_average("numbers.txt")) # should print 8.04
print(file_average("other_numbers.txt")) # should print 3.38

8.04
3.38


## Step 2: Write First `n` Values of Factorial to a File ##

### Background: Writing to Files ###

Using `open` in Python, you can also open files for writing.
However, you instead pass `"w"` as the second parameter, as opposed to `"r"`.
From there, you can call the `write` method on the file handle to actually write values to it.
This is shown in the following cell:

In [15]:
with open("example_file.txt", "w") as file_handle:
    file_handle.write("foo")
    file_handle.write("bar")

Note that the `write` method only can write strings, so you'll need to make sure that you only write strings (and convert any non-string outputs using `str`, f-strings, or otherwise).
If run, the above cell will create a file named `"example_file.txt"` in the same directory as the `.ipynb` file.
The contents of the file will be `"foobar"`.
This may strike you as odd, as the behavior is inconsistent with `print`.
That is, `print` will put a newline at the end of whatever you print, whereas `write` will not.
The end result is that even though we made two separate calls to `write` here, there is no gap between them.

Fixing this is an easy solution: write out this newline yourself.
It's common for people to do this for each call to `write`, as with:

In [16]:
with open("example_file.txt", "w") as file_handle:
    file_handle.write("foo\n")
    file_handle.write("bar\n")

F-strings work well here too, especially if you have data that isn't originally a string.
For example:

In [17]:
some_integer = 17
some_float = 3.14
some_boolean = False

with open("other_example_file.txt", "w") as file_handle:
    file_handle.write(f"{some_integer}\n")
    file_handle.write(f"{some_float}\n")
    file_handle.write(f"{some_boolean}\n")

One thing to note about opening a file for writing: if a file with the same name already exists, then the original file will be deleted just before the new one is opened.
The technical term for this is that the file will be _clobbered_ (no, I'm not kidding).
With this in mind, if you are writing to a file, it can be good practice to ensure that you're writing to the right place beforehand.

### Try this Yourself ###

For this step, you need to define a function named `write_factorial`, which will take:

- The name of a file to write to
- A non-negative integer `n`

Given these parameters, `write_factorial` will write the first `n` factorial values to a new file with the given name.
For example, if we were to call:

```python
write_factorial("f1.txt", 4)
```

...this would create a file named `"f1.txt"`.
The contents of the file would be the following integers, one per line:

```
1
1
2
6
24
```

These values follow from the fact that:

```
0! = 1
1! = 1
2! = 2 * 1 = 2
3! = 3 * 2 * 1 = 6
4! = 4 * 3 * 2 * 1 = 24
```

Define your `write_factorial` function in the next cell.
Leave the example calls in place (with corresponding expected output in the respective files) in order to test your code.
As a hint, you likely will want to use `range` and `for...in` to iterate through all the positive integers up to (and including) `n`, and you can incrementally compute larger and larger factorial values.

In [19]:
# Define your write_factorial function here.  Leave the calls in place below for testing.
def write_factorial(file_name,n):
    with open(file_name, "w") as file_handle:
        factorial = 1
        for i in range(n + 1):
            if i == 0:
                factorial = 1
            else:
                factorial *= i
            file_handle.write(str(factorial) + "\n")

write_factorial("f1.txt", 4)
# expected contents of f1.txt after the above line is executed:
# 1
# 1
# 2
# 6
# 24

write_factorial("f2.txt", 0)
# expected contents of f2.txt after the above line is executed:
# 1

write_factorial("f3.txt", 10)
# expected contents of f3.txt after the above line is executed:
# 1
# 1
# 2
# 6
# 24
# 120
# 720
# 5040
# 40320
# 362880
# 3628800

## Step 3: Submit via Canvas ##

Be sure to **save your work**, then log into [Canvas](https://canvas.csun.edu/).  Go to the COMP 502 course, and click "Assignments" on the left pane.  From there, click "Assignment 17".  From there, you can upload the `17_file_io.ipynb` file.

You can turn in the assignment multiple times, but only the last version you submitted will be graded.