# UNCLASSIFIED

Transcribed from FOIA Doc ID: 6689693

https://archive.org/details/comp3321

## (U) Introduction: Getting Dangerous 

(U) As you probably already know, input and output is a core tool in algorithm development and reading from and writing to files is one of the most common forms. Let's jump right in just to see how easy it is to write a file. 

In [1]:
myfile = open('data.txt', 'w') 
myfile.write("I am writing data to my file") 
myfile.close()

(U) And there you have it! You can write data to files in Python. By the way, the variables you put into that `open` command are the filename (as a string--do not forget the path) and the file _mode_. Here we are writing the file, as indicated by the `'w'` as the second argument to the `open` function. 

(U) Let tear apart what we actually did. 

In [2]:
open('data.txt', 'w') 

<_io.TextIOWrapper name='data.txt' mode='w' encoding='UTF-8'>

(U) This actually returns something called a _file object_. Let's name it! 

(U) **Danger:** Opening a file that already exists for writing **will erase the original file.**

In [6]:
myfile = open('data.txt', 'w') 

(U) Now we have a variable to this file object, which was opened in write mode. Let's try to write to the file: 

In [7]:
myfile.write("I am writing data to my file object") 

35

In [8]:
myfile.read() # Oops.. .notice the error 

UnsupportedOperation: not readable

In [9]:
myfile.close() # Guess what that did... 

(U) There are only a few file modes which we need to use. You have seen `'w'` (writing). The others are `'r'` (reading), `'a'` (appending), `'r+'` (reading and writing), and `'b'` (binary mode). 

In [10]:
myfile = open('data.txt', 'r') 

In [11]:
myfile.read()

'I am writing data to my file object'

In [12]:
myfile.write("I am writing more data to my file") # Oops again...check our mode 

UnsupportedOperation: not writable

In [13]:
mydata = myfile.read()

In [14]:
mydata # HEY! Where did the data go.... 

''

In [15]:
myfile.close() # don't be a piggy 

(U) A cool way to use contents of a file in a block is with the `with` command. Formally, this is called a context manager. Informally, it ensures that the file is closed when the block ends. 

In [18]:
with open('data.txt') as f:
    print(f.read())

print("Hello")

I am writing data to my file object
Hello


(U) Using `with` is a good idea but is usually not absolutely necessary. Python tries to close files once they are no longer needed. Having files open is not usually a problem, unless you try to open a large number all at once (e.g. inside a loop). 

## (U) Reading Lines From Files 

(U) Here are some of the other useful methods for file objects: 

In [19]:
lines_file = open('fewlines.txt', 'w')

In [20]:
lines_file.writelines("first\n")

In [21]:
lines_file.writelines(["second\n", "third\n"]) 

In [22]:
lines_file.close() 

(U) Similarly: 

In [23]:
lines_file = open('fewlines.txt', 'r') 

In [24]:
lines_file.readline()

'first\n'

In [25]:
lines_file.readline() 

'second\n'

In [26]:
lines_file.readline() 

'third\n'

In [27]:
lines_file.readline()

''

(U) And make sure the file is closed before opening it up again in the next cell 

In [28]:
lines_file.close()

(U) Alternately: 

In [29]:
lines = open('fewlines.txt', 'r').readlines() # Note the plurality 

In [30]:
lines

['first\n', 'second\n', 'third\n']

(U) **Note:** Both `read` and `readline(s)` have optional size arguments that limit how much is read. For `readline(s)`, this may return incomplete lines. 

(U) But what if the file is very long and I don't need or want to read all of them at once, file objects behave as their own iterator. 

In [33]:
lines_file = open('fewlines.txt', 'r') 

for line in lines_file: 
    print(line, end="") 

first
second
third


(U) The below syntax is a very common formula for reading through files. Use the `with` keyword to make sure everything goes smoothly. Loop through the file one line at a time, because often our files have one record to a line. And do something with each line. 

In [35]:
with open('fewlines.txt') as my_file:
    for line in my_file: 
        print(line.strip()) # The strip function removes newlines and whitespace from the start and finish

first
second
third


(U) The file was closed upon exiting the `with` block. 

## (U) Moving Around With `tell` and `seek` 

(U) The `tell` method returns the current position of the cursor within the file. The `seek` command sets the current position of the cursor within the file. 

In [36]:
inputfile = open('data.txt', 'r')

In [37]:
inputfile.tell()

0

In [38]:
inputfile.read(4)

'I am'

In [41]:
inputfile.tell()

0

In [40]:
inputfile.seek(0)

0

In [42]:
inputfile.read()

'I am writing data to my file object'

## (U) File-Like objects 

(U) There are other times when you really need to have data in a file (because another function requires it be read from a file perhaps). But why waste time and disk space if you already have the data in memory? 

(U) A very useful module to make a string into a file-like object is called `StringIO`. This will take a string and give it file methods like read and write. 

In [44]:
import io 

In [45]:
mystringfile = io.StringIO() # For handing bytes, use io.BytesIO

In [46]:
mystringfile.write("This is my data!") # We just wrote to the object, not a filehandle 

16

In [48]:
mystringfile.read() # Cursor is at the end! 
mystringfile.tell()

16

In [49]:
mystringfile.seek(0)

0

In [50]:
mystringfile.read()

'This is my data!'

In [51]:
newstringfile = io.StringIO("My data") # The cursor wiLL automatically be set to 0 

In [52]:
newstringfile.read()

'My data'

(U) Now let's pretend we have a function that expects to read data from a file before it operates on it. This sometimes happens when using library functions. 

In [53]:
def iprintdata(f): 
    print(f.read()) 

In [54]:
iprintdata('mydata') # Grrr! 

AttributeError: 'str' object has no attribute 'read'

In [55]:
my_io = io.StringIO('mydata') 

In [56]:
iprintdata(my_io) # YAY!

mydata


## (U) Lesson Exercises 

### (U) Get the data 

(U) Use the file `sonnet.txt` from the git repository.

### (U) Exercise 1 

(U) Write a function called `file_capitalize()` that takes an input file name and an output file name, then writes each word from the input file with only the first letter capitalized to the output file. Remove all punctuation except apostrophe. 

```python
capitalize('sonnet.txt', 'sonnet_caps.txt') # capitalized words written to sonnet_caps.txt
```

"Summer'S"

### (U) Exercise 2 

(U) Write a function called `file_word_count()` that takes a file name and returns a dictionary containing the counts for each word. Remove all punctuation except apostrophe. Lowercase all words. 

```python
file_word_count('sonnet.txt') # {'it': 4, 'me': 2, ... }
```

### (U) Extra Credit 

(U) Write the counts dictionary to a file, one key:value per line. 

# UNCLASSIFIED

Transcribed from FOIA Doc ID: 6689693

https://archive.org/details/comp3321