## Input -- Output

**Our goal for today**: to learn how to write open-ended programs, programs that can
respond to the data entered by the user or the data that comes from a file.

+ **Keyboard input** -- the user can enter data when prompted by the program

+ **File input–output** -- the program can read data from or write data to files. 

#### Keyboard input

One way to input data is to request it from the user. That is, you can write
programs that pause at some point and wait for the user to enter data. The code
for this is quite simple: there is a function `input()` that takes a single string
argument and returns what the user types as a string. Here’s an extremely simple
example:

In [3]:
theInput = input('Type something: ')
print('You typed "', theInput, '"',sep = '')

Type something: 35
You typed "35"


The `input()` command prints its string argument to the screen. The program
then waits for the user to type something. Once the user hits the return key, the
program prints that back with double quotes.
Notice that the string that the `input()` command types does not end with a
return or final space by default. The program above adds a space explicitly. If we
wanted to, we could add a return instead by explicitly putting that in the string
typed:

In [10]:
theInput = input('Type something:\n')

Type something:
hi


Notice too that whatever the user types is converted to a string. Thus, if the user
types `3`, it will be converted to `'3'`. Hence, if you want the user to enter numbers
or other datatypes other than strings, you must include code to convert those.

In [11]:
#collect two numbers
n1 = input('Enter a number: ')
n2 = input('Enter another number: ')
#convert to integers and add
n3 = int(n1) + int(n2)
#return result
print('The sum is:',n3)

Enter a number: 5
Enter another number: 3
The sum is: 8


A potential *downside*: the data have to be typed in by hand.

There is a potentially *desirable* aspect of entering data from the keyboard like this: the number and content of each input item can respond to the program's behavior with respect to earlier items.

Let's write a program that is a guessing game for letters of the alphabet. The program randomly
selects a letter and then the user can guess letters. The interesting part of the program is that it gives the user feedback on whether their guess is before or
after the selected letter. Thus the number of and content of each keyboard input
is dependent on the program's response to earlier inputs.

In [13]:
import random
letters = 'abcdefghijklmnopqrstuvwxyz'
#get a random letter
letter = letters[random.randint(0,25)]
#loop until the user guesses correctly
while True:
#prompt them to type a letter
    guess = input('Type a lower-case letter: ')
    #check that it’s actually a letter
    if guess not in letters:
        print("That's not a lower-case letter.")
        continue
  #if they’re right
    if guess == letter:
        print("That's right!")
        break
  #give them a hint if they’re wrong
    if guess > letter:
        print("It's earler in the alphabet.")
    else:
        print("It's later in the alphabet.")

Type a lower-case letter: 6
That's not a lower-case letter.
Type a lower-case letter: 
It's later in the alphabet.
Type a lower-case letter: r
It's later in the alphabet.
Type a lower-case letter: t
It's later in the alphabet.
Type a lower-case letter: w
It's later in the alphabet.
Type a lower-case letter: z
It's earler in the alphabet.
Type a lower-case letter: y
It's earler in the alphabet.
Type a lower-case letter: x
That's right!


There's a lot of code here, but the structure is fairly simple. First, we import from
the random module to have access to a random number generator: `randint()`.
This function generates a random integer between the two integer arguments we
give it. Here the range is based on the length of the letters string, so we can
use the output to index into that string, selecting a single random letter. Notice
that the random number generated by `random.randint()` is immediately fed
as an index to letters.

We then have an infinite `while` loop with a number of `if` tests. First, we prompt
the user to enter a letter and then then test that letter. First, we test if the user
actually entered a letter. If not, we prompt again. We then test if the user’s letter
matches the selected letter. If so, we let the user know and exit the loop. If it’s a
legal guess and doesn’t match, we then tell the user whether their guess precedes
or follows the selected letter alphabetically and continue to the next iteration.

There is one context in which `input()` can be awkward. We’ve seen that the
function returns a string which we can then convert to a number if appropriate.
What if we want the user to enter actual Python variables or functions? For example,
imagine we have three variables `x`, `y`, and `z` and we want the user to select
one so that the contents of the variable can be printed. Here’s the incorrect code:

In [14]:
#not what we want!
x = 'Tom'
y = 'Dick'
z = 'Harry'
result = input('Type x, y, or z: ')
print(result)

Type x, y, or z: x
x


To get the right result, we must *evaluate* what the user enters as a Python expression.
This can be done with the function `eval()`. Here is the revised code:

In [15]:
#set up three variables
x = 'Tom'
y = 'Dick'
z = 'Harry'
#collect user input
result = input('Type x, y, or z: ')
#evaluate and print result
print(eval(result))

Type x, y, or z: x
Tom


#### File input–output

The usual way to input or output large amounts of data is from or to files. The basic idea is that your program is written to respond to any amount of data. The file contains data of the appropriate sort and your program reads in that data and processes it either all at once or chunk by chunk.

Writing to files is a dangerous operation. If you are not careful, you can accidentally overwrite important data. Work with **copies** of the files.

Let’s begin with writing to a file. The basic logic is that you create a stream or pathway to a file, print to that stream, and then close the stream. Here’s a very simple example of this:

In [7]:
#open the file stream
outFile = open('tfile.txt','w')
#write to it
outFile.write('some text!\n')
outFile.write('...and some more text!\n')
#close the stream
outFile.close()

First, we create a stream called `outFile` with the `open()` function. The first
argument is the name of the file and the second argument indicates that we are
writing to this file. We then write to that stream twice using the stream method
`write()`. Notice that we’ve explicitly added returns (`\n`) at the end of each
`write()` command so that each bit is on its own line in the file. Notice too
that each successive write call adds to the existing file. Once we are done with
writing to the file, we close the stream with the `close()` method.

Again, be careful here. The program above creates a file. If you were
to name this file the same name as an existing file in the same directory, it would
overwrite the existing file, destroying its contents. Make sure to name your new test files in a way
that is least likely to crash with your existing files.

Let’s now look at file input. The system is basically the same. You create a file
input stream, read from it, and then close the stream. In the following example,
we read from the file we created in the previous example and print the result to the
screen.

In [9]:
#open file stream
inFile = open('tfile.txt','r')
#read form it
stuff = inFile.read()
#close stream
inFile.close()
#print contents
print(stuff)

some text!
...and some more text!



Notice that the `read()` method reads in the entire context of the file. If you want
to process the contents of the file in chunks, say lines, this is not optimal. You have
two choices here. One possibility is to break the text into lines after you’ve read
them all in as above. The following program shows how to do this:

In [1]:
#open file
inFile = open('tfile.txt','r')
#read file contents
stuff = inFile.read()
#close file
inFile.close()
#split file contents into lines
lines = stuff.split('\n')
#print lines and their lengths
for line in lines:
    print(len(line),': ',line,sep='')

10: some text!
22: ...and some more text!
0: 


In this program we read the entire contents of the file in with the `read()` method.
We then use the string method `split()` to break the file contents into lines. We
then go through those lines one by one, calculating their length and printing the
length and line.

The other possibility is to read lines from the stream and process them one by one.

In [2]:
#open file
inFile = open('tfile.txt','r')
#read from stream line by line
for line in inFile:
#print length of line and the line
    print(len(line),': ',line,sep='',end='')
#close file stream
inFile.close()

11: some text!
23: ...and some more text!


This second program produces very similar output. For very large files, this second
approach can be more efficient.

#### Alice in Wonderland

Our first step is to make sure we can read in the file. Let’s just do that and count
the lines in the file. Here’s one way to do that:

In [1]:
#counter for lines
count = 0
#open the file
f = open('alice.txt','r')
#read the file line by line
for line in f:
    count += 1
#close the file
f.close()
#print the number of lines
print('lines:',count)

lines: 1702


Let’s now save all the lines in a list:

In [2]:
#counter for lines
count = 0
#list for contents of lines
lines = []
#open the file
f = open('alice.txt','r')
#read it line by line
for line in f:
    #add 1 to the counter
    count += 1
    #add the current line to the list
    lines.append(line)
#close the file
f.close()
#print the number of lines read
print('lines:', count)
#print the number of lines saved
print('saved lines:', len(lines))

lines: 1702
saved lines: 1702


In this latter example, we have created an empty list and then added the lines one
by one to the end of that list. At the end of the program we print out the number of
lines read and the number of lines in the list. If we’ve done things correctly, those
two numbers should be the same. This is good programming practice generally.
Print out the values of things as you proceed so that you can make sure the program
is behaving as you intend.

Let’s now print out the first few lines:

In [3]:
#list to save the lines
lines = []
#open the file
f = open('alice.txt','r')
#read it line by line
for line in f:
    #save each line in the list
    lines.append(line)
#close the file
f.close()
#print the first 100 lines
i = 0
while i < 100:
    print(lines[i])
    i += 1

The Project Gutenberg EBook of Alice in Wonderland, by Lewis Carroll



This eBook is for the use of anyone anywhere at no cost and with

almost no restrictions whatsoever.  You may copy it, give it away or

re-use it under the terms of the Project Gutenberg License included

with this eBook or online at www.gutenberg.org





Title: Alice in Wonderland



Author: Lewis Carroll



Illustrator: Gordon Robinson



Release Date: August 12, 2006 [EBook #19033]



Language: English





*** START OF THIS PROJECT GUTENBERG EBOOK ALICE IN WONDERLAND ***









Produced by Jason Isbell, Irma Spehar, and the Online

Distributed Proofreading Team at http://www.pgdp.net



















          [Illustration: Alice in the Room of the Duchess.]





                       _THE "STORYLAND" SERIES_







                   ALICE'S ADVENTURES IN WONDERLAND















                     SAM'L GABRIEL SONS & COMPANY



                               NEW YORK







                        

The result here is not quite right; each line is printed out with an extra line in
between. The problem is that when lines are read in, the’re read in with their final return character. The `print()` function supplies another return and we get
each line terminated by two return character. We can get the behavior we want by
telling `print()` not to append a return.

In [4]:
#list to save lines
lines = []
#open file
f = open('alice.txt','r')
#read line by line
for line in f:
    #save lines to list
    lines.append(line)
#close file
f.close()
#print first 100 lines
i = 0
while i < 100:
    #don’t add a return to the line!
    print(lines[i],end='')
    i += 1

The Project Gutenberg EBook of Alice in Wonderland, by Lewis Carroll

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org


Title: Alice in Wonderland

Author: Lewis Carroll

Illustrator: Gordon Robinson

Release Date: August 12, 2006 [EBook #19033]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK ALICE IN WONDERLAND ***




Produced by Jason Isbell, Irma Spehar, and the Online
Distributed Proofreading Team at http://www.pgdp.net









          [Illustration: Alice in the Room of the Duchess.]


                       _THE "STORYLAND" SERIES_



                   ALICE'S ADVENTURES IN WONDERLAND







                     SAM'L GABRIEL SONS & COMPANY

                               NEW YORK



                           Copyright, 1916,

                   by SAM'L GABRIEL

Notice now that the lines we’re printing out are not part of the Alice story, but are
part of a header that Project Gutenberg has added to the file. By playing around
with the number of lines we print out, we can see that the header is 64 lines
long. Our next version of the program removes this header and then prints out the
beginning of the story:

In [5]:
#list for lines
lines = []
#open file
f = open('alice.txt','r')
#read lines one by one
for line in f:
    #add lines to list
    lines.append(line)
#close file
f.close()
#strip off first 64 lines
lines = lines[64:]
#now print the first 50 lines
i = 0
while i < 50:
    #still don’t add a return!
    print(lines[i],end='')
    i += 1


ALICE'S ADVENTURES IN WONDERLAND

[Illustration]




I--DOWN THE RABBIT-HOLE


Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do. Once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, "and what is the use of a book," thought Alice, "without pictures or
conversations?"

So she was considering in her own mind (as well as she could, for the
day made her feel very sleepy and stupid), whether the pleasure of
making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.

There was nothing so very remarkable in that, nor did Alice think it so
very much out of the way to hear the Rabbit say to itself, "Oh dear! Oh
dear! I shall be too late!" But when the Rabbit actually took a watch
out of its waistcoat-pocket and looked at it and then hurried on, Alice
started to her feet, for it flashed across

Let’s now do some analysis of the lexical content of the file. As a very simple example,
let’s imagine that we are interested in whether there is a correlation between
word length and word frequency. To do this, we must break each line into words
and then compute the length of each word. We’ll keep track of the number of
words we see of each length.

The next version of the program breaks each line into words and then stores all
the words in a list.

In [6]:
#list of all words
words = []
#list of all lines
lines = []
#open the file
f = open('alice.txt','r')
#save the lines one by one
for line in f:
    lines.append(line)
#close the file
f.close()
#remove Gutenberg header
lines = lines[64:]
#go through the lines one by one
for line in lines:
    #break each line into words
    wds = line.split()
    #add the words to the list
    words = words + wds
#print the first 100 words
i = 0
while i < 100:
    print(i,words[i])
    i += 1

0 ALICE'S
1 ADVENTURES
2 IN
3 WONDERLAND
4 [Illustration]
5 I--DOWN
6 THE
7 RABBIT-HOLE
8 Alice
9 was
10 beginning
11 to
12 get
13 very
14 tired
15 of
16 sitting
17 by
18 her
19 sister
20 on
21 the
22 bank,
23 and
24 of
25 having
26 nothing
27 to
28 do.
29 Once
30 or
31 twice
32 she
33 had
34 peeped
35 into
36 the
37 book
38 her
39 sister
40 was
41 reading,
42 but
43 it
44 had
45 no
46 pictures
47 or
48 conversations
49 in
50 it,
51 "and
52 what
53 is
54 the
55 use
56 of
57 a
58 book,"
59 thought
60 Alice,
61 "without
62 pictures
63 or
64 conversations?"
65 So
66 she
67 was
68 considering
69 in
70 her
71 own
72 mind
73 (as
74 well
75 as
76 she
77 could,
78 for
79 the
80 day
81 made
82 her
83 feel
84 very
85 sleepy
86 and
87 stupid),
88 whether
89 the
90 pleasure
91 of
92 making
93 a
94 daisy-chain
95 would
96 be
97 worth
98 the
99 trouble


This program does indeed get all the words, but it doesn’t strip out irrelevant punctuation.
Words are returned with adjacent punction like period, question-mark,
etc. We need to strip these away before doing our counts if we want an accurate
picture of the relationship between word length and frequency.

There are better ways to do this that we’ll learn in the next few weeks. Our approach here
will be to go through each word character by character, counting only alphabetic
characters and not counting anything else. To make this easier, we first convert
words to lowercase.

To test this idea and make sure it’s doing the right thing for us, we’ll first write
some code that does this for the first 100 words and displays the output for us. If
this works, we then scale it up for all the words in the book. Here’s a program that
shows how to do this:

In [7]:
#list of all words
words = []
#list of all lines
lines = []
#open the file
f = open('alice.txt','r')
#save the lines one by one
for line in f:
    lines.append(line)
#close the file
f.close()
#remove Gutenberg header
lines = lines[64:]
#go through the lines one by one
for line in lines:
    #break each line into words
    wds = line.split()
    #add the words to the list
    words = words + wds
#print the first 100 words
#and their letter counts
i = 0
while i < 100:
    #store the count for the current word
    count = 0
    #convert the current word to lowercase
    word = words[i].lower()
    #go through the word letter by letter
    #if letter is lowercase, add 1 to count
    for l in word:
        if l in 'abcdefghijklmnopqrstuvwxyz':
            count += 1
    #print it all out
    print(i,words[i],count)
    i += 1

0 ALICE'S 6
1 ADVENTURES 10
2 IN 2
3 WONDERLAND 10
4 [Illustration] 12
5 I--DOWN 5
6 THE 3
7 RABBIT-HOLE 10
8 Alice 5
9 was 3
10 beginning 9
11 to 2
12 get 3
13 very 4
14 tired 5
15 of 2
16 sitting 7
17 by 2
18 her 3
19 sister 6
20 on 2
21 the 3
22 bank, 4
23 and 3
24 of 2
25 having 6
26 nothing 7
27 to 2
28 do. 2
29 Once 4
30 or 2
31 twice 5
32 she 3
33 had 3
34 peeped 6
35 into 4
36 the 3
37 book 4
38 her 3
39 sister 6
40 was 3
41 reading, 7
42 but 3
43 it 2
44 had 3
45 no 2
46 pictures 8
47 or 2
48 conversations 13
49 in 2
50 it, 2
51 "and 3
52 what 4
53 is 2
54 the 3
55 use 3
56 of 2
57 a 1
58 book," 4
59 thought 7
60 Alice, 5
61 "without 7
62 pictures 8
63 or 2
64 conversations?" 13
65 So 2
66 she 3
67 was 3
68 considering 11
69 in 2
70 her 3
71 own 3
72 mind 4
73 (as 2
74 well 4
75 as 2
76 she 3
77 could, 5
78 for 3
79 the 3
80 day 3
81 made 4
82 her 3
83 feel 4
84 very 4
85 sleepy 6
86 and 3
87 stupid), 6
88 whether 7
89 the 3
90 pleasure 8
91 of 2
92 making 6
93 a 1
94 daisy-ch

If you inspect the output of this program, you’ll see that it does get the correct
letter count for the first 100 words. Given that that part is doing the right thing, we
can now scale up to doing this for all words and saving the results. What we want
is to know how many words there are of each length. To do this, we construct a
dictionary which we’ll use to store the number of words we’ve seen for each word
length. If, for example, we were to call this dictionary `wordlengths`, we would
have the number of words that are two letters long in `wordlengths[2]`, etc.

The following program implements this idea:

In [8]:
#list of all words
words = []
#list of all lines
lines = []
#dictionary of all word lengths
wordlengths = {}
#open the file
f = open('alice.txt','r')
#save the lines one by one
for line in f:
    lines.append(line)
#close the file
f.close()
#remove Gutenberg header
lines = lines[64:]
#go through the lines one by one
for line in lines:
    #break each line into words
    wds = line.split()
    #add the words to the list
    words = words + wds
for wd in words:
    #store the count for the current word
    count = 0
    #convert the current word to lowercase
    word = wd.lower()
    #go through the word letter by letter
    #if letter is lowercase, add 1 to count
    for l in word:
        if l in 'abcdefghijklmnopqrstuvwxyz':
            count += 1
    #check if we’ve seen this length already
    if count in wordlengths:
        #if so add 1
        wordlengths[count] += 1
    else:
        #if not, set to 1
        wordlengths[count] = 1
#print out counts for each word length
for c in wordlengths:
    print(c,wordlengths[c])

6 1002
10 243
2 1989
12 73
5 1491
3 3093
9 358
4 2436
7 845
8 367
13 16
1 505
11 140
15 9
14 13
0 44
19 2
23 2
18 2
16 2


Finally, let’s have the program save the results in a file:

In [9]:
#list of all words
words = []
#list of all lines
lines = []
#dictionary of all word lengths
wordlengths = {}
#open the file
f = open('alice.txt','r')
#save the lines one by one
for line in f:
    lines.append(line)
#close the file
f.close()
#remove Gutenberg header
lines = lines[64:]
#go through the lines one by one
for line in lines:
    #break each line into words
    wds = line.split()
    #add the words to the list
    words = words + wds
for wd in words:
    #store the count for the current word
    count = 0
    #convert the current word to lowercase
    word = wd.lower()
    #go through the word letter by letter
    #if letter is lowercase, add 1 to count
    for l in word:
        if l in 'abcdefghijklmnopqrstuvwxyz':
            count += 1
    #check if we’ve seen this length already
    if count in wordlengths:
        #if so add 1
        wordlengths[count] += 1
    else:
        #if not, set to 1
        wordlengths[count] = 1
#open output file
g = open('resalice.txt','w')
#print out counts for each word length
for c in wordlengths:
    clen = str(wordlengths[c])
    res = str(c) + ': ' + clen + '\n'
    g.write(res)
#close output file
g.close()