# Chapter 7 Lecture Notes

Please read chapter 7 of the textbook.

These notes take 1 - 3 lecture hours to cover.

## While Loops

A while-loop is a loop that continues to execute as long as a given condition is
met. The simplest kind of while-loop is an **infinite loop**:

```python
while True:
    print('hello')
    print('world')
```

This will print "hello" and "world" forever, never stopping.

Usually we want the while-loop condition to eventually become `False`. For example, this prints "hello" 5 times:

```python
count = 0
while count < 5:  # while loop header
    print('hello')
    count += 1
```

The **while-loop header** consists of the keyword `while`, a boolean condition,
and a colon. The code body under the while-loop should be consistently indented
(just as for if-statements, for-loops, and other functions).

You don't need to know ahead of time how many times a while-loop will execute.
For instance, this keeps asking the user to enter a password until they type
`swordfish`:

```python
password = ''
while password != 'swordfish':
    password = input('Enter password: ')

# at this point outside the loop
# password == 'swordfish' is true
```

When this code is run, it could go like this:

```
Enter password: hello
Enter password: sword fish
Enter password: Swordfish
Enter password: swordfish
```

Depending what input the user types the program could loop any number of times.

In this related example, the computer to keep asking the user to enter a *quit*
command. The user must type either "quit" or "done", and a different message is
used when the command is not recognized:

```python
command = ''
command = input('Enter command: ')
while not (command == 'quit' or command == 'done'):
    print(f'Command "{command}" not recognized.')
    command = input('Enter command: ')

# at this point outside the loop
# command == 'quit' or command == 'done'
```

In [15]:
password = ''
while password != 'swordfish':
    password = input('Enter password: ')

## Converting a for-loop to a while-loop

This for-loop prints the numbers 1 to 500:

```python
for i in range(500):
    print(i + 1)
```

This can be re-written as a while-loop:

```python
i = 0             # initialize i to 0
while i < 500:    # only run the body if i < 500
    print(i + 1)  # print the number
    i += 1        # increment i by 1
```

We call `i` the **loop variable**, or the **loop index**. We first set it to 0
outside the loop, and then explicitly increment at the end of the loop using the
statement `i += 1`.

More generally, a for-loop of this form:

```python
for i in range(start, end, step):
    # ... do stuff ...
```

Can be rewritten as a while-loop:

```python
i = start
while i < end:
    # ... do stuff ...
    i += step
```

## Choosing between for-loops and while-loops

In Python, it's usually simpler to use a for-loop. In a for-loop, the index
variable is automatically initialized and incremented. In a while-loop, you have
to do this yourself, and it is easy to make mistakes.

However, a ranged for-loop like the kind we've been using needs to know ahead of time 
how many times it will execute. If you don't know how many times the loop will execute, 
then you use a while-loop. 

For example, we can't re-write this code using a ranged for-loop because we
don't know ahead of time how many times the user will enter a password:

```python
password = ''
while password != 'swordfish':  # must be a while loop
    password = input('Enter password: ')
```

So, in Python, the rule is: use a for-loop when you can, and while-loop if you
must (because you don't know how many times the loop will execute).

## Loops and Strings

Strings appear in many programming problems, and so it is useful to learn how to
process them.

You can use a for-loop to process the characters in a string one at a time:

In [3]:
for letter in "hello!":
    print(letter)

h
e
l
l
o
!


Or you could put a `-`  after each letter:

In [2]:
for letter in "hello!":
    print(letter, end='-')

h-e-l-l-o-!-

The name of the variable in the for-loop is `letter`. While you can use any name
you like, you should try to use a name that conveys some meaning about the code.

## Searching Strings

The Python `in` operator can be used to search for a substring in a string:

In [2]:
s = 'hello!'
if 'lo' in s:
    print('contains "lo"')
if 'eel' in s:
    print('contains "eel"')
if '?' not in s:
    print('does not contain "?"')


contains "lo"
does not contain "?"


We can use a for-loop and if-statement to check if a string contains a letter.
For example:

In [4]:
for letter in 'mumble':
    if letter == 'm' or letter == 'M':
        print('this word has an "m"')

this word has an "m"
this word has an "m"


The message is printed once for each occurrence of `m` in the string.

We can write it as a function and pass in any word we like:

In [5]:
def has_m_print(word):
    for letter in word:
        if letter == 'm' or letter == 'M':
            print('this word has an "m"')

has_m_print('mumble')  # two m's
has_m_print('apple')   # no m's
has_m_print('')        # no m's, empty string

this word has an "m"
this word has an "m"


A better way to implement this is as a boolean function that returns `True` if
there's an `m` in `word`, and `False` otherwise:

In [6]:
def has_m(word):
    for letter in word:
        if letter == 'm' or letter == 'M':
            return True
    
    return False

print(has_m('mumble'))  # True
print(has_m('apple'))   # False
print(has_m(''))        # False, empty string

True
False
False


We can also simplify the if-statement by using `letter.lower()`. `lower()`
returns the lowercase version of `letter`.

In [7]:
def has_m(word):
    for letter in word:
        if letter.lower() == 'm':
            return True
    
    return False

print(has_m('mumble'))  # True
print(has_m('apple'))   # False
print(has_m(''))        # False, empty string

True
False
False


`has_m` is an example of a **pure function**. A pure function is a function
whose output depends only on its input, and does not have any side effects, such
as printing to the screen, opening a file, or modifying a global variable.

In general, pure functions are easier to understand and debug than impure ones,
and you should strive to write pure functions whenever possible.

### Some Incorrect Functions

Note that this version of the function is incorrect:

In [10]:
def bad_has_m(word):
    for letter in word:
        if letter.lower() == 'm':
            return True
        else:               # wrong!
            return False

print(bad_has_m('bumble'))  # False, incorrect!
print(bad_has_m('apple'))   # False
print(bad_has_m(''))        # None, incorrect!

False
False
None


The problem with `bad_has_m` is that it checks if the first letter is `m`, and
then immediately returns either `True` or `False`. When `return` is called, the
function stops executing, and so no letters after the first one are checked.

This version is also incorrect:

In [12]:
def another_bad_has_m(word):
    for letter in word:
        return letter.lower() == 'm'  # wrong!

print(another_bad_has_m('bumble'))  # False, incorrect!
print(another_bad_has_m('apple'))   # False
print(another_bad_has_m(''))        # None, incorrect!

False
False
None


Again, the problem with `another_bad_has_m` is that as soon as `return` is
called the function stops executing. So it only checks the first letter of the
word.

### The `in` Operator

The Python `in` operator can be used to search for a substring in a string. `'m'
in word` returns `True` if `word` contains an `m`, and `False` otherwise:

In [1]:
s = "hello!"
if 'lo' in s:
    print('contains "lo"')
if 'eel' in s:
    print('contains "eel"')
if '?' not in s:
    print('does not contain "?"')

contains "lo"
does not contain "?"


So we could shorten `has_m` to:

In [8]:
def has_m(word):
    return 'm' in word.lower()

print(has_m('mumble'))  # True
print(has_m('apple'))   # False
print(has_m(''))        # False, empty string

True
False
False


## Reading the Lines of a Text File

Reading a text file in Python follows these steps:
- *Open* the text file for reading.
- *Read* the lines of the text file one at a time, and do whatever you want with
  the lines.
- *Close* the text file.

The file [words.txt](words.txt) contains a list of words, one per line. To read
the contents we must first open it with the `open` function:

In [None]:
file_object = open('words.txt')

The `file_object` has a method called `readline` that reads the next line of the
file. So we can get and print the first three lines of the file like this:

In [6]:
file_object = open('words.txt')

print(file_object.readline())
print(file_object.readline())
print(file_object.readline())

aa



There is an extra line between each line because `readline()` includes the
newline character at the end of each line. We can remove the extra lines by
using the `strip` method:

In [17]:
file_object = open('words.txt')

print(file_object.readline().strip())
print(file_object.readline().strip())
print(file_object.readline().strip())

aa
aah
aahed


Note that we call `open` again so that the file will be read from the beginning.

Another way to read the lines of a file is to use a for-loop. The file
[small_story.txt](small_story.txt) contains the following text:

```
Once upon a time,
the world was full of magic.
Everyone was constantly 
doing card tricks.
```

In [22]:
file_object = open('small_story.txt')

for line in file_object:
    print(line.strip())

Once upon a time,
the world was full of magic.
Everyone was constantly
doing card tricks.


### Closing Files

Notice that we have *not* explicitly closed the file in these examples. Python
automatically closes `file_object` when the program ends. If you want to close
the file sooner, use the `close` method:

```python
file_object.close()
```

## Looping Over the Lines of a File

The for-loop way of reading a text file goes through the lines of the file one
at a time, which has many useful applications.

For example, we can count the number of lines in a file:

In [23]:
file_object = open('words.txt')

word_count = 0
for line in file_object:
    word_count += 1
print(word_count)

113783


The variable `word_count` is sometimes called an **accumulator variable**. It's
initialized to 0, and then every time a line of the file is read it's
incremented by 1.

This line adds 1 to `word_count`:

```python
word_count += 1
```

Another way of doing the same thing is:

```python
word_count = word_count + 1
```

Both statements do the same thing, but generally `+=` is preferred because it's
shorter.

You can add, subtract, multiply, or divide using `+=`, `-=`, `*=`, or `/=`:

In [24]:
n = 3

n += 7  # add 7 to n, n is now 10
n -= 2  # subtract 2 from n, n is now 8
n *= 3  # multiply n by 3, n is now 24
n /= 4  # divide n by 4, n is now 6.0

print(n)  # 6.0

6.0


### Counting Words with an 'm'

Now lets count how many words in [words.txt](words.txt) contain one, or more,
`m` characters:

In [26]:
file_object = open('words.txt')
total_m_words = 0
for line in file_object:
    if has_m(line):
        total_m_words += 1

print(f'{total_m_words} words have an "m"')

22472 words have an "m"


If we also count the number of words then we can calculate the percentage of
words that have an `m`:

In [9]:
file_object = open('words.txt')
total_words = 0
total_m_words = 0
for line in file_object:
    total_words += 1
    if has_m(line):
        total_m_words += 1

pct = 100 * total_m_words / total_words
print(f'{pct:.1f}% of the words have an "m"')

19.7% of the words have an "m"


## Generalizing has_m

`has_m` is not a very useful function since it only searches for the letter `m`.
Here is a more useful function:

In [11]:
def uses_any(word, letters):
    """Returns True just when any of the letters in letters are in word.
    Otherwise it returns False. Converts all letters to lowercase.
    """
    for c in word.lower():
        if c in letters.lower():
            return True
    return False

print(uses_any('mumble', 'm'))     # True
print(uses_any('apple', 'aeiou'))  # True
print(uses_any('8675309', '1234')) # True
print(uses_any('apple', ' \n\y'))  # False

True
True
True
False


We can use it to write other useful functions, for instance:

In [12]:
def has_vowel(word):
    return uses_any(word, 'aeiou')

print(has_vowel('mumble'))  # True
print(has_vowel('!!!'))     # False

True
False


Lets use `has_vowel` to find all the words in [words.txt](words.txt) that don't
have any vowels:

In [14]:
file_object = open('words.txt')
total = 0
for line in file_object:
    # if not uses_any(line, 'aeiouy'):
    if not has_vowel(line):
        total += 1
        print(line.strip())

print(f'{total} words have no vowels')

by
byrl
byrls
bys
crwth
crwths
cry
crypt
crypts
cwm
cwms
cyst
cysts
dry
dryly
drys
fly
flyby
flybys
flysch
fry
ghyll
ghylls
glycyl
glycyls
glyph
glyphs
gym
gyms
gyp
gyps
gypsy
hymn
hymns
hyp
hyps
lymph
lymphs
lynch
lynx
my
myrrh
myrrhs
myth
myths
nth
nymph
nymphs
phpht
pht
ply
pry
psst
psych
psychs
pygmy
pyx
rhythm
rhythms
rynd
rynds
sh
shh
shy
shyly
sky
sly
slyly
spry
spryly
spy
sty
stymy
sylph
sylphs
sylphy
syn
sync
synch
synchs
syncs
syzygy
thy
thymy
try
tryst
trysts
tsk
tsks
tsktsk
tsktsks
typp
typps
typy
why
whys
wry
wryly
wych
wynd
wynds
wynn
wynns
xylyl
xylyls
xyst
xysts
107 words have no vowels


`has_vowels` doesn't count `y` as a vowel, but lets suppose we want to count `y`
as vowel. Then we can modify the code like this:

In [36]:
file_object = open('words.txt')
total = 0
for line in file_object:
    if 'y' not in line and not has_vowel(line):
        total += 1
        print(line.strip())

print(f'{total} words have no vowels')

crwth
crwths
cwm
cwms
nth
phpht
pht
psst
sh
shh
tsk
tsks
tsktsk
tsktsks
14 words have no vowels


## Case Study: Spelling Bee

[Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee) is a word puzzle
where you are given 7 different letters and must find as 4-letter or longer
words as possible. You are allowed to repeat letters.

One of the letters is the "center" letter, and it must appear in any word you
make. It's guaranteed that here is at least one word that uses all seven
letters.

Words are scored as follows:

- Words that *don't* contain the required letter, or contain any letters not in
  the puzzle, score 0.
- Words with 3 or fewer letters score 0.
- Words with more 4 or more letters score 1 point per letter.
- A word that uses all seven letters is called a *pangram* and is worth 7 bonus
  points (it still gets 1 point per letter).

The goal is to score as many points as possible.

For example, suppose the seven letters are SIGLENP, and the required letter is
G. Some words are:

- PIGS, 1 point
- GILLS, 5 points
- SINGLE, 6 points
- SLEEPING, 8 + 7 = 15 points

Some words don't score any points:

- I, IN, LIP, and PIG don't score points because they have fewer than 4 letters.
- LINE, PINES, and SLIPS don't score points because they don't use the G.

If you are a human, then you probably solve this problem by trying different
combinations of letters to see if they form words. How many words can you find?

But with a computer, we can do use a brute-force approach: we can check *every*
word in the English language, or at least those listed in [words.txt](words.txt)
to see if it can be formed with the given letters.

Since [words.txt](words.txt) has over 113,000 words, no human could do this. But
a computer can!

So lets write a program the solves the Spelling Bee problem by scanning through
all the words in [words.txt](words.txt) and calculating their score.

### The Word Score Function

First, lets write `word_score` that calculates the score of a word according to
the rules given above:

In [23]:
def pangram_bonus(word, puzzle_letters):
    """Returns 7 if every character in puzzle_letters is in word.
    Otherwise, no bonus and it returns 0.
    """
    for c in puzzle_letters:
        if c not in word:
            return 0
    return 7


def word_score(word, required, puzzle_letters):
    """Returns the score for word according to these rules:
    - 3-letters or fewer: 0 points
    - 4-letters or more: 1 point per letter
    - A word that uses all seven letters is called a *pangram* and is worth 7
      bonus points (in addition to the 1 point per letter).
    """
    #
    # first check if the word is too short or doesn't contain the required
    # letter
    #
    if required not in word or len(word) < 4:
        return 0
    
    #
    # check that word only has letters that appear in puzzle_letters
    #
    for c in word:
        if c not in puzzle_letters:
            return 0
    
    #
    # if we get here, the word is long enough and contains the required letters
    #
    return len(word) + pangram_bonus(word, puzzle_letters)


print(word_score('TAPE', 'P', 'CIHPETA'))      # 4
print(word_score('PATH', 'P', 'CIHPETA'))      # 4
print(word_score('CHEAP', 'P', 'CIHPETA'))     # 5
print(word_score('PATCH', 'P', 'CIHPETA'))     # 5
print(word_score('APPETITE', 'P', 'CIHPETA'))  # 8
print(word_score('PATHETIC', 'P', 'CIHPETA'))  # 15


4
4
5
5
8
15


### The Spelling Bee Solver

Now that we can use `word_score` score all words from [words.txt](words.txt):

In [24]:
# example puzzle
# puzzle_letters = 'SIGLENP'
# required_letter = 'G'

# June 20, 2024
puzzle_letters = 'CHIPATE'
required_letter = 'P'

# June 20, 2024
# puzzle_letters = 'LGXNCEI'
# required_letter = 'I'

total_score = 0    # sum of all the scores
scoring_words = 0  # number of words that score points
file_object = open('words.txt')
for w in file_object:
    # remove the newline character at the end of the word
    # also cover it to upper case
    w = w.strip().upper()
    
    score = word_score(w, required_letter, puzzle_letters)
    total_score += score
    if score > 0:
        scoring_words += 1
        print(f'{w}, {score} points')

print()
print(f'Total score: {total_score}')
print(f'Total scoring words: {scoring_words}')

ACCEPT, 6 points
ACCEPTEE, 8 points
APACE, 5 points
APACHE, 6 points
APATETIC, 8 points
APATHETIC, 16 points
APATITE, 7 points
APHETIC, 14 points
APHTHA, 6 points
APHTHAE, 7 points
APIECE, 6 points
APPETITE, 8 points
CAPE, 4 points
CAPH, 4 points
CAPITA, 6 points
CAPITATE, 8 points
CHAP, 4 points
CHAPE, 5 points
CHAPT, 5 points
CHEAP, 5 points
CHEAPIE, 7 points
CHEEP, 5 points
CHIP, 4 points
CHIPPIE, 7 points
EPACT, 5 points
EPEE, 4 points
EPHA, 4 points
EPHAH, 5 points
EPIC, 4 points
EPITAPH, 7 points
EPITHET, 7 points
ETAPE, 5 points
HAPTIC, 6 points
HEAP, 4 points
HEPATIC, 14 points
HEPATICA, 15 points
HEPATICAE, 16 points
HEPCAT, 6 points
HIPPIE, 6 points
ICECAP, 6 points
IPECAC, 6 points
PACA, 4 points
PACE, 4 points
PACHA, 5 points
PACT, 4 points
PAPA, 4 points
PAPPI, 5 points
PATACA, 6 points
PATCH, 5 points
PATE, 4 points
PATH, 4 points
PATHETIC, 15 points
PATTEE, 6 points
PATTIE, 6 points
PEACE, 5 points
PEACH, 5 points
PEAT, 4 points
PECH, 4 points
PECTATE, 7 points
PECTIC, 6

The code finds 202 words for a total score of 1324, and runs almost instantly on
my computer.

## Questions

1. Write an infinite while-loop that prints 1, 2, 3, 4, 5, ... forever.

2. Write a while-loop that prints the numbers from 100 down to 1, and then
   prints "Blastoff!".

3. Re-write this for-loop using a while-loop:

   ```python
   for d in range(2, n):
      if n % d == 0:
         print(f'{n} is divisible by {d}')
   ```

3. Is this a correct implementation of the `has_m` function (i.e. a function
   that returns `True` if `word` has an `m`, and `False` otherwise) ?

   ```python
   def has_m(word):
       for letter in word:
           if letter.lower() == 'm':
               return True
       
           return False
   ```

4. Is this a correct implementation of the `has_m` function (i.e. a function
   that returns `True` if `word` has an `m`, and `False` otherwise) ?

   ```python
   def has_m(word):
       for letter in word:
           if letter.lower() != 'm':  # != instead of ==
               return False           # False instead of True
       
           return True                # True instead of False
   ```

5. What does each `print` statement print?

   ```python
   a = 5
   a = a + 1
   print(a)

   b = 5
   b = 1 + b
   print(b)

   c = 5
   c += 1
   print(c)

   d = 5
   d += d
   print(d)

   e = 5
   e *= e
   print(e)

   f = 5
   f -= f
   print(f)

   g = 5
   g /= g
   print(g)

   h = 5
   h = h += 1
   print(h)
   ```

6. In the `uses_any` function, does the *order* of the characters in `letters`
   matter?

7. Why does this code print a blank line between the words?

   ```python
   file_object = open('words.txt')

   print(file_object.readline())
   print(file_object.readline())
   print(file_object.readline())
   ```

   How can you prevent the extra lines from being printed?