# Chapter 7 Lecture Notes

Please read chapter 7 of the textbook.

These notes take 1 - 3 lecture hours to cover.

## Loops and Strings

Strings appear in many programming problems, and so it is useful to learn how to
process them.

You can use a for-loop to access the characters in a string one at a time:

In [3]:
for letter in "hello!":
    print(letter)

h
e
l
l
o
!


Or:

In [2]:
for letter in "hello!":
    print(letter, end='-')

h-e-l-l-o-!-

The name of the variable in the for-loop is `letter`. While you can use any name
you like, you should try to use a name that conveys some meaning about the code.

## Searching Strings

The Python `in` operator tests if a given substring is in a string:

In [1]:
s = "hello!"
if 'lo' in s:
    print('contains "lo"')         # printed
if 'eel' in s:
    print('contains "eel"')        # not printed
if '?' not in s:
    print('does not contain "?"')  # printed

contains "lo"
does not contain "?"


We can use a for-loop and if-statement to manually check if a string contains a
letter. For example:

In [4]:
for letter in 'mumble':
    if letter == 'm' or letter == 'M':
        print('this word has an "m"')

this word has an "m"
this word has an "m"


The message is printed once for each occurrence of `m` in the string.

We can write it as a function and pass in any word we like:

In [2]:
def has_m_print(word):
    for letter in word:
        if letter == 'm' or letter == 'M':
            print('this word has an "m"')

has_m_print('mumble')  # two m's
has_m_print('apple')   # no m's
has_m_print('')        # no m's, empty string

this word has an "m"
this word has an "m"


A better way to implement this is as a boolean function that returns `True` if
there's an `m` in `word`, and `False` otherwise:

In [3]:
def has_m(word):
    for letter in word:
        if letter == 'm' or letter == 'M':
            return True
    
    return False

print(has_m('mumble'))  # True
print(has_m('apple'))   # False
print(has_m(''))        # False, empty string

True
False
False


We can also simplify the if-statement by using `letter.lower()`. `lower()`
returns the lowercase version of `letter`.

In [7]:
def has_m(word):
    for letter in word:
        if letter.lower() == 'm':
            return True
    
    return False

print(has_m('mumble'))  # True
print(has_m('apple'))   # False
print(has_m(''))        # False, empty string

True
False
False


`has_m` is an example of a **pure function**: a function whose output depends
only on its input, and does not have any side effects, such as printing to the
screen, opening a file, or modifying a global variable.

In general, pure functions are easier to understand and debug than impure ones,
and you should strive to write pure functions whenever possible.

### Some Incorrect Functions

Note that this version of the function is incorrect:

In [10]:
def bad_has_m(word):
    for letter in word:
        if letter.lower() == 'm':
            return True
        else:               # wrong!
            return False

print(bad_has_m('bumble'))  # False, incorrect!
print(bad_has_m('apple'))   # False
print(bad_has_m(''))        # None, incorrect!

False
False
None


The problem with `bad_has_m` is that it checks if the first letter is `m`, and
then immediately returns either `True` or `False`. When `return` is called, the
function stops executing, and so no letters after the first one are checked.

This version is also incorrect:

In [12]:
def another_bad_has_m(word):
    for letter in word:
        return letter.lower() == 'm'  # wrong!

print(another_bad_has_m('bumble'))  # False, incorrect!
print(another_bad_has_m('apple'))   # False
print(another_bad_has_m(''))        # None, incorrect!

False
False
None


Again, the problem with `another_bad_has_m` is that as soon as `return` is
called the function stops executing. So it only checks the first letter of the
word.

### Using the `in` Operator

We could shorten `has_m` to:

In [13]:
def has_m(word):
    return 'm' in word.lower()

print(has_m('mumble'))  # True
print(has_m('apple'))   # False
print(has_m(''))        # False, empty string

True
False
False


## Reading the Lines of a Text File

To read the lines of a text file in Python we do the following:
- *Open* the text file for reading.
- *Read* the lines of the text file one at a time, and do whatever you want with
  the lines.
- *Close* the text file.

Lets try this out with the file [words.txt](words.txt), a list of words with one
per line. To read the contents we must first open it with the `open` function:

In [None]:
file_object = open('words.txt')

 `open` returns a special file object that lets us access the opened file. In
particular,`file_object` has a method called `readline` that reads the next line
of the file. So we can print the first three lines of the file like this:

In [4]:
file_object = open('words.txt')

print(file_object.readline())  # prints first line of file
print(file_object.readline())  # prints second line of file
print(file_object.readline())  # prints third line of file

aa

aah

aahed



There is an extra line between each line because `readline()` includes the
newline character, `\n`, at the end of each line. We can remove the extra lines
by using the `strip` method:

In [None]:
file_object = open('words.txt')

print(file_object.readline().strip())  # prints first line of file
print(file_object.readline().strip())  # prints second line of file
print(file_object.readline().strip())  # prints third line of file

aa
aah
aahed


Another way to read the lines of a file is to use a for-loop. The file
[small_story.txt](small_story.txt) contains the following text:

```
Once upon a time,
the world was full of magic.
Everyone was constantly 
doing card tricks.
```

In [22]:
file_object = open('small_story.txt')

for line in file_object:
    print(line.strip())

Once upon a time,
the world was full of magic.
Everyone was constantly
doing card tricks.


### Closing Files

When you are done with a file, you should close it to free up computer
resources. Python automatically closes `file_object` when the program ends. But
if you want to close the file sooner, use the `close` method:

```python
file_object.close()
```

## Looping Over the Lines of a File

The for-loop way of reading a text file goes through the lines of the file one
at a time, which has many useful applications.

For example, we can count the number of lines in a file:

In [23]:
file_object = open('words.txt')

word_count = 0
for line in file_object:
    word_count += 1
print(word_count)

113783


The variable `word_count` is sometimes called an **accumulator variable**. It's
initialized to 0, and then every time a line of the file is read it's
incremented by 1.

This line adds 1 to `word_count`:

```python
word_count += 1
```

Another way of doing the same thing is:

```python
word_count = word_count + 1
```

Both statements do the same thing, but generally `+=` is preferred because it's
shorter.

You can add, subtract, multiply, or divide using `+=`, `-=`, `*=`, or `/=`:

In [24]:
n = 3

n += 7  # add 7 to n, n is now 10
n -= 2  # subtract 2 from n, n is now 8
n *= 3  # multiply n by 3, n is now 24
n /= 4  # divide n by 4, n is now 6.0

print(n)  # 6.0

6.0


### Counting Words with an 'm'

Now lets count how many words in [words.txt](words.txt) contain one, or more,
`m` characters:

In [26]:
file_object = open('words.txt')
total_m_words = 0
for line in file_object:
    if has_m(line):
        total_m_words += 1

print(f'{total_m_words} words have an "m"')

22472 words have an "m"


If we also count the total number of words then we can calculate the percentage
of words that have an `m`:

In [30]:
file_object = open('words.txt')
total_words = 0
total_m_words = 0
for line in file_object:
    total_words += 1
    if has_m(line):
        total_m_words += 1

pct = 100 * total_m_words / total_words
print(f'{pct:.1f}% of the words have an "m"')

19.7% of the words have an "m"


## Generalizing has_m

`has_m` is not a very useful function since it only searches for the letter `m`.
Here is a more useful function:

In [5]:
def uses_any(word, letters):
    """Returns True just when one, or more, of the characters in 
    letters are in word. Otherwise it returns False. 
    Converts all letters to lowercase.
    """
    for letter in word.lower():
        if letter in letters.lower():
            return True
    return False

print(uses_any('mumble', 'm'))     # True
print(uses_any('apple', 'aeiou'))  # True
print(uses_any('8675309', '1234')) # True
print(uses_any('apple', ' \n'))    # False

True
True
True
False


We can use it to write other useful functions, for instance:

In [33]:
def has_vowel(word):
    return uses_any(word, 'aeiou')

print(has_vowel('mumble'))  # True
print(has_vowel('!!!'))     # False

True
False


Lets use `has_vowel` to find all the words in [words.txt](words.txt) that don't
have any vowels:

In [35]:
file_object = open('words.txt')
total = 0
for line in file_object:
    if not has_vowel(line):
        total += 1
        print(line.strip())

print(f'{total} words have no vowels')

by
byrl
byrls
bys
crwth
crwths
cry
crypt
crypts
cwm
cwms
cyst
cysts
dry
dryly
drys
fly
flyby
flybys
flysch
fry
ghyll
ghylls
glycyl
glycyls
glyph
glyphs
gym
gyms
gyp
gyps
gypsy
hymn
hymns
hyp
hyps
lymph
lymphs
lynch
lynx
my
myrrh
myrrhs
myth
myths
nth
nymph
nymphs
phpht
pht
ply
pry
psst
psych
psychs
pygmy
pyx
rhythm
rhythms
rynd
rynds
sh
shh
shy
shyly
sky
sly
slyly
spry
spryly
spy
sty
stymy
sylph
sylphs
sylphy
syn
sync
synch
synchs
syncs
syzygy
thy
thymy
try
tryst
trysts
tsk
tsks
tsktsk
tsktsks
typp
typps
typy
why
whys
wry
wryly
wych
wynd
wynds
wynn
wynns
xylyl
xylyls
xyst
xysts
107 words have no vowels


`has_vowels` doesn't count `y` as a vowel, but lets suppose we want to count `y`
as vowel. Then we can modify the code like this:

In [36]:
file_object = open('words.txt')
total = 0
for line in file_object:
    if 'y' not in line and not has_vowel(line):
        total += 1
        print(line.strip())

print(f'{total} words have no vowels')

crwth
crwths
cwm
cwms
nth
phpht
pht
psst
sh
shh
tsk
tsks
tsktsk
tsktsks
14 words have no vowels


## Case Study: Spelling Bee

[Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee) is a word puzzle
where you are given 7 different letters and must find as many 4-letter or longer
words as possible. You are allowed to repeat letters.

One of the letters is the "center" letter, and it must appear in any word you
make. It's guaranteed that here is at least one word that uses all seven
letters.

Words are scored as follows:

- Words that *don't* contain the required letter don't score any points.
- Words with 3 or fewer letters don't score any points.
- A 4-letter word is worth 1 point.
- A 5-letter or more word is worth 1 point per letter.
- A word that uses all seven letters is called a *pangram* and is worth 7 extra
  points.

The goal is to score as many points as possible.

For example, suppose the seven letters are SIGLENP, and the required letter is
G. Some words are:

- PIGS, 1 point
- GILLS, 5 points
- SINGLE, 6 points
- SLEEPING, 8 + 7 = 15 points

Some words don't score any points:

- I, IN, LIP, and PIG don't score points because they have fewer than 4 letters.
- LINE, PINES, and SLIPS don't score points because they don't use the G.

If you are a human, then you probably solve this problem by trying different
combinations of letters to see if they form words. How many words can you find?

But with a computer, we can use a **brute-force** approach: we can check *every*
word in the English language, or at least those listed in [words.txt](words.txt)
to see if it can be formed with the given letters.

Since [words.txt](words.txt) has over 113,000 words, no human could do this. But
a computer can!

So lets write a program the solves the Spelling Bee problem by scanning through
all the words in [words.txt](words.txt) and calculating their score.

### The Word Score Function

First, lets write `word_score` that calculates the score of a word according to
the rules given above:

In [6]:
def pangram_bonus(word, letters):
    """Returns 7 if every character in letters is in words.
    Otherwise it returns 0.
    """
    for c in letters:
        if c not in word:
            return 0
    return 7


def word_score(word, required, letters):
    """Returns the score for word according to these rules:
    - Words with 3 letters aren't considered (they don't score any points).
    - A 4-letter word is worth 1 point.
    - A 5-letter or more word is worth 1 point per letter.
    - A word that uses all seven letters is called a *pangram* and is worth 7
      extra points.
    """
    #
    # first check if the word is too short or doesn't contain the required
    # letter
    #
    if required not in word or len(word) < 4:
        return 0
    
    #
    # check that all the letters in word are in letters
    #
    for c in word:
        if c not in letters:
            return 0
    
    #
    # if we get here, the word is long enough and contains the required letter
    #
    if len(word) == 4:
        return 1
    else:
        return len(word) + pangram_bonus(word, letters)

print(word_score('PIG', 'G', 'SIGLENP'))      # 0
print(word_score('SLIPS', 'G', 'SIGLENP'))    # 0
print(word_score('PIGS', 'G', 'SIGLENP'))     # 1
print(word_score('GILLS', 'G', 'SIGLENP'))    # 5
print(word_score('SINGLE', 'G', 'SIGLENP'))   # 6
print(word_score('SLEEPING', 'G', 'SIGLENP')) # 15

0
0
1
5
6
15


### The Spelling Bee Solver

Now that we can use `word_score` score all words from [words.txt](words.txt):

In [33]:
# sample
letters = 'SIGLENP'
required_letter = 'G'

# June 20, 2024
# letters = 'LGXNCEI'
# required_letter = 'I'

total_score = 0    # sum of all the scores
scoring_words = 0  # number of words that score points
file_object = open('words.txt')
for w in file_object:
    # remove the newline character at the end of the word
    # also cover it to upper case
    w = w.strip().upper()
    
    score = word_score(w, required_letter, letters)
    total_score += score
    if score > 0:
        scoring_words += 1
        print(f'{w}, {score} points')

print()
print(f'Total score: {total_score}')
print(f'Total scoring words: {scoring_words}')

EGGING, 6 points
EGGS, 1 points
EGIS, 1 points
EGISES, 6 points
ELEGIES, 7 points
ELEGISE, 7 points
ELEGISES, 8 points
ELEGISING, 9 points
ENGINE, 6 points
ENGINES, 7 points
ENGINING, 8 points
ENGS, 1 points
ENISLING, 8 points
ENSIGN, 6 points
ENSIGNS, 7 points
ENSILING, 8 points
EPIGENE, 7 points
ESPIEGLE, 8 points
GEEING, 6 points
GEES, 1 points
GEESE, 5 points
GELEE, 5 points
GELEES, 6 points
GELLING, 7 points
GELS, 1 points
GENE, 1 points
GENES, 5 points
GENESES, 7 points
GENESIS, 7 points
GENIE, 5 points
GENIES, 6 points
GENII, 5 points
GENIP, 5 points
GENIPS, 6 points
GENS, 1 points
GENSENG, 7 points
GENSENGS, 8 points
GIEING, 6 points
GIEN, 1 points
GIES, 1 points
GIGGING, 7 points
GIGGLE, 6 points
GIGGLES, 7 points
GIGGLING, 8 points
GIGS, 1 points
GILL, 1 points
GILLIE, 6 points
GILLIES, 7 points
GILLING, 7 points
GILLS, 5 points
GINGELI, 7 points
GINGELIES, 9 points
GINGELIS, 8 points
GINGELLIES, 10 points
GINGILI, 7 points
GINGILIS, 8 points
GINNING, 7 points
GINNINGS, 8 poi

The code finds 202 words for a total score of 1324, and runs almost instantly on
my computer.

## Questions

1. Is this a correct implementation of the `has_m` function (i.e. a function
   that returns `True` if `word` has an `m`, and `False` otherwise) ?

   ```python
   def has_m(word):
       for letter in word:
           if letter.lower() == 'm':
               return True
       
           return False
   ```

2. Is this a correct implementation of the `has_m` function (i.e. a function
   that returns `True` if `word` has an `m`, and `False` otherwise) ?

   ```python
   def has_m(word):
       for letter in word:
           if letter.lower() != 'm':  # != instead of ==
               return False           # False instead of True
       
           return True                # True instead of False
   ```

3. What does each `print` statement print?

   ```python
   a = 5
   a = a + 1
   print(a)

   b = 5
   b = 1 + b
   print(b)

   c = 5
   c += 1
   print(c)

   d = 5
   d += d
   print(d)

   e = 5
   e *= e
   print(e)

   f = 5
   f -= f
   print(f)

   g = 5
   g /= g
   print(g)

   h = 5
   h = h += 1
   print(h)
   ```

4. In the `uses_any` function, does the *order* of the characters in `letters`
   matter?

5. Why does this code print a blank line between the words?

   ```python
   file_object = open('words.txt')

   print(file_object.readline())
   print(file_object.readline())
   print(file_object.readline())
   ```

   How can you prevent the extra lines from being printed?