# Strings

## Readings

- Chapter 8 ( + 9) of Think Python
- Chapter 7 of Python for Everybody

## Review

### Review1: Build a string using `+=`

In [2]:
dog = "Lucy"
cat = "DOT"
chicken = "KFC"

sentence = ""
sentence += "I have a dog named " + dog
sentence += " a cat named " + cat
sentence += " and a chicken named " + chicken
print(sentence)

# TODO: print the length of sentence using len
print(len(sentence))

I have a dog named Lucy a cat named DOT and a chicken named KFC
63


### Learning Objectives:
- Compare two strings by hand using < or > 
- Recognize common string methods, explain what they do, and use them in Python code
- Define the term sequence, name common sequence operations, and explain how a string is a sequence
- Index and slice strings using correct syntax, including positive and negative indices
- Read and Write code that uses a for loop to iterate over a string

### Compare two strings by hand using `<`, `>`, `==`, or `!=` 

<div>
<img src="attachment:string%20comparison.png" width="600"/>
</div>

In [3]:
print("cat" != "dog") # use !=
print("cat" == "dog") # TODO: use ==
print("cat" < "dog") # TODO: use <
print("11" < "2") # TODO: use >

True
False
True
True


### String comparison:

Strings are compared one char at a time, using the ASCII table: https://simple.wikipedia.org/wiki/ASCII

#### Exceptions

1. upper case comes before lower case
2. string of digits are compared one character at a time
3. prefixes come before any word containing that prefix (because space comes before any alphabet in the ASCII table)

In [4]:
print("H" < "h")                 # upper case comes before lower case
print("dorm room" < "dormroom")  # space comes before 'r' in the ASCII table
print("base" < "baseball")       # strings that end come before strings that continue, 
                                 # that is no character comes before some character
print("11" < "2")                # strings of digits are compared one character at a time

True
True
True
True


You keep the comparison going until you find the first non-matching character.

In [5]:
print("doo doo" < "dog") # "o" comes after "g"

False


### String methods

- Strings have special functions that are part of the definition of a string
- These are called methods and are called with a '.', similar to modules

<div>
<img src="attachment:string%20methods.png" width="600"/>
</div>

In [6]:
print(dog.upper())     
print(dog.lower())
print(dog) # calling a method on a string does not change the original variable's value

LUCY
lucy
Lucy


So, how do you update the original variable?

In [9]:
dog = dog.upper()
print(dog)

LUCY


`dog.upper()` is equivalent to `str.upper(dog)`. Programmers don't prefer the latter usage as `str` is redundant (it is obvious that dog variable stores a data type of string.

In [8]:
str.upper(dog)

'LUCY'

Stripping removes whitespace.

In [10]:
some_word = "       A       B\nC      "
some_word

'       A       B\nC      '

In [11]:
print(some_word)  # recall that print function formats the string and only 
                  # displays the formatted output

       A       B
C      


In [12]:
# TODO: call strip method
some_word.strip()

'A       B\nC'

In [13]:
# TODO: call lstrip method
some_word.lstrip()

'A       B\nC      '

In [14]:
# TODO: call rstrip method
some_word.rstrip()

'       A       B\nC'

find method returns index of first matching character of the search string or -1, if there is no match.

- `find` requires a search string as argument. 

In [15]:
some_str = "220 is Awesome!"
print(some_str)

220 is Awesome!


In [20]:
print(some_str.find("1"))   # doesnt exist hence returning -1
print(some_str.find("2"))   
print(some_str.find("0")) 
print(some_str.find("A")) 
print(some_str.find("some")) 

# TODO: try to find "awe": does it work? How can you make it work?
print(some_str.lower().find("awe"))
print(some_str.upper().find("AWE"))

# TODO: discuss: what method can you invoke prior to invoking find method to successfully find "awe"?


-1
0
2
7
10
7
7


In [23]:
print(some_str.startswith("220"))
print(some_str.startswith("319"))
print(some_str.endswith("some!"))
print(some_str.endswith("Awesome!"))

True
False
True
True


Replace replaces all matching occurrence.

`string_to_updated.replace(search_string, replacement_string)`

In [24]:
print(some_str.replace("e", "E"))
print(some_str.replace("3", "three"))

220 is AwEsomE!
220 is Awesome!


String methods can be called on literals.

In [1]:
print("Hello".replace("H", "h"))
print("Meet me at the bike racks".replace('e', 'o'))

hello
Moot mo at tho biko racks


Format function enables us specify placeholders within the string, which can be replaced with some variable's value.

In [28]:
email = "Dear {}, your grade for exam1 is {}"
print(email.format("Viyan", "A"))

# TODO: give yourself or your friend some grade
score = "Dear {}, your score for midterm1 is {}"
print(score.format("Yushan", "100"))


Dear Viyan, your grade for exam1 is A
Dear Yushan, your score for midterm1 is 100


In [None]:
# TODO: what will happen when you pass only one argument to format method using email string?
print(???) 

### Sequence

- Definition: a sequence is a collection of numbered/ordered values
- String: a sequence of one-character strings

<div>
<img src="attachment:sequences.png" width="600"/>
</div>

In [29]:
# TODO: find length of some_str
print(some_str)

220 is Awesome!


### Indexing

- enables you to extract one item in your sequence, that is one character in a string
- Syntax: string_var`[index]`
    - index needs to be in range, that is from `0` to `len(string_var) - 1`
    - other index values will produce `IndexError`

In [2]:
day = "Friday"
print(day)
print(day[1])  # 2nd character
print(day[5])  # last

print(day[-1]) # last
print(day[-2]) # 2nd last
print(day[50]) # this won't work

Friday
r
y
y
a


IndexError: string index out of range

### Slicing
- enables you to extract a sub-sequence
- sub-sequence will be of same type as original sequence
- Syntax: string_var`[start_index:end_indx]`:
    - start_index is inclusive
    - end_index is exclusive
    - index need not be in range. Slicing will ignore indices which are not in range of `0` to `len(string_var) - 1`

In [4]:
print(day)
print(day[1:3])    # include 1, exclude 3
print(day[:100])  # slicing is forgiving
print(day[1:])     # can skip 2nd number
print(day[:3])     # can skip 1st number
print(day[:])      # this, too!
print(day[-3:-1])  # can use negative indices

Friday
ri
Friday
riday
Fri
Friday
da


### for loops

- can iterate over every item in a sequence

In [31]:
# print each letter of the string using while loop
index = 0
while index < len(day):
    print(day[index])
    index += 1

F
r
i
d
a
y


In [32]:
# print each letter of the string using for loop
# letter is a new variable that is the value of each iteration
for i in range(7):
    print(day(i))
    i += 1

TypeError: 'str' object is not callable

In [None]:
# the 2nd variable must be defined
# 2nd var b undefined
for a in b: 
    print(a)

In [None]:
# print each letter of the string using for loop with range built-in function call
# range enables us to iterate over every index in the string

for idx in range(???):
    print(day[???])

In [None]:
# range built-in function: an optional 3rd number is the increment
# let's print every other character in the string

for idx in range(1, len(day), 2):  
    print(day[idx])

In [None]:
# Practice: Write a for loop to generate a string that makes an acronym

phrase = "National Collegiate Athletic Association 2022"
acro = ""
for letter in phrase:
    if letter.upper() == letter:
        print(letter)
        # How can we make sure you don't consider spaces and numbers?
        # TODO: try isalpha method (update if condition)
        # TODO: now instead of printing the letter, concatenate the letter to acro

print(acro)

Other string methods: https://www.w3schools.com/python/python_ref_string.asp. Methods in Python have very intuitive names. Please don't memorize the methods.

## Wordle
### Self-practice example
- read through the below program, to understand its functionality

In [None]:
def get_wordle_results(guess):
    wordle_result = ""
    for i in range(len(guess)):
        if guess[i] == word_of_the_day[i]:
            wordle_result += "O"
        elif word_of_the_day.find(guess[i]) != -1:
            wordle_result += "_"
        else:
            wordle_result += "X"
    return wordle_result

max_num_guesses = 6
current_num_guesses = 1
word_of_the_day = "CRANE"

print("Welcome to PyWordle!")
print("You have 6 guesses to guess a 5 character word.")
print("X\tThe letter is not in the word.")
print("_\tThe letter is in the word, but in the wrong place.")
print("O\tThe letter is in the correct place!")

while current_num_guesses <= max_num_guesses:
    guess = input("Guess the word: ")
    guess = guess.upper()

    wordle_results = get_wordle_results(guess)
    print("{}\t{}".format(guess, wordle_results))
    if guess == word_of_the_day:
        break
    current_num_guesses += 1
    
if current_num_guesses > max_num_guesses:
    print("Better luck next time!")
    print("The word was: {}".format(word_of_the_day))
else:
    print("You won in {} guesses!".format(current_num_guesses))