## Chapter 8 - Strings

A string is a **sequence**, which means it is an ordered collection of other values.

A string is a sequence of characters. You can access the characters one at a time with the bracket operator.

The expression in brackets is called an index. The index indicates which character in the sequence you want (hence the name).

*Note:* Python's index starts at 0.

In [1]:
fruit = 'banana'
letter = fruit[1]

letter

'a'

As an index you can use an expression that contains variables and operators, but the value of the index has to be an integer.

In [2]:
i = 1
print(fruit[i],fruit[i+1])

a n


**len** is a built-in function that returns the number of characters in a string.

In [3]:
len(fruit)

6

Since we started counting at zero, the six letters are numbered 0 to 5. To get the last character,
you have to subtract 1 from length:

In [4]:
last = fruit[len(fruit)-1]
last

'a'

Or you can use negative indices, which count backward from the end of the string. The expression fruit[-1] yields the last letter, fruit[-2] yields the second to last, and so on.

In [5]:
fruit[-1]

'a'

A lot of computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a **traversal**.

One way to this is with a *while* loop. As an exercise, write a function that takes a string as an argument and displays the letters backward, one per line, using a while loop.

In [6]:
def print_back(string):
    i = len(string)-1
    while i >= 0 :
        print(string[i])
        i -= 1

print_back("Hello")

o
l
l
e
H


Another way to write a traversal is with a *for* loop. In the code below; each time through the loop, the next character in the string is assigned to the variable letter. The loop continues until no characters are left.

In [7]:
for letter in fruit:
    print(letter)

b
a
n
a
n
a


In [8]:
prefixes = 'JKLMNOPQ'
suffix = 'ack'
suffix2 = 'uack'

for letter in prefixes:
    if letter == 'O' or letter == 'Q':
        print(letter + suffix2)
    else:
        print(letter + suffix)

Jack
Kack
Lack
Mack
Nack
Ouack
Pack
Quack


A segment of a string is called a **slice**. Selecting a slice is similar to selecting a character:

In [9]:
s = 'Monty Python'
s[0:5]

'Monty'

The operator [n:m] returns the part of the string from the “n-eth” character to the “m-eth” character, including the first but excluding the last.

If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the second index, the slice goes to the end of the string:

In [10]:
print(fruit[:3])
print(fruit[3:])
print(fruit[:])

ban
ana
banana


Strings are **immutable**, which means you can’t change an existing string. The best you can do is create a new string that is a variation on the original.

The example below concatenates a new first letter onto a slice of greeting. It has no effect on the
original string.

In [11]:
greeting = 'Hello, world!'
new_greeting = 'J' + greeting[1:]
new_greeting
'Jello, world!'

'Jello, world!'

The function find, below, is the inverse of the [] operator. Instead of taking an index and extracting the corresponding character, it takes a character and finds the index where that character appears. If the character is not found, the function returns -1.

This pattern of computation—traversing a sequence and returning when we find what we are looking for—is called a search.

In [12]:
def find(word, letter):
    index = 0
    while index < len(word):
        if word[index] == letter:
            return index
        index = index + 1
    return -1

As an exercise, modify find so that it has a third parameter, the index in word where it should start looking.

In [13]:
def find(word, letter, index):
    while index < len(word):
        if word[index] == letter:
            return index
        index = index + 1
    return -1

find("Garfunkel","h", 0)

-1

The following program counts the number of times the letter a appears in a string:

In [14]:
word = 'banana'
count = 0
for letter in word:
    if letter == 'a':
        count = count + 1
print(count)

3


This program demonstrates another pattern of computation called a counter. The variable count is initialized to 0 and then incremented each time an a is found. When the loop exits, count contains the result—the total number of a’s. 

As an exercise, encapsulate this code in a function named count, and generalize it so that it accepts the string and the letter as arguments.


Then rewrite the function so that instead of traversing the string, it uses the three parameter version of find from the previous section.

In [15]:
def count(word,letter):
    count = 0
    for character in word:
        if character == letter:
            count += 1
    return count

count("Yeezy", "e")

2

In [16]:
def count2(word,letter):
    count = 0
    index = 0
    while index < len(word):
        result = find(word, letter, index)
        if result == -1:
            return count
        else:
            count += 1
        index = result + 1
    return count

count2('Yeezy', 'e') #this seems more confusing than the for loop

2

Strings provide **methods** that perform a variety of useful operations. A method is similar to a function—it takes arguments and returns a value—but the syntax is different. 

For example, the method upper takes a string and returns a new string with all uppercase letters. Instead of the function syntax upper(word), it uses the method syntax word.upper().

In [17]:
word = 'banana'
new_word = word.upper()
new_word

'BANANA'

This form of dot notation specifies the name of the method, upper, and the name of the string to apply the method to, word. The empty parentheses indicate that this method takes no arguments.

A method call is called an **invocation**; in this case, we would say that we are invoking upper on word.

The word **in** is a boolean operator that takes two strings and returns True if the first appears
as a substring in the second:

In [18]:
print('a' in 'bandana')
print('f' in 'Bronson')

True
False


For example, the following function prints all the letters from word1 that also appear in word2:

In [19]:
def in_both(word1, word2):
    for letter in word1:
        if letter in word2:
            print(letter)

In [20]:
in_both('Simon', 'Garfunkel')

n


The relational operators also work on strings. To see if two strings are equal:

In [21]:
if word == 'banana':
    print('All right, bananas.')

All right, bananas.


Other relational operations are useful for putting words in alphabetical order:

In [22]:
def before_after_banana(word):
    if word < 'banana':
        return('Your word, ' + word + ', comes before banana.')
    elif word > 'banana':
        return('Your word, ' + word + ', comes after banana.')
    else:
        return('All right, bananas.')
    
print(before_after_banana('Gwen'))
print(before_after_banana('ish'))

Your word, Gwen, comes before banana.
Your word, ish, comes after banana.


Python does not handle uppercase and lowercase letters the same way people do. All the uppercase letters come before all the lowercase letters, so that's why 'Gwen' comes before banana in our example above.

A common way to address this problem is to convert strings to a standard format, such as
all lowercase, before performing the comparison.

In [23]:
def before_after_banana(word):
    compare_word = word.lower()
    if compare_word < 'banana':
        return('Your word, ' + word + ', comes before banana.')
    elif compare_word > 'banana':
        return('Your word, ' + word + ', comes after banana.')
    else:
        return('All right, bananas.')
    
print(before_after_banana('Gwen'))
print(before_after_banana('ish'))

Your word, Gwen, comes after banana.
Your word, ish, comes after banana.


**Debugging**

When you use indices to traverse the values in a sequence, it is tricky to get the beginning and end of the traversal right. Here is a function that is supposed to compare two words and return True if one of the words is the reverse of the other, but it contains two errors:

In [24]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    i = 0
    j = len(word2) #need to subtract 1 here, because the len of the word is always out of index
    while j > 0: #j will never equal 0, so we will never check the first letter of word2
        if word1[i] != word2[j]:
            return False
        i = i+1
        j = j-1
    return True

def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    i = 0
    j = len(word2)-1 #fixed
    while j >= 0: #fixed
        if word1[i] != word2[j]:
            return False
        i = i+1
        j = j-1
    return True

is_reverse('pots', 'stop')

True

### Glossary

**object:** Something a variable can refer to. For now, you can use “object” and “value” interchangeably.


**sequence:** An ordered collection of values where each value is identified by an integer index.


**item:** One of the values in a sequence.


**index:** An integer value used to select an item in a sequence, such as a character in a string. In Python indices start from 0.


**slice:** A part of a string specified by a range of indices.


**empty string:** A string with no characters and length 0, represented by two quotation marks.


**immutable:** The property of a sequence whose items cannot be changed.


**traverse:** To iterate through the items in a sequence, performing a similar operation on each.


**search:** A pattern of traversal that stops when it finds what it is looking for.


**counter:** A variable used to count something, usually initialized to zero and then incremented.


**invocation:** A statement that calls a method.


**optional argument:** A function or method argument that is not required.

### Exercises

**Exercise 8.1.** Read the documentation of the string methods.

In [25]:
phrase = 'I know what I know'

print(phrase.center(len(phrase)+10))

print(phrase.count('know'))

print(phrase.isalpha())

print(phrase.isnumeric())

print(phrase.split(sep=' '))

print(phrase.strip('know')) #removes leading or trailing characters

print("-42".zfill(5))

     I know what I know     
2
False
False
['I', 'know', 'what', 'I', 'know']
I know what I 
-0042


**Exercise 8.2.** There is a string method called count that is similar to the function in Section 8.7. Read the documentation of this method and write an invocation that counts the number of a’s in
'banana'.

In [26]:
'banana'.count('a')

3

**Exercise 8.3.** A string slice can take a third index that specifies the “step size”; that is, the number of spaces between successive characters. A step size of 2 means every other character; 3 means every third, etc.

Step size of -1 goes through the word backwards, so the slice [::-1] generates a reversed string.
Use this idiom to write a one-line version of *is_palindrome*.

In [27]:
def is_palindrome(word):
    return word == word[::-1]

print(is_palindrome("a"))
print(is_palindrome("bb"))
print(is_palindrome("thirsty"))
print(is_palindrome("racecar"))

True
True
False
True


**Exercise 8.4.** The following functions are all intended to check whether a string contains any lowercase letters, but at least some of them are wrong. For each function, describe what the function actually does (assuming that the parameter is a string).

In [28]:
def any_lowercase1(s): 
    for c in s:
        if c.islower(): #this will fail if the first character of a string is not lower because of the return calls
            return True
        else:
            return False

def any_lowercase2(s):
    for c in s:
        if 'c'.islower(): #this will always return true as we are only checking the character 'c'
            return 'True'
        else:
            return 'False'

def any_lowercase3(s):
    for c in s:
        flag = c.islower()
    return flag #flag will be set to whatever the last character is only, so it's uppercase, this would fail

def any_lowercase4(s): #this one works, once the flag is set to True, it will stay True
    flag = False #it is only set to true if we find a lowercase character
    for c in s:
        flag = flag or c.islower()
    return flag

def any_lowercase5(s):
    for c in s:
        if not c.islower(): #fails on the first character, if it is uppercase
            return False
    return True

**Exercise 8.5.** A Caesar cypher is a weak form of encryption that involves “rotating” each letter by a fixed number of places. To rotate a letter means to shift it through the alphabet, wrapping around to the beginning if necessary, so ’A’ rotated by 3 is ’D’ and ’Z’ rotated by 1 is ’A’. 


To rotate a word, rotate each letter by the same amount. For example, “cheer” rotated by 7 is “jolly” and “melon” rotated by -10 is “cubed”. In the movie 2001: A Space Odyssey, the ship computer is called HAL, which is IBM rotated by -1.


Write a function called *rotate_word* that takes a string and an integer as parameters, and returns a new string that contains the letters from the original string rotated by the given amount. You might want to use the built-in function ord, which converts a character to a numeric code, and chr, which converts numeric codes to characters. Letters of the alphabet are encoded in alphabetical
order, so for example:

ord('c') - ord('a') outputs: 2

Because 'c' is the two-eth letter of the alphabet. But beware: the numeric codes for upper case
letters are different.

In [29]:
def rotate_word(word, shift):
    code_word = ''
    for char in word.lower():
        #print(char)
        num = ord(char)
        #print(num)
        #print(num+shift)
        if num+shift > ord('z'):
            new_char = chr(((num+shift)-ord('z'))+(ord('a')-1))
            #print(new_char)
        else:
            new_char = chr(num+shift)
        code_word += new_char
    return code_word

def coded_message(message, shift):
    split = message.split(sep=' ')
    coded_message = ''
    for word in split:
        code_word = rotate_word(word,shift)
        if word == split[-1]:
            coded_message += code_word
        else:
            coded_message += (code_word + " ")
    return coded_message

In [30]:
rotate_word('Bart', 2)
coded_message("How to Cook for Forty Humans", 13)

'ubj gb pbbx sbe sbegl uhznaf'