# Chapter 8 Lecture Notes

Please read chapter 8 of the textbook.

These notes take 1 - 3 lecture hours to cover.

## Strings

Strings are an important and common data type. They represent characters and
text, and many files and web pages are formatted entirely as text.

A **string** is defined as a sequence of 0 or more characters. The characters
could be letters, digits, punctuation, spaces, tabs, or other symbols.

A **string literal** is a sequence of characters enclosed in single, double, or
triple quotes:

In [1]:
print('you can put a " and \' here!')
print("you can put a \" and ' here!")
print("""you can put a " and ' here!""")

you can put a " and ' here!
you can put a " and ' here!
you can put a " and ' here!


"""-quotes are useful for strings that span multiple lines, and they can contain
both single and double quotes:

In [3]:
intro = """
Triple-quote strings can contain
"-quotes and '-quotes, and can
also span multiple lines.
"""

print(intro)


Triple-quote strings can contain
"-quotes and '-quotes, and can
also span multiple lines.



You can read individual characters of a string using **indexing**:

In [2]:
fruit = 'banana'
print(fruit[1])  # a

a


`fruit[i]` evaluates to the character at index `i`. The first index of a Python
string is 0, so `fruit[1]` is `a` instead of `b`.

Index 0 is the first character of the string, and index 2 is the second
character:

In [3]:
fruit = 'banana'
print(fruit[0])  # b
print(fruit[1])  # a
print(fruit[2])  # n
print(fruit[3])  # a
print(fruit[4])  # n
print(fruit[5])  # a

b
a
n
a
n
a


If you enter an index that is the same length as the string, or bigger, you get
an error:

In [1]:
fruit = 'banana'
print(fruit[10])  # IndexError: string index out of range

IndexError: string index out of range

Recall that `len` is a built-in function that returns the length of a string.
You can use it to get the last character like this:

In [5]:
fruit = 'banana'
n = len(fruit)
print(fruit[n-1])  # a, the last character

print(fruit[n])  # IndexError

a


IndexError: string index out of range

Python provides a neat trick to make this easier. Use index -1:

In [None]:
fruit = 'banana'
print(fruit[-1])  # a, the last character
print(fruit[-2])  # n, the second last character
print(fruit[-3])  # a, the third last character

This is known as **negative indexing**, and can be quite convenient.

This diagram shows the positive and negative indices of a string:

<img src="banana_indices.png" alt="banana string" width="200"/>

## Example: Print Characters of a String

Suppose we want to print the characters of a string each on their own line. The
easiest way to do this in Python is to use a for-loop:


In [7]:
fruit = 'banana'
for c in fruit:
    print(c)

b
a
n
a
n
a


Another way is to use a while-loop:


In [8]:
fruit = 'banana'
i = 0
while i < len(fruit):
    print(fruit[i])
    i += 1

b
a
n
a
n
a


The while-loop is a little more work, but it is more flexible. For example, we
can print the string in reverse:

In [9]:
fruit = 'banana'
i = len(fruit) - 1
while i >= 0:
    print(fruit[i])
    i -= 1

a
n
a
n
a
b


## String Slices

A **slice** is a substring of a string. You can get a slice like this:

In [4]:
fruit = 'banana'
print(fruit[2:5])  # nan
print(fruit[2:6])  # nana
print(fruit[0:3])  # ban
print(fruit[1:2])  # n
print(fruit[2:2])  # empty string

nan
nana
ban
n



In general, if `s` is a string then `s[m:n]` is the slice consisting of the
characters `s[m]`, `s[m+1]`, ..., `s[n-1]`. 

**Important**: The index values of a slice are *asymmetric*. The slice starts at
index `m` and ends at index `n-1`.

## Strings are Immutable

An important property of strings is that they are **immutable**. This means that
they cannot be modified. You can read and copy a string, but you **can't**
change it's characters or change its length.

For example:

In [5]:
fruit = 'banana'
print(fruit[0])  # reading a string is ok

fruit[0] = 'B'  # TypeError: can't change a string

b


TypeError: 'str' object does not support item assignment

When you concatenate strings (i.e. with `+`), you always create new strings:

In [None]:
fruit = 'banana'

fruit = fruit + 's'
print(fruit)  # bananas, a brand new string

### Some Benefits of Immutability

Immutable strings have some nice consequences.

Assigning an immutable string to a variable is a simple and efficient operation
that does not require copying the characters:

In [None]:
fruit = 'banana'
fruit2 = fruit
fruit3 = fruit

`fruit`, `fruit2`, and `fruit3` can all refer to the same string in memory, with
no characters needing to be copied. This saves both time and memory.

Getting a slice of a string is also efficient thanks to immutability. No
characters need to be copied, since Python can use index values to refer to the
slice. For instance:

In [None]:
s = 'banana'
t = s[2:5]  # nan

`t` is a string, but it could implement by something like (`s`, 2, 5), i.e. a
reference to `s` and then the slice. There is no need to copy or store any extra
characters, it can refer to the ones already in `s`. Python automatically
handles these details internally, and so you, the programmer, do not need to
worry about them and can think of `t` as an ordinary string of characters.

Immutable strings also make it easier to use a few other important features of
Python, such as dictionaries (which we will see later), and threads (which we
will not cover in this covers).

### Some Disadvantages of Immutability

A downside of immutability is that there's no fast and simple way to change the
characters of a string. For example, if you want to change the first character
of a string you need to do something like this:

In [10]:
phrase = 'cat is my favourite word'

phrase = 'b' + phrase[1:]

print(phrase)  # bat is my favourite word

bat is my favourite word


Both `phrase[1:]` and `+` make new strings. If the strings were mutable, we
could quickly and simply change the first character with a like statement
`phrase[0] = 'b'`. 

Some kinds of string processing are hard to do efficiently in Python. For
instance, implementing a text editor that represents the file as a single big
string would be a bad idea. In a text editor, you need to be able to quickly
insert, delete, and replace characters throughout the file. If it's an immutable
string, changing a single character would require copying the entire file, which
is likely too slow to be usable.

There are alternative to strings that support mutability and efficient editing,
but we won't cover them in this course.

## Comparing Strings

`==`, `!=`, `<`, `>`, `<=`, and `>=` can be used to compare strings. If `s` and
`t` are strings, then:

Sure, here is the updated table with each example on its own line:


|   Operator   | Description                          | Example                                      |
|:------------:|--------------------------------------|----------------------------------------------|
| `==`         | Equal to                             | `'apple' == 'apple'` (True)  <br> `'apple' == 'banana'` (False)  |
| `!=`         | Not equal to                         | `'apple' != 'orange'` (True) <br> `'apple' != 'apple'` (False)  |
| `<`          | Less than                            | `'apple' < 'banana'` (True) <br> `'banana' < 'apple'` (False)   <br> `'apple' < 'apple'` (False)|
| `>`          | Greater than                         | `'banana' > 'apple'` (True) <br> `'apple' > 'banana'` (False)   <br> `'apple' > 'apple'` (False)|
| `<=`         | Less than or equal to                | `'apple' <= 'banana'` (True) <br> `'banana' <= 'apple'` (False)  <br> `'apple' <= 'apple'` (True)|
| `>=`         | Greater than or equal to             | `'banana' >= 'apple'` (True) <br> `'apple' >= 'banana'` (False)  <br> `'apple' >= 'apple'` (True)|

Python comparisons strings alphabetically, which is quite useful for sorting
strings. Non-alphabetic letters, and the case of the letters, also matters, and
so the general name is **lexicographic comparison**.

## Writing to Files

Previously, we've seen how to open a text file to read it line by line:

```python
todo_file = open('todo_list.txt')
for line in todo_file:
    print(line, end='')  # end='' to avoid double spacing
```

If you want to *write* to a file, you must first open it with the `open`
function, and using a `'w'` argument:

In [3]:
#
# # Open 'todo_list.txt' for writing.
#
# If 'todo_list.txt' does not exist, it will be created.
# If 'todo_list.txt' already exist, it will be over-written.
#
todo_file = open('todo_list.txt', 'w')

todo_file.write('How to succeed:\n')
todo_file.write('1. learn to write poems\n')
todo_file.write('2. get a job at a poetry company\n')
todo_file.write('3. profit!\n')

# close it when done writing
todo_file.close()

#
# Print the contents of the file.
#
todo_file = open('todo_list.txt', 'r')  # 'r' for reading (optional)
for line in todo_file:
    print(line, end='')  # end='' to avoid double spacing

learn to write poems
get a job at a poetry company
profit!


Writing files is is very useful, but be careful to write to the right file! Keep
this in mind when you open a file for writing:

- If the file does not exist, then it will be created.
- If the file already exists, then its contents will be **erased**.
- The `'w'` is required. If you leave it out, then Python will open the file for
  reading only.
- When you are done writing the file you should call `close`. The contents you
  write to it may not actually appear in the file itself until you call close.

## Regular Expressions: Not Covered

The discussion of regular expressions in the textbook is not covered in this course.

## Questions

1. What is a string?

2. What are three different ways to represent string literals? Why are there so
   many different ways?

3. If `s` is a non-empty string, what are two different ways to get the last
   character of `s`?

4. Write an expression that uses negative indexing to get the first character of
   a string `s`. Assume `s` is non-empty. 

5. Without using `len`, write an expression that gives the length `s[m:n]`.
   Assume the slice is non-empty.

6. Suppose `s` is `'discombobulation'`. Using only `+`s and slices of `s`, write
   an expression that evaluates to `'iondicombot'`. Make the expression as short
   as possible.

7. Explain what it means that strings are immutable. What are some benefits of
   immutability? What are some disadvantages?

8. Write a function called `is_3sorted(s)` that returns `True` just when the
   string `s`:
   - is exactly 3 characters long
   - has no duplicate letters
   - has all its letters in alphabetical order

   In every other case, the function returns `False`.

   For example:
   - `is_3sorted('abc')` returns `True`
   - `is_3sorted('acc')` returns `False` (duplicate letters)
   - `is_3sorted('ab')` returns `False` (too short)
   - `is_3sorted('abcd')` returns `False` (too long)
   - `is_3sorted('cab')` returns `False` (not in alphabetical order)

9. Write a statement that opens a file called `log.txt` for writing.

10. If the file `log.txt` already exists, what happens when you open it for
    writing?