# Chapter 8 Lecture Notes

Please read chapter 8 of the textbook.

These notes take 1 - 3 lecture hours to cover.

## Strings

Strings are an important and common data type.

A **string** is defined as a sequence of 0 or more characters. The characters
could be letters, digits, punctuation, spaces, tabs, or other symbols.

A **string literal** is a sequence of characters enclosed in single, double, or
triple quotes:

In [1]:
print('you can put a " and \' here!')
print("you can put a \" and ' here!")
print("""you can put a " and ' here!""")

you can put a " and ' here!
you can put a " and ' here!
you can put a " and ' here!


"""-quotes are useful for strings that span multiple lines, and they can contain
both single and double quotes:

In [3]:
intro = """
Triple-quote strings can contain
"-quotes and '-quotes, and can
also span multiple lines.
"""

print(intro)


Triple-quote strings can contain
"-quotes and '-quotes, and can
also span multiple lines.



## String Indexing

You can read individual characters of a string using indexing:

In [2]:
fruit = 'banana'
print(fruit[1])  # a

a


`fruit[1]` is `a` (not `b`!) because the first index of a string is 0:
`fruit[0]` is the first character of the string, and `fruit[1]` is the second
character, and so on:

In [3]:
fruit = 'banana'
print(fruit[0])  # b
print(fruit[1])  # a
print(fruit[2])  # n
print(fruit[3])  # a
print(fruit[4])  # n
print(fruit[5])  # a

b
a
n
a
n
a


If you enter an index that is the same length as the string, or bigger, you get
an error:

In [4]:
fruit = 'banana'
print(fruit[6])  # IndexError: string index out of range

IndexError: string index out of range

Recall that `len` is a built-in function that returns the length of a string,
and so you can get the last character like this:

In [None]:
fruit = 'banana'
n = len(fruit)

print(fruit[n-1])  # a, the last character

print(fruit[n])  # IndexError

a


IndexError: string index out of range

## Negative Indexing

An easier way to get the last character is to use index -1:

In [None]:
fruit = 'banana'
print(fruit[-1])  # a, the last character
print(fruit[-2])  # n, the second last character
print(fruit[-3])  # a, the third last character

This is known as **negative indexing**, and can be quite convenient.

This diagram shows the positive and negative indices of a string:

<img src="banana_indices.png" alt="banana string" width="200"/>

## String Slices

A **slice** is a substring of a string. You can get a slice like this:

In [9]:
fruit = 'banana'
print(fruit[2:5])  # nan

nan


In general, if `s` is a string then `s[m:n]` is the slice consisting of the
characters `s[m]`, `s[m+1]`, ..., `s[n-1]`.

**Important**: The index values of a slice are asymmetric. The slice starts
`s[m:n]` at index `m` and ends at index `n-1`.

`s[m:]` is the slice starting at index `m` and going to the end of the string.
`s[:n]` is the slice from the start of the string up to, but not including, index
`n`.

`s[:]` is a slice that starts at the beginning and goes to the end, which is
same as the whole string.


## Strings are Immutable

An important property of Python strings is that they are **immutable**. This
means that they *cannot be modified*. You can read and copy a string, but you
can't change it's characters or change its length.

For example:

In [None]:
fruit = 'banana'
print(fruit[0])  # reading a string is ok

fruit[0] = 'B'  # TypeError: can't change a string

When you concatenate strings (i.e. with `+`), you always create new strings:

In [None]:
fruit = 'banana'

fruit = fruit + 's'
print(fruit)  # bananas, a brand new string

### Some Benefits of Immutability

Immutable strings have a number of nice consequences.

Assignment to a string variable is a simple operation that does not require
copying the characters of the string:

In [None]:
fruit = 'banana'
fruit2 = fruit
fruit3 = fruit

`fruit`, `fruit2`, and `fruit3` can all refer to the same string in memory, with
no characters needing to be copied. This fast and memory efficient.

Another advantage of immutable strings is that they can be used as keys in
dictionaries. Later we will see dictionaries, which are a kind of key-value
table that needs its keys to be immutable. If strings were changeable, then
changing a string key ruins the table.

### Some Disadvantages of Immutability

Immutability can be a problem when you want to change a string. For example,
there is no fast and simple way to change the first character of a string. You
need to do something like this:

In [10]:
phrase = 'cat is my favourite word'

phrase = 'b' + phrase[1:]

print(phrase)  # bat is my favourite word

bat is my favourite word


Both `phrase[1:]` and `+` make new strings. If the strings were mutable, we
could simply write the fast and simple statement `phrase[0] = 'b'`. 

Immutable strings can be a problem in programs where performance is critical.
For instance, writing a text editor in Python could be challenging. A text
editor should be able to efficiently change any character anywhere in the file.
Thus representing the entire file as a single string would be a bad idea because
you'd need to re-copy the entire file for every change.

## Comparing Strings

`==`, `!=`, `<`, `>`, `<=`, and `>=` can be used to compare strings. If `s` and
`t` are strings, then:

Sure, here is the updated table with each example on its own line:


|   Operator   | Description                          | Example                                      |
|:------------:|--------------------------------------|----------------------------------------------|
| `==`         | Equal to                             | `'apple' == 'apple'` (True)  <br> `'apple' == 'banana'` (False)  |
| `!=`         | Not equal to                         | `'apple' != 'orange'` (True) <br> `'apple' != 'apple'` (False)  |
| `<`          | Less than                            | `'apple' < 'banana'` (True) <br> `'banana' < 'apple'` (False)   <br> `'apple' < 'apple'` (False)|
| `>`          | Greater than                         | `'banana' > 'apple'` (True) <br> `'apple' > 'banana'` (False)   <br> `'apple' > 'apple'` (False)|
| `<=`         | Less than or equal to                | `'apple' <= 'banana'` (True) <br> `'banana' <= 'apple'` (False)  <br> `'apple' <= 'apple'` (True)|
| `>=`         | Greater than or equal to             | `'banana' >= 'apple'` (True) <br> `'apple' >= 'banana'` (False)  <br> `'apple' >= 'apple'` (True)|

Python comparisons strings *alphabetically*, which is quite useful for sorting
strings.

## Writing to Files

To *write* to a file, you must first open it with the `open` function, and using
a `'w'` argument:

In [3]:
#
# # Open 'todo_list.txt' for writing.
#
# If 'todo_list.txt' does not exist, it will be created.
# If 'todo_list.txt' already exist, it will be over-written.
#
todo_file = open('todo_list.txt', 'w')

todo_file.write('How to succeed:\n')
todo_file.write('1. learn to write poems\n')
todo_file.write('2. get a job at a poetry company\n')
todo_file.write('3. profit!\n')

# close it when done writing
todo_file.close()

#
# Print the contents of the file.
#
todo_file = open('todo_list.txt', 'r')  # 'r' for reading (optional)
for line in todo_file:
    print(line, end='')  # end='' to avoid double spacing

learn to write poems
get a job at a poetry company
profit!


Keep this in mind when you open a file for reading:
- The `'w'` is required. If you leave it out, then Python will open the file for
  reading only.
- If the file does not exist, then it will be created.
- If the file already exists, then its contents will be erased. Be careful!
- When you are done writing the file you should call `close`. The content you
  write to it may not actually appear in the file itself until you do this.

## Regular Expressions: Not Covered

The discussion of regular expressions in the textbook is not covered in this course.

## Questions

1. What is a string?

2. What are three different ways to represent string literals? Why are there so
   many different ways?

3. If `s` is a non-empty string, what are two different ways to get the last
   character of `s`?

4. Write an expression that uses negative indexing to get the first character of
   a string `s`. Assume `s` is non-empty. 

5. Without using `len`, write an expression that gives the length `s[m:n]`.
   Assume the slice is non-empty.

6. Suppose `s` is `'discombobulation'`. Using only `+`s and slices of `s`, write
   an expression that evaluates to `'iondicombot'`. Make the expression as short
   as possible.

7. Explain what it means that strings are immutable. What are some benefits of
   immutability? What are some disadvantages?

8. Write a function called `is_3sorted(s)` that returns `True` just when the
   string `s`:
   - is exactly 3 characters long
   - has no duplicate letters
   - has all its letters in alphabetical order

   In every other case, the function returns `False`.

   For example:
   - `is_3sorted('abc')` returns `True`
   - `is_3sorted('acc')` returns `False` (duplicate letters)
   - `is_3sorted('ab')` returns `False` (too short)
   - `is_3sorted('abcd')` returns `False` (too long)
   - `is_3sorted('cab')` returns `False` (not in alphabetical order)

9. Write a statement that opens a file called `log.txt` for writing.

10. If the file `log.txt` already exists, what happens when you open it for
    writing?