# Lecture 21 Notes

These notes contain examples of how to use some common string methods, and also
a few larger examples of functions that use strings.

## String Methods

Python strings have many useful built-in **methods**. Here are examples of a few
of them. 

It's important to keep in mind that none of these methods change the string they
are operating on (because strings are immutable in Python). When a string is
returned, the string is a new copy of the original string with the changes
applied (and the original string is unchanged):

In [1]:
s = 'IBM is Big Blue'

print(s.upper())  # 'IBM IS BIG BLUE'
print(s.lower())  # 'ibm is big blue'

print(s.count('B'))  # 3
print(s.count('b'))  # 0
print(s.count(' '))  # 3

print(s.replace(' ', '_'))  # 'IBM_is_Big_Blue'
print(s.replace(' ', ''))   # 'IBMisBigBlue'

print(s.startswith('IBM'))  # True, s starts with 'IBM'
print(s.endswith('Blue.'))  # False, s ends with 'Blue.'

print(s.find('is'))  # 4, 'is' starts at index 4 of s
print(s.find('as'))  # -1, 'as' is not found in s

print(s.split())  # ['IBM', 'is', 'Big', 'Blue']

t = '   Done! \n'
print(t.strip())   # 'Done!'
print(t.lstrip())  # 'Done! \n'
print(t.rstrip())  # '   Done!'

IBM IS BIG BLUE
ibm is big blue
3
0
3
IBM_is_Big_Blue
IBMisBigBlue
True
False
4
-1
['IBM', 'is', 'Big', 'Blue']
Done!
Done! 

   Done!


You can combine multiple methods together. For example:

```python
>>> s = 'IBM is Big Blue'
>>> s.lower()
'ibm is big blue'
>>> s.lower().count('b')   # Convert s to lowercase, and then
3                          # count how many 'b's it has.

>>> t = '   Done! \n'
>>> t.strip()
'Done!'
>>> t.strip().lower()
'done!'
>>> t.strip().lower().startswith('done') # Remove begin/end whitespace,
True                                     # convert to lower case, and
                                         # check if it starts with 'done'
```

In [2]:
s = 'IBM is Big Blue'
print(s.lower())              # 'ibm is big blue'
print(s.lower().count('b'))   # 3, Convert s to lowercase, and then
                              # count how many 'b's it has.
t = '   Done! \n'
print(t.strip())              # 'Done!', remove begin/end whitespace
print(t.strip().lower())      # 'done!', remove begin/end whitespace, and
                              # convert to lower case.

print(t.strip().lower().startswith('done')) # True, remove begin/end whitespace,
                                            # convert to lower case, and
                                            # check if it starts with 'done'

ibm is big blue
3
Done!
done!
True


**Example** Suppose the `strip()` method didn't exist. How could you implement
it using `lstrip()` and `rstrip()`?

In [3]:
def my_strip(s):
    """ Strips leading and trailing spaces from s.
    """
    return s.lstrip().rstrip()

print(my_strip('   Done! \n'))  # 'Done!'

Done!


## Example: A String Equality Checking Function

It's easy to check if two strings are the same using `==` and `!=`:

In [4]:
print('Star' == 'Star')  # True
print('star' == 'Star')  # False

print('Star' != 'Star')  # False
print('star' != 'Star')  # True

True
False
False
True


It's instructive to write our own version of these operators to better
understand how they work.

Let's write a function `string_equal(s, t)` that returns `True` if strings `s`
and `t` are the same length, and have the same characters in the same order.
Otherwise, it returns `False`. It should return the same results as `==`.

In [6]:
def strings_equal(s, t):
    """Returns True if strings s and t are the same, False otherwise.
    """
    # different length strings can't be equal
    if len(s) != len(t): return False
    
    # len(s) == len(t) at this point
    for i in range(len(s)):
        if s[i] != t[i]:
            return False
    
    return True

print(strings_equal('star', 'star'))  # True
print(strings_equal('star', 'Star'))  # False

True
False


`strings_equal` works as follows:

- First it checks if the strings are the same length. If they're *not*, then we
  know the strings can't be the same, and return `False` immediately.

- Second, if the strings are the same length, we use a loop to compare each pair
  of characters at the same index location to see if they are the same. If
  they're different, we return `False` immediately.

- Finally, if the loop finishes without returning `False`, then that means all
  the characters in `s` and `t` are the same, and so we return `True`.

Notice that the amount of work the function does depends in part on where the
first difference of characters appears. For example, `string_equal('x123456',
'y123456')` returns `False` immediately after checking the first characters. But
`string_equal('123456x', '123456y')` takes a little longer because the first six
characters of each string are checked before getting to the different
characters, `'x'` and `'y'`, at the end. If you are comparing a lot of very
large strings, this speed difference might be noticeably large.

Using `strings_equal` we can implement `strings_not_equal`:

In [7]:
def strings_not_equal(s, t):
    """Returns True if strings s and t are different, False otherwise.
    """
    return not strings_equal(s, t)

print(strings_not_equal('Star', 'Star'))  # False
print(strings_not_equal('star', 'Star'))  # True

False
True


## Example: Checking if All Characters are the Same

Let's write a function called `all_chars_same(s)` that returns `True` if all the
characters in `s` are the same, and `False` otherwise:

```python
>>> all_chars_same('')
True
>>> all_chars_same('a')
True
>>> all_chars_same('aa')
True
>>> all_chars_same('aaaaaaa')
True
>>> all_chars_same('ab')
False
>>> all_chars_same('aaaaa!')
False
```

Here's one way to implement it:

In [8]:
def all_chars_same(s):
    """Returns True if all the characters are the same, and False otherwise.
    Returns True for the empty string.
    """
    if len(s) <= 1:
        return True
    
    # s has at least 2 characters at this point
    first_char = s[0]
    for c in s:
        if c != first_char:
            return False
    return True

print(all_chars_same(''))       # True
print(all_chars_same('a'))      # True
print(all_chars_same('aa'))     # True
print(all_chars_same('ab'))     # False
print(all_chars_same('aaa'))    # True
print(all_chars_same('aaa aa')) # False

True
True
True
False
True
False


`all_chars_same` works as follows:

- First, if `s` has length 0 or 1 then we can return `True` immediately. `s`
  must have at least 2 characters if any of them are different.
- Second, if the string is length 2 or greater, then the first character is
  saved in `first_char`, and we use a for-loop to go through every character in
  `s` and check if it's equal to `first_char`. If not, we can return `False`
  immediately. 
- Third, if the loop finishes without returning `False`, then all the characters
  must be the same, and so `True` is returned.

## Example: Checking if All Characters are Different

Now consider the problem of testing all the characters in a string are
*different*, i.e. no 2 characters are equal. This simple function *doesn't*
work:

In [11]:
def incorrect_all_chars_different(s):
	"""Returns True if all characters are different, False otherwise.
	Returns True for the empty string.
	"""
	return not all_chars_same(s)  # wrong!

print(incorrect_all_chars_different('aabb'))  # True, but should be False

True


`incorrect_all_chars_different` is wrong because all the characters not being
the same doesn't mean they are all different. A shown, `'aabb'` because all the
characters are not the same, but they are not all different.

So we need another approach. One idea is, for strings with 2 or more characters,
to go through the characters one at a time and then use `in` to make sure it
doesn't appear in any of the characters after it:

In [15]:
def all_chars_different(s):
    """Returns True if all characters are different, False otherwise.
    Returns True for the empty string.
    """
    n = len(s)
    if n <= 1:
        return True
    
    for i in range(n):
        if s[i] in s[i+1:]:
            return False
    return True

print(all_chars_different('aabb'))    # False
print(all_chars_different('abCd'))    # True
print(all_chars_different('m'))       # True
print(all_chars_different('abcdae'))  # False

False
True
True
False


`s[i+1:]` is a slice starting at the first character *after* `s[i]` and
continuing to the end of the string. In other words, `s[i+1:]` is all the
characters after `s[i]`.

## Example: Checking for Long Lines

A common restriction in Python source code is that each line of code ought to be
no more than, say, 100 characters long. Long lines can make code harder to read.

So lets write some code that detects long lines in a given text file. The basic
idea is this:

- open the file you want to check
- for each line in the file, do this:
  - if it's longer than 100 characters, then print it

To read file in Python you first open it:

```python
file = open('filename.txt')
```

Then you can read the file line by line with a for-loop:

```python
for line in file:
    print(line)
```

A subtle detail here is that the string `line` will usually have a `'\n'`
character at the end. This is a newline character, and it's how Python knows to
start a new line when printing the string. We never want to count `\n` when
counting line length, so we will remove it using this helper function:

In [None]:
def chop(s):
    """Removes 1 trailing '\n' from s (if there is one).
    """
    if s == '':
        return s
    elif s[-1] == '\n':
        return s[:-1]
    else:
        return s

Now we can write a function that prints all the long lines in a given file:

In [16]:
def print_long_lines(filename, max_len):
    """Prints the lines in filename that are longer than max_len characters.
    """
    file = open(filename)
    for line in file:
        line = chop(line)  # remove trailing '\n' (if any)
        n = len(line) # or to count tabs as 4 spaces: n = len(line) + 3*line.count('\t')
        if n > max_len:
            print(line)

print_long_lines('some_long_lines.py', 100)

    if n == 0 or n == 1 or n == 2 or n == 3 or n == 4 or n == 5 or n == 6 or n == 7 or n == 8 or n == 9:
# This comment is over 100 characters long. It's only purpose to is give another line over 100 characters in this file
#####################################################################################################


This is a useful little function, and it there are some extra features you could
add to `print_long_lines` to make it even better. For example:

  - print the line numbers of the lines that are too long, making it easier for
    the user to find the long lines in the file
  - print the length of each long line so the user can see exactly how long it
    is
  - print a short version of the line, e.g. just the first 30 characters
    followed by " ..."; this makes the output easier to read
  - if no lines are too long, then print a message saying that