# Strings

A string in python is an immutable sequence of characters.
We've already seen that we can make strings by enclosing
it in either single quotes `''` or double quotes `""`.
It does not
matter which we use, though if we plan to contain one
of a single or double quote inside the string, but not both,
inside the string, then it is easier to choose the one
we do not plan to use inside the string.

In [None]:
astr = 'ah'
bstr = "won't"

### String Operators

A couple of the standard operators have been defined to have meaning with strings:

* `str1 + str2` -> returns a new string with `str1` and `str2` concatenated
* `str1*n` -> returns a new string with `str1` repeated `n` times

Note that `/` and `-` are not defined for strings.

In [None]:
astr = 'ah'
bstr = 'frog'
print(astr+bstr)
print(astr*5)

These can also be combined:

In [None]:
print('a'*3 + 'h'*3 + bstr)

### Special Characters

While most characters can simply be typed into a
string, some characters don't have a clear way
to type them into strings.  These are known as special
characters.  The two most common special characters
are:

* `\n` = newline
* `\t` = tab

To use these, we simply type them as part of the
string.  For example:

In [None]:
multiline = 'a string with\na newline'
print(multiline)

### Escape Characters

While many times simply selectively choosing whether
to enclose a string in single or double quotes is
sufficient, sometimes, it's not enough.  For instance,
maybe you need to have both an apostrophe and quotation
marks inside your string.  In these cases, we can escape
the character with `\`.  This indicates that the character
immediately following should not be treated as a character
python recognizes for some purpose, but instead should show
up as just that regular character in the string.

* `\'` = single quote
* `\"` = double quote
* `\\` = backslash (since `\` is used to escape, to use it
  as a literal character, it must be used to escape itself)

In [None]:
strquote = "And Harry Potter said \"hello\""

### Strings as Lists
Strings can be treated like lists in python.  This means
that we can perform the same operations we did with lists.

#### Indexing and Slicing
* indexing - with `strname[i]`
* slicing - `strname[start:stop:stride]`.
As with lists, indices start at 0 and for slicing,
`stride` is optional and `start` or `step` can be left empty.

#### Iteration
The same two for loop styles we used for lists can also
be used for iterating through a string:
* iterating through index range and then indexing into string
* iterating directly through the characters in the string

#### Containing Items
As with lists, we can check if a substring is in a string
with `in`:

#### Examples

In [None]:
alpha = 'abcdefghijklmnopqrstuvwxyz'
print(alpha[0])
print(alpha[:5])
repeated = ''
for ch in alpha:
    repeated += ch*2
print(repeated)
print('aa' in alpha)
print('aa' in repeated)

### String Comparison
We've seen comparison operators for numeric types already (aka
`==`, `<`, `<=`, `>`, `>=`), but they can
be used with strings as well.  Take a look at the following code
and try to predict what it will print:

In [None]:
msg1 = "Hello"
msg2 = "Apple"
msg3 = "Hello"
msg4 = "apple"

print(msg1 == msg2)
print(msg1 == msg3)


print(msg1 < msg2)
print(msg1 < msg4)

You may have thought the output for the last 2 lines would be the same.  To some extent, the comparisons will do alphabetical, but in reality it is comparing the numbers used to store each letter underneath. Because of how
individual letters are stored as unique numbers by the computer, lower and uppercase letters end up at very different spots from one another, causing this to give different behavior depending on the case of the operands.  All uppercase letters come before all lowercase letters as far as the computer is concerned.

## Functions on Strings and String Methods

### Functions
We've seen previously how one of the built-in
functions, `len`, can be applied to strings to
determine the number of characters in the string.
The `min` and `max` functions can also be applied
to strings, returning the letter with a letters
value increasing from A-Z then a-z (so all uppercase
are smaller than all lowercase, just as with string
comparison).

In [None]:
msg = 'Python'
print(len(msg))
print('min is', min(msg), 'max is', max(msg))

### String Methods
As with lists, strings have many methods associated
with the specific object and invoked (called) with
`strname.methodname(args)`.  The full list
of methods for strings is available in the
[documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), but some of the most
commonly used methods are listed below.

Because strings are immutable, none of these
methods actually modify the string.  In cases where
the method returns a string, it is a new string and
the original string is unchanged.

#### Conversion Methods
* `upper()` -> returns copy of the string in all uppercase
* `lower()` -> returns copy of the string in all lowercase
* `capitalize()` -> returns copy of the string with first letter capitalized

In [None]:
msg1 = 'hello'
msg2 = 'Goodbye'
msg3 = 'garfield'
print(msg1.upper())
print(msg2.lower())
print(msg3.capitalize())

#### Trimming Methods

* `strip()` -> returns copy of the string with leading and trailing whitespace removed
* `lstrip()` -> returns copy of the string with leading whitespace removed
* `rstrip()` -> returns copy of the string with trailing whitespace removed

In [None]:
msg1 = '  2 spaces at the front and some at the end    \n'
print(msg1)
print(msg1.strip())
print(msg1.lstrip())
print(msg1.rstrip())

For each of these functions, it's also possible to pass in a string
as an argument to indicate which characters should be trimmed from
start and or end of the string.  If provided, all combinations of
its values are stripped from the beginning/end.

In [None]:
msg2 = '-*-*-*-*aaaHELLOzzz*-*-*-*-'
print(msg2.strip('az-*'))

#### Miscellaneous

* `find(substring)` -> returns lowest index in the string where `substring` starts.
  Returns -1 if `substring` is not in the string.
* `rfind(substring)` -> returns highest index in the string where `substring` starts
  (aka last occurrence)
* `index(substring)` -> like `find`, except raises `ValueError` when `substring` not found
* `count(substring)` -> returns the number of non-overlapping
  occurrences of `substring` in string

#### Converting To/From Lists

Sometimes it's necessary to explicitly go between
a string and a list.  More often than not, this is done by creating a
list of words from a string, or by combining a list of words into
a single string.

* `split()` -> return a list of words in the string by separating on whitespace.
  Consecutive whitespace is treated as a single separator
* `join(lst)` -> concatenates all of the strings in `lst`, separated by whatever
  string is calling the method
  

In [None]:
msg1 = 'This is the song that never ends\nit just goes on and on my friends'
lstmsg1 = msg1.split()
print(lstmsg1)
comma_msg1 = ','.join(lstmsg1)
print(comma_msg1)

The default is to split the string on whitespace.
It's also possible to specify a certain character
to separate entries.  This is often used with other
common delimiters, such as commas.

In [None]:
header = 'Day,Month,Year'
print(header.split(','))

It's also possible to separate
a string into a list of it's individual characters by typecasting it
to a list.  This could be useful if it's necessary to modify a lot
of the characters as you iterate through a string.
It's not necessary, but it can prevent creating many slightly modified
copies of the string in the process of modifying.

In [None]:
msg1 = 'hello'
print(list(msg1))

#### Formatting

So far, all of the printing we've been doing is just separating
the different strings or variables we wanted to print by commas
in our call to the print function.  This works okay for some
things, but it doesn't that well if we need to output something
nicely.  

To do that, we use the `format` method.  The `format` method
is quite powerful, and can be used for formatting as simple
as just inserting values/variables into a string to more
complex formatting of numbers.

At the simplest, we begin by adding `{}` anywhere in our
string that we would like to insert a value.

In [None]:
val1 = 5
val2 = 2
msg = '{} / {} = {}'.format(val1, val2, val1/val2)
print(msg)

We can also number the brackets, in which case the number
indicates which argument to format should be substituted.
More often than not, we'd do this if we want one of them
to be inserted more than once.

In [None]:
msg = '{1} / {0} = {2}'.format(val1, val2, val2/val1)
print(msg)
msg = '({0} / {1}) * ({1} / {0}) = 1'.format(val1, val2)
print(msg)

It can be ugly to keep track of these numbers, so we can also
name them.  This can be really helpful for long strings with many
values being inserted, particularly if some occur multiple times.

In [None]:
msg = 'hello {name} are you {emotion} today?'.format(name="Alice", emotion="happy")
print(msg)

If we want something like a table to output nicely formatted,
we can indicate to pad to a minimum size by adding a `:` and
following it with the minimum number of characters.

In [None]:
names = ['Alice', 'Bob', 'John']
emotions = ['happy', 'sad', 'content']

In [None]:
print('{0:10}|{1:10}'.format('name', 'emotion'))
print('-'*10 + '|' + '-'*10)
for i in range(len(names)):
    print('{0:10}|{1:10}'.format(names[i], emotions[i]))

There are other modifiers that can be added to change alignment of padding.
For example, adding `^` directly after the colon says to center.

In [None]:
print('{0:^10}|{1:^10}'.format('name', 'emotion'))
print('-'*10 + '|' + '-'*10)
for i in range(len(names)):
    print('{0:^10}|{1:^10}'.format(names[i], emotions[i]))

##### Formatting Numbers
One of the most helpful uses of `format` is that it allows
an easy way of specifying how to format floating point numbers.
For floating point numbers, the brackets would look something
like `{i:w.ps}` where:
* `i` is the index to substitute into this bracket 
* `w` is the minimum field width 
* `d` is the number of places after the decimal place to show
* `s` is a format specifier (`f` for fixed point, `e` for scientific
  notation, and `g` for general which will choose between `e` and
  `f` automatically depending on the magnitude of the number).

Note that you often don't need all of these, many are optional. 
The full range of possibilities with `format` is far more
in depth than this and is described in the
[documenation](https://docs.python.org/3.9/library/string.html#formatspec).
If there is a desired output
format you want, it's likely achievable.

##### Examples

In [None]:
largenum = 1.23944938751e5

This says to use as many places before the decimal as needed,
and to show at least 3 places after the decimal point,
in fixed point notation.

In [None]:
print('{0:.3f}'.format(largenum))

In [None]:
smallnum = 1.2394498e-6

The next two print the same number, both with a minimum width of 11.
The first example uses scientific notation with 2 places after the decimal point.
Note the spaces to the left of the printed result.  The second example uses fixed
point.

In [None]:
print('{0:11.2e}'.format(smallnum))
print('{0:11.9f}'.format(smallnum))