# Introductory Notes

Throughout this entire notebook you should be experimenting with the code in the non-text cells. A great way to begin to get a feel for Python is by playing with it. So have some fun by changing the values in the cells and then running them again with Shift-Enter. Before you do, think about what you expect the output to be, and make sure your intuition matches up with what you run. If it doesn't, take some time to think about what happened so you can hone your intuition.

At the end of each section there will be some questions to help further your understanding. Remember, in Python we can always manually test code by running it; however, you should try to think about the answers to these questions before you run some code. This way you can check and verify your understanding of the section's topic.

## Introduction to Strings and Lists

Last week, we learned about how to use Python's power of `while` loops and conditionals, operating on some simple built in numeric types. Today, we are going to learn about a couple of data structures that will continue to build up your power in Python, and we'll learn about a new type of loop.

### Strings

First, we are going to learn about another common data type, strings. From a high-level perspective, a string is just a bit of text. This could be text that you have read in from a file, html that you have pulled from the Internet, or any other text. From Python's perspective, a string (type `str`) is simply a collection of encoded characters. Wait, what's an encoding...?

An encoding is just a fancy way of us saying that the characters in our string follow a certain format, or structure. The reason this matters to us in terms of our Python programs, though, is that Python expects our strings to be in one of a couple of different encodings (either `ASCII`, `utf-8`, or `unicode`). This isn't something you will run into often, and especially not when defining your own strings (it's probably most prevalent when pulling text from the Internet). However, it's worth noting because there is a good chance that sometime in your Python career, you will end up with Python telling you it doesn't recognize a certain character in one of your strings, and an unexpected encoding will most likely be at the heart of that error.

In Python, strings are recognized as a collection of characters surrounded by a set of either single quotation marks (`'...'`) or double quotation marks (`"..."`). So long as you open and close your string with a **matching** set of single or double quotation marks, you are free to use either. The single caveat to that is that if you are writing an expression with a single quotation mark in it (such as "Don't do that"), you will **have to** use a matching set of **double** quotation marks. Let's experiment with some strings...


In [1]:
'This is a string.'

'This is a string.'

In [2]:
"This is another string, but this time with double quotation marks."

'This is another string, but this time with double quotation marks.'

In [1]:
'They told me not to do this, but I didn\'t listen.' 

"They told me not to do this, but I didn't listen."

Just like we expected, we can use both single and double quotation marks. What happened in the 3rd case there? Well, we opened the string with a single quotation mark, and Python started looking for the next single quotation mark to close the string. When it found that quotation mark in the word `didn't`, it assumed the string was closed after `didn`. As a result, this left `t listen.'` just hanging out, and Python didn't know how to interpret that, resulting in our error. The solution to this, as mentioned above, is to use double quotation marks in any case where your text will have single quotation marks in it. For example...

In [2]:
"Now that I've got double quotes, I can use all the contractions!"

"Now that I've got double quotes, I can use all the contractions!"

In [3]:
"Can't, won't, didn't, don't... all the contractions!"

"Can't, won't, didn't, don't... all the contractions!"

As a final note before we dive into string operations, we can store strings in variables in the exact same way that we can store an `int`, `float`, or `complex`.

In [4]:
my_str_variable = 'This is a string variable.' 

In [5]:
my_str_variable # my_str_variable holds the string that we put in it in the above cell. 

'This is a string variable.'

**Introductory String Questions**

1. When does the distinction between using single and double quotes to build a string matter?
2. Fix the following string to be considered valid and not throw an error when run.
    * `'They told me not to do this, but I didn't listen.'`
3. Create a variable that holds a string of your name.
4. Create another variable that holds a string of your best friend's name.

1. It depends on the contents of the string. Single quote strings can't simply contain single quotes.
2. Code:
  - `'They told me not to do this, but I didn\'t listen.'`

In [19]:
my_name, best_friends_name = 'Florian', 'Ivan'
my_name, best_friends_name

('Florian', 'Ivan')

#### String Operations

Surprisingly, a couple of our standard mathematical operations will work on strings, namely `+` and `*`. We can use the `+` operator to add two strings together (this is known as string **concatenation**), and we can use the `*` operator to repeat a string a given number of times. Let's take a look...

In [6]:
'My first string' + 'My second string'

'My first stringMy second string'

In [7]:
'Repeating string' * 3

'Repeating stringRepeating stringRepeating string'

Note that Python didn't put spaces between the strings with either the `+` operator or the `*` operator. Why not? Because it wasn't told to! In this case, and in programming in general, we have to be extremely explicit about what we want the computer to do. To fix this, we can add a space in the middle of the first case, and then add a space to the end of our string in the second case.

In [8]:
'My first string' + ' ' + 'My second string'

'My first string My second string'

In [9]:
'Repeating string ' * 3

'Repeating string Repeating string Repeating string '

That looks much better! But, what about that pesky little space at the end of our second string: `'Repeating string Repeating string Repeating string '`. Is there a way to remove this? It turns out there is! One of the methods (a name for a function that is attached to a particular object) that we can call on strings is the `strip()` method. Methods are something that we will cover in much more depth later, but for now just note that we call them on our objects through **dot notation**. We simply place a `.` at the end of our object (`str`, `int`, `float`, any variable, etc.), and then call the method by name. Here's how the use of this **dot notation** looks in practice.

In [10]:
'Repeating string Repeating string Repeating string '.strip()

'Repeating string Repeating string Repeating string'

In [11]:
' Repeating string Repeating string Repeating string '.strip()

'Repeating string Repeating string Repeating string'

So, what did the `strip()` method do? In the first example, it removed the trailing space from the string. In the second example, it removed both the leading and trailing spaces. This is exactly what the `strip()` method does - by default (without any arguments) it removes leading and trailing whitespace (*note, the method can actually remove any leading or trailing characters if you pass them to `strip()`, but whitespace is the default character that it removes*).

Are there other things that we can do with strings? There are tons! Let's store our string in a variable below, so we can get some exposure working with strings in variables.

In [12]:
my_str_variable = 'this IS my STRING to PLAY around WITH.'

In [13]:
my_str_variable.capitalize()

'This is my string to play around with.'

In [14]:
my_str_variable.upper()

'THIS IS MY STRING TO PLAY AROUND WITH.'

In [15]:
my_str_variable.lower()

'this is my string to play around with.'

In [16]:
my_str_variable.replace('STR', 'fl')

'this IS my flING to PLAY around WITH.'

In [17]:
my_str_variable.split()

['this', 'IS', 'my', 'STRING', 'to', 'PLAY', 'around', 'WITH.']

These are some of the most commonly used string methods. You can see above what they do by default: `capitalize()` capitalizes the first letter of the string and lowercases the rest; `upper()` converts all the letters in the string to uppercase, and `lower()` to lowercase; `replace()` replaces all instances of a given substring in your string with another given substring; finally, `split()` splits the string by an inputted string (whitespace by default, just as with `strip()`). There are many more string methods available, and you can check them out in the [docs](https://docs.python.org/2/library/stdtypes.html#string-methods).

Alternatively, you can find out what methods are available to call on strings from the IPython terminal itself (this is one of the really awesome features of IPython)! This also works in an IPython notebook like this one. Using tab completion, if you have a string stored in a variable, you can type the variable name followed by a period, and then use tab complete to see all the methods available for strings! For display purposes, we're showing below what you would see if you tab completed in IPython (if you tab completed in an IPython notebook instead, you would get a dropdown menu showing what's available on that variable). 

```python
In [1]: my_str.  # Hit tab now!

my_str.capitalize  my_str.isalnum     my_str.lstrip      my_str.splitlines
my_str.center      my_str.isalpha     my_str.partition   my_str.startswith
my_str.count       my_str.isdigit     my_str.replace     my_str.strip
my_str.decode      my_str.islower     my_str.rfind       my_str.swapcase
my_str.encode      my_str.isspace     my_str.rindex      my_str.title
my_str.endswith    my_str.istitle     my_str.rjust       my_str.translate
my_str.expandtabs  my_str.isupper     my_str.rpartition  my_str.upper
my_str.find        my_str.join        my_str.rsplit      my_str.zfill
my_str.format      my_str.ljust       my_str.rstrip      
my_str.index       my_str.lower       my_str.split
```


**Note**: This works for all of our variable types! Not only that, but we can also tab complete the names of the variables that IPython currently knows about (those in the **namespace**).

**String Operation Questions**

1. Start off by storing your name in a variable again.
2. Now, use string concatenation (e.g. addition) to add 'Hello, ' before your name. 
3. Given the string 'Hello, Sean', replace each of the letter 'e''s with a 't'.
4. Use the `.split()` method on the string 'Hello, Sean' to split it by the comma (`','`).
    * What happens if you split by a comma and a space (`', '`)?

In [22]:
my_name = 'Florian'
greeting = 'Hello, ' + my_name
greet_sean = 'Hello, Sean'
print(greet_sean)
greet_sean = greet_sean.replace('e', 't')
print(greet_sean)
greet_sean.split(', ')

Hello, Sean
Htllo, Stan


['Htllo', 'Stan']