# Strings and Containers

We encountered strings last time, and you were introduced to some of the basic aspects of them. At the most basic level, a string is just a group of characters, and we have all kinds of use-cases for dealing with sequences of characters:

*   Opening files
*   Labelling graphs
*   Saving files
*   Scraping webpages
*   And many more

In order to select individual or groups of letters from strings, we can employ a technique called `indexing`


In [None]:
first_name = "Andrew"

## Indexing

Strings are what are described in Python as an `iterable`. That means it is a thing that can be iterated upon. In plain english, it just means that it's a sequence of things (characters in this case). 

Usually, the string itself has meaning, and that meaning comes from the sequence of characters.

The Python syntax for selecting (`indexing`) a specific character in a string is to follow the string with square brackets `[1]` with a number in between indicating the number of the character.

**Warning Python is 0-indexed. This means that all iterables are numbered from 0, not from 1!**

In [None]:
print(first_name[0])

A


If you want to select a group of consecutive characters, you have to use the `index` of the first character, and the `index` of the one after the last one you want, separated by a colon `:`

so if you want the first 2 characters of `first_name`, you need to select from 0 to (but not including) 2 (i.e. positions 0 and 1). It takes a bit of time to get used to it, and even now, sometimes I have to stop, check myself and count on my fingers.

This is called "slicing"

In [None]:
print(first_name[0:2])
# if you're starting at 0 is equivalent to:
print(first_name[:2])

An
An


Once you've done some experimenting with indexing, there are two additions:

#### negative indexing

If you use a negative index, then you can start from the back: that last character is -1 (not -0 sadly).

How do you slice off the last two characters?



It's also possible to slice every other element in the sequence, for instance, taking the "A", "d" and "e" out of my `first_name`

In [None]:
print(first_name[::2])

Ade


Think of this syntax as being:

`string_name[start:stop:step]`

Where step indicates how big a jump you're making. If you leave entries blank, Python works out that you want to start at the start and end at the end.

And finally... another useful function `slice`
which allows us to index multiple strings in the same way, without having to retype the index information over and over (part of the reason we're doing this is to avoid more manual labour, right?)


In [None]:
test_slice = slice(1, 8, 2)
full_name = first_name + last_name
print(full_name[test_slice])

nrwa


Try this exercise - *and remember in many cases there are multiple ways of doing these things, right now what matters is the result*:<br>
Take the string below and count the number of "i" characters, assign the number to a suitable variable name.<br>
Now, count the total number of "i" and "I" characters and compare whether there are more "i" or "I".<br>
What proportion of the total are lower case?<br>
Find the location of the word "Nencki"<br>
Assign the word "Nencki" to a suitable variable name.<br>
Slice out every 4th letter from the string<br>


In [None]:
test_string = "This is the first incarnation of the Nencki Open Lab Python course, and I am having so much fun!"

# Take some time to explore different ways of indexing and interacting with this string

Sometimes, when you're dealing with strings, you want to modify the string dynamically - you might want different things in the string, depending on some specific criteria.<br>
For example, you may want to insert mouse identification numbers, or some other feature for labelling a graph.<br>
Python has two syntax elements for this:<br>
`f-strings` from Python 3.6 onwards<br>
`format` string method<br>

These both provide a route to add things to strings during our analyses, through the use of `{}` within the string:

In [None]:
nencki_label = "Nencki"
string_for_insertion = "I am going to insert a word here: {}"

print(string_for_insertion.format(nencki_label))
print(f"I am going to insert a word here: {nencki_label}")
# notice the f preceding the string quotation marks

# we can insert multiple elements by adding them to the method, separated by commas:
work_statement = "I have worked at {} for {} years"
print(work_statement.format(nencki_label, 3))
# or
print(f"I have worked at {nencki_label} for {3} years")

# We can insert any number of elements easily
target_string = "mouse {}; mouse {}; mouse {}"
print(target_string.format(1, 2, 3))
# or 
print(target_string.format(*"123"))

I am going to insert a word here: Nencki
I am going to insert a word here: Nencki
I have worked at Nencki for 3 years
I have worked at Nencki for 3 years
mouse 1; mouse 2; mouse 3
mouse 1; mouse 2; mouse 3


The last item, using the `*` operator before the string, is an example of "unpacking". It works with any iterable, and tells Python to put each item from the iterable into the curly braces `{}` in sequential order from left to right.
What happens if you try this:<br>
`print(target_string.format(*"1234"))`

We can also modify the appearance of the insertion, according to specific rules, using a `:` symbol, using the [string formatting mini-language](https://docs.python.org/3/library/string.html#formatspec). One of the particular advantages of this is specifying the precision of floats in string reports.

In [None]:
print("this is a string form of 0.3456789 to 3 decimal places: {:.3f}".format(0.3456789))
# notice the : followed by .3 indicating 3 decimal places, and f indicating that the input is a float.

this is a string form of 0.3456789 to 3 decimal places: 0.346


Of course, strings also have their own function to convert things to string objects: `str`

In [None]:
aa = 1
bb = str(1)
print(type(aa))
print(type(bb))
print(bb)
# note the quote marks around the output from the final print function, indicating that the output is a string

<class 'int'>
<class 'str'>
1


Indexing will come up in various forms again and again, so it's really important that you get comfortable with the syntax and the possibilities

In [None]:
test_string = "This is a test string, and is written to provide this example."


Note about **mutability** 
There are some types in Python that are *immutable*, meaning that the cannot be modified in place. Strings are one of these types. How would you change the first letter `"T"` to a lower case `"t"` by indexing? Can you insert the word `"not"` to read `"This is not a test string, and is written to provide this example."`?

One final type... For now.

Python has, effectively, a blank type called `None`. You might see it from time to time. It can be used in comparisons such as `test_string is None` and will return a bool. It's often used as a default argument to functions, or may be returned by functions. We'll go into more detail on how Python functions work later in the course.

In [None]:
print(test_string is None)

## Other containers

Python has a number of data containers, that allow multiple Python objects to be grouped together. In most cases, the objects they can contain are arbitrary, but let's get to some examples

First, lists.

There are a few ways to create a list in Python, the easiest is to use square brackets `[ ]`:

In [None]:
my_list = []
another_list = [1, 2, 3, 4]
a_list_of_strings = ["1", "2", "3", "4"]

each element in the list is separated from the next by a comma.
Lists don't need to contain the same types:

In [None]:
a_mixed_list = [1, "2", 3.0, [], 100, None, print]

We can access elements in lists by using the same indexing that we used for strings: `list_name[start, stop, step]`, remembering that:
<br>**Python iterables are 0-indexed!!**
<br>**stop indices are up-to, but not including, the index**
<br>
If we want, we can easily convert a string to a list:

In [None]:
first_name = "Andrew"
list_first_name = list(first_name)
print(list_first_name)

This method allows conversion of any iterable to a list, taking advantage of the type-flexibility of Python. Take a few minutes to play around with some Python lists:

Here, we're going to introduce a couple of functions that you will use **all** the time:
<br>
`range`<br>
and<br>
`len`

`range` allows us to obtain a list of numbers (in Python 3 this is not technically a list, so we'll be doing an explicit conversion), and uses the same structure as the `slice` function of `start, stop, step`:

In [None]:
r_100 = list(range(100))
print(r_100)
r_2_200_2 = list(range(2, 200, 2))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


This function allows us to create any length iterable that we need to, whenever we like, the use for which we will introduce soon. However, this brings us neatly on to `len`

<br>
The `len` function allows us to learn the length of an iterable, for instance if we want to know how long a string is, or how long a list we've created it

In [None]:
first_name_length = len(first_name)
print(len(r_2_200_2))
# We can also apply len to strings:
print(len(first_name))

In [None]:
help(str)

If we use the `dir` function, we can see what kind of methods are available to us for lists (ignoring any methods starting and ending with `__` for now):

The key functions that you'll find yourself using often are:<br>
`append`: add something to the end of the list<br>
`extend`: add two lists together creating a single list<br>
`count`: count the number of times a given element appears<br>
`insert`: add something to the list (where you specify the location)<br>
`pop`: remove the last element from the list<br>
They all have their uses though, and I encourage you to explore and discover what they all do. Hopefully the names are fairly intuitive


In [None]:
list_1 = [1, 2, 3, 4, 5]
list_1.append(6)
print(list_1)

list_1.extend([7, 8])  # extend will take any iterable, not just lists
list_1.extend((1, 1, 1, 1, 1))
print(list_1)

print(list_1.count(1))  # here we have to provide an argument to tell the  method what to count

list_1.insert(0, 0)  # inserts an item into the list, the first argument is the index, the second is the object
list_1.insert(-1, len)  # note we can't use insert to put something at the end, use append instead!
print(list_1)

print(list_1.pop())  # here the default behaviour is to remove the last element
print(list_1)
print(list_1.pop(0))
print(list_1)

# All these operations are carried out in place, which means that they act on the list without having to assign to a new variable like with strings

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6, 7, 8, 1, 1, 1, 1, 1]
6
[0, 1, 2, 3, 4, 5, 6, 7, 8, 1, 1, 1, 1, <built-in function len>, 1]
1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 1, 1, 1, 1, <built-in function len>]
0
[1, 2, 3, 4, 5, 6, 7, 8, 1, 1, 1, 1, <built-in function len>]


In [None]:
print(dir(r_100))

### List behaviour
This is a good place to highlight a slightly counterintuitive behaviour that lists have that you should be aware of.<br>
What do you think will happen below

In [None]:
a_list = [1, 2, 3, 4, 5]
b_list = a_list
b_list[0] = 0  # lists are mutable, so we can overwrite elements by indexing or slicing
print(a_list)

[0, 2, 3, 4, 5]
140427615590752
140427615590752


Now, try to generate a list of all the numbers from 0-20.<br>
All the even numbers between 1 and 15.<br>
And every 6th number between 1 and 50.<br>
Count the number of times the word apple appears in the `test_sentence`.

In [None]:
test_sentence = """apple banana pear banana apple Apple Orange banana pear 
banana apple Apple Orange banana pear banana Orange banana pear banana apple 
Apple Orange banana pear banana Orange banana pear banana apple Apple Orange 
banana pear banana Orange banana pear banana apple Apple Orange banana pear 
banana Orange banana pear banana apple Apple Orange banana pear banana Orange 
banana pear banana apple Apple Orange banana pear banana Orange banana pear 
banana apple Apple Orange banana pear banana Orange banana pear banana apple 
Apple Orange banana pear banana Orange banana pear banana apple Apple Orange 
banana pear banana Orange banana pear banana apple Apple Orange banana pear 
banana Orange banana pear banana apple Apple Orange banana pear banana Orange 
banana pear banana apple Apple Orange banana pear banana"""
# mention the Python triple string behaviour here!

## Lists of lists

It's possible, and even desirable to create lists with lists in them:

In [None]:
my_list_of_lists = [[1, 2, 3, 4],
                    [1, 2, 3, 4],
                    [1, 2, 3, 4],
                    [1, 2, 3, 4],
                    [1, 2, 3, 4]]

And if we want to access the contents, then we index the outer list first, to specify the row, and then the inner list to specify the column, so row 2, column 3 would be:

> Indented block



In [None]:
my_list_of_lists[2][3]
# and we can chain this together for any number of arbitrarily nested lists
my_list_of_lists[2].append(5)
print(my_list_of_lists)

[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4]]


This is really here for information purposes, once we start to cover numpy and pandas in a few weeks, then it will be pretty rare for this to be something you want to work with.

## Loops and Control Flow

This is the core of "real" programming. The power of computer is repetition, we may be interested in importing a number of files and processing their contents. In order to do this, we will need to deal with the different lines in the file. Computers are able to repeat the steps that we want to perform for as long as necessary. 

Python uses two kinds of loops:<br>
`for`<br>
`while`<br>
We'll deal with `for` loops first because those are the most common.
First, we'll print the numbers 1 to 5:

In [None]:
for number in range(1, 6):
    print(number)

1
2
3
4
5


This is the basic syntax - 

```
for thing in iterable:
    do something
```

Note the use of the reserved python words `for` and `in` (reserved meaning, please don't use them for variable names)
The line **has** to end with a colon, this is one of the signifiers for the Python interpreter. Also notice the indentation, many of the tools you might use to write Python will take care of this indentation for you, but it's 4 spaces. This is how you can tell what's in the loop. When you see something at a different level of indentation, then it's outside the loop.

In [None]:
for number in range(1, 6):
    print(number)
print("All finished!")

1
2
3
4
5
All finished!


In [None]:
test_list = [1, 2, 3, "a", "b" "c"]
for item in test_list:
  print(item)

for i in range(1, 6):
  print(i**2)



1
2
3
a
bc
1
4
9
16
25


In [None]:
list1 = [1, 2, 3, 4, 5]
for i in list1:
  list1[i] += 1
print(list1)

IndexError: ignored

In [None]:
list2 = []
for i in list1:
  list2.append(i)
print(list2)

[1, 2, 3, 4, 5]


When looping, sometimes we only want to do something to some of the elements in the iterable, for this we have something called *control flow*. This is just a general term for applying some form of decision making into the process, because we *control* the *flow* of the program.

This is accomplished, primarily using the following keywords<br>
`if`<br>
`elif`<br>
`else`<br>

`elif` is short for `else if` so allowing multiple conditions to be applied within the same loop, consider the simple example:

In [None]:
for i in range(1, 26):
  if i == 1:  # note again the indentation on the next line and the colon on this.
    print("Starting here at number: {}".format(i))  # Everything at this indentation level occurs if i == 1
  elif i % 2 == 0:
    print("Found an even number")
  else:
    print(i)

Starting here at number: 1
Found an even number
3
Found an even number
5
Found an even number
7
Found an even number
9
Found an even number
11
Found an even number
13
Found an even number
15
Found an even number
17
Found an even number
19
Found an even number
21
Found an even number
23
Found an even number
25


### You need to plan carefully what you're evaluating, Python will work from the top, so if you have a condition that is fulfilled in two subclauses, then only the first one will be completed:

In [None]:
for i in range(1, 11):
  if i == 1:
    print("Starting here at number: {}".format(i))
  elif i % 2 == 1:
    print("Found an odd number: {}".format(i))
  elif i % 3 == 0:
    print("Found a multiple of 3: {}".format(i))

The for loop doesn't find the numbers 3 or 9 as multiples of 3, because it finds that they're odd first (first `elif` block)<br>
If you want to evaluate each one, then you have to use multiple `if` statements:

In [None]:
for i in range(1, 11):
  if i == 1:
    print("Starting here at number: {}".format(i))
  if i % 2 == 1:
    print("Found an odd number: {}".format(i))
  if i % 3 == 0:
    print("Found a multiple of 3: {}".format(i))

In [None]:
score = input("Enter Score: ")

Write a program that prints the integers from 1 to 100. But for multiples of three print "Fizz" instead of the number, and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".

In [None]:
for

Remember the question below from our exercise. Let's solve it with the new knowledge, we learned.

In [None]:
comparison_list = [1, 100, 5, 26, 12, 15, 6, 7, 8]
# Using any method you know, show with booleans which values are:
# greater than 7
# less than 20
# equal to or less than 8
# equal to or more than 12
# hint: if you've read ahead in the lecture notes, feel free to try looping
# if you haven't or would like to wait until we cover it, then indexing the different elements is also fine

In [None]:
text = '''But soft what light through yonder window breaks \n
It is the east and Juliet is the sun \n
Arise fair sun and kill the envious moon \n
Who is already sick and pale with grief'''

# The program should build a list of words. For each word on each line check to see if the word is already in the list 
# and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.

### While loops
Are very simple, but are easy to get wrong, resulting in an infinite loop, so always be mindful.

`while` evaluates some expression, and if that expression is `True`, then the loop continues for one more iteration, if `False` then it terminates.
For this reason, it's usually a good idea to have some variable that is created outside the loop, and modified inside the loop by an `if` statement

In [None]:
finished = True
while finished:
  user_input = input("Please enter a number greater than 10: ")  # note this new function, allowing us to request input from a user
  if int(user_input) > 10:  # type conversion, because the input function always returns a string.
    finished = False
    print("You entered {}. The loop is now ending...".format(user_input))
  else:
    print("The number you entered is not greater than 10, please try again.")
    pass  # pass is a python keyword that just tells the interpreter to do nothing. It often acts as a nice placeholder that allows you to put some code in before you know exactly what you want to do

Can you see any potential problems with the code above? How might we improve it?<br>
*Hint* How might we use another `if` statement?

An alternative approach Python can use in a situation like this is a `try: except` block.
Basically, the `try: except` block allows the interpreter to attempt something that might produce an error, but won't crash your program if it fails, and it might look something like this:

In [None]:
user_input_2 = input("Please enter a number: ")
try:
  float_input = float(user_input_2)
  print("Input converted successfully!!")
except ValueError:
  print("That wasn't a number, there was nothing I could do with it...")
print(float_input / 2)

Two more elements we're going to show you now, `with` and `open`, the first is a keyword, and the second is a Python function.

`with` is what's described as a context manager, it allows Python to add extra interpretation to what it is you're doing. You'll see it most commonly used, and we'll show you today, in opening files.

In [3]:
file_location = "./sample_data/centroid3.csv"
with open(file_location, "r") as tracking_file:  # again, because we're using the with keyword, we're using a colon and indentation on the next line
    tracking_data = tracking_file.readlines()  # another new function here, reads each separate line into a list

In [4]:
for line in tracking_data[:10]:
  print(line)

x,y

3.176375,12.65534

2.799197,12.00402

2.330623,12.09214

2.031915,12.10638

1.758865,9.099291

1.433333,9.2

1.333333,4

1.333333,9

423.7333,4.533333



Things to think about:
How can we get rid of blank rows?<br>
Can we store the two sections separately?<br>
How many times does it use the word "the"?<br>
How about the word "scientific"?<br>
How many words in each section?<br>
Anything else that you want to look for using the tools we've shown you?

Do what you'd like in the code block above.

Write a program that repeatedly prompts a user for integer numbers until the user enters 'fin'. Once 'fin' is entered, print out the largest and smallest of the numbers. If the user enters anything other than a valid number catch it with a try/except and put out an appropriate message and ignore the number.Enter 5, 1, andrew, 15, and 9 and match the output below.

In [None]:
# smallest and largest number

Guessing game? Where random will choose a number. We will try to guess it. The program will say if we're close or far away from the chosen number. If we will find it, it can say "Congrats!"