# Slicing and dicing with strings, lists and arrays

## Overview

```{admonition} Questions
:class: questions
- What is a string?
- What is a list?
- What are arrays?
- What is slicing? It sounds painful...
```
```{admonition} Objectives
:class: objectives
- Manipulate some strings
- Grab parts of lists
- Play with some arrays
```

In the previous section we were working with numbers (`int`s and `float`s) and logical (`bool`) values. In this section we're going to look at the other main data types you'll be using this semester: **strings**, **lists** and **numpy arrays**.

## Strings

**Strings** are just a sequence of characters. Anything can be a string:

```python

"hello!"

"3 coffees, please."

"it's the end of the world (as we know it...)"

```

Here we'll make a string called `mystring` and give it the value `Take That are the greatest band in the world.` When you define a string you have to enclose it in quotation marks, `"`. You can use either single `'` or double `"` quotes, as long as you use the same type at the start and the end.

In [1]:
mystring = "Take That are the greatest band in the world."

You can add and multiply strings using the built in arithmetic functions:

In [2]:
second_string = " Way better than East 17."

big_string = mystring + second_string

print(big_string)

Take That are the greatest band in the world. Way better than East 17.


In [3]:
little_string = "abc"

print(little_string * 4)

abcabcabcabc


## Lists

Lists are collections of objects. They can be numbers, strings or both. Items in a list should be separated by commas and the whole list is enclosed in square brackets, `[ ]`. 

In [4]:
list1 = [0, 1, 1, 2, 3, 5, 8, 13]

In [5]:
list2 = [5.0, 'dog', 'yellow', 6]

The two cells above contain examples of lists. Recreate these lists in your notebook, run the cells, and print the lists with `print()` statements

In [6]:
print(list1)

[0, 1, 1, 2, 3, 5, 8, 13]


In [7]:
print(list2)

[5.0, 'dog', 'yellow', 6]


## Slicing and indexing

Why would we shove a load of random things in a list? What use is that? How do we get the individual entries? We get to them by **indexing** and **slicing**.

We can use the built in `len()` function to check how many elements are in our lists:

In [8]:
len(list1)

8

In [9]:
len(list2)

4

We can then use square brackets to ask for specific parts of the list. The number in square brackets is called the **index**. Let's look at `list2[1]`:

In [10]:
list2[1]

'dog'

You may have been expecting `list2[1]` to return `5.0` rather than `dog`, as `dog` is the second element in the list, not the first. However, Python doesn't count from 1. **Python counts from zero**. We can confirm this by looking at `list2[0]`

In [11]:
list2[0]

5.0

and by looking for `list2[4]`

In [12]:
list2[4]

IndexError: list index out of range

The `IndexError` error message is telling us that we've asked for an index that doesn't exist. The list has 4 elements, so these will have indexes between 0 and 3, not 1 and 4. 

We can select several elements at once by **slicing**. 

We can slice a list using

```python
list1[start:stop]
```

where `start` is the first element we want (remembering that Python counts from zero) and `stop` is the element we want it to stop at. 

For example, to select the 4th to 7th elements of `list1` we use

In [13]:
list1[3:7]

[2, 3, 5, 8]

Why does `list1[3:7]` return elements 4, 5, 6 and 7? Our `start` value is 3, which makes sense, but why is the `stop` value not 6? 

Think back to the REPL algorithm we looked at in the previous section. When we're slicing the list, python checks the index and asks "is this the start value?" If it is, it will start the slice. It then works through asking "is this the stop value?" When it gets to the 7th element (`list1[6]`) it keeps going. When it gets to the next element it sees that it has reached the `stop` value and doesn't add that element to the slice. 

It's OK if you're confused by this. It is a bit counterintuitive. Practice will help. So let's practice!

(ex_slice)=
```{admonition} Exercise: Slicing some lists
:class: practice

Create a new list that contains the values `penguin`, `star`, `42`, `3.1415`, `Branston` (in that order). Call your list whatever you want. 

Find the following values:

1. The length of the list
2. The middle element of the list
3. All except the first element
4. All except the last element

Do each of these in a different cell. **In the cell above or below each question** add some markdown that explains what you're doing. 

[solution](soln_slice.ipynb)

```

## Slicing tricks

The last two parts of the exercise above can be done in another way. We can use **negative indexes** to count from the end rather than the start.

To get the last element in `list1` we can use

In [19]:
list1[-1]

13

or to get all **except** the last one we could use 


In [20]:
list1[0:-1]

[0, 1, 1, 2, 3, 5, 8]

If we're feeling particularly lazy, we don't actually have to give both the `start` and `stop` values. We get the same result as above using

In [21]:
list1[:-1]

[0, 1, 1, 2, 3, 5, 8]

without giving the `start` value - it just assumes it's zero. This also works if we miss the `stop` value:

In [22]:
list1[1:]

[1, 1, 2, 3, 5, 8, 13]

For my final trick we'll look at the third way we can slice lists. In addition to the `start` and `stop` values, we can also give it a `stepsize` value:

```python
list1[start:stop:stepsize]
```

So if we wanted every other element in list we could pass a `stepsize` value of 2:

In [23]:
list1[::2]

[0, 1, 3, 8]

(ex_flip)=
```{admonition} Exercise: Flip it and reverse it
:class: practice

All the indexing and slicing we've done on lists can be applied to strings too. 

Create a variable called `missy` that contains the string `I put my thang down, flip it and reverse it`

Use the slicing methods from above to create a new variable (called whatever you want) that contains the `missy` string reversed. 
[solution](soln_flip.ipynb)
```

## `numpy` arrays

The last type of data we're going to look at today are `numpy` arrays.

Before we get started, make sure you have loaded the `numpy` package:

In [27]:
import numpy as np

Arrays are similar to lists and can be sliced and indexed in the same way. The difference is that to create a `numpy` array variable we have to use `np.array()` with our array in square brackets inside the `array` function.

For example, we can create a simple, 1-dimensional array called `array1`:

In [28]:
array1 = np.array([1,2,3,4,5])

In [29]:
array1

array([1, 2, 3, 4, 5])

and we can slice it the same way as we did with our lists:

In [30]:
array1[1:3]

array([2, 3])

In [31]:
array1[::-1]

array([5, 4, 3, 2, 1])

Our arrays don't have to be only one dimension. We can create a 2D array using two sets of square brackets:

In [32]:
array2 = np.array([[1,2,3],[4,5,6]])

In [33]:
array2

array([[1, 2, 3],
       [4, 5, 6]])

Slicing a 2D array works in the same way, but we need to be careful with our brackets.

To get the top "row", we would use

In [34]:
array2[0,:]

array([1, 2, 3])

and to get the first column we would use

In [35]:
array2[:,0]

array([1, 4])

To get a single value, e.g. the top right value, we use double brackets:

In [36]:
array2[0][-1]

3

where the value in the first set of brackets says which row, and the value in the second says which column. 


Similar to how we checked the length of our lists with `len()`, we can use `np.shape()` to check the shape of an array.

In [37]:
np.shape(array1)

(5,)

In [38]:
np.shape(array2)

(2, 3)

We can do higher dimensions, but things get complicated very quickly, so we'll just stick with 1 or 2 dimensions for now.

The reason why we use arrays instead of lists is that we can use the other `numpy` functions to operate on them. 

For example, if we had an array of angles, we could quickly find the cosine of all of those angles in one go, rather than calculating them one by one. 

In [39]:
angles = np.array([0, (np.pi/2), np.pi, (3*np.pi/2.), (2*np.pi)])

In the `angles` array above I've used `np.pi` for the value of $\pi$ (because `np.cos` expects values in radians) 

To get the cosine values of all of these angles we just pass `angles` to `np.cos()`:


In [40]:
cosines = np.cos(angles)

cosines

array([ 1.0000000e+00,  6.1232340e-17, -1.0000000e+00, -1.8369702e-16,
        1.0000000e+00])

## Key Points
- Data can also be stored in strings, lists and `numpy` arrays
- Indexing and slicing are used to select data