# This notebook was taken from Chapter 1 from [Python Programming for the Humanities](http://www.karsdorp.io/python-course/).

## String manipulation

Many disciplines within the humanities work on texts. Quite naturally programming for the humanities will focus a lot on manipulating texts. In the last quiz you were asked to define a variable that points to a string that represents your name. We have already seen some basic arithmetic in our very first calculation. Not only numbers, but also strings can be added, or, more precisely, *concatenated*, together as well:

In [2]:
name = "Suleiman" # insert your first name
book = "The Lord of the Flies"
print(name + " likes " + book + "?")

Suleiman likes The Lord of the Flies?


This string consists of a number of characters. We can access the individual characters with the help of `indexing`. For example, to find only the first letter of your name, you can type in:

In [3]:
first_letter = name[0]
print(first_letter)

S


Notice that to access the first letter, we use the index `0`. This might seem odd, but just remember that indexes in Python start at zero.

---

#### Quiz!

Now, if you know the length of your name you can ask for the last letter of your name:

In [4]:
last_letter = name[7]
print(last_letter)

n


---

It is rather inconvenient having to know how long our strings are if we want to find out what its last letter is. Python provides a simple way of accessing a string from the rear:

In [5]:
last_letter = name[-1]
print(last_letter)

n


Alternatively, there is the function `len()` which returns the length of a string:

In [6]:
print(len(name))

8


Do you understand the following?

In [7]:
print(name[len(name)-1])

n


---

#### Quiz!

Now can you write some code that defines a variable `but_last_letter` and assign to it the second-to-last letter of your name?

In [8]:
but_last_letter = name[-2]
print(but_last_letter)

a


---

You're starting to become a real expert in indexing strings. Now what if we would like to find out what the last two or three letters of our name are? In Python we can use so-called slice-indexes or slices for short. To find the first two letters of our name we type in:

In [9]:
first_two_letters = name[0:2]
print(first_two_letters)

Su


The `0` index is optional, so we could just as well type in `name[:2]`. This says take all characters of name until you reach index 2. We can also start at index 2 and leave the end index unspecified:

In [12]:
without_first_two_letters = name[2:]
without_first_two_letters

'leiman'

Because we did not specify the end index, Python continues until it reaches the end of our string. If we would like to find out what the last two letters of our name are, we can type in:

In [13]:
last_two_letters = name[-2:]
print(last_two_letters)

an


Take a look at the following picture. Do you fully understand it? 
<div style="float: center;"><img style="float: center;" src="http://www.nltk.org/images/string-slicing.png" align=center /></div>

---

#### Quiz!

Can you define a variable `middle_letters` and assign to it all letters of your name except for the first two and the last two?

In [14]:
middle_letters = name[1:7]
print(middle_letters)

uleima


Given the following two words, can you write code that prints out the word *humanities* using only slicing and concatenation? (So, no quotes are allowed in your code.)

In [16]:
word1 = "human"
word2 = "opportunities"
word1[0:]+word2[-5:]

'humanities'

---

##### What we have learnt

To finish this section, here is an overview of what we have learnt. Go through the list and make sure you understand all the concepts.

-  concatenation (e.g. addition of strings)
-  indexing
-  slicing
-  `len()`

---

## Lists

Consider the sentence below:

In [17]:
sentence = "Python's name is derived from the television series Monty Python's Flying Circus."

Words are made up of characters, and so are string objects in Python. As we will see, it is always to be prefered to represent our data as naturally as possible. Now for the sentence above, it seems more natural to describe it in terms of words than in terms of characters. Say we want to access the first word in our sentence. If we type in:

In [18]:
first_word = sentence[0]
print(first_word)

P


Python only prints the first letter of our sentence. (Think about this if you do not understand why.) We can transform our sentence into a `list` of words (represented by strings) using the `split()` function as follows: 

In [19]:
words = sentence.split()
print(words)

["Python's", 'name', 'is', 'derived', 'from', 'the', 'television', 'series', 'Monty', "Python's", 'Flying', 'Circus.']


By issuing the function split on our sentence, Python splits the sentence on spaces and returns a list of words. In many ways a list functions like a string. We can access all of its components using indexes and we can use slice indexes to access parts of the list. Let's try it!

---

#### Quiz!

Write a small program that defines a variable `first_word` and assign to it the first word of our word list. Play around a little with the indexes to see if you really understand how it works.

In [20]:
first_word = sentence.split()
print(first_word[0])

Python's


---

A `list` acts like a container where we can store all kinds of information. We can access a list using indexes and slices. We can also add new items to a list. For that you use the method `append`. Let's see how it works. Say we want to keep a list of all our good reads. We start with an empty list and we will add some good books to it:

In [21]:
#start with an empty list
good_reads = []
good_reads.append("The Hunger games")
good_reads.append("A Clockwork Orange")
print(good_reads)

['The Hunger games', 'A Clockwork Orange']


Now, if for some reason we don't like a particular book anymore, we can change it as follows:

In [22]:
good_reads[0] = "Pride and Prejudice"
print(good_reads)

['Pride and Prejudice', 'A Clockwork Orange']


---

#### Quiz!

Here's another small Quiz! Try to change the title of the second book in our good reads collection.

In [23]:
good_reads[1] = "Notes of a Native Son"
print(good_reads)

['Pride and Prejudice', 'Notes of a Native Son']


---

We just changed one element in a list. Note that if you do the same thing for a string, you will get an error:

In [24]:
name = "Pythen"
name[4] = "o"

<class 'TypeError'>: 'str' object does not support item assignment

This is because `strings` (and some other types) are *immutable*. That is, they cannot be changed, as opposed to `lists` which *are* mutable. Let's explore some other ways in which we can manipulate lists.

#### remove()

Let's assume our good read collection has grown a lot and we would like to remove some of the books from the list. Python provides the method `remove` that acts upon a list and takes as its argument the items we would like to remove. 

In [25]:
good_reads = ["The Hunger games", "A Clockwork Orange", 
              "Pride and Prejudice", "Water for Elephants",
              "The Shadow of the Wind", "Bel Canto"]

good_reads.remove("Water for Elephants")

print(good_reads)

['The Hunger games', 'A Clockwork Orange', 'Pride and Prejudice', 'The Shadow of the Wind', 'Bel Canto']


If we try to remove a book that is not in our collection, Python raises an error (don't be afraid, your computer won't break ;-))

In [26]:
good_reads.remove("White Oleander")

<class 'ValueError'>: list.remove(x): x not in list

---

#### Quiz!

Define a variable `good_reads` as an empty list. Now add some of your favorite books to it (at least three) and print the last two books you added. 

In [28]:
good_reads=[]
good_reads.append("Notes of a Native Son")
good_reads.append("The Wind-Up Bird Chronicle")
good_reads.append("The Prince")
print(good_reads[1:])

['The Wind-Up Bird Chronicle', 'The Prince']


---

Just as with strings, we can concatenate two lists. Here is an example:

In [29]:
#first we specify two lists of strings:
good_reads = ["The Hunger games", "A Clockwork Orange", 
              "Pride and Prejudice", "Water for Elephants",
              "The Shadow of the Wind", "Bel Canto"]

bad_reads = ["Fifty Shades of Grey", "Twilight"]

all_reads = good_reads + bad_reads
print(all_reads)

['The Hunger games', 'A Clockwork Orange', 'Pride and Prejudice', 'Water for Elephants', 'The Shadow of the Wind', 'Bel Canto', 'Fifty Shades of Grey', 'Twilight']


#### sort()

It is always nice to organise your bookshelf. We can sort our collection with the following expression:

In [30]:
good_reads.sort()
print(good_reads)

['A Clockwork Orange', 'Bel Canto', 'Pride and Prejudice', 'The Hunger games', 'The Shadow of the Wind', 'Water for Elephants']


#### nested lists

Up to this point, our lists only have consisted of strings. However, a list can contain all kinds of data types, such as integers and even lists! Do you understand what is happening in the following example?

In [31]:
nested_list = [[1, 2, 3, 4], [5, 6, 7, 8]]
print(nested_list[0])
print(nested_list[0][0])

[1, 2, 3, 4]
1


We can put this to use to enhance our good read collection with a score for every book we have. An entry in our collection will consist of a score within the range of 1 and 10 and the title of our book. The first element is the title; the second the score: `[title, score]`. We initialize an empty list:

In [32]:
good_reads = []

And add two books to it:

In [35]:
good_reads.append(["Pride and Prejudice", 8])
good_reads.append(["A Clockwork Orange", 9])
good_reads.append(["Notes of a Native Son", 10])
good_reads.append(["The Wind-Up Bird Chronicle", 8])
good_reads.append(["The Prince", 9])
print(good_reads[0][1])

8


---

#### Quiz!

Update the `good_reads` collection with some of your own books and give them all a score. Can you print out the score you gave to the first book in the list? (Tip: you can pile up indexes)

---

##### What we have learnt

To finish this section, here is an overview of the new concepts and functions you have learnt. Go through them and make sure you understand them all.

-  list
-  *mutable* versus *immutable*
-  `.split()`
-  `.append()`
-  nested lists
-  `.remove()`
-  `.sort()`

---