# Python Basics 2
## Notebook 2
<br>

### In this Notebook we'll cover:<br>

3. Python collections<br>
    3.1 Lists<br>
    3.2 Dictionaries<br><br>

***

## 3. Python collections

<br>

Collections are containers that are used to store collections of data. There are four collection data types in Python:

- **List**
- Tuple
- Set
- **Dictionary**

We are going to review **Lists** and **Dictionaries**
<br><br>

### 3.1 Lists

A list is a collection which is ordered and changeable. In Python, lists are written with square brackets

In [1]:
my_list = ['element 1', 'element 2', 'element 3', 'element 4', 'element 5']

print(type(my_list))

print(my_list)

<class 'list'>
['element 1', 'element 2', 'element 3', 'element 4', 'element 5']


<br>
In Python, lists have a very similar behaviour to strings, so we can use what we already know about strings to manipulate lists as well:

- We can **get the number of elements** in the list using the function `len()`<br><br>

- We can **access elements** of the list using square brackets and the index of the item we want to extract<br><br>

- We can **extract slices** using square brackets including the start and end index of the elements we want to extract (Remember Python does not include the end index)<br><br>

- We can **join two lists** using the `+` operator

In [2]:
print('The length of this list is: ' + str( len(my_list) ))
print('\n The first item is: ' + my_list[0] )
print('\nThe first two items are: ' + str( my_list[0:2] ))
print('\nJoining two lists together: ' + str( my_list[0:2] + my_list[0:2] ))

The length of this list is: 5

 The first item is: element 1

The first two items are: ['element 1', 'element 2']

Joining two lists together: ['element 1', 'element 2', 'element 1', 'element 2']


<br><br>
You can **change the value** of an item by selecting it and assigning a new value to it

In [3]:
my_list[3] = 'new value'

print(my_list)

['element 1', 'element 2', 'element 3', 'new value', 'element 5']


<br><br>
You can **iterate over the elements** of a list in exactly the same way as we did with the string

In [4]:
for elem in my_list:
    print(elem)

element 1
element 2
element 3
new value
element 5


<br><br>
You can **add new elements** to the list using the `append()` method

Notice that since it is a method, the syntax is <font color='#86b300'>**variable**</font><font color='#cc0066'>**.method()**</font>

In [5]:
my_list.append('appended value')
print(my_list)

['element 1', 'element 2', 'element 3', 'new value', 'element 5', 'appended value']


<br><br>
You can **remove elements** to the list using the `remove()`

In [6]:
my_list.remove('element 5')
print(my_list)

['element 1', 'element 2', 'element 3', 'new value', 'appended value']


Or the `pop()` method if you want to remove it by its index

In [7]:
my_list.pop(2)

'element 3'

<br><br>
You can see all of the methods you can use on lists [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)
<br><br>

<br><br>
<font color='#993366'>Create your own list with at least three characters. Add a new element, remove one and iterate over its elements in a `for` loop, printing each element</font>

<br><br>

### 3.2 Dictionaries

A dictionary is a collection which is **unordered**, changeable **and indexed** (this means that each element has a 'name'). In Python dictionaries are written with curly brackets, and they have <font color='#009999'>**keys**</font> (the 'names') and <font color='#cc0066'>**values**</font>

In [8]:
my_dict = {'key_1':'val_1', 'key_2':'val_2', 'key_3': 'val_3'}
print(type(my_dict))
print(my_dict)

<class 'dict'>
{'key_1': 'val_1', 'key_2': 'val_2', 'key_3': 'val_3'}


<br><br>
You can access the items of a dictionary by referring to its key name, inside square brackets

In [9]:
print('Dictionary value for entry with key = \'key_2\': ' + my_dict['key_2'])

Dictionary value for entry with key = 'key_2': val_2


<br><br>
You can **change the value** of an item by referring to its <font color='#009999'>**key**</font> name and assigning a new value to it

In [10]:
my_dict['key_2'] = 'new_val_2'
print(my_dict)

{'key_1': 'val_1', 'key_2': 'new_val_2', 'key_3': 'val_3'}


<br><br>
You can **iterate over the elements** of a list in the same way as we did with strings and with lists

The only difference is that the iterator will only return the <font color='#009999'>**key**</font>, and not the <font color='#cc0066'>**value**</font> of the element

In [11]:
for elem in my_dict:
    print(elem)

key_1
key_2
key_3


To obtain the value you just need to access it using square brackets

In [12]:
for elem in my_dict:
    print(my_dict[elem])

val_1
new_val_2
val_3


Or use the method `values()`

In [13]:
for vals in my_dict.values():
    print(vals)

val_1
new_val_2
val_3


<br><br>
You can **add new elements** to the dictionary in the same way as we modified the value of an existing entry, the only difference is that the <font color='#009999'>**key**</font> you are using here must be new

In [14]:
my_dict['new_key'] = 'new_value'
print(my_dict)

{'key_1': 'val_1', 'key_2': 'new_val_2', 'key_3': 'val_3', 'new_key': 'new_value'}


<br><br>
You can **remove elements** to the list using the `pop()` method, specifying the <font color='#009999'>**key**</font>

In [15]:
my_dict.pop('new_key')
print(my_dict)

{'key_1': 'val_1', 'key_2': 'new_val_2', 'key_3': 'val_3'}


<br><br>

You can see all of the methods you can use on dictionaries [here](https://www.w3schools.com/python/python_dictionaries.asp)

<br><br>
<font color='#993366'>Create your own dictionary, add a new element, remove one and iterate over its keys, printing its corresponding value on each iteration</font>


<br><br><br><br>

***
***

# Example

Following the example we were working on last week, the following block of code loads the alice.txt file as a string
<br><br>

**Note:** The way in which these problems are solved is not very efficient, and sometimes there are special Python functions or methods that do the same thing in a simpler way, but we are doing it this way to practice the concepts we have seen so far

In [16]:
file = open('alice.txt','r')
alice = file.read()
file.close()

<br><br>
We can split the string into a list of chapters using the `split()` method

**Note:** We

In [17]:
alice_by_chapter = alice.split('CHAPTER')

print('The book has ' + str(len(alice_by_chapter)) + ' chapters')

The book has 13 chapters


<br><br>
If we see the content of the first chapter we see it's not actually a chapter but the title of the book, so let's remove it

In [18]:
print('First Chapter: ' + alice_by_chapter[0])

#  Remove first element of the list
alice_by_chapter.pop(0)

First Chapter: [Alice's Adventures in Wonderland by Lewis Carroll 1865]




"[Alice's Adventures in Wonderland by Lewis Carroll 1865]\n\n"

In [19]:
print('\nNew first Chapter:\n\nCHAPTER' + alice_by_chapter[0])


New first Chapter:

CHAPTER I. Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, 'and what is the use of a book,' thought Alice 'without pictures or
conversation?'

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure
of making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.

There was nothing so VERY remarkable in that; nor did Alice think it so
VERY much out of the way to hear the Rabbit say to itself, 'Oh dear!
Oh dear! I shall be late!' (when she thought it over afterwards, it
occurred to her that she ought to have wondered at this, but at the time
it all seemed quite natural); but when the Rabbit actually TOOK A WATC

<br><br>
Let's create a new string variable with the first chapter of the book, divide it by paragraphs and remove the first one because it's the title of the chapter and not an actual paragraph, and count the number of paragraphs it contains

In [20]:
# Select first chapter
ch_1 = alice_by_chapter[0]

# Split the string by paragraphs (double line breaks)
ch_1_by_paragraph = ch_1.split('\n\n')

# Remove the first element of the list
ch_1_by_paragraph.pop(0)


print('\nFirst paragraph of the first chapter:\n\n' + ch_1_by_paragraph[0])

print('\nThe first chapter has ' + str(len(ch_1_by_paragraph)) + ' paragraphs')


First paragraph of the first chapter:

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, 'and what is the use of a book,' thought Alice 'without pictures or
conversation?'

The first chapter has 32 paragraphs


<br>

***

Let's now divide the whole book into paragraphs and print only the paragraphs that mention the character The Dodo.

First we need to create our new list, removing the title of the book and ignoring (for now) the paragraphs that contain the title of the chapters

In [21]:
# 1. Split the string by double line breaks
alice_by_paragraphs = alice.split('\n\n')

# 2. Drop the first element (because it contains the title)
alice_by_paragraphs.pop(0)

"[Alice's Adventures in Wonderland by Lewis Carroll 1865]"

To print the paragraphs where The Dodo is mentioned, let's use a `for` loop with an `if` statement inside using the method `count()` that we saw last week to count substring ocurrences to know if this character is mentioned or not

In [22]:
# Iterate over each paragraph in the list
for paragraph in alice_by_paragraphs:
    
    # If statement checking if the word "Rabbit" is in the string
    if paragraph.count('Dodo')>0:
        
        # Body of the if statement
        print(paragraph + '\n')

It was high time to go, for the pool was getting quite crowded with the
birds and animals that had fallen into it: there were a Duck and a Dodo,
a Lory and an Eaglet, and several other curious creatures. Alice led the
way, and the whole party swam to the shore.

'In that case,' said the Dodo solemnly, rising to its feet, 'I move
that the meeting adjourn, for the immediate adoption of more energetic
remedies--'

'What I was going to say,' said the Dodo in an offended tone, 'was, that
the best thing to get us dry would be a Caucus-race.'

'What IS a Caucus-race?' said Alice; not that she wanted much to know,
but the Dodo had paused as if it thought that SOMEBODY ought to speak,
and no one else seemed inclined to say anything.

'Why,' said the Dodo, 'the best way to explain it is to do it.' (And, as
you might like to try the thing yourself, some winter day, I will tell
you how the Dodo managed it.)

First it marked out a race-course, in a sort of circle, ('the exact
shape doesn't matter,'

<br><br>
Now, let's print the first paragraph where the Queen of Hearts is mentioned

For this, it's better to use a `while` loop because we don't need to iterate over all the list, just until the first appearance

In [23]:
# Boolean indicating if the queen has appeared
continue_iterating = True

# Integer keeping track of the paragraph number
idx = 0

# While loop. This will continue iterating until the variable continue_iterating is set to FALSE
while continue_iterating:

    # Get corresponding paragraph
    paragraph = alice_by_paragraphs[idx]
    
    # Check if the Queen of Hearts is mentioned
    if paragraph.count('Queen of Hearts')>0:
        # The Queen of Hearts has been mentioned. We need to stop the loop
        continue_iterating = False
    else:
        # Update index to move to the next chapter
        idx = idx+1
        
print('\nThe Queen of Hearts first appears on paragraph ' + str(idx) + ':\n')

print(alice_by_paragraphs[idx])


The Queen of Hearts first appears on paragraph 365:

The Hatter shook his head mournfully. 'Not I!' he replied. 'We
quarrelled last March--just before HE went mad, you know--' (pointing
with his tea spoon at the March Hare,) '--it was at the great concert
given by the Queen of Hearts, and I had to sing


<br><br>
***
Let's count all the occurrences in the book of the main characters:

- Alice
- The White Rabbit
- The Queen of Hearts
- The Cheshire Cat
- The Caterpillar
- The Mad Hatter
- The March Hare
- The Dormouse
- The Gryphon
- The Mock Turtle
- The Knave of Hearts
- The Mouse
- The Dodo

A good way to store this information is in a dictionary, because we can use the character's name to identify each occurence count and later on extract the counts using the character's name, without having to iterate

In [24]:
# Define list of all the Characters
list_of_characters = ['Alice', 'White Rabbit', 'Queen of Hearts', 'Cheshire Cat', 'Caterpillar',
                      'Hatter', 'March Hare', 'Dormouse', 'Gryphon', 'Mock Turtle',
                      'Knave of Hearts', 'Mouse', 'Dodo']

# Create dictionary where we will store the mentions
characters_mentions = {}

# Iterate over every character
for character in list_of_characters:
    
    # Count number of mentions in the whole book
    mentions = alice.count(character)
    
    # Save the number of mentions in the dictionary using the name as key
    characters_mentions[character] = mentions

<br><br>
How many times is the Caterpillar mentioned in the book?

In [25]:
print('The Caterpillar is mentioned ' + str(characters_mentions[character]) + ' times')

The Caterpillar is mentioned 13 times


<br><br>
Which is the most mentioned character and how many times are they mentioned?

In [26]:
# Set initial values for the maximum number of mentions found so far and the character it belongs to
most_mentions = 0
character_with_most_mentions = 'No character'

# Iterate over each key of the dictionary
for character in list_of_characters:
    
    # Get appearance count
    mentions = characters_mentions[character]
    
    # Check if it's the largest number of mentions seen so far
    if mentions > most_mentions:
        # This is the largest number of mentions, we need to update our variables
        most_mentions = mentions
        character_with_most_mentions = character

print('The character with most mentions is ' + character_with_most_mentions + ' with ' + 
      str(most_mentions) + ' mentions')

The character with most mentions is Alice with 396 mentions
