# Data Structures
We've looked at variables and types of data - now let's look at data strucutures. We'll look at `lists`, `tuples`, and `dictionaries`. We can use these to structure our data and variables together - which allows us to more easily load, clean, analyze, and visualize our data.

## Lists
A `list` is a group of data points collected in a sequence.  `Lists` are made with brackets `[]`.  

In the example below, it's important to have quotes to denote that the list items are strings.  Otherwise, Python will interpret these items as named variables.

In [1]:
list_a = ['Yuting', 'John', 'Karina', 'Rahul']
list_a

['Yuting', 'John', 'Karina', 'Rahul']

One of the main things that we'll want to do with a list is access the list items. We can do this be using the list index.

The index of a list always starts at 0. Thus, Karina is the third item, or index 2, in the list.

In [2]:
list_a[2]

'Karina'

A `negative index`, such as -1, references the first item at the end of the list. Calling a negtative index can continue on besides -1.  In this case, the third item from the end of the list is John. 

In [3]:
list_a[-3]

'John'

Lists can contain any one data type and combinations of data types. In the following code block, list_b contains numeric data and list_c contains multiple data types.

In [4]:
# Create a numeric list
list_b = [12, 11, 13]
print(list_b)

# Create a mixed list
list_c = [1, 'a', True, 2.56]
print(list_c)

[12, 11, 13]
[1, 'a', True, 2.56]


A list can also contain items other than single points of data. For example, lists can contain other lists. In the example below, list_c contains list_a as well as a further data point of 'Pavel' while list_d contains lists, integers, and string data.

In [5]:
list_d = [list_a, 'Pavel']
print(list_d)

list_e = [list_c, list_b, 1, 9.63, 'a']
print(list_e)

[['Yuting', 'John', 'Karina', 'Rahul'], 'Pavel']
[[1, 'a', True, 2.56], [12, 11, 13], 1, 9.63, 'a']


We can also combine multiple lists together to form one larger list. Notice how this is different than creating a list of lists, list_f contains six individual data points but contains no lists, as we saw when creating list_d in the above example.

In [6]:
list_f = list_a + list_b
print(list_f)

['Yuting', 'John', 'Karina', 'Rahul', 12, 11, 13]


## Manipulating Lists
The `sort()` function arranges list values in alphabetical or ascending order by default. However, we cannot sort lists that contain different types of data.

In [7]:
# Sort a string list
list_a.sort()
print(list_a)

# Sort a numeric list
list_b.sort()
print(list_b)

# Sort a mixed list
list_c.sort()
print(list_c)

['John', 'Karina', 'Rahul', 'Yuting']
[11, 12, 13]


TypeError: '<' not supported between instances of 'str' and 'int'

If we want to sort a list in reverse order, from largest to smallest value, or reverse alphabetical order, we can use the reverse argument in the `sort()` function.

In [8]:
# Sort a string list in reverse alphabetical order
list_a.sort(reverse = True)
print(list_a)

['Yuting', 'Rahul', 'Karina', 'John']


In [9]:
# Sort a numeric list in descending order
list_b.sort(reverse = True)
print(list_b)

[13, 12, 11]


A list is mutable, which means that we can edit and update our lists. This is an important point to note - as this ability to update, edit, and delete list items is a benefit of choosing to use this type of data structure. 

In [30]:
# list_a in reverse sort order
print(list_a)

['Yuting', 'Oscar', 'Karina', 'Rosie']


We can add additional items to a list using the `append()` function.

In [11]:
# Append a new item, 'Rosie', to the list
list_a.append('Rosie')
print(list_a)

['Yuting', 'Rahul', 'Karina', 'John', 'Rosie']


Stating the index of a list with an `=` sign replaces the value within that index.

In [12]:
# Replace the second list item, 'Rahul', with 'Oscar'
list_a[1] = 'Oscar'
print(list_a)

['Yuting', 'Oscar', 'Karina', 'John', 'Rosie']


`'del'` removes the value within the list and realigns their respective indexes.

In [13]:
# Delete 'John' from the list
del list_a[3]
print(list_a)

['Yuting', 'Oscar', 'Karina', 'Rosie']


## Tuples
A tuple is a collection of values separated by comma and enclosed in `()`. It can contain all of the same items as a list: different data types as well as lists and tuples themselves.

In [14]:
# Create a string tuple
tuple_a = ('a', 'b', 'c')
print(tuple_a)

('a', 'b', 'c')


In [15]:
# Create a numeric tuple
tuple_b = (1, 2, 3)
print(tuple_b)

(1, 2, 3)


In [16]:
# Create a mixed tuple
tuple_c = (1, 'a', True, 2.56)
print(tuple_c)

(1, 'a', True, 2.56)


In [17]:
# Create a tuple of tuples
tuple_d = (tuple_b, list_a, tuple_c)
print(tuple_d)

((1, 2, 3), ['Yuting', 'Oscar', 'Karina', 'Rosie'], (1, 'a', True, 2.56))


As well as tuples containing lists, lists can also contain tuples.

In [18]:
list_g = [list_e, tuple_c]
list_g

[[[1, 'a', True, 2.56], [13, 12, 11], 1, 9.63, 'a'], (1, 'a', True, 2.56)]

Similar to how we combined multiple lists into a larger list, we can do the same thing with tuples.

In [19]:
tuple_e = tuple_a + tuple_b
print(tuple_e)

('a', 'b', 'c', 1, 2, 3)


As we've seen, tuples are similar to lists with one key difference: they are immutable, meaning they cannot be changed. This is important to note, as it can be a benefit of choosing this type of data structure - ensuring that the data can never be altered or deleted.

In [20]:
# Append a new item, 'a', to the tuple
tuple_a.append('a')
print(tuple_a)

AttributeError: 'tuple' object has no attribute 'append'

In [21]:
# Replace the second tuple item
tuple_a[1] = 'e'
print(tuple_a)

TypeError: 'tuple' object does not support item assignment

In [22]:
# Delete an item from the tuple
del tuple_a[3]
print(tuple_a)

TypeError: 'tuple' object doesn't support item deletion

## Dictionaries
A dictionary is a collection of unordered key value pairs and use `{}`. Dictionaries are mutable, meaning that they can be edited, and can be called by both their key or index. We can use the key as an index in the dictionary to return the values associated with that key.

In [23]:
# A dictionary containing the first name, last name, and age for one person
dict_a = {'first_name': 'Frank', 'last_name': 'Park', 'age': 20}
print(dict_a)

{'first_name': 'Frank', 'last_name': 'Park', 'age': 20}


In [24]:
dict_a['first_name']

'Frank'

Dictionaries do not allow duplicates: all key value pairs must be unique. This is different from a list, that can contain duplicate values.

In [25]:
# A list containing duplicate values
list_g = ['Frank', 'Frank']
print(list_g)

['Frank', 'Frank']


In [26]:
# A dictionary with a duplicate key and value pair
dict_b = {'first_name': 'Frank', 'first_name': 'Frank', 'last_name': 'Park', 'age': 20}
print(dict_b)

{'first_name': 'Frank', 'last_name': 'Park', 'age': 20}


In [27]:
# A dictionary with a duplicate key, only the last value gets printed
dict_c = {'first_name': 'Frank', 'first_name': 'Frank1', 'last_name': 'Park', 'age': 20}
print(dict_c)

{'first_name': 'Frank1', 'last_name': 'Park', 'age': 20}


A common method for creating dictionaries is using lists. The list name is the key, and the list items are the values.

In [28]:
sales = [100, 200, 240, 400, 100, 500]
stores = ['Store A', 'Store B', 'Store A', 'Store C', 'Store D', 'Store B']

dict_sales = {'Sales': sales, 'Stores': stores}
print(dict_sales)

{'Sales': [100, 200, 240, 400, 100, 500], 'Stores': ['Store A', 'Store B', 'Store A', 'Store C', 'Store D', 'Store B']}


 To further inspect all of the items in a dictionary, we can use the `items()` function. We can use the `items()` function when using a `for loop` with a dictionary. This allows us to access both the key and values in a dictionary. We'll see this a little bit later in this chapter.

In [29]:
dict_sales.items()

dict_items([('Sales', [100, 200, 240, 400, 100, 500]), ('Stores', ['Store A', 'Store B', 'Store A', 'Store C', 'Store D', 'Store B'])])