#Data Analysis with Python
Estimated time needed: 60 minutes

##Objectives
After completing this lab you will be able to:

*   Defining tuples, lists, dictionaries, and sets
*   Creating reusable Python functions
*   Understand mechanics of Python file objects and interacting with local hard drive

##Table of Contents
1.   Data Structures and Sequences
        * Tuple
        * List
        * Dictionary
        * Set
        * Built-In Sequence Functions
        * List, Set, and Dictionary Comprehensions
        * Functions
        * Files and the Operating System
        * Conclusion

###1.   Data Structures and Sequences

####Tuple
A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed. The easiest way to create one is with a comma-separated sequence of values wrapped in parentheses:

In [None]:
tup = (4, 5, 6) #or tup = 4, 5, 6

tup

(4, 5, 6)

You can convert any sequence or iterator to a tuple by invoking tuple:

In [None]:
tuple([4, 0, 2])

(4, 0, 2)

In [None]:
tup = tuple('string')

tup

('s', 't', 'r', 'i', 'n', 'g')

Elements can be accessed with square brackets [] as with most other sequence types. As in C, C++, Java, and many other languages, sequences are 0-indexed in Python:

In [None]:
tup[0]

's'

When you're defining tuples within more complicated expressions, it’s often necessary to enclose the values in parentheses, as in this example of creating a tuple of tuples:

In [None]:
nested_tup = (4, 5, 6), (7, 8)

nested_tup

((4, 5, 6), (7, 8))

In [None]:
nested_tup[0]

(4, 5, 6)

In [None]:
nested_tup[1]

(7, 8)

While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify.

If an object inside a tuple is mutable, such as a list, you can modify it in place:

In [None]:
tup = tuple(['foo', [1, 2], True])

tup[1].append(3)
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the **+** operator to produce longer tuples:

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of concatenating that many copies of the tuple:

In [None]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Note that the objects themselves are not copied, only the references to them.

#####Unpacking tuples

If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign:



In [None]:
tup = (4,5,6)

a, b, c = tup
b

5

A common use of variable unpacking is iterating over sequences of tuples or lists:

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


There are some situations where you may want to "pluck" a few elements from the beginning of a tuple. There is a special syntax that can do this, ***rest**, which is also used in function signatures to capture an arbitrarily long list of positional arguments:

In [None]:
values = 1, 2, 3, 4, 5

a, b, *rest = values
rest

[3, 4, 5]

This rest bit is sometimes something you want to discard; there is nothing special about the rest name. As a matter of convention, many Python programmers will use the underscore (_) for unwanted variables:

In [None]:
values = 1, 2, 3, 4, 5

a, b, *_ = values
_

[3, 4, 5]

#####Tuple methods


Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one (also available on lists) is **count**, which counts the number of occurrences of a value:

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)

a.count(2)

4

####List

In contrast with tuples, lists are variable length and their contents can be modified in place. Lists are **mutable**. You can define them using square brackets [] or using the list type function:

In [None]:
a_list = [2, 3, 7, None]

tup = ("foo", "bar", "baz")

b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [None]:
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

#####Adding and removing elements

Elements can be appended to the end of the list with the **append** method:

In [None]:
b_list.append('dwarf')
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

Using **insert** you can insert an element at a specific location in the list:

In [None]:
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']



---

Inserting elements into a sequence is **computationally expensive** compared to appending because it requires shifting internal references. If you need to insert elements at both the beginning and end of a sequence, consider using [collections.deque](https://www.geeksforgeeks.org/deque-in-python/), a double-ended queue optimized for this purpose found in the Python Standard Library.

---







The inverse operation to insert is **pop**, which removes and returns an element at a particular index:



In [None]:
b_list.pop(2)
b_list

['foo', 'red', 'dwarf']

Elements can be removed by value with **remove**, which locates the first such value and removes it from the list:

In [None]:
b_list.remove('red')
b_list

['dwarf']

Check if a list contains a value using the **in** keyword:

In [None]:
'dwarf' in b_list

True

The keyword **not** can be used to negate **in**:

In [None]:
'dwarf' not in b_list

False



---
Checking whether a **list** contains a value is a lot **slower than** doing so with **dictionaries and sets** (to be introduced shortly), as Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.


---





#####Concatenating and combining lists

Similar to tuples, adding two lists together with **+** concatenates them:

In [None]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list **already defined**, you can append multiple elements to it using the **extend** method:

In [None]:
x = [4, None, "foo"]

x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Note that list concatenation by addition is a **comparatively expensive operation** since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. Thus:

```
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
```

is **faster** than the concatenative alternative:

```
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
```

#####Sorting

You can sort a list in place (without creating a new object) by calling its **sort** function:

In [None]:
a = [7, 2, 5, 1, 3]

a.sort()
a

[1, 2, 3, 5, 7]

**sort** has a few options that will occasionally come in handy. One is the ability to pass a secondary sort key—that is, a function that produces a value to use to **sort the objects**. For example, we could sort a collection of strings by their lengths:

In [None]:
b = ["saw", "small", "He", "foxes", "six"]

b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#####Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of **start:stop** passed to the indexing operator **[]**:

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]

seq[1:5]

[2, 3, 7, 5]

While the element at the **start** index is **included**, the **stop** index is **not included**, so that the number of elements in the result is stop - start.

Either the start or stop can be **omitted**, in which case they default to the start of the sequence and the end of the sequence, respectively:

In [None]:
seq[:5]

[7, 2, 3, 7, 5]

In [None]:
seq[3:]

[7, 5, 6, 0, 1]

***Negative indices*** slice the sequence relative to the end:

In [None]:
seq[-4:]

[5, 6, 0, 1]

In [None]:
seq[-6:-2]

[3, 7, 5, 6]

A **step** can also be used after a second colon to, say, take every other element:

In [None]:
seq[::4]

[7, 5]

A clever use of this is to pass **-1**, which has the useful effect of reversing a list or tuple:

In [None]:
seq[::-1]

[1, 0, 6, 5, 7, 3, 2, 7]

####Dictionary

A dictionary stores a collection of **key-value pairs**, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key.

One approach for creating a dictionary is to use curly braces **{}** and colons to separate keys and values:

In [None]:
empty_dict = {}

d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple:

In [None]:
d1[7] = "an integer"
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

You can delete values using either the **del** keyword or the **pop** method (which simultaneously returns the value and deletes the key):

In [None]:
del d1[7]
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

The **keys** and **values** method gives you iterators of the dictionary's keys and values, respectively. The order of the keys depends on the order of their insertion, and these functions output the keys and values in the same respective order:

In [None]:
list(d1.keys())

['a', 'b']

In [None]:
list(d1.values())

['some value', [1, 2, 3, 4]]

If you need to **iterate** over both the keys and values, you can use the items method to iterate over the keys and values as 2-tuples:

In [None]:
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4])]

You can **merge** one dictionary into another using the **update** method:

In [None]:
d1.update({"b": "foo", "c": 12})
d1

{'a': 'some value', 'b': 'foo', 'c': 12}

#####Creating dictionaries from sequences

It’s common to occasionally end up with two sequences that you want to pair up element-wise in a dictionary. As a first cut, you might write code like this:



```
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```



Since a dictionary is essentially a collection of 2-tuples, the dict function accepts a list of 2-tuples:

In [None]:
tuples = zip(range(5), reversed(range(5)))
tuples

<zip at 0x7f61fe0f1c80>

In [None]:
mapping = dict(tuples)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

#####Default values

It’s common to have logic like:

```
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```



Thus, the dictionary methods get and pop can take a default value to be returned, so that the above if-else block can be written simply as:

```
value = some_dict.get(key, default_value)
```



With setting values, it may be that the values in a dictionary are another kind of collection, like a **list**.

For example, you could imagine categorizing a list of words by their first letters as a dictionary of lists:

In [None]:
words = ["apple", "bat", "bar", "atom", "book"]

by_letter = {}

for word in words:
    first_letter = word[0]
    if first_letter not in by_letter:
        by_letter[first_letter] = [word]
    else:
        by_letter[first_letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The **setdefault** dictionary method can be used to simplify this workflow. The preceding for loop can be rewritten as:

```
by_letter = {}

for word in words:
     letter = word[0]
     by_letter.setdefault(letter, []).append(word)
```



The built-in collections module has a useful class, **defaultdict**, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dictionary:


```
from collections import defaultdict

by_letter = defaultdict(list)

for word in words:
     by_letter[word[0]].append(word)
```



#####Valid dictionary key types

Dictionaries in Python use keys and values, where keys have to be **immutable** (unchangeable) objects like numbers, strings or tuples and values can be any Python object. We can check if an object can be used as a key using the **hash()** function:

```
hash("string")
3634226001988967898

hash((1, 2, (2, 3)))
-9209053662355515447

hash((1, 2, [2, 3])) # fails because lists are mutable

```



To use a **list as a key**, one option is to convert it to a tuple, which can be hashed as long as its elements also can be:

In [None]:
d = {}

d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

####Set

A set is an **unordered** collection of unique elements. A set can be created in two ways: via the set function or via a set literal with curly braces

In [1]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [2]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations like union, intersection, difference, and symmetric difference. Consider these two example sets:

In [3]:
a = {1, 2, 3, 4, 5}

b = {3, 4, 5, 6, 7, 8}

The **union** of these two sets is the set of distinct elements occurring in either set. This can be computed with either the union method or the **|** binary operator:

In [4]:
a.union(b) # or  a | b

{1, 2, 3, 4, 5, 6, 7, 8}

The **intersection** contains the elements occurring in both sets. The **&** operator or the intersection method can be used:

In [5]:
a.intersection(b) # or a & b

{3, 4, 5}

#####Commonly used set methods

######`a.add(x)`

Add element x to set a

######`a.clear()`

Reset set a to an empty state, discarding all of its elements


######`a.remove(x)`

Remove element x from set a

######`a.pop()`

Remove an arbitrary element from set a, raising KeyError if the set is empty

######`a.union(b)`

All of the unique elements in a and b and also `a | b`

######`a.update(b)`

Set the contents of a to be the union of the elements in a and b and also `a |= b`

######`a.intersection(b)`

All of the elements in both a and b and also `a & b`

######`a.intersection_update(b)`

Set the contents of a to be the intersection of the elements in a and b and also `a &= b`

######`a.difference(b)`

The elements in a that are not in b and also `a - b`

######`a.difference_update(b)`

Set a to the elements in a that are not in b and also `a -= b`

######`a.symmetric_difference(b)`

All of the elements in either a or b but not both and also `a ^ b`

######`a.symmetric_difference_update(b)`

Set a to contain the elements in either a or b but not both and also `a ^= b`

######`a.issubset(b)`

True if the elements of a are all contained in b and also `<=`

######`a.issuperset(b)`

True if the elements of b are all contained in a and also `>=`

######`a.isdisjoint(b)`

True if a and b have no elements in common