# 3.1 Data Structures and Sequences

1. [Tuple](#tuple)
2. [Lists](#lists)
3. [Dictionary](#dictionary)
4. [Sets](#sets)
5. [Built-in Sequence Functions](#sequence)
6. [List Comprehension](#comprehension)

<a name="tuple"></a>
## Tuple

A tuple is a fixed-length, immutable *sequence* of Python objects.  

Once it's assigned, **it cannot be changed**  

Standard construction is like a list with parentheses instead of brackets: `myTup = (4, 5, 6)`  

In a lot of contexts, you don't even need the parentheses and can just do `myTup = 4, 5, 6`  

You can also use the `tuple()` function to either modify existing variable or as an initialization `myTup = tuple([4, 5, 6])`

Any sequence or iterator can be converted to a tuple:

In [184]:
myList=[4, 0, 2]
print(myList)
myTup=tuple(myList)
print(myTup)
myString="string"
print(myString)
myTup2=tuple(myString)
print(myTup2)

[4, 0, 2]
(4, 0, 2)
string
('s', 't', 'r', 'i', 'n', 'g')


Like most sequence types, we use brackets `[]` to access the elements. (Remember it's 0-indexed, unlike R's 1-index)

In [185]:
print(myTup[0])
print(myTup2[3])

4
i


### Nested Tuples

Can create tuples within tuples (I imagine to any depth, here just showing one layer).  

Just chain the brackets together to access deeper elements (same as R's [[]] and & when accessing nested lists)

In [186]:
myNestedTup = (4, 5, 6), (7, 8)
print(myNestedTup)
print(myNestedTup[0])
print(myNestedTup[1])
print(myNestedTup[0][1])

((4, 5, 6), (7, 8))
(4, 5, 6)
(7, 8)
5


### Modifying Tuples

Tuples themselves are immutable. So once you make a tuple, you can't modify which object is stored in a particular slot.  

The only real 'modification' you can make is if an object within the tuple is mutable.  

In the example below, I make a tuple with a string, a list, and a boolean.  I can't change the boolean to be 

In [187]:
myTup = tuple(['foo', [1,2], True])
myTup2 = ('foo', [1,2], True) # alternative method of making it
print(myTup)
print(myTup2)
print(type(myTup))
myTup[1][0] = 5 # able to modify the list
print(myTup)

('foo', [1, 2], True)
('foo', [1, 2], True)
<class 'tuple'>
('foo', [5, 2], True)


In [188]:
# myTup[2] = False # This will fail if executed

Tuples can be combined together using `+` and can be repeated with `*`:

In [189]:
myTup = (4, None, 'foo')
myTup2 = (6, 0)
myTup3 = ('bar',)
print(myTup)
print(myTup2)
print(myTup3)
myComboTup = myTup + myTup2 + myTup3
print(myComboTup)
myMultiTup = myTup * 3
print(myMultiTup)


(4, None, 'foo')
(6, 0)
('bar',)
(4, None, 'foo', 6, 0, 'bar')
(4, None, 'foo', 4, None, 'foo', 4, None, 'foo')


### Unpacking Tuples

This is a feature that doesn't have an analog in R. If you try to assign to a "tuple-like expression" (i.e. comma-sep variables), the value on the right-hand side of the equation will try to be unpacked.  

The following will unpack myTup into 3 different variables:

In [190]:
myTup = (4, 5, 6)
a, b, c = myTup
# (a, b, c) = myTup # (equivalent)
print(b)

5


Notice that the left hand side has to equal whatever the value is:

```python
a, b = myTup
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 a, b = myTup

ValueError: too many values to unpack (expected 2)
```

As long as the structure on the left side equals that on the right, it can be unpacked. Below is the unpacking of a nested tuple. Each element in this tuple will become an integer:

In [191]:
myNest = (4, 5, (6, 7))
a, b, (c, d) = myNest
print(a)
print(type(a))
print(d)
print(type(d))

4
<class 'int'>
7
<class 'int'>


#### A few common ways to use this functionality:

1. swap variable names without using a temporary variable

2. Iterate over sequences of tuples or lists

3. Returning multiple values from a function (will be shown later)


Swapping:

In [192]:
a, b = 1, 2
print(a)
print(b)
b, a = a, b
print(a)
print(b)

1
2
2
1


Iterating:

In [193]:
mySeq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

# If we don't use it:
for item in mySeq:
    print(item)
    print(type(item))
    
# If we do use it:
for a, b, c in mySeq:
    print(f'var a={a}, var b={b}, var c={c}')
    print(f'type a={type(a)}, type b={type(b)}, type c={type(c)}')

(1, 2, 3)
<class 'tuple'>
(4, 5, 6)
<class 'tuple'>
(7, 8, 9)
<class 'tuple'>
var a=1, var b=2, var c=3
type a=<class 'int'>, type b=<class 'int'>, type c=<class 'int'>
var a=4, var b=5, var c=6
type a=<class 'int'>, type b=<class 'int'>, type c=<class 'int'>
var a=7, var b=8, var c=9
type a=<class 'int'>, type b=<class 'int'>, type c=<class 'int'>


Note that the same element-agreement rules apply. If you have `for a, b, c in mySeq`, then every element of mySeq must have 3 things:
<br/>
<br/>
<br/>
  
  
```python
mySeq2 = [(1, 2, 3), (4, 5), (7, 8, 9)]
for a, b, c in mySeq2:
    print(f'var a={a}, var b={b}, var c={c}')
    print(f'type a={type(a)}, type b={type(b)}, type c={type(c)}')

var a=1, var b=2, var c=3
type a=<class 'int'>, type b=<class 'int'>, type c=<class 'int'>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[18], line 2
      1 mySeq2 = [(1, 2, 3), (4, 5), (7, 8, 9)]
----> 2 for a, b, c in mySeq2:
      3     print(f'var a={a}, var b={b}, var c={c}')
      4     print(f'type a={type(a)}, type b={type(b)}, type c={type(c)}')

ValueError: not enough values to unpack (expected 3, got 2)
```
<br/>
<br/>
<br/>

### Rest

The iterating can be annoying if you only want a few of the elements in a tuple and don't care about the others. You can use `*rest` to capture everything else. It is also used to indicate an arbitrarily long list of optional arguments in a function (like R's ...).  

It doesn't have to be `*rest`, though. A common convention is to use underscore: `*_` if you don't want to keep the values for anything. The following two are equivalent:

In [194]:
values1 = (1, 2, 3, 4, 5)
a1, b1, *rest = values1
print(a1)
print(rest)

values2 = (8, 9, 10, 11, 12)
a2, b2, *_ = values2
print(a2)
print(_)

1
[3, 4, 5]
8
[10, 11, 12]


### Count

The final tuple thing we'll review is `count` this is a tuple method that is similar to R's `table()` in that it will return the number of occurrences of something. You have to specify what you want to count though. The example below will give you the number of times 2 appears in the tuple:

In [195]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

<a name="lists"></a>
## Lists

Lists in python are very similar to lists in R. They're also very similar to tuples in their functionality, except for the fact that they can be modified.   

They're variable in length and can be modified in place.  

As we've already seen in the examples, lists can be created using brackets `[]` or the `list` type function.  

In [196]:
myList = [2, 3, 7, None]
myTup = (1, "foo", 5)
myOtherList = list(myTup)
print(myList)
myList[1] = 9
print(myList)
print(myOtherList)


[2, 3, 7, None]
[2, 9, 7, None]
[1, 'foo', 5]



The `list` type function is commonly used to materialize an iterator (i.e. get the values from a range object instead of just a range object)

In [197]:
myRange = range(10)
print(myRange)
print(list(myRange))

range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Adding and Removing Elements

Add stuff to the end of a list with `append`.  

Add stuff anywhere using `insert` (but careful, this is computationally expensive on big lists).  

`pop` essentially "pops out" a list element, so it removes it from the list object and also returns it.  

You can use `remove` to remove **the first occurrence** of a value in a list.  

Finally, simply use `in` and `not in` to check if a value is in a list.

In [198]:
myList = ['foo', 'peekaboo', 'baz']
myList.append('dwarf')
print(f"Add dwarf: {myList}\n")
myList.insert(1, "red")
print(f" Insert red: {myList}\n")
myPop = myList.pop(2)
print(f"Popped value: {myPop}\n")
print(f"List with popped value removed: {myList}\n")
myList.append("foo")
print(f"Add another foo: {myList}\n")
myList.remove("foo")
print(f"Remove first foo: {myList}")

Add dwarf: ['foo', 'peekaboo', 'baz', 'dwarf']

 Insert red: ['foo', 'red', 'peekaboo', 'baz', 'dwarf']

Popped value: peekaboo

List with popped value removed: ['foo', 'red', 'baz', 'dwarf']

Add another foo: ['foo', 'red', 'baz', 'dwarf', 'foo']

Remove first foo: ['red', 'baz', 'dwarf', 'foo']


### Concatenating and Combining

Same as tuples, just us `+` to concatenate multiple lists together. This is computationally expensive because it has to make a copy and a new list.  

Instead, use the `extend` method. This can add multiple elements of any type, so can use it to concatenate multiple lists or add anything else you want to a list.  

The functionality is a little different though, because append can only operate on the existing list variable...you can't assign it to a new variable in the same call.

In [199]:
myList = [4, None, "foo"]
myList2 = [7, 8, (2, 3)]
out1 = myList + myList2
print(f"Add two lists with + (and also assign to a new variable): {out1}")
print(f"This keeps myList and myList2 unchanged:\n\t{myList}\n\t{myList2}\n")
out2 = myList.extend(myList2) 
print(f"Trying to extend and assign to out2 returns None for out2 and adds it to myList:\n\tout2: {out2}\n\tmyList: {myList}\n")
out3 = myList.extend(["a", "b", 23])
print(f"Trying to extend and assign to out3 returns None for out3 and continues to add to myList:\n\tout3: {out3}\n\tmyList: {myList}\n")

Add two lists with + (and also assign to a new variable): [4, None, 'foo', 7, 8, (2, 3)]
This keeps myList and myList2 unchanged:
	[4, None, 'foo']
	[7, 8, (2, 3)]

Trying to extend and assign to out2 returns None for out2 and adds it to myList:
	out2: None
	myList: [4, None, 'foo', 7, 8, (2, 3)]

Trying to extend and assign to out3 returns None for out3 and continues to add to myList:
	out3: None
	myList: [4, None, 'foo', 7, 8, (2, 3), 'a', 'b', 23]



### Sorting

The `sort` method has a few different uses for lists.

1. You can sort a list in place with it (i.e. don't have to make a new object)

1. You can also provide a sort key to tell it how to sort. Example given is sorting strings by lenght, not sure what other key options there might be.

In [200]:
a = [7, 2, 5, 1, 3]
print(f"Original list: {a}\n")
a.sort() # No output here and don't have to do like a = a.sort() or anything
print(f"Standard sort: {a}\n")
a.sort(reverse=True)
print(f"Using reverse flag: {a}\n")

b = ["saw", "small", "He", "foxes", "six"]
print(f"New list of strings: {b}\n")
b.sort()
print(f"Default sort for strings is alphabetical (capital then lowercase): {b}\n")
b.sort(key=len)
print(f"Provide a key to change the sort: {b}\n")

Original list: [7, 2, 5, 1, 3]

Standard sort: [1, 2, 3, 5, 7]

Using reverse flag: [7, 5, 3, 2, 1]

New list of strings: ['saw', 'small', 'He', 'foxes', 'six']

Default sort for strings is alphabetical (capital then lowercase): ['He', 'foxes', 'saw', 'six', 'small']

Provide a key to change the sort: ['He', 'saw', 'six', 'foxes', 'small']



### Slicing

Slicing is very similar to how you slice in R. Only thing is you can't use `c(a,b)`, you have to just do `a:b`  

Similar to `range()` first value is inclusive while second value is exclusive!  

You can also use slices to modify the list to replace values as certain indices (same as you would do in R)  

Not sure if you can do this in R, but can do `a:` to get from a to the end or `:b` to get from the beginning to b in your slice.  

Finally, you can use negative indices to slice relative to the end.  

Here is a little diagram to help visualize it since it's a bit different than R:

<img src="./myImages/fig3.1_slicing.png" width="500"/>

Can also input a `step` using a second colon, to indicate which elements to take (i.e. step of 2 would take every other element, step of 3 would take every 3rd, etc.)  

A step of -1 will reverse a list.

In [201]:
mySeq = [7, 2, 3, 7, 5, 6, 0, 1]
print(f"Original list: {mySeq}\n")
print(f"mySeq[1:5] gets the index 1 (second element) through index 4 (fifth element): {mySeq[1:5]}\n")
print(f"mySeq[:5] gets beginning until index 4 (5th element): {mySeq[:5]}\n")
print(f"mySeq[3:] gets index 3 (fourth element) to the end: {mySeq[3:]}\n")
print(f"mySeq[-4:] gets four elements from the end until the end: {mySeq[-4:]}\n")
print(f"mySeq[:-3] gets the beginning until the 3rd index from the end (4th element from end): {mySeq[:-3]}\n")
print(f"mySeq[-6:-2] gets 6 elements from the end to three elements from the end: {mySeq[-6:-2]}\n\n\n")
print(f"mySeq[::2] gets every other element: {mySeq[::2]}\n")
print(f"mySeq[::-1] reverses it: {mySeq[::-1]}")

Original list: [7, 2, 3, 7, 5, 6, 0, 1]

mySeq[1:5] gets the index 1 (second element) through index 4 (fifth element): [2, 3, 7, 5]

mySeq[:5] gets beginning until index 4 (5th element): [7, 2, 3, 7, 5]

mySeq[3:] gets index 3 (fourth element) to the end: [7, 5, 6, 0, 1]

mySeq[-4:] gets four elements from the end until the end: [5, 6, 0, 1]

mySeq[:-3] gets the beginning until the 3rd index from the end (4th element from end): [7, 2, 3, 7, 5]

mySeq[-6:-2] gets 6 elements from the end to three elements from the end: [3, 7, 5, 6]



mySeq[::2] gets every other element: [7, 3, 5, 0]

mySeq[::-1] reverses it: [1, 0, 6, 5, 7, 3, 2, 7]


<a name="dictionary"></a>
## Dictionary

Aka a *hash map* or *associative array*.  

A dictionary is a collection of key-value pairs. Each key is associated with a value so that the value can be retrieved, inserted, modified, or deleted given its key.  

A common method for building a dictionary is to use curly braces `{}`, with keys separated from their values by colons (and then key-val pairs are separated with commas, like you would expect).  

Cool/useful thing about dictionaries is that the values can be any class, even another dictionary!  

Keys can only be immutable objects, more on this later.

Build a dictionary:

In [202]:
myDict = {"a": "some value", "b": [1, 2, 3, 4]}
myDict

{'a': 'some value', 'b': [1, 2, 3, 4]}

Add a key-value pair with brackets:

In [203]:
myDict[7] = "an integer"
print(f"New dictionary: {myDict}\n")

New dictionary: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}



Access using brackets as well. Note that these aren't numeric indices like in a list to get the nth element, they have to be a key:

In [204]:
print(f"Value for key b: {myDict['b']}")
print(f"Value for key 7: {myDict[7]}")

Value for key b: [1, 2, 3, 4]
Value for key 7: an integer


Use `in` and `not in` to check if a dictionary has a certain **key**:

In [205]:
"b" in myDict

True

Can `pop` a value (where it gets removed and returned, like in tuples), or use `del` to just delete it:

In [206]:
myDict[5] = "new dict entry with 5 as key"
myDict["dummy"] = "dummy dict value"
print(f"New dictionary: {myDict}\n")
del myDict[5] # notice how this is modified in place
print(f"Dictionary with my key = 5 element removed: {myDict}\n")
dummyVal = myDict.pop('dummy')
print(f"When popping, get:\n\tnew myDict: {myDict}\n\tpopped val: {dummyVal}\n")

New dictionary: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'new dict entry with 5 as key', 'dummy': 'dummy dict value'}

Dictionary with my key = 5 element removed: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 'dummy': 'dummy dict value'}

When popping, get:
	new myDict: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}
	popped val: dummy dict value



`keys` and `values` are both methods for dictionaries - They create iterators for these, respectively. Kind of have to cast them into a list for them to be usable. Can also put into a for loop.

In [207]:
print(f"Without casting: \n\tmyDict.keys(): {myDict.keys()}\n\ttype: {type(myDict.keys())}\n\n")
print(f"With casting: \n\tlist(myDict.keys()): {list(myDict.keys())}\n\ttype: {type(list(myDict.keys()))}\n\n")
print("Iterate over the keys and print their values:\n")
for k in myDict.keys():
    print(f"Value for key {k} is: {myDict[k]}\n")

Without casting: 
	myDict.keys(): dict_keys(['a', 'b', 7])
	type: <class 'dict_keys'>


With casting: 
	list(myDict.keys()): ['a', 'b', 7]
	type: <class 'list'>


Iterate over the keys and print their values:

Value for key a is: some value

Value for key b is: [1, 2, 3, 4]

Value for key 7 is: an integer



The `items` method will iterate over keys and values at the same time and return them as tuples of length 2:

In [208]:
list(myDict.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

In [209]:
type(list(myDict.items())[0])

tuple

Use the `update` method to modify dictionaries in place. You can use this to add new key/value pairs and also to replace an existing key's value:

In [210]:
myDict.update({"c" : "new value1", "d" : "other new value"})
print(f"Dictionary with two new key/value pairs added:\n\t{myDict}\n")
myDict.update({"b" : "new val for b-key"})
print(f"Dictionary with a key's value replaced:\n\t{myDict}\n")

Dictionary with two new key/value pairs added:
	{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 'c': 'new value1', 'd': 'other new value'}

Dictionary with a key's value replaced:
	{'a': 'some value', 'b': 'new val for b-key', 7: 'an integer', 'c': 'new value1', 'd': 'other new value'}



### Creating Dictionaries from sequences

First, have to take a look at the `zip()` function, which yields tuples of length n (depending on number of args)

```python
Init signature: zip(self, /, *args, **kwargs)
Docstring:     
zip(*iterables, strict=False) --> Yield tuples until an input is exhausted.

   >>> list(zip('abcdefg', range(3), range(4)))
   [('a', 0, 0), ('b', 1, 1), ('c', 2, 2)]

The zip object yields n-length tuples, where n is the number of iterables
passed as positional arguments to zip().  The i-th element in every tuple
comes from the i-th iterable argument to zip().  This continues until the
shortest argument is exhausted.

If strict is true and one of the arguments is exhausted before the others,
raise a ValueError.
Type:           type
Subclasses:    
```

The idea here is that you use zip to make a bunch of 2-tuples and the first element becomes the key while the second element becomes the value.

#### First just use zip on its own

In [211]:
### Make lists and then zip them
key_list = ["a", "b", "c", "d", "e"]
val_list = range(5)
myTups = zip(key_list, val_list)

### Inspect object
print(myTups) ### Notice that zips are *iterators*, have to coerce to list
print(list(myTups))

### Non-pythonic way you might make a dictionary:
badMap = {}
myTups = zip(key_list, val_list) # have to remake the zip result because we used it above...don't quite get thisyet.
for key, val in myTups:
    badMap[key] = val
    
### Pythonic way
myTups = zip(key_list, val_list)
goodMap = dict(myTups)

print(badMap)
print(goodMap)

<zip object at 0x10ed7ac80>
[('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4)]
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}


### Default Values

Not exactly sure when I would use this. Seems somewhat analogous to ifelse() inlines, but not quite.  

Essentially, if you have logic that searches a dictionary for a key and returns the value, you will want an else statement to give a value when that key isn't in the dictionary. this is a shortcut to do so.  

`myDict.get(key)` will return `None` if key is not in myDict.

```python
### Standard logic
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value

### Use default val in the get method.
### default_value has to be assigned previously
value = some_dict.get(key, default_value)
```

Below are some examples of using default values.  
The task is to create a dictionary that categorizes a list of words by their first letter:

In [212]:
### Non-pythonic way
words = ["apple", "bat", "bar", "atom", "book", "cat", "carpet", "cash", "chop"]
byLetter_dict = {}

for word in words:                         # iterate over the list
    letter = word[0]                       # slice each word, getting the first character
    if letter not in byLetter_dict:        # check if the first letter of this word is already a key in the dictionary
        byLetter_dict[letter] = [word]     # if it isn't, then add letter as a key and the value is a list with the word as an element
    else:
        byLetter_dict[letter].append(word) # if it is already, then append the word to the list that already exists
        
print(byLetter_dict)

### Quicker way using setdefault
byLetter_dict = {}
for word in words:
    letter = word[0]                                      # same as above
    byLetter_dict.setdefault(letter, []).append(word)     # setdefault's help: 'Return the value of the specified key. If the key does not exist: insert the key, with the specified value'
                                                          # So this checks byLetter_dict if letter is already a key. If it IS, then it appends the word to the value for that key
                                                          # If it ISN'T, then it first creates a list as the default value, then it goes ahead and appends the word to it.

print(byLetter_dict)

### Another way using `collections` module
from collections import defaultdict
byLetter_dict = defaultdict(list)
print(byLetter_dict)
for word in words:
    byLetter_dict[word[0]].append(word)
print(byLetter_dict)

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book'], 'c': ['cat', 'carpet', 'cash', 'chop']}
{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book'], 'c': ['cat', 'carpet', 'cash', 'chop']}
defaultdict(<class 'list'>, {})
defaultdict(<class 'list'>, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book'], 'c': ['cat', 'carpet', 'cash', 'chop']})


### Valid dictionary key types

Keys have to be **hashable** (read immutable)

They can be scalars (int, float, string) or tuples that are made up of only immutable objects.

The `hash()` function determines if an object is hashable and thus able to be a dictionary key.

If you want to use a list as a key, have to convert it to a tuple first!

<a name="sets"></a>
## Sets

A set is an unordered collection of unique elements.

Can use the `set()` function or the "set literal" which is curly braces. 

How does a set's {} differ from a dictionary's? Seems like:
1. Empty brackets (`{}`) create empty dictionary
2. Brackets with colons somewhere (`{"key1":"val1","key2":"val2}`) create populated dictionary
3. Brackets with commas only (`{"val1", "val2", "val3", "val4}`) create sets

How does a set differ from a list?
1. Sets can only contain **unique elements**. If you initialize a set with duplicates, they will be removed.
2. Sets are **unordered**. 
3. They're searched using hashes 
    - this means they also have to be immutable.
    - have to do the list -> tuple conversion if you want a list-like object in a set
    - Using the `in` operator is much more efficient than in a list

#### Set operators

Sets allow use of mathematical set operators. See the table below.  

The logical set operators are `in place`

<img src="./myImages/table3.1_setOperations.png" width = 600>

In [213]:
### Unique and ordered
a = set([5, 3, 1, 2, 3, 2, 2, 4])
b = {8, 8, 7, 4, 3, 5, 6}
print(a)
print(b)

{1, 2, 3, 4, 5}
{3, 4, 5, 6, 7, 8}


In [214]:
### Set operators
print(a.union(b))
print(a | b)
print(a.intersection(b))
print(a & b)

{1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 4, 5, 6, 7, 8}
{3, 4, 5}
{3, 4, 5}


In [215]:
### In-place operations (aka the _update) versions
c = a.copy()
print(c)
c |= b
print(c)

d = a.copy()
print(d)
d.intersection_update(b)
print(d)

{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 4, 5}
{3, 4, 5}


In [216]:
### Check subsets/supersets
a_set = {1, 2, 3, 4, 5}
print({1, 2, 3}.issubset(a_set))
print(a_set.issuperset({1, 2, 3}))

True
True


<a name="sequence"></a>
## Built-In Sequence Functions

The built-ins are efficient and good to know!

1. `enumerate` - keep track of the index of the current item while iterating over a sequence  
2. `sorted` - returns a new, sorted list from the elements of any sequence
3. `zip` - pairs up the elements of provided lists/tuples/sequences to make a list of tuples
4. `reversed` - iterates over elements of a sequence in reverse order


In [217]:
###
### ENUMERATE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
###
myCollection = ["a", "b", "c", "d", "e"]

### DIY
index = 0
for value in myCollection:
    print(f"Value: {value}; Index: {index}")
    index += 1

print("\n")
### Enumerate
for index, value in enumerate(myCollection):
    print(f"Value: {value}; Index: {index}")
    
###
### SORTED ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
###

print("\n")
print(sorted([7, 1, 2, 6, 0, 3, 2]))
print(sorted("horse race"))

###
### ZIP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
###

print("\n")
### Basic
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
print(list(zipped))

print("\n")
### Adding shorter sequence shortens zip
seq3 = [False, True]
print(list(zip(seq1, seq2, seq3)))

print("\n")
### Iterate over multiple sequences with zip:
for (a, b) in zip(seq1, seq2):
    print(f"{a}, {b}")
    
### Not sure what the () above does...
for a, b in zip(seq1, seq2):
    print(f"{a}, {b}")

print("\n")
### With enumerate:
for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")
    
###
### REVERSED ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
###

### Just like zip, have to coerce to list to be able to actually see it.
print("\n")
print(reversed(range(10)))
print(list(reversed(range(10))))

Value: a; Index: 0
Value: b; Index: 1
Value: c; Index: 2
Value: d; Index: 3
Value: e; Index: 4


Value: a; Index: 0
Value: b; Index: 1
Value: c; Index: 2
Value: d; Index: 3
Value: e; Index: 4


[0, 1, 2, 2, 3, 6, 7]
[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']


[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]


[('foo', 'one', False), ('bar', 'two', True)]


foo, one
bar, two
baz, three
foo, one
bar, two
baz, three


0: foo, one
1: bar, two
2: baz, three


<range_iterator object at 0x10599a2e0>
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


<a name="comprehension"></a>
## List, Set, and Dictionary Comprehensions

Python feature that allows you to *concisely* form a new list by filtering the elements of a collection.

### List

Basic form:

`[expr for value in collection if condition]`

Equivalent for loop:

```python
result = []
for value in collection:
    if condition:
        result.append(expr)
```

Example: Given a list of strings, filter out strings with length 2 or less and convert those remaining to uppercase.

In [218]:
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

### Dictionary

```python
dict_comp = {key-expr: value-expr for value in collection
            if condition}
```

Example: lookup map of strings and their locations

In [219]:
loc_mapping = {value: index for index, value in enumerate(strings)}
print(loc_mapping)

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}


### Set

`set_comp = {expr for value in collection if condition}`

Get unique lengths from list of strings (all and with if statement):

In [220]:
unique_lengths = {len(x) for x in strings}
unique_lengths2 = {len(x) for x in strings if len(x) > 2}
print(unique_lengths)
print(unique_lengths2)

{1, 2, 3, 4, 6}
{3, 4, 6}


### Nested List Comprehensions

Can use similar list comprehension to go over nested lists. Similar format - you will have as many in-line for loops as you have nested lists. Each loop is in order of nesting.

Below are two examples that simplify nested lists.

1. Find all names in nested list that have two or more A's.
2. Flatten a list of tuples into list of integers

In [221]:
myNames = [["John", "Emily", "Michael", "Mary", "Steven"],
           ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

### Regular for loop method
foundNames = []
for names in myNames:
    for name in names:
        if name.count("a") >= 2:
            foundNames.append(name)

print(foundNames)

### Using one list comprehension
foundNames = []
for names in myNames:
    hasAs = [name for name in names if name.count("a") >= 2] # Remember the format is [`expr` `for val in collection` `if`]
                                                             # `expr` is name (so just print the name)
                                                             # `for val` goes over each name in the names list
                                                             # `if` checks if there are 2 or more a's
    foundNames.extend(hasAs)                                 # extend the blank list.
    
print(foundNames)

### Using nested list comprehension
### Essentially the same as above (i.e. [`expr` `for` `if`]), we just have two for loops combined.
foundNames = [name for names in myNames for name in names if name.count("a") >= 2]
print(foundNames)


['Maria', 'Natalia']
['Maria', 'Natalia']
['Maria', 'Natalia']


This one doesn't have an if statement:

In [222]:
myTuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

### Regular
flat = []
for tup in myTuples:
    for t in tup:
        flat.append(t)
print(flat)

### One list comp
flat = []
for tup in myTuples:
    flat.extend([t for t in tup])
print(flat)

### Two list comp
flat = [t for tup in myTuples for t in tup]
print(flat)

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
