# Beginner Python and Math for Data Science
## Lecture 15 (OPTIONAL)
### Zipping and Unzipping

__Purpose:__
The purpose of this lecture is to understand zipping and unzipping. 

__At the end of this lecture you will be able to:__
1. Understand the definition of zipping and unzipping
2. Apply zipping in for loops

## 1.1 Zipping and Unzipping

__Overview:__
- __[Zipping](http://python-reference.readthedocs.io/en/latest/docs/functions/zip.html)__: Zipping is a convenient feature in Python that allows you to combine 2 or more sequences, into a single sequence
- The new sequence consists of a list of `n-tuples` (where the i-th tuple contains the i-th element from each of the argument sequences) and `n` is the number of sequences which corresponds to the length of the list
- For example, 2 objects that are of type `list`, can be "zipped" together and the resulting list will be a `tuple` looking like this: `[(element 0 of list 1, element 0 of list 2), (element 1 of list 1, element 1 of list 2), ...]`
- __Unzipping:__ Unzipping is the opposite of the __Zipping__ feature and is performed by using the `*` operator 

__Helpful Points:__
1. The term "zipping" is most commonly used to __["zip"](https://en.wikipedia.org/wiki/Zip_(file_format))__ files which means to "compress" a series of files. In Python, the interpretation is the same (but we are compressing sequences, not files)
2. If the sequences that are passed in are not of equal length, the returned list is truncated to the length of the shortest sequence
3. When using the `zip()` function directly, the result will not automatically be a `list`, this is something you need to force by using the `list()` function
4. Zipping is very useful (and common) for iterating over multiple sequences at once (see Part 3 in examples below)

__Practice:__ Examples of Zipping and Unzipping in Python 

### Part 1 (Zipping):

### Example 1.1 (Zip With No Arguments):

In [1]:
zip()

<zip at 0x10c717d08>

In [2]:
list(zip())

[]

With no arguments, the `zip()` function returns an empty list (after converting to a list, of course)

### Example 1.2 (Zip with One Argument):

In [3]:
my_list = [1,2,3]

In [4]:
zip(my_list)

<zip at 0x10c717b48>

In [5]:
list(zip(my_list))

[(1,), (2,), (3,)]

With 1 argument, the `zip()` function returns a list of `1-tuples` (after converting to a list, of course)

### Example 1.3 (Zip with Multiple Arguments of the Same Length):

In [19]:
my_list_1 = [1,2,3]
my_list_2 = [4,5,6]

In [22]:
zip(my_list_1, my_list_2)

<zip at 0x10c755408>

In [24]:
list(zip(my_list_1, my_list_2))

[(1, 4), (2, 5), (3, 6)]

- With 2 arguments, the `zip()` function returns a list of `2-tuples` (after converting to a list, of course)
- Notice the `2-tuple` at position `0` contains the 0th element of `my_list_1` and the 0th element of `my_list_2` 
- Notice the `2-tuple` at position `1` contains the 1st element of `my_list_1` and the 1st element of `my_list_2` 
- Notice the `2-tuple` at position `2` contains the 2nd element of `my_list_1` and the 2nd element of `my_list_2` 

### Example 1.4 (Zip with Multiple Argument of Different Lengths):

In [25]:
my_list_1 = [1,2,3]
my_list_2 = [4,5,6]
my_list_3 = [7,8,9,10]

In [26]:
zip(my_list_1, my_list_2, my_list_3)

<zip at 0x10c65b848>

In [27]:
list(zip(my_list_1, my_list_2, my_list_3))

[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

In [28]:
len(list(zip(my_list_1, my_list_2, my_list_3)))

3

- With 3 arguments, the `zip()` function returns a list of `3-tuples` (after converting to a list, of course)
- Notice the `3-tuple` at position `0` contains the 0th element of `my_list_1`, 0th element of `my_list_2` and the 0th element of `my_list_3`
- Notice the `3-tuple` at position `1` contains the 1st element of `my_list_1`, 1st element of `my_list_2` and the 1st element of `my_list_3` 
- Notice the `3-tuple` at position `2` contains the 2nd element of `my_list_1`, 2nd element of `my_list_2` and the 2nd element of `my_list_3` 
- Notice the resulting `list` is of length 3 since that is the length of the shortest sequence that was passed in and the 3rd element of `my_list_3` gets truncated 

### Part 2 (Unzipping)

### Example 2.1 (Unzipping Example 1.2 above):

In [29]:
my_list = [1,2,3]
my_list_zip = list(zip(my_list))
print(my_list_zip)

[(1,), (2,), (3,)]


In [30]:
a = list(zip(*my_list_zip)) # this is actually unpacking at work 
print(a)

[(1, 2, 3)]


### Example 2.2 (Unzipping Example 1.3 above):

In [31]:
my_list_1 = [1,2,3]
my_list_2 = [4,5,6]
my_list_zip = list(zip(my_list_1, my_list_2))
print(my_list_zip)

[(1, 4), (2, 5), (3, 6)]


In [32]:
a, b = list(zip(*my_list_zip))
print("The unzipped tuple is {} and the original list was {}, therefore the equality with original list is {}".format(a, my_list_1, my_list_1 == list(a)))
print("The unzipped tuple is {} and the original list was {}, therefore the equality with original list is {}".format(b, my_list_2, my_list_2 == list(b)))

The unzipped tuple is (1, 2, 3) and the original list was [1, 2, 3], therefore the equality with original list is True
The unzipped tuple is (4, 5, 6) and the original list was [4, 5, 6], therefore the equality with original list is True


### Example 2.3 (Unzipping Example 1.4 above):

In [33]:
my_list_1 = [1,2,3]
my_list_2 = [4,5,6]
my_list_3 = [7,8,9,10]
my_list_zip = list(zip(my_list_1, my_list_2, my_list_3))
print(my_list_zip)

[(1, 4, 7), (2, 5, 8), (3, 6, 9)]


In [34]:
a, b, c = list(zip(*my_list_zip))
print("The unzipped tuple is {} and the original list was {}, therefore the equality with original list is {}".format(a, my_list_1, my_list_1 == list(a)))
print("The unzipped tuple is {} and the original list was {}, therefore the equality with original list is {}".format(b, my_list_2, my_list_2 == list(b)))
print("The unzipped tuple is {} and the original list was {}, therefore the equality with original list is {}".format(c, my_list_3, my_list_3 == list(c)))

The unzipped tuple is (1, 2, 3) and the original list was [1, 2, 3], therefore the equality with original list is True
The unzipped tuple is (4, 5, 6) and the original list was [4, 5, 6], therefore the equality with original list is True
The unzipped tuple is (7, 8, 9) and the original list was [7, 8, 9, 10], therefore the equality with original list is False


The last check for the third unzipped tuple is `False` since the list was truncated when zipping due to the inequality of lengths 

### Part 3 (Zipping in `for` loops):

- If we need to iterate over multiple sequences, there is a way (albeit, messy) way to do it using the skills learned up to this point
- However, zipping makes this much cleaner and concise 

### Example 3.1 (Iterating Multiple Sequence - Method 1):

In [37]:
city = ["Chicago", "Seattle", "New York City"]
state = ["Illinois", "Washington", "New York"]
employee = ["Clark", "Bruce", "Paul"]

In [38]:
# loop through multiple sequences without zipping
for i in range(len(city)):
    print("{} works in {}, {}".format(employee[i], city[i], state[i]))

Clark works in Chicago, Illinois
Bruce works in Seattle, Washington
Paul works in New York City, New York


### Example 3.2 (Iterating Multiple Sequnces - Method 2):

In [39]:
employee = ["Clark", "Bruce", "Paul"]
city = ["Chicago", "Seattle", "New York City"]
state = ["Illinois", "Washington", "New York"]

In [40]:
# loop through multiple sequences with zipping 
for employee, city, state in zip(employee, city, state):
    print("{} works in {}, {}".format(employee, city, state))

Clark works in Chicago, Illinois
Bruce works in Seattle, Washington
Paul works in New York City, New York


Why does this work in the way we want? See below

In [41]:
city = ["Chicago", "Seattle", "New York City"]
state = ["Illinois", "Washington", "New York"]
employee = ["Clark", "Bruce", "Paul"]

In [46]:
employee, city, state = zip(employee, city, state)
print(employee)
print(city)
print(state)

('Clark', 'Bruce', 'Paul')
('Chicago', 'Seattle', 'New York City')
('Illinois', 'Washington', 'New York')


In [48]:
# NOTE: run the above lines, without re-initializing the variables city, state and employee 
employee, city, state = zip(employee, city, state)
print(employee)
print(city)
print(state)

('Clark', 'Chicago', 'Illinois')
('Bruce', 'Seattle', 'Washington')
('Paul', 'New York City', 'New York')


We see that by zipping twice is basically like unzipping (we get back the original sequences, albeit not in `list` form.