# Introducing: Collections DataTypes
Collections:
- Chain-Map
- Counter
- Deque
- Default-Dict
- Ordered-Dict
- Named Tuple

In [1]:
import collections
from timeit import timeit
## timeit("fib(n=8)","from __main__ import fib")

### ChainMap
When your approach to solving a problem includes multiple layers of dictionary mappings, the ChainMap object type enables us to conveniently store dictionaries in what is effectively a prioritized list.

In [38]:
print(map_chain)
print(map_chain.maps[0])
print("C is: ",map_chain['c']) # There are two 'c' keys in the chainmap; notice which is given priority
# stored mappings remain mutable
map_chain.maps[0]['c']= 7
print("C is now: ",map_chain['c'])

ChainMap({'a': 1, 'b': 2, 'c': 3}, {'c': 5, 'd': 6, 'e': 7})
{'a': 1, 'b': 2, 'c': 3}
C is:  3
C is now:  7


In [40]:
# We can use mapchain.new_child(dict) to insert a new dictionary at the beginning,
# giving it priority to override pre-existing maps.
# This enables us to extend well upon def
extended_chain = map_chain.new_child({'f':8,'g':9,'c':10})
print(extended_chain)
print(extended_chain['c'])

ChainMap({'f': 8, 'g': 9, 'c': 10}, {'a': 1, 'b': 2, 'c': 7}, {'c': 5, 'd': 6, 'e': 7})
10


## Counter
Collections.Counter() provides a convenient and easy way to perform counts

In [3]:
count_obj = collections.Counter("ABCDEFEFEFG H EFEF IJEFEF")
print(count_obj)

Counter({'E': 7, 'F': 7, ' ': 3, 'A': 1, 'B': 1, 'C': 1, 'D': 1, 'G': 1, 'H': 1, 'I': 1, 'J': 1})


In [4]:
count_obj['R'] # Note how this does not throw a key-error

0

In [13]:
# Use Counter.update() to additionally include counts derived
count_obj2 = collections.Counter({'q':3,'w':7,'e':2,'r':8,'y':12})
print(count_obj2)
count_obj2.update("Jazz")
print(count_obj2)
count_obj2.update("grande cafe mocha")
print(count_obj2)

Counter({'y': 12, 'r': 8, 'w': 7, 'q': 3, 'e': 2})
Counter({'y': 12, 'r': 8, 'w': 7, 'q': 3, 'e': 2, 'z': 2, 'J': 1, 'a': 1})
Counter({'y': 12, 'r': 9, 'w': 7, 'e': 4, 'a': 4, 'q': 3, 'z': 2, ' ': 2, 'c': 2, 'J': 1, 'g': 1, 'n': 1, 'd': 1, 'f': 1, 'm': 1, 'o': 1, 'h': 1})


In [18]:
# CounterObject.most_common() is particularly useful
# You can also use CounterObject.keys(), .values(), and .items()
print(count_obj.most_common())

[('E', 7), ('F', 7), (' ', 3), ('A', 1), ('B', 1), ('C', 1), ('D', 1), ('G', 1), ('H', 1), ('I', 1), ('J', 1)]


In [19]:
bool_count = collections.Counter()
survey_data = [True, False, False, False, True, True, False, True, False]
for i in survey_data:
    bool_count[i]+=1
print(bool_count)

Counter({False: 5, True: 4})


## Deque

Deque is short for "Double-Ended Queue." 

If one finds themselves using a list where elements frequently are added/removed from both ends of the list, a deque could be a good candidate, as deques afford the ability to quickly add/remove elements from both sides.

Deque methods: .pop(), .popleft(), .append(), .appendleft() run in O(1), or constant time.

The benefit of deque over list often revolves around deque.popleft() outperforming list.insert(0,object)

Warning: Deques support indexing, but not index-slicing.

#### Basic Use

In [6]:
CustomerDeque = collections.deque(["Customer_A","Customer_B","Customer_C","Customer_D","Customer_E"])
print(CustomerDeque)

deque(['Customer_A', 'Customer_B', 'Customer_C', 'Customer_D', 'Customer_E'])


In [7]:
CustomerDeque.append('Customer_F')
CustomerDeque.appendleft("VIP_Customer")
print(CustomerDeque)

deque(['VIP_Customer', 'Customer_A', 'Customer_B', 'Customer_C', 'Customer_D', 'Customer_E', 'Customer_F'])


In [76]:
CustomerDeque.pop()
CustomerDeque.popleft()
print(CustomerDeque)

deque(['Customer_A', 'Customer_B', 'Customer_C', 'Customer_D', 'Customer_E'])


In [81]:
# Use Deque.rotate(n) to shift elements
#     if(n positive)   => rotate to right
#     elif(n negative) => rotate to left
CustomerDeque = collections.deque(["Customer_A","Customer_B","Customer_C","Customer_D","Customer_E"])
print(CustomerDeque)
CustomerDeque.rotate(2)
print(CustomerDeque)
CustomerDeque.rotate(-1)
print(CustomerDeque)

deque(['Customer_A', 'Customer_B', 'Customer_C', 'Customer_D', 'Customer_E'])
deque(['Customer_D', 'Customer_E', 'Customer_A', 'Customer_B', 'Customer_C'])
deque(['Customer_E', 'Customer_A', 'Customer_B', 'Customer_C', 'Customer_D'])


#### Benchmarking

In [97]:
# N = 100 for both structures
CustomerList = ["Customer-#"+str(x+1) for x in range(5000)]
CustomerDeque = collections.deque(CustomerList)
print(len(CustomerList),len(CustomerDeque))

5000 5000


In [98]:
%%time
# note -- %% timeit gives cell-execution time
CustomerList.insert(0,"VIP_Customer")

CPU times: user 10 µs, sys: 0 ns, total: 10 µs
Wall time: 14.1 µs


In [99]:
%%time
# note -- %% timeit gives cell-execution time
CustomerDeque.appendleft("VIP_Customer")

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 7.87 µs


## "Default" Dictionaries
Collections.DefaultDict is, ironically, not the default dictionary implementation for Python, but rather a dictionary that more gracefully processes what would otherwise generate key-errors by using default values.

Perhaps the easiest way to demonstrate this is to consider the first iteration through the for loop below, focusing especially on the line:
> medals_won\[medal\]+=1 

- During the first iteration, medal=='gold' and the instruction is to positively-increment the value of medals_won\['gold'\]. 
- Problem: medals_won\['gold'\] is not yet initialized
- The default dictionary automatically creates the key-value pair "gold":default, which, for integers, is 0.
- The value then increments and the loop continues similarly.

In [80]:
# format: var_identifier = collections.defaultdict(optional_datatype_of_values)
medals_won = collections.defaultdict(int)
for medal in ['gold','bronze','silver','none','bronze','gold','bronze', 'none']:
    medals_won[medal]+=1

print(medals_won)

defaultdict(<class 'int'>, {'gold': 2, 'bronze': 3, 'silver': 1, 'none': 2})


In [87]:
# Example with defaultdict(tuple)
schools = ["University of Michigan","Michigan State University", "New York University",
           "Ohio State University", "University of California - San Diego"]
colors = [("Maize","Blue"),("Green","White"),("Scarlet","Gray"),("Purple","White"),("Navy Blue","Gold")]

collegiate_colors = collections.defaultdict(tuple)
for a,b in zip(schools, colors):
    print(a,b)
    collegiate_colors[a]=[b]

print("\n"+"="*80)
for college in collegiate_colors:
    print(str(college)+'\t'+str(collegiate_colors[college]))

University of Michigan ('Maize', 'Blue')
Michigan State University ('Green', 'White')
New York University ('Scarlet', 'Gray')
Ohio State University ('Purple', 'White')
University of California - San Diego ('Navy Blue', 'Gold')

University of Michigan	[('Maize', 'Blue')]
Michigan State University	[('Green', 'White')]
New York University	[('Scarlet', 'Gray')]
Ohio State University	[('Purple', 'White')]
University of California - San Diego	[('Navy Blue', 'Gold')]


#### Also Worth Noting
- Note that defaultdict(lambda) can be used to establish default values
- Note how, in the for-loop through the players, even though the expression is simply to invoke the key-value pair without any dictionary-modification operations, that expression alone is sufficient to esablish those key-value pairs in a default-dict.

In [89]:
# Example with defaultdict(lambda) 
Player_HealthPoints = collections.defaultdict(lambda:500)
players = ["Jessie","Jaehee","Jabari","John"]

# Note
for player in players:
    Player_HealthPoints[player]
print(Player_HealthPoints)

defaultdict(<function <lambda> at 0x10ed05840>, {'Jessie': 500, 'Jaehee': 500, 'Jabari': 500, 'John': 500})


## Ordered Dictionaries
With Python 3.6+, dictionaries retain/remember the order of element insertion, however in prior versions, the standard built-in dictionary would not retain insertion-order information; when printing the dictionary or iterating through, the order of elements would occur in seemingly-random order. 

In case you find yourself working in an earlier version of Python, it is worth knowing that ordered-dictionaries are generally available as part of the Collections library.

In [125]:
ordered = collections.OrderedDict({'a':1,'b':3,'c':5,'d':7,'e':9})

In [131]:
for i in ordered.keys():
    print(i, ordered[i])

a 1
b 3
c 5
d 7
e 9


## Named Tuples
Our final Collections data type to explore, Named Tuples, are effectively a more self-documenting form of tuple.

In [144]:
# First, we establish what we will name our tuple, in this case, Employee_Data
# Second, we pass in a list of fields that each instance of the named-tuple should have
Employee = collections.namedtuple('Employee_Data',['name','department','role','supervisor'])

In [147]:
# Once established, we can create new instances of our named-tuple type with this syntax
# Using named parameters ("name="//"department="//etc.) is not required, but is good practice
josh = Employee(name="Joshua", department="Engineering", role="Naval Architect", supervisor="Karen")

In [148]:
print(josh)

Employee_Data(name='Joshua', department='Engineering', role='Naval Architect', supervisor='Karen')


In [161]:
# Accessing by index, as one would with a traditional tuple
print(josh[0],"|",josh[1],"|",josh[2],"|", josh[3])
# Accessing by field-name, where the namedtuple acts more like a class with attributes than a tuple 
print(josh.name,"|", josh.department,"|", josh.role,"|", josh.supervisor)

Joshua | Engineering | Naval Architect | Karen
Joshua | Engineering | Naval Architect | Karen
