# Lists

• Mutable array structure for storing elements of the same or different type. Lots of operations available on lists, especially homogeneous data types.    
• Hit .tab next to a list instance to see available operations. Know that list, tuples, and other data structures have additional built in logic such as max(), min(), len(), etc... 

In [1]:
example = ['abc', 1472, ['another', 'list'], 23, True]
example

['abc', 1472, ['another', 'list'], 23, True]

In [2]:
for item in example: # item is an arbitrary name, use whatever makes the most sense in the context of the loop 
    print(item)

abc
1472
['another', 'list']
23
True


In [3]:
example.append('add this to the end')
print(example)
print(id(example))

['abc', 1472, ['another', 'list'], 23, True, 'add this to the end']
4467603968


• If adding two large lists together use extend, this adds the second list to an existing list     
• Better than doing list1 + list2 as this will cause a new list to be created and both list 1+2 to be copied to it

In [4]:
example.extend([0, 1, 2, 3, 4, 5, 6, 7, 8, 1472, 0]) 
print(example)
print(id(example))

['abc', 1472, ['another', 'list'], 23, True, 'add this to the end', 0, 1, 2, 3, 4, 5, 6, 7, 8, 1472, 0]
4467603968


In [5]:
extra = [1, 2]
example = example + extra
print(id(example))

4466659728


• Example of built in list capabilities

In [6]:
print(f"The number 1472 appears {example.count(1472)} times in the list")

The number 1472 appears 2 times in the list


In [7]:
example.pop(0) # Add index if desired
print(example)

[1472, ['another', 'list'], 23, True, 'add this to the end', 0, 1, 2, 3, 4, 5, 6, 7, 8, 1472, 0, 1, 2]


In [8]:
if 0 in example:
    example.remove(0) # only removes a single instance... use while to remove all
print(example)

[1472, ['another', 'list'], 23, True, 'add this to the end', 1, 2, 3, 4, 5, 6, 7, 8, 1472, 0, 1, 2]


In [9]:
example.insert(0, 7) # Insert into index 0 the number 7, shifts all items to the right one
print(example)
print("example now has {} items in it".format(len(example)))

[7, 1472, ['another', 'list'], 23, True, 'add this to the end', 1, 2, 3, 4, 5, 6, 7, 8, 1472, 0, 1, 2]
example now has 18 items in it


In [10]:
example.clear() # or example = []
print(example)

[]


In [11]:
example = [1,5,7,233,23,12,78,45]
print(max(example))

233


# Deques
• A list like container with fast appends and pops on either end... Notice speed difference vs list for ordered updates. This is due to the fact a list moves every item when inserting to the front O(N) where as deque just changes the pointer O(1). 

• Adding or popping items from either end of deque has O(1) complexity.    
• Adding or popping items from the front of a list is O(N), adding to the end of the list is O(1).

In [12]:
nums = [x for x in range(20000)]

In [13]:
%%timeit
nums.pop(0)
nums.insert(0, 10)

12.4 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [14]:
from collections import deque
nums = deque(x for x in range(20000))

In [15]:
%%timeit
nums.popleft()
nums.appendleft(10)

95.7 ns ± 0.768 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


# Tuples 

• Immutable structure also able to handle multiple data types. Tuples have a smaller memory footprint vs lists.    
• Being immutable, tuples are space allocated on the number of assignments at creation, no more.    
• Lists and other mutable structures like dictionaries have extra space allocated to allow for adding more data to the structure.     
• Depending on the number of elements this can be a fair amount of extra space vs a tuple.

In [16]:
empty = ()  # Empty tuple
single_element_tuple = ("str",) # Comma Needed when only one item present
tup = ('green', True, 'Howdy!', 11)

In [17]:
color, boolean, greeting, answer = tup # Tuple unpacking, also works with lists
print(greeting)

Howdy!


In [18]:
print(f"Count of 'green' == {tup.count('green')}")

Count of 'green' == 1


In [19]:
print(tup.index(11))

3


# Named tuples

• Tuple with naming additions to make code indexes more readable. Still keeps a low memory profile like normal tuples.     
• Most of the time you can get away with a dictionary, but none the less a useful data structure.

In [20]:
from collections import namedtuple
from sys import getsizeof

In [21]:
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, pay')
rawdata = ["Chris", 49, "Engineer", "Design", "190000"]
example = EmployeeRecord("Chris", 49, "Engineer", "Design", "190000")

print(f"Age : {example.age}") # This versus doing example[1]
print(f"named tuple size: {getsizeof(example)} bytes")
print(f"List size: {getsizeof(rawdata)} bytes")

Age : 49
named tuple size: 96 bytes
List size: 112 bytes


In [22]:
print(example)
print(id(example))

EmployeeRecord(name='Chris', age=49, title='Engineer', department='Design', pay='190000')
4466611264


In [23]:
example = example._replace(age=33)
print(example)
print(id(example)) # creates a new named tuple instance when updating due to immutability 

EmployeeRecord(name='Chris', age=33, title='Engineer', department='Design', pay='190000')
4466612720


In [24]:
# Can also be treated like a regular tuple...
for entry in example:
    print(entry)

Chris
33
Engineer
Design
190000


# Dictionaries 

• Workhorse of python... Key value pairs with O(1) updates and retrievals     
• Lots of things in python make heavy usage of dictionaries including classes 

In [25]:
dict1 = dict(A=1, Z=-1)
dict2 = {'A': 1, 'Z': -1}
dict3 = dict(zip(['A', 'Z'], [1, -1]))
dict4 = dict([('A', 1), ('Z', -1)])
dict5 = dict({'Z': -1, 'A': 1})

print(dict1 == dict2 == dict3 == dict4 == dict5)

True


In [26]:
citizens = {'Belgium' : 'Isabella', 'British': 'Nathan', 'Swiss' : 'Ranik', 'Nepal' : 'Sarala'}
print(f"Keys: {citizens.keys()}")
print(f"Values: {citizens.values()}")
print(f"Items: {citizens.items()}") # Also the option of iteritems(), a generator equivalent 

Keys: dict_keys(['Belgium', 'British', 'Swiss', 'Nepal'])
Values: dict_values(['Isabella', 'Nathan', 'Ranik', 'Sarala'])
Items: dict_items([('Belgium', 'Isabella'), ('British', 'Nathan'), ('Swiss', 'Ranik'), ('Nepal', 'Sarala')])


• Membership check

In [27]:
if 'Belgium' in citizens.keys():
    print(True)

True


• Operations that allow setting or getting default values without receiving key value errors

In [28]:
value = citizens.pop('British', 'return this instead')
print(value)
print(citizens)

Nathan
{'Belgium': 'Isabella', 'Swiss': 'Ranik', 'Nepal': 'Sarala'}


• Get value if exist, but don't modify the dictionary 

In [29]:
value = citizens.get('British', 'return this instead')
print(value)
print(citizens)

return this instead
{'Belgium': 'Isabella', 'Swiss': 'Ranik', 'Nepal': 'Sarala'}


• Use setdefault to check for a key and add it to the dictionary if not


In [30]:
value = citizens.setdefault('British', 'but add it as well')
print(value)
print(citizens)

but add it as well
{'Belgium': 'Isabella', 'Swiss': 'Ranik', 'Nepal': 'Sarala', 'British': 'but add it as well'}


• Also can use .setdefault to call functions    
• Say for instance you wanted to update personal cache with a call to a database... 

In [31]:
def fake_database_call(country):    
    return "Nathan"

citizens.setdefault("British", fake_database_call("British"))  
print(citizens)

{'Belgium': 'Isabella', 'Swiss': 'Ranik', 'Nepal': 'Sarala', 'British': 'but add it as well'}


• Upgrade multiple items at once

In [32]:
citizens.update({
    "British" : "New",
    "Portugal" : "Timon"
})

In [33]:
citizens["British"] = "Chip"
citizens

{'Belgium': 'Isabella',
 'Swiss': 'Ranik',
 'Nepal': 'Sarala',
 'British': 'Chip',
 'Portugal': 'Timon'}

• Can do logical operations on keys, values, items 

In [34]:
original = {'a' : 1, 'b' : 2, 'c' : 3}
new = {'b' : 2, 'c' : 4, 'd' : 6}

print('Common keys:', original.keys() & new.keys())
print('Keys from original not in new:', original.keys() - new.keys())
print('Key, value pairs in common:', original.items() & new.items())

Common keys: {'b', 'c'}
Keys from original not in new: {'a'}
Key, value pairs in common: {('b', 2)}


In [35]:
# Make a new dictionary with certain keys removed
subtraction = {key:original[key] for key in original.keys() - {'c'}}
subtraction

{'b': 2, 'a': 1}

• Merge two dictionaries and keep write info

In [36]:
original = {'a' : 1, 'b' : 2, 'c' : 3}
new = {'b' : 4, 'c' : 5, 'd' : 6}

from collections import ChainMap
chained = ChainMap(original, new)
print('b', chained['b'])      # Outputs 1  From A
print('c', chained['c'])      # Outputs 2  From A
print('d', chained['d'])      # Outputs 3  From B
print(chained)

b 2
c 3
d 6
ChainMap({'a': 1, 'b': 2, 'c': 3}, {'b': 4, 'c': 5, 'd': 6})


• Checking multiple layers deep or return 'Not in dictionary', Elasticsearch gold for avoiding key value errors...

In [37]:
dictionary = {'_source' : {'results': 'all the data', 'people': 'All the people'}}
people = dictionary.get('_source', {}).get('people', 'Not in dictionary')
label = dictionary.get('_source', {}).get('label', 'Not in dictionary')
print("People : ", people )
print("Label : ", label)

People :  All the people
Label :  Not in dictionary


• Using '' as a default return value we can also iterate on nested dictionary calls...    
• Does not work if '' is replaced with a non iterable item such as None    

In [38]:
people = {'Tomas': [21, 'student'], 'Julio': [30, 'engineer'], 'Mike': [31, 'manager'], 'Mez': [30, 'artist', 'another']}
people

{'Tomas': [21, 'student'],
 'Julio': [30, 'engineer'],
 'Mike': [31, 'manager'],
 'Mez': [30, 'artist', 'another']}

In [39]:
for info in people.get('Mez', ''):
    print(info)

30
artist
another


In [40]:
for entity in people.get('NotPresent', ''): # If '' is changed to None this will crash
    print(entity, end = ' ')

# Collections module dictionary tools

• Default dictionary, set default for every new key    

In [41]:
from collections import defaultdict
ddict = defaultdict(int)  # int is the default type (0 the value)
ddict['year'] += 1
print(ddict['year'])

ddict['year'] = 1999
ddict['year'] += 1
print(ddict['year'])

1
2000


• Collections counter for getting occurance counts

In [42]:
from collections import Counter 
words = ['hello', 'how', 'are', 'you', 'doing']

counts = Counter(words)
print(counts)
print("Most Common:", counts.most_common(3))

Counter({'hello': 1, 'how': 1, 'are': 1, 'you': 1, 'doing': 1})
Most Common: [('hello', 1), ('how', 1), ('are', 1)]


In [43]:
more = ['add', 'how', 'are', 'words', 'here']

counts.update(more)
print(counts)
print("Counts on 'my':", counts['are'])

Counter({'how': 2, 'are': 2, 'hello': 1, 'you': 1, 'doing': 1, 'add': 1, 'words': 1, 'here': 1})
Counts on 'my': 2


In [44]:
counts.subtract(more)

In [45]:
print(counts)

Counter({'hello': 1, 'how': 1, 'are': 1, 'you': 1, 'doing': 1, 'add': 0, 'words': 0, 'here': 0})


In [46]:
from collections import Counter
from collections import defaultdict
ddict = defaultdict(Counter)

ddict['nums'].update(['1', '2','3'])
ddict['test'].update(["this", "is", "nice"])
ddict['test'].update(["this", "is", "nice"])
ddict['test'].update(["this", "is", "nice"])
print(ddict)

defaultdict(<class 'collections.Counter'>, {'nums': Counter({'1': 1, '2': 1, '3': 1}), 'test': Counter({'this': 3, 'is': 3, 'nice': 3})})


### Other types of dictionaries to explore... 

• OrderedDict: dict subclass that remembers the order entries were added    
• UserDict: A wrapper around dictionary objects for easier dict subclassing    
• UserList: A wrapper around list objects for easier list subclassing    
• UserString: A wrapper around string objects for easier string subclassing   

# Heaps 

• heapq is a heap for organizing min and max structures.     
• O(log n) push and pop.     
• O(n log n) to push all items on to the heap.     

In [47]:
import heapq

rows = [
    {'name': 'Steve', 'age': 19},
    {'name': 'John', 'age': 24},
    {'name': 'Sally', 'age': 32},
    {'name': 'Ada', 'age': 22}
]

top_three = heapq.nsmallest(3, rows, key=lambda x: x['age'])

In [48]:
for item in top_three:
    print(item)

{'name': 'Steve', 'age': 19}
{'name': 'Ada', 'age': 22}
{'name': 'John', 'age': 24}


In [49]:
top_three

[{'name': 'Steve', 'age': 19},
 {'name': 'Ada', 'age': 22},
 {'name': 'John', 'age': 24}]

# Set
• The set type is mutable, while frozenset is immutable. They are unordered collections of immutable objects.    
• Good for deduplicating values when temporary storing data for processing 

In [50]:
small = {1, 5, 6, 2}
large = set([6, 8, 9, 10, 8, 8])
print(small)
print(large)

{1, 2, 5, 6}
{8, 9, 10, 6}


In [51]:
all_ = small | large # union
intersection = small & large # intersection
difference = small - large # subtract out overlapping
print(all_)
print(intersection)
print(difference)

{1, 2, 5, 6, 8, 9, 10}
{6}
{1, 2, 5}
