# General

- The following jupyter notebooks are aimed at cpython implementation
- Python is a dynamic pass by reference language... you can think of variable names as pointers to objects / data structures in memory (The heap in particular)
- Every object has an identity, type, value, and reference count 
- In cpython id(x) is the memory address where the contents of variable x is stored

In [None]:
import sys
example_string = "Example string"
print("id:", id(example_string))
print("type:", type(example_string))
print(f"value: '{example_string}'")

- Reference count acts a little odd in notebooks, as you can see below with count at 2, not 1...

In [None]:
sys.getrefcount(example_string) 

- Garbage collection (gc) in python works off reference counts
- By default, gc frees the object's memory once reference count hits zero
- Cyclical data structures like linked list are handled a bit differently
- Easy to view variables alive in a notebook session as seen below... 
- See all jupyter magic commands with %lsmagic

In [None]:
%whos 

- Easy to view locals, globals, and builtins with following commands...

In [None]:
#locals()
#globals()
#vars(__builtins__) # vars is similar to .__dict__ to view object's dictionary 

- Everything in python is an object! What does that mean??
- Python objects themselves follow an inheritance structure
- For example, items that are considered sequences, mappings, or sets all inherit properties from parent classes that allow the user to seamlessly use them... we'll elaborate on this examples in the following notebooks
- All classes hold instance attributes in a dictionary, view these with vars(object)
- I won't go into full details, but read more here... 
- https://docs.python.org/3/reference/datamodel.html
- Inspect an object with ?? in notebooks

In [None]:
str.index??

- View available . operations

In [None]:
dir(str)

- See what operations are available to mappings in addition to basic objects
- Sets explained more below...

In [None]:
from collections import abc
set(dir(abc.Mapping)) - set(dir(object))

- There are immutable and mutable data structures, a high level overview...
- Immutable objects are fixed size and can not be modified in place
- They also are "hashable" which is important in several places
- When you "change" an immutable object it actually creates a new object and removes reference count by one to the previous data structure. This includes numbers, strings, tuples. 
- Mutable objects can be updated in place and generally have hidden free space for amortized addition of new data. This includes lists, dictionaries, and queues. 

# Select notes on primitives

### Integer division in Python is always rounded towards minus infinity 
- Can cause some headaches if you're not aware of it...
- To avoid this issue with negative numbers it's best to use int(result) to get the desired rounding down. 
- "/" is int division in 2, "//" is int division in 3
- "/" is true division in python 3

In [None]:
print("True Division 7/4 : {}".format(7/4))
print("True Negative Division -7/4 : {}".format(-7/4))
print("Integer Division 7//4 : {}\n".format(7//4))

print("Unexpected behavior:")
print("Integer Negative Division -7//4 : {}, rounds down negative\n".format(-7//4))

print("Using int to round numbers down:")
print("int(1.75) : {}".format(int(1.75)))
print("int(-1.75) : {}".format(int(-1.75)))

In [None]:
%%python2
print 7/4

### Floating point numbers are approximated depending on the system being used

- This is because 2's compliment does not allow the same precision as base 10 notation
- Use decimals when precision is necessary, or just use round(num, precision) for other cases

In [None]:
print("3 * .1 - .3 = {}".format(3* .1 - .3)) 
print("Above using round(3* .1 - .3,2) : {}\n".format(round(3* .1 - .3,2)))

from decimal import Decimal as D
# Make sure you're using strings as input to Decimal... otherwise you hit the same issue
print("Using Decimal with D('num') : 3 * .1 - .3 = {}".format(D('3') * D('.1') - D('.3')))
print("Using Decimal with D(num) : 3 * .1 - .3 = {}".format(D(3) * D(.1) - D(.3)))

### Joining Strings
- 'character sequence to join around'.join(iterable of strings)
- Much more effective than += str when done several times due to immutability 

In [None]:
list_ = ['Who', 'are', 'you', 'who', 'are', 'so', 'wise', 'in', 'the', 'ways', 'of', 'science']
joined = '+*> '.join(list_)
print(joined)
    
# Takes any string iterable... using built in reversed function for example 
reverse = ' - '.join(reversed(list_)) 
print(reverse)

- Print a row of items other than strings..
- Print is a function in python 3, so we can use optional sep and end parameters

In [None]:
row = ['test', 50, 22, ['a','b']]
print(*row, sep=', ', end = '!!\n' )    

### F strings 

- Python 3.6+    
- Pythonic string handling
- allows for variables, functions, and other object operations in {}  
- Significantly more readable vs % and .format options when calling lots of arguments  
- Still can use \__format__ special handling

In [None]:
print(f"Operations are valid in curly brackets 3 + 4 = {3+4}, like this")

In [None]:
def unnecessary_function(input_):
    return input_

question = "Why don't you just make .format() louder?"
print(f"{question} \nfstrings go to {unnecessary_function(11)}...") 

In [None]:
x = 20
print(f"formatting still works... {x*2 :03d}")

# Data Structures

- Time complexity list here for those interested: https://wiki.python.org/moin/TimeComplexity

### Lists

- Container structure capable of holding nested and mixed data types
- Mutable, so can be updated in place and has overhead free space for amortized adding of elements
- Created with [] notation
- Use from collections import UserList as a parent class to make your own list like object... 

In [None]:
example = ['abc', (1234, 'list'), 23.12, True]
for item in example: # item is an arbitrary name, use name makes the most sense in the context of the loop 
    print(item)

- Available options for lists

In [None]:
dir(example)
#example.sort??

##### Adding to a list is less strait forward than one might think...
- list_object = list_object + list_to_add is resembles to calling .\__add__, which returns a completely new list
- list_object += list_to_add resembles \__iadd__, which modifies the list in place. (.append & .extends also modify in place)
- Not taking care of this difference can be very costly depending on list size or implementation 
- View these in http://pythontutor.com/ if any confusion... 

In [None]:
print(id(example))
example = example + ["new list"]
print(example)
print(id(example))

In [None]:
print(id(example))
example += ["in place"]
print(example)
print(id(example))

In [None]:
print(id(example))
example.extend([0, 1, 2, 3]) 
print(example)
print(id(example))

- Mutable sequences can easily be modified with slicing
- We'll get into slicing later if this is unclear

In [None]:
example[1:8] = [2,4,5]
example

- Don't forget python builtins are available for usage as well... 

In [None]:
max([1,5,7,233,23,12,78,45])

### Deques

- A list like container with fast appends and pops on either end... Notice speed difference vs list for ordered updates. This is due to the fact a list moves every item when inserting to the front O(N) where as deque just changes the pointer O(1). 
- Popping and adding items from either end of a deque is O(1)
- Other queues https://docs.python.org/3/library/queue.html
- Append and popleft operations are atomic 
- Mutable


In [None]:
nums = [x for x in range(20000)]

In [None]:
%%timeit
nums.pop(0)
nums.insert(0, 10)

In [None]:
from collections import deque
nums = deque(x for x in range(20000))
#dir(deque)

In [None]:
%%timeit
nums.popleft()
nums.appendleft(10)

### Tuples 

- Immutable structure also able to handle multiple data types
- Tuples have a smaller memory footprint vs lists
- Being immutable, tuples are space allocated on the number of assignments at creation, no more
- A few items in python that are tuples... function arguments, return statements with multiple values, etc
- Created with (), ("single",), or tuple()

- Extras in a list not in a tuple... 


In [None]:
set(dir(list))-set(dir(tuple))

In [None]:
dir(tuple)

In [None]:
example = ('green', True, 'Howdy!', 11)
example

- Tuple unpacking, also works with lists

In [None]:
color, boolean, greeting, answer = example 
print(greeting)

##### Tuples are immutable, but ones containing mutable objects can be mutated

- Lists inside tuples are just pointers, if those underlying lists are modified so are the tuples
- These are surprisingly not atomic... if you attempt to update a tuple index that is mutable it will be updated, but will also raise an error since tuples shouldn't be updated

In [None]:
list_ = [1,2,3,4]
tup = ('Green', list_, 'Howdy!', 7)
print(id(tup), tup)

list_.append(5) 
print(id(tup), tup) # !!

- Calling the tuple directly will raise an error, but it will update the value of the list inside

In [None]:
tup[1] += [10]

In [None]:
tup

### Named tuples

- Tuple with naming additions to make code indexes more readable. Still keeps a low memory profile like normal tuples.     
- Most of the time you can get away with a dictionary if you don't need immutability or low footprint, but none the less a useful data structure
- There are two types of named tuples... from collections module and typed
- Handles a lot of built in logic such as value comparison for you
- Acts just like a regular tuple for the most part
- Immutable 

In [None]:
import typing
import collections

In [None]:
Employee = typing.NamedTuple('Employee', name=str, age=int, title=str, pay=int)
Worker = collections.namedtuple('Worker', 'name, age, title, pay')

In [None]:
print(set(dir(Employee)) - set(dir(Worker)))
#dir(Worker)

In [None]:
raw_list = ["Chris", 49, "Engineer", 190000]
from_typed = Employee("Chris", 49, "Engineer", "190000")
from_collections = Worker("Chris", 49, "Engineer", 190000)

In [None]:
from_typed

In [None]:
import sys
print(f"Size of raw_list: {sys.getsizeof(raw_list)}")
print(f"Size of from_typed: {sys.getsizeof(from_typed)}")
print(f"Size of from_collections: {sys.getsizeof(from_collections)}")

- Instead of calling by index we can now use the name specified for the named tuple

In [None]:
print(f"Age : {from_collections.age}") # This versus doing example[1]

- Using _replace to update a field returns a new namedtuple object as expected with immutable types

In [None]:
print(id(from_collections))
from_collections = from_collections._replace(age=33)
print(from_collections)
print(id(from_collections)) 

In [None]:
dir(from_collections)

- All regular tuple functions should still work

In [None]:
for entry in from_collections:
    print(entry)

### Dictionaries 

- The workhorse of python
- Use version 3.6 to get best performance for dictionaries (3.5 key sharing, 3.6 compact dict)
- Lots of items in python make heavy usage of dictionaries including classes
- Collections class has a UserDict as well for creating your own dictionary implementation 

In [None]:
dict1 = dict(A=1, Z=-1)
dict2 = {'A': 1, 'Z': -1}
dict3 = dict(zip(['A', 'Z'], [1, -1]))
dict4 = dict([('A', 1), ('Z', -1)])
dict5 = dict({'Z': -1, 'A': 1})

print(dict1 == dict2 == dict3 == dict4 == dict5)
#dir(dict)

- keys(), .values(), and .items() return instances of classes called dict_keys, dict_values, and dict_items
- These are dict views that are read only of the internal structures
- In python 2 these calls returned duplicate data in list format
- iteritems is also no longer an options in py 3

In [None]:
citizens = {'Belgium' : 'Isabella', 'British': 'Nathan', 'Swiss' : 'Ranik', 'Nepal' : 'Sarala'}
print(f"Keys: {citizens.keys()}")
print(f"Values: {citizens.values()}")
print(f"Items: {citizens.items()}") 

- Membership check using \__contains__ is O(1) for dict keys since it is a hashmap
- No need for using .keys() unless it's a special case, such as to avoid infinite loops in magic methods

In [None]:
if 'Belgium' in citizens:
    print(True)

- Operations that allow setting or getting default values without receiving key value errors
- \__missing__ is a magic method we'll explore more later for similar cases

In [None]:
value = citizens.pop('British', 'return this instead')
print(value)
print(citizens)

• Get value if exist, but don't modify the dictionary 

In [None]:
value = citizens.get('British', 'Not present')
print(value)
print(citizens)

• Use setdefault to check for a key and add it to the dictionary if not


In [None]:
value = citizens.setdefault('British', 'Nathan')
print(value)
print(citizens)

• Also can use .setdefault to call functions    
• Say for instance you wanted to update personal cache with a call to a database... 

In [None]:
def fake_database_call(country):    
    return "Nathan"

citizens.setdefault("British", fake_database_call("British"))  
print(citizens)

• Upgrade multiple items at once

In [None]:
citizens.update({
    "British" : "New",
    "Portugal" : "Timon"
})
citizens

- Standard way of updating single items

In [None]:
citizens["British"] = "Chip"
citizens

- Can do logical operations on keys, values, items 

In [None]:
default = {"output" : "internal speakers" , "volume" : ".1", "equalizer" : "lounge"}
user = {'output' : "headphones", "volume" : ".2", "equalizer" : "lounge", "playback speed" : ".5"}

print('Common keys:', default.keys() & user.keys())
print('Keys from user not in default :', user.keys() - default.keys())
print('Key, value pairs in common:', default.items() & user.items())

- Merge two dictionaries and keep write info
- Useful in areas where you might want local and global settings with local taking priority...

In [None]:
default = {"output" : "internal speakers" , "volume" : ".1", "equalizer" : "lounge"}
user = {"output" : "headphones", "volume" : ".2", "equializer" : "lounge", "playback speed" : ".5"}

from collections import ChainMap
chained = ChainMap(user, default)
print("output:", chained["output"])      
print("volume:", chained["volume"])     
print("equalizer:", chained["equalizer"])    
print(chained)

• Checking multiple layers deep or return 'Not in dictionary', Elasticsearch gold for avoiding key value errors...

In [None]:
dictionary = {'_source' : {'results': 'All of the data', 'people': 'People info'}}
people = dictionary.get('_source', {}).get('people', 'Not in dictionary')
label = dictionary.get('_source', {}).setdefault('label', 'Not in dictionary')
print("People : ", people)
print("Label : ", label)

- Using '' as a default return value we can also iterate on nested dictionary calls...    
- Does not work if '' is replaced with a non iterable item such as None    

In [None]:
people = {'Tomas': [21, 'student'], 'Julio': [30, 'engineer'], 'Mike': [31, 'manager'], 'Mez': [30, 'artist', 'another']}
people

In [None]:
for info in people.get('Mez'):
    print(info)

In [None]:
for entity in people.get('NotPresent', ''): # If '' is changed to None this will crash
    print(entity, end = ' ')

### Collections module dictionary tools

- Default dictionary, set default for every new key    

In [None]:
from collections import defaultdict
ddict = defaultdict(int)  # int is the default type (0 the value)
ddict['year'] += 1
print(ddict['year'])

ddict['year'] = 1999
ddict['year'] += 1
print(ddict['year'])
#dir(ddict)

- Collections counter for easy dictionary counts
- Caution that most_common will not show all if multiple tied counts

In [None]:
from collections import Counter 
words = ['hello', 'how', 'are', 'you', 'doing']

counts = Counter(words)
print(counts)
print("Most Common:", counts.most_common(3))

In [None]:
more = ['add', 'how', 'are', 'words', 'here']

counts.update(more)
print(counts)
print("Counts on 'my':", counts['are'])

In [None]:
#dir(counts)
counts.subtract??

In [None]:
counts.subtract(more)

In [None]:
print(counts)

- Merge defaultdict and counter for extra power

In [None]:
from collections import Counter
from collections import defaultdict
ddict = defaultdict(Counter)

ddict['nums'].update(['1', '2','3'])
ddict['test'].update(["this", "is", "nice"])
ddict['test'].update(["this", "is", "nice"])
ddict['test'].update(["this", "is", "nice"])
print(ddict)

- Mapping proxy is a good way to return a read only dictionary
- Dynamically references the underlying dict
- Calling \__setattr__ will always throw an error 

In [None]:
from types import MappingProxyType # 3.3 +
d = {1: 'A'}
d_proxy = MappingProxyType(d)
d_proxy

In [None]:
d_proxy['B'] = 2

In [None]:
from collections import UserDict
UserDict??

### Heaps 

• heapq is a heap for organizing min and max structures.     
• O(log n) push and pop.     
• O(n log n) to push all items on to the heap.     

In [None]:
import heapq

rows = [
    {'name': 'Steve', 'age': 19},
    {'name': 'John', 'age': 24},
    {'name': 'Sally', 'age': 24},
    {'name': 'Ada', 'age': 23}
]

top_three = heapq.nsmallest(3, rows, key=lambda x: x['age'])

In [None]:
top_three

### Sets
- The set type is mutable, while frozenset is immutable
- They are unordered collections of immutable objects
- Good for deduplicating values when storing data for processing 
- Use a dictionary if you need ordered keys, still hashable but keeps position 
- Memory overhead can be high since ~1/3 the strucuture will be left empty to prevent collisions 
- Object must be hashable to add to set

In [None]:
small = {1, 5, 6, 2}
large = set([6, 8, 9, 10, 8, 8])
print(small)
print(large)

In [None]:
dir(small)

- Various compare logic...

In [None]:
all_ = small | large # union small.__or__(large)
intersection = small & large # intersection small.__and__(large)
difference = small - large # subtract out overlapping small.__sub__(large)
print(all_)
print(intersection)
print(difference)

##### One caution with sets is that they only retain one item of a particular hash
- If another item has the same hash it will not be added... 
- Values False and True hash to 0 and 1 respectively 
- Notice what happens below...

In [None]:
test = set()
test.add(True)
test.add(0)
test.add("testing")
test.add("testing")
test.add(1)
test.add(1.0)
test.add(False)
test.add(None)

test

- This happens with dicts as well...

In [None]:
test = {}
test[True] = 1
test[0] = 2
test["testing"] = 3
test[1] = 4
test[1.0] = 5
test[False] = 6
test[None] = 7

test

# Mutability vs Immutability Concerns

- Extreme care needs to be used around shallow and deep copies to make sure you're not unintentionally manipulating data

### Mutable items

- Without copying, just adding a reference
- Pointing and manipulating same data

In [None]:
names = ['Toni', 'John', 'Robin', 'Kali']
print(id(names), names, "\n")

people = names # Names and people point to the same list
people.pop() # So removing an item from people also removes from names
print(id(names), names) # Same ID, same data 
print(id(people), people) # Same ID, same data 

- Shallow copy
- Only first level pointers are copied

In [None]:
names = ['Toni', 'John', 'Robin', 'Kali']
print(id(names), names, "\n")

people = names[:] # Create a new list separate from names, names.copy() would also work
people.pop() 
print(id(names), names)
print(id(people), people)

- Deep copy
- Copy all levels to avoid issues

In [None]:
from copy import deepcopy

In [None]:
people = [['Toni', 'John'], ['Robin', 'Kali']]
print(id(people), people, "\n")

copy = people.copy()
copy[0][0] = "Hello"
print(id(people), people)
print(id(copy), copy)

In [None]:
people = [['Toni', 'John', 'Robin', 'Mike'], ['Steve', 'Caroline', 'Emma', 'Joe', 'Kali']]
print(id(people), people, "\n")

copy = deepcopy(people)
copy[0][0] = 1
print(id(people), people)
print(id(copy), copy)

### Immutable item
- Keeps same pointer since immutable...

- Shallow copy

In [None]:
tuple1 = (1, 3, "test")
tuple2 = tuple1[:]

print(id(tuple1))
print(id(tuple2))

- Deep copy

In [None]:
tuple1 = (1, 3, "test")
tuple2 = deepcopy(tuple1)

print(id(tuple1))
print(id(tuple2))

### Don't use mutable, or runtime objects as default arguments for functions

In [None]:
def unexpected(value, list_=[]):
    list_.append(value)
    print(list_)
    
unexpected(11)
unexpected('bananas', list_ =["test"])
unexpected("hello")

- time.time() has similar issues 
- timing here will be set to the value when the function is first instantiated 

In [None]:
import time
def unexpected_2(timing=time.time()): 
    print(timing)

unexpected_2()
time.sleep(5)
unexpected_2()


- Use None and is to check for empty kwargs..

In [None]:
def spam(a, b=None):
    if b is None: # must use None here 
        b = []
    print(a, b)

#def spam(a, b=None):
#    if not b:    # This causes silent errors due to all items that evaluate to false 
#        b = []
#    print(a, b)
        
spam("11", 0)

### Modifying data in place unintentionally

• Be explicit about modifying underlying input data in function calls

In [None]:
names = ['Toni', 'John', 'Robin', 'Mike', 'Steve', 'Caroline', 'Emma', 'Joe', 'Kali']
print(names)

In [None]:
sorted_ = sorted(names) # sorted() creates a copy
print(names)

In [None]:
sorted_ = names.sort() # Modifies the underlying list
print(names)