# python review

## Basic control
### if / elif / else
### while loop / for loop (foreach)
* else in loop: nobreak
* avoid changing the list in the loop body.

## Variable

The equal "=" sign in the assignment shouldn't be seen as "is equal to". It should be "read" or interpreted as "is set to", meaning in our example "the variable i is set to 42". 

### Object References
As variables are pointing to objects and objects can be of arbitrary data type, variables cannot have types associated with them. 

__id Function__
help> keywords

## Object
* id + type + content
* name <- namespaces (name : obj ref)

## Boolean
* True: 1 / False: 0

__False__:
* numerical zero values (0, 0.0, 0.0+0.0j),
* the Boolean value False,
* empty strings,
* empty lists and empty tuples,
* empty dictionaries.
* plus the special value None.

## Numbers
* Integer
* Long Integer
* Floating-point numbers
* Complex numbers
 ### Integer Division
* "true division" performed by "/"
* "floor division" performed by "//" -> return int

## Strings
All strings in Python 3 are sequences of "pure" Unicode characters, no specific encoding like UTF-8. 

's' vs "s" vs '''s''' 

__(a is b) vs (a == b)__: _is_ will return True if two variables point to the same object, _==_if the objects referred to by the variables are equal.


### Operators / functions
* Concatenation + ( vs join)
* Repetition *
* Indexing: "Python"[0]
* Slicing: "Python"[2:4] -> "th"
* Size: len("Python") -> 6

### Immutable Strings
### Byte Strings
Every string or text in Python 3 is Unicode, but encoded Unicode is represented as binary data. The type used to hold text is str, the type used to hold data is bytes.

### Bitwise operators
`<<, >>, &, |, ~, ^`

In [None]:
x = b"Hallo"
t = str(x)
u = t.encode("UTF-8")

## Sequential Data Types
* Has `len()` function
* Supports slicing

| Immutable | Mutable |
|:--|:-- |
| String  |  |
| Tuples | Lists |
| Bytes | ByteArrays |


They have some underlying concepts in common:
* The items or elements of strings, lists and tuples are *ordered* in a defined sequence
* The elements can be accessed via indices

### Python Lists []
Ordered sequence of values. A list in Python is an ordered group of items of elements. elements don't have to be of the same type. 

The main properties of Python lists:
* They are ordered
* The contain arbitrary objects
* Elements of a list can be accessed by an index
* They are arbitrarily nestable, i.e. they can contain other lists as sublists
* Variable size
* They are mutable, i.e. the elements of a list can be changed

List can have sublists as elements. 

### Tuples ()
A tuple is an immutable list. 
* Tuples are faster than lists.
* If you know that some data doesn't have to be changed, you should use tuples instead of lists, because this protects your data against accidental changes.
* Tuples can be used as keys in dictionaries, while lists can't.

### List Comprehension
```Python
>>> s = "Toronto is the largest City in Canada"
>>> t = "Python courses in Toronto by Bodenseo"
>>> s = "".join(["".join(x) for x in zip(s,t)])
>>> s
'TPoyrtohnotno  ciosu rtshees  lianr gTeosrto nCtiot yb yi nB oCdaennasdeao'
```

## Copy
* Copy with the Slice Operator
* Using the method deepcopy

```Python
from copy import deepcopy
lst2 = deepcopy(lst1)
```

# Dictionaries {}
* key-value pairs / has len()
* Only immutable data types can be used as keys
* see *Modern Dict.ipynb*

### Operator
```Python
len(d) # returns the number of stored entries, i.e. the number of (key,value) pairs.
del d[k] # deletes the key k together with his value
k in d # True, if a key k exists in the dictionary d
k not in d # True, if a key k doesn't exist in the dictionary d
```
### Methods
* D.__pop__(k[,d]) -> v  
* popitem(...)
* .get(k) return None if not found
* d.copy() -- Shallow copy
* d.clear()

### Iterating over a dict
```Python
for k in d:
    print(k)
for k in d.keys():
    print(k)
for v in d.values():
    print(v)
for k in d:
    print(d[k])
```

### Turn Lists into Dictionaries
``` Python
>>> dishes = ["pizza", "sauerkraut", "paella", "hamburger"]
>>> countries = ["Italy", "Germany", "Spain", "USA"," Switzerland"]
>>> country_specialities = list(zip(countries, dishes))
>>> country_specialities_dict = dict(country_specialities)
>>> print(country_specialities_dict)
{'Germany': 'sauerkraut', 'Italy': 'pizza', 'USA': 'hamburger', 'Spain': 'paella'} 
```

zip() in Python 3 returns an iterator. Iterator exhaust themselves, if they are used.
also, there is 'jitertools.zip_longest'

# Sets and Frozensets 
`set()` / `{}` / `frozenset()` : An unordered bag of unique values.
* Frozensets: immutable & hashable

### Operations
* `add(element)`
* `clear()`
* `copy`
* `difference()` / difference_update (x = x - y) / `-`
* `discard(el)` / `remove(el)`
* `x.union(y)` / `|`
* `x.intersection(y)` / `&`
* isdisjoint()
* "<=" is an abbreviation for "issubset()" and ">=" for "issuperset()" 
* "<" is used to check if a set is a proper subset of a set. / ">" is used to check if a set is a proper superset of a set. 
* pop()

## Print
```python
print(value1, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
print("format %5d" % (tuples))
```
* placeholder: %[flags][width][.precision]type 

| Conversion | Meaning |
|:--|:---|
| d | Signed integer decimal.|
| i | Signed integer decimal.|
| o | Unsigned octal.|
| u | Obsolete and equivalent to 'd', i.e. signed integer decimal.|
| x | Unsigned hexadecimal (lowercase).|
| X | Unsigned hexadecimal (uppercase).|
| e | Floating point exponential format (lowercase).|
| E | Floating point exponential format (uppercase).|
| f | Floating point decimal format.|
| F | Floating point decimal format.|
| g | Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise.|
| G | Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise.|
| c | Single character (accepts integer or single character string).|
| r | String (converts any python object using repr()).|
| s | String (converts any python object using str()).|
| % | No argument is converted, results in a "%" character in the result.|

* String method "format" https://www.python-course.eu/python3_formatted_output.php
```Python
template.format(p0, p1, ..., k0=v0, k1=v1, ...)
```

## Functions
```Python
def function-name(Parameter list):
    statements, i.e. the function body
```

### Optional Parameters
### Docstring: """ """
### Parameters: 
* arbitrary number of parameters: *args
* arbitrary number of keyword parameters: **kwargs

### Return / Return multiple values
The return value is immediately stored via unpacking into the variables
### Arbitrary Number of Parameters
```Python
def arithmetic_mean(first, *values):
    """ This function calculates the arithmetic mean of a non-empty
        arbitrary number of numerical values """

    return (first + sum(values)) / (1 + len(values))

x = [3, 5, 9]
arithmetic_mean(*x) # *x will "unpack" or singularize the list. 
```
Arbitrary Number of Keyword Parameters
```Python
def f(**kwargs):
    print(kwargs)
```

In [None]:
# transposing 2d matrix
my_list = [('a', 232), 
           ('b', 343), 
           ('c', 543), 
           ('d', 23)]
list(zip(*my_list))

## Recursion
### Factorial number
```Python
def factorial(n):
    if n == 0: # n == 1
        return 1
    else:
        return n * factorial(n-1)
        
def iterative_factorial(n):
    result = 1
    for i in range(2,n+1):
        result *= i
    return result
```
### Fibonacci numbers
```Python
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)
        
def fibi(n):
    old, new = 0, 1
    if n == 0:
        return 0
    for i in range(n-1):
        old, new = new, old + new
    return new
    
memo = {0:0, 1:1}
def fibm(n):
    if not n in memo:
        memo[n] = fibm(n-1) + fibm(n-2)
    return memo[n]
    
fibl = lambda n: n if n <= 2 else fibl(n-1) + fibl(n-2)
```

## Parameters and Arguments
Python uses a mechanism, which is known as "Call-by-Object", sometimes also called "Call by Object Reference" or "Call by Sharing".

If you pass immutable arguments like integers, strings or tuples to a function, the passing acts like call-by-value.

Python initially behaves like call-by-reference, but as soon as we are changing the value of such a variable, i.e. as soon as we assign a new object to it, Python "switches" to call-by-value.

### Side effects

# Namespaces and scopes
* python dictionary: key -> object
* __global__ names of a module
* __local__ names in a function or method invocation
* __built-in__ names: this namespace contains built-in fuctions (e.g. abs(), cmp(), ...) and built-in exception names

## Scopes
During program execution there are the following nested scopes available:
* the innermost scope is searched first and it contains the local names
* the scopes of any enclosing functions, which are searched starting with the nearest enclosing scope
* the next-to-last scope contains the current module's global names
* the outermost scope, which is searched last, is the namespace containing the built-in names

## Global, Local and nonlocal Variables
nonlocal variables have a lot in common with global variables. One difference to global variables lies in the fact that it is not possible to change variables from the module scope, i.e. variables which are not defined inside of a function, by using the nonlocal statement. 
* default: local
* nonlocal: nested functions; needs to be defined in enclosing functions

# Decorators
A decorator in Python is any callable Python object that is used to modify a function or a class. A reference to a function "func" or a class "C" is passed to a decorator and the decorator returns a modified function or class. The modified functions or classes usually contain calls to the original function "func" or class "C". 
* Function decorators
* Class decorators -> \_\_call\_\_ function

```Python
def decorator1(f):
    def helper():
        print("Decorating", f.__name__)
        f()
    return helper

@decorator1
def foo():
    print("inside foo()")

foo()
```
```Python
class decorator2:
    
    def __init__(self, f):
        self.f = f
        
    def __call__(self):
        print("Decorating", self.f.__name__)
        self.f()

@decorator2
def foo():
    print("inside foo()")

foo()
```

* __Syntactic Sugar__

`foo = decorator1(foo)`

# Lambda, filter, reduce and map
lambda: make function
* Advantage: 1. fairly simple 2. only used once
```Python
sum = lambda x, y : x + y
sum(3,4)
```
Mapping: r = map(func, seq)

Filtering: __filter(function, sequence)__ offers an elegant way to filter out all the elements of a sequence "sequence", for which the function function returns True.

Reducing: The function __reduce(func, seq)__ continually applies the function func() to the sequence seq. It returns a single value. 


# List [] / generator() / dict{k:v} / set{} comprehension
Clean and readably way to create these data types
```Python
# List
fahrenheit = [((float(9)/5)*x + 32) for x in Celsius]

# Tuple
pythagorean_triple = [(x,y,z) for x in range(1,30) for y in range(x,30) for z in range(y,30) if x**2 + y**2 == z**2]

# Set
no_prime = {j for i in range(2, sqrt(n)) for j in range(i*2, n, i)}
```


# Generators
https://www.python-course.eu/python3_generators.php

A generator is a function which returns a generator object. This generator object can be seen like a function which produces a sequence of results instead of a single object. This sequence of values is produced by iterating over it.

Generators has a syntactic and a semantic difference than functions
* yield statement -> returns a functions into a generator
* "pause" the code until next "next" call

The generator will stop with a StopIteration exception error. 
* \_\_iter\_\_() and next()

```Python
__iter__
__next__; next() # StopIteration
yield
noreset # create another one
return # = raise StopIteration()
send # wait at a yield statement
next # send None
```

In [None]:
def simple_generator_function():
    print("before yield")
    yield 1
    print("1-2")
    yield 2
    yield 3

our_generator = simple_generator_function()

In [None]:
next(our_generator)

![Generator](Asset/relationships.png)

# Modular Programming
Components can be independently created / tested
* Goal: minimization of dependencies
* Module: .py python code
> * usually contain functions / classes / "plain" statement (used to initalize the module)
> * import once
* import: module: import math (as ...)
* import: object from module: from math import sin, pi
> `*` only when working in the interactive Python shell

* code in module: execution 
* reload:  `from importlib import reload`

* Executing Modules as Scripts: `python fibo.py` <- `__name__` is set to `__main__`
* Kinds of Modules:
> * Written in Python (suffix .py)
> * Dynamically liked C modules (.dll, .pyd, .so, .sl, ...)
> * C-Modules linked with the Interpreter. To get a complete list of these modules:
> ```Python
import sys
print(sys.builtin_module_names)
```
* `dir(module)` lists content of a module


### Module search Path
1. The directory of the top-level file, i.e. the file being executed.
2. The directories of PYTHONPATH, if this global variable is set.
3. standard installation path Linux/Unix e.g. in /usr/lib/python3.5.

Find out where a module is located: `module.__file__`. (Not for statically linked C libraries)

# Package
Directory with `__init__.py`
* Either empty or contain valid Python code
* Code will be excuted when package imported (init a package)

* can't access neither "a" nor "b" by solely importing simple_package.
* Auto load: `__init__.py` to include:
```Python
import simple_package.a
import simple_package.b
```

### Using relative path: 
* `sound/__init__.py`: `from . import effects` 
* `effects/__init__.py`: `from .. import formats`
* `effects/__init__.py`: `from ..filters import karaoke`

### Import a complete package
* Add `__all__`to `sound/__init__.py` : `__all__ = ["formats", "filters", "effects", "foobar"]`
* `__all__`: a list of module and package names to be imported when `from package import *` is encountered.
* Can do it for all levels
* `import *` still bad practice




# Exceptions
```Python
try:
    x = float(input("Your number: "))
    inverse = 1.0 / x
except ValueError:
    print("You should have given either an int or a float")
except ZeroDivisionError:
    print("Infinity")
else:
    print("No exception raised")
finally:
    print("There may or may not have been an exception.")
```

# Tests
## Kinds of Errors
* Errors caused by lack of understanding of a language construct.
* Errors due to logically incorrect code conversion.

## Unit tests: Module Tests with \_\_name\_\_
```Python
if __name__ == "__main__":
    if fib(0) == 0 and fib(10) == 55 and fib(50) == 12586269025:
        print("Test for the fib function was successful!")
    else:
        print("The fib function is returning wrong values!")
```
    
## doctest
a test framework that comes prepackaged with Python. 

In [None]:
# doctest
import doctest

def fib(n):
    """ 
    Calculates the n-th Fibonacci number iteratively  

    >>> fib(0)
    0
    >>> fib(1)
    1
    >>> fib(10) 
    55
    >>> fib(15)
    610
    >>> 

    """
    a, b = 1, 1
    for i in range(n):
        a, b = b, a + b
    return a
doctest.testmod()
    

## test-driven development (TDD)

## unittest
* Based on JUnit and Smalltalk.
* TestCase, TestSuite, and so on
* TextTestRunner

Advantage: Not defined inside of the module

Disadvantage: Increased work to create the test cases

In [None]:
import unittest
from fibonacci import fib

class FibonacciTest(unittest.TestCase):

    def testCalculation(self):
        self.assertEqual(fib(0), 0)
        self.assertEqual(fib(1), 1)
        self.assertEqual(fib(5), 5)
        self.assertEqual(fib(10), 55)
        self.assertEqual(fib(20), 6765)

if __name__ == "__main__": 
    unittest.main()

# Class
* Describe how to produce an object
* Classes are objects too: As soon as you use the keyword _class_, Python executes it and creates an OBJECT
* This object (the class) is itself capable of creating objects (the instances), and this is why it's a class.

## Type
It can also create classes on the fly.
```Python
type(name of the class, 
     tuple of the parent class (for inheritance, can be empty), 
     dictionary containing attributes names and values)
```
* Attributes: a dictionary
```Python
class Foo(object):
      bar = True
# Can be translated to:
Foo = type('Foo', (), {'bar':True})
```
* Inherit

```Python
class FooChild(Foo):
    pass
# Would be
FooChild = type('FooChild', (Foo,), {})
```
* Add methods to class

```Python
def echo_bar(self):
    print(self.bar)

FooChild = type('FooChild', (Foo,), {'echo_bar': echo_bar})
```

# Metaclass
A metaclass is the class of a class. Like a class defines how an instance of the class behaves, a metaclass defines how a class behaves. A class is an instance of a metaclass.
* __intercept__ a class creation/ __modify__ the class / __return__ the modified class
* subclass of _type_; type is a metaclass
* Most commonly used as a class-factory
* fror simple alterations, use 1. monkey patching 2. class decorators.

```Python
MyClass = MetaClass()
MyObject = MyClass()
```

## \_\_metaclass\_\_ attribute
Python will use the metaclass to create the class Foo.
```Python
class Foo(object):
  __metaclass__ = something...
```
You write class Foo(object) first, but the class object Foo is not created in memory yet.

__Python will look for \_\_metaclass\_\_ in the class definition. If it finds it, it will use it to create the object class Foo. If it doesn't, it will use _type_ to create the class.__

* The main purpose of a metaclass is to change the class automatically, when it's created.
* \_\_metaclass\_\_ can be any callable, doesn't need to be a formal class.

Class creation
1. \_\_new\_\_ is the method called before \_\_init\_\_; it creates the object and returns it
2. \_\_init\_\_ just initializes the object passed as parameter
3. \_\_call\_\_ method allows the class's instance to be called as a function, not always modifying the instance itself.

```Python
def __new__(upperattr_metaclass, future_class_name, 
            future_class_parents, future_class_attr):
def __new__(cls, clsname, bases, dct):
    ... 
    return type.__new__(cls, clsname, bases, uppercase_attr)
    # same as (metaclasses inheriting from metaclasses)
    return super(UpperAttrMetaclass, cls).__new__(cls, clsname, bases, uppercase_attr)
```

3. Projects written in Python

8. Basics of OOP
    a. remember sample file 
    b. __init__, self, cls
    c. inherited, override 
9. Python related questions ready to ask interviewer
    a. Unit test 
    b. Versions
    c. sqlalchemy
10. Basics of other technologies [T-Shaped skillset] 
    a. git 
    b. linux
    c. sql
    
    
    
Unicode

# OOP
* Encapsulation
* Data Abstraction
* Polymorphism
* Inheritance

## First-class Everything
Guido wanted all objects to have equal status. They can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth.

## Attributes
* Dynamically create arbitrary new attributes for existing instances of a class.
    `x.name = "Steve"`
* Use `.__dict__` to show attributes and values
* Access _attributes_ "brand"
    1. Check "brand" is a key of `y.__dict__`
    2. Check if "brand" is a key of `Robot.__dict__`
    3. If not, attribute name is not defined, access it will raise an `AttributeError`
* Use `getattr(object, name[, default])` to prevent such exception
* Binding attributes to objects is a general concept in Python. 
* Attribute to function: a replacement for the static funcion variables (e.g., counter)

## Methods
* A method is "just" a function which is defined inside of a class.
* The first parameter is used a reference to the calling instance. `self`
* Methods vs functions
    * It belongs to a class, and it is defined within a class
    * The first parameter in the definition of a method has to be a reference to the instance, which called the method. This parameter is usually called "self".
* For a Class _C_, an instance _x_ of _C_ and a method _m_ of _C_ the following three method calls are __equivalent__: 
    * `type(x).m(x, ...)`
    * `C.m(x, ...)`
    * `x.m(...)`

## `__init__` method
* Used to initialize an instance 
* magic method

## `__str__` and `__repr__`
* `__str__` for end user, nicely printed. `__repr__` for internal representation of an object
* `o == eval(repr(o))`


## Data Abstraction, Data Encapsulation, and Information Hiding

![DataAbstraction](Asset/data_abstraction.png)
* Encapsulation is seen as the bundling of data with the methods that operate on that data. 
* Information hiding is the principle that some internal information or data is "hidden", so that it can't be accidentally changed. 
    * Data encapsulation via methods doesn't necessarily mean that the data is hidden. 
* Data Abstraction = Data Encapsulation + Data Hiding 

## Encapsulation
* Often accomplished by providing _getter_ and _setter_ methods
    * Getter methods do not change the values of attributes, they just return the values. 
    * Setter methods used for changing the values of attributes

## Public, Protected and Private Attributes
* Private attributes should only be used by the owner, i.e. inside of the class definition itself.
* Protected (restricted) Attributes may be used, but at your own risk. Essentially, this means that they should only be used under certain conditions.
* Public Attributes can and should be freely used.

| Naming | Type | Meaning |
|--------|------|---------|
| `name` | Public | These attributes can be freely used inside or outside of a class definition. |
| `_name` | Protected | Protected attributes should not be used outside of the class definition, unless inside of a subclass definition.  |
| `__name` | Private | This kind of attribute is inaccessible and invisible. It's neither possible to read nor write to those attributes, except inside of the class definition itself. |

## Destructor
`__del__`

## Class Attribues vs Instance Attributes
* Instance attributes are owned by the specific instances of a class
* Class attributes are shared by all the instances of the class. 
* Outside all the methods, right below the class header.
* To change it: `ClassName.AttributeName`
* class attributes and object attributes are stored in __separate__ dictionaries
    * `x.__dict__` vs `y.__dict__` vs `A.__dict__`

```Python
class C: 
    counter = 0
    
    def __init__(self): 
        type(self).counter += 1
    def __del__(self):
        type(self).counter -= 1
```
* `type(self)` is better than `C`, if use such a class as a superclass

## @staticmethod
a method, which we can call via the class name or via the instance name without the necessity of passing a reference to an instance to it. 


## @classmethod
* Like static methods class methods are not bound to instances
* Unlike static methods class methods are bound to a class. 
* The first parameter of a class method is a reference to a class, i.e. a class object. 
* They can be called via an instance or the class name. 

Use cases:
* In the definition of the factory methods
* They are often used, where we have static methods, which have to call other static methods. To do this, we would have to hard code the class name, if we had to use static methods. This is a problem, if we are in a use case, where we have inherited classes.

## Properties
* Getters & Setters: Mutator methods. Ensure the principle of data encapsulation.
* The Pythonic way to introduce attributes is to make them public

* `@properties`: getting a value
* `@x.setter`: set value for x: `def x(self, x):`
    * put `self.x=x` in the `__init__` method
    * Two methods with the same name and a different # param: due to decorating
* Less elegant way:
    * `x = property(get_x, set_x)`
    * `self.set_x(x)` in `__init__`
        * Problem: two ways to access / change the value x:
          `p1.x=42` or `p1.set_x(42)`
        * There should be one-- and preferably only one --obvious way to do it.
* Doesn't need to be one-to-one connection between properties and attributes
* Need to consider: 
    * Will the value of "OurAtt" be needed by the possible users of our class?
    * If not, we can or should make it a private attribute.
    * If it has to be accessed, we make it accessible as a public attribute
    * We will define it as a private attribute with the corresponding property, if and only if we have to do some checks or transformation of the data.
    * Alternatively, you could use a getter and a setter, but using a property is the Pythonic way to deal with it!

Old design:
```Python
class OurClass:

    def __init__(self, a):
        self.OurAtt = a


x = OurClass(10)
print(x.OurAtt)
```
_Converts into_
```Python
class OurClass:

    def __init__(self, a):
        self.OurAtt = a

    @property
    def OurAtt(self):
        return self.__OurAtt

    @OurAtt.setter
    def OurAtt(self, val):
        if val < 0:
            self.__OurAtt = 0
        elif val > 1000:
            self.__OurAtt = 1000
        else:
            self.__OurAtt = val


x = OurClass(10)
print(x.OurAtt)
```

## Inheritance
* Superclass / ancestor class / parent class / base class
* Subclass / hier class / child class / derived class
* in `__init__` method, call `super().__init__(*args, **kwargs)`

### Overriding vs overloading vs overwriting
* Method __overriding__ allows a subclass to provide a different implementation of a method that is already defined by its superclass or by one of its superclasses, by providing a method with the same name, same parameters or signature, and same return type as the method of the parent class. 
* __Overloading__ is the ability to define the same method, with the same name but with a different number of arguments and types.

### Multiple Inheritance
* Multiple inheritance with __old-style__ classes: depth-first and then left-to-right.
* New-style class: MRO
    * `A.__mro__` or `A.mro()`
* Rules - C3 Method Resolution Order:
    * C + (C1 C2 ... CN) = C C1 C2 ... CN
    * L[object] = object
    * L[C(B1 ... BN)] = C + merge(L[B1] ... L[BN], B1 ... BN)
    * e.g., L[C(B)] = C + merge(L[B],B) = C + L[B]

```Python
O = object
class F(O): pass # F merge(O,O) => FO
class E(O): pass # EO
class D(O): pass # DO
class C(D,F): pass # C merge(DO FO DF) => C D F O
class B(D,E): pass # B merge(DO EO DE) => B D E O
class A(B,C): pass # A merge(BDEO CDFO BC) => A B C D E F O
```

    
## Polymorphism
The ability to present the same interface for differing underlying forms.
* Python is implicitly polymorphic.
* Java or C++, have to overload f to implement the various type combinations. 

# Python Data model
AKA Magic Methods and Operator Overloading
https://docs.python.org/3/reference/datamodel.html

* `is` will return True if two variables point to the same object, `==` if the objects referred to by the variables are equal.

* `type()` An object's type determines the operations that the object supports. 


In [None]:
class Root:
    def draw(self):
        # the delegation chain stops here
        assert not hasattr(super(), 'draw')

class Shape(Root):
    def __init__(self, shapename, **kwds):
        self.shapename = shapename
        super().__init__(**kwds)
    def draw(self):
        print('Drawing.  Setting shape to:', self.shapename)
        super().draw()

class ColoredShape(Shape):
    def __init__(self, color, **kwds):
        self.color = color
        super().__init__(**kwds)
    def draw(self):
        print('Drawing.  Setting color to:', self.color)
        super().draw()

cs = ColoredShape(color='blue', shapename='square')
cs.draw()

## Module level property

In [None]:
# module.py

# getter and setter :(
a = 10
def get_a():
    global _a
    return _a

def set_a(value):
    global _a
    if value < 0:
        raise ValueError('must be positibe')
    _a = value

# module level property
_b = 100
@property
def b(self):
    return self._b

@b.setter
def b(self, value):
    if value < 0:
        raise ValueError('must be positive')
    self._b = value

if __name__ != '__main__':
    from sys import modules
    self = modules[__name__] # grab myself (thing this module suppose to be)
    mod = type(self)         # grab type of myself
    body = {x: getattr(self, x) for x in dir(self)} # contents
    prop = {k: ov for k, v in body.items() if isinstance(v, property)} # contents happen to be properties
    mod = type(mod.__name__, (mod,), prop)
    
    # create a new class with all the properties
    self = modules[__name__] = mod(__name__)
    for k, v in {k: v for k, v in body.items() if k not in prop}.items():
        setattr(self, k, v)
        
import module
module.a = 1 # module.a is attrubute lookup; create own module type


# Transforming Code into Beautiful, Idiomatic Python

Notes from Raymond Hettinger's talk at pycon US 2013 [video](http://www.youtube.com/watch?feature=player_embedded&v=OSGv2VnC0go), [slides](https://speakerdeck.com/pyconslides/transforming-code-into-beautiful-idiomatic-python-by-raymond-hettinger-1).

The code examples and direct quotes are all from Raymond's talk. I've reproduced them here for my own edification and the hopes that others will find them as handy as I have!

## Looping over a range of numbers

```python
for i in [0, 1, 2, 3, 4, 5]:
    print i**2

for i in range(6):
    print i**2
```

__Better__

```python
for i in xrange(6):
    print i**2
```
`xrange` creates an iterator over the range producing the values one at a time. This approach is much more memory efficient than `range`. `xrange` was renamed to `range` in python 3.

## Looping over a collection

```python
colors = ['red', 'green', 'blue', 'yellow']

for i in range(len(colors)):
    print colors[i]
```

__Better__

```python
for color in colors:
    print color
```

## Looping backwards

```python
colors = ['red', 'green', 'blue', 'yellow']

for i in range(len(colors)-1, -1, -1):
    print colors[i]
```

__Better__

```python
for color in reversed(colors):
    print color
```

## Looping over a collection and indices

```python
colors = ['red', 'green', 'blue', 'yellow']

for i in range(len(colors)):
    print i, '--->', colors[i]
```

__Better__

```python
for i, color in enumerate(colors):
    print i, '--->', color
```
> It's fast and beautiful and saves you from tracking the individual indices and incrementing them.

> Whenever you find yourself manipulating indices [in a collection], you're probably doing it wrong.

## Looping over two collections

```python
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue', 'yellow']

n = min(len(names), len(colors))
for i in range(n):
    print names[i], '--->', colors[i]

for name, color in zip(names, colors):
    print name, '--->', color
```

__Better__

```python
for name, color in izip(names, colors):
    print name, '--->', color
```

`zip` creates a new list in memory and takes more memory. `izip` is more efficient than `zip`.
Note: in python 3 `izip` was renamed to `zip` and promoted to a builtin replacing the old `zip`.

## Looping in sorted order

```python
colors = ['red', 'green', 'blue', 'yellow']

# Forward sorted order
for color in sorted(colors):
    print colors

# Backwards sorted order
for color in sorted(colors, reverse=True):
    print colors
```

## Custom Sort Order

```python
colors = ['red', 'green', 'blue', 'yellow']

def compare_length(c1, c2):
    if len(c1) < len(c2): return -1
    if len(c1) > len(c2): return 1
    return 0

print sorted(colors, cmp=compare_length)
```

__Better__

```python
print sorted(colors, key=len)
```

The original is slow and unpleasant to write. Also, comparison functions are no longer available in python 3.

## Call a function until a sentinel value

```python
blocks = []
while True:
    block = f.read(32)
    if block == '':
        break
    blocks.append(block)
```

__Better__

```python
blocks = []
for block in iter(partial(f.read, 32), ''):
    blocks.append(block)
```

`iter` takes two arguments. The first you call over and over again and the second is a sentinel value.

## Distinguishing multiple exit points in loops

```python
def find(seq, target):
    found = False
    for i, value in enumerate(seq):
        if value == target:
            found = True
            break
    if not found:
        return -1
    return i
```

__Better__

```python
def find(seq, target):
    for i, value in enumerate(seq):
        if value == target:
            break
    else:
        return -1
    return i
```

Inside of every `for` loop is an `else`.

## Looping over dictionary keys

```python
d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

for k in d:
    print k

for k in d.keys():
    if k.startswith('r'):
        del d[k]
```

When should you use the second and not the first? When you're mutating the dictionary.

> If you mutate something while you're iterating over it, you're living in a state of sin and deserve what ever happens to you.

`d.keys()` makes a copy of all the keys and stores them in a list. Then you can modify the dictionary.
Note: in python 3 to iterate through a dictionary you have to explicidly write: `list(d.keys())` because `d.keys()` returns a "dictionary view" (an iterable that provide a dynamic view on the dictionary's keys). See [documentation](https://docs.python.org/3/library/stdtypes.html#dict-views).

## Looping over dictionary keys and values

```python
# Not very fast, has to re-hash every key and do a lookup
for k in d:
    print k, '--->', d[k]

# Makes a big huge list
for k, v in d.items():
    print k, '--->', v
```

__Better__

```python
for k, v in d.iteritems():
    print k, '--->', v
```

`iteritems()` is better as it returns an iterator.
Note: in python 3 there is no `iteritems()` and `items()` behaviour is close to what `iteritems()` had. See [documentation](https://docs.python.org/3/library/stdtypes.html#dict-views).
 
## Construct a dictionary from pairs

```python
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue']

d = dict(izip(names, colors))
# {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}
```
For python 3: `d = dict(zip(names, colors))`

## Counting with dictionaries

```python
colors = ['red', 'green', 'red', 'blue', 'green', 'red']

# Simple, basic way to count. A good start for beginners.
d = {}
for color in colors:
    if color not in d:
        d[color] = 0
    d[color] += 1

# {'blue': 1, 'green': 2, 'red': 3}
```

__Better__

```python
d = {}
for color in colors:
    d[color] = d.get(color, 0) + 1

# Slightly more modern but has several caveats, better for advanced users
# who understand the intricacies
d = defaultdict(int)
for color in colors:
d[color] += 1
```

## Grouping with dictionaries -- Part I and II

```python
names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

# In this example, we're grouping by name length
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = []
    d[key].append(name)

# {5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}

d = {}
for name in names:
    key = len(name)
    d.setdefault(key, []).append(name)
```

__Better__

```python
d = defaultdict(list)
for name in names:
    key = len(name)
    d[key].append(name)
```

## Is a dictionary popitem() atomic?

```python
d = {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

while d:
    key, value = d.popitem()
    print key, '-->', value
```

`popitem` is atomic so you don't have to put locks around it to use it in threads.

## Linking dictionaries

```python
defaults = {'color': 'red', 'user': 'guest'}
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parser.parse_args([])
command_line_args = {k:v for k, v in vars(namespace).items() if v}

# The common approach below allows you to use defaults at first, then override them
# with environment variables and then finally override them with command line arguments.
# It copies data like crazy, unfortunately.
d = defaults.copy()
d.update(os.environ)
d.update(command_line_args)
```

__Better__

```python
d = ChainMap(command_line_args, os.environ, defaults)
```

`ChainMap` has been introduced into python 3. Fast and beautiful.

## Improving Clarity
 * Positional arguments and indicies are nice
 * Keywords and names are better
 * The first way is convenient for the computer
 * The second corresponds to how human's think

## Clarify function calls with keyword arguments

```python
twitter_search('@obama', False, 20, True)
```

__Better__

```python
twitter_search('@obama', retweets=False, numtweets=20, popular=True)
```

Is slightly (microseconds) slower but is worth it for the code clarity and developer time savings.

## Clarify multiple return values with named tuples

```python
# Old testmod return value
doctest.testmod()
# (0, 4)
# Is this good or bad? You don't know because it's not clear.
```

__Better__

```python
# New testmod return value, a namedTuple
doctest.testmod()
# TestResults(failed=0, attempted=4)
```

A namedTuple is a subclass of tuple so they still work like a regular tuple, but are more friendly.

To make a namedTuple:

```python
TestResults = namedTuple('TestResults', ['failed', 'attempted'])
```

## Unpacking sequences

```python
p = 'Raymond', 'Hettinger', 0x30, 'python@example.com'

# A common approach / habit from other languages
fname = p[0]
lname = p[1]
age = p[2]
email = p[3]
```

__Better__

```python
fname, lname, age, email = p
```

The second approach uses tuple unpacking and is faster and more readable.

## Updating multiple state variables

```python
def fibonacci(n):
    x = 0
    y = 1
    for i in range(n):
        print x
        t = y
        y = x + y
        x = t
```

__Better__

```python
def fibonacci(n):
    x, y = 0, 1
    for i in range(n):
        print x
        x, y = y, x + y
```

Problems with first approach

 * x and y are state, and state should be updated all at once or in between lines that state is mis-matched and a common source of issues
 * ordering matters
 * it's too low level


The second approach is more high-level, doesn't risk getting the order wrong and is fast.

## Simultaneous state updates

```python
tmp_x = x + dx * t
tmp_y = y + dy * t
tmp_dx = influence(m, x, y, dx, dy, partial='x')
tmp_dy = influence(m, x, y, dx, dy, partial='y')
x = tmp_x
y = tmp_y
dx = tmp_dx
dy = tmp_dy
```

__Better__

```python
x, y, dx, dy = (x + dx * t,
                y + dy * t,
                influence(m, x, y, dx, dy, partial='x'),
                influence(m, x, y, dx, dy, partial='y'))
```

## Efficiency
 * An optimization fundamental rule
 * Don't cause data to move around unnecessarily
 * It takes only a little care to avoid O(n**2) behavior instead of linear behavior

> Basically, just don't move data around unecessarily.

## Concatenating strings

```python
names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

s = names[0]
for name in names[1:]:
    s += ', ' + name
print s
```

__Better__

```python
print ', '.join(names)
```

## Updating sequences

```python
names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

del names[0]
# The below are signs you're using the wrong data structure
names.pop(0)
names.insert(0, 'mark')
```

__Better__

```python
names = deque(['raymond', 'rachel', 'matthew', 'roger',
               'betty', 'melissa', 'judith', 'charlie'])

# More efficient with deque
del names[0]
names.popleft()
names.appendleft('mark')
```
## Decorators and Context Managers
 * Helps separate business logic from administrative logic
 * Clean, beautiful tools for factoring code and improving code reuse
 * Good naming is essential.
 * Remember the Spiderman rule: With great power, comes great responsibility!

## Using decorators to factor-out administrative logic

```python
# Mixes business / administrative logic and is not reusable
def web_lookup(url, saved={}):
    if url in saved:
        return saved[url]
    page = urllib.urlopen(url).read()
    saved[url] = page
    return page
```

__Better__

```python
@cache
def web_lookup(url):
    return urllib.urlopen(url).read()
```

Note: since python 3.2 there is a decorator for this in the standard library: `functools.lru_cache`.

## Factor-out temporary contexts

```python
# Saving the old, restoring the new
old_context = getcontext().copy()
getcontext().prec = 50
print Decimal(355) / Decimal(113)
setcontext(old_context)
```

__Better__

```python
with localcontext(Context(prec=50)):
    print Decimal(355) / Decimal(113)
```

## How to open and close files

```python
f = open('data.txt')
try:
    data = f.read()
finally:
    f.close()
```

__Better__

```python
with open('data.txt') as f:
    data = f.read()
```

## How to use locks

```python
# Make a lock
lock = threading.Lock()

# Old-way to use a lock
lock.acquire()
try:
    print 'Critical section 1'
    print 'Critical section 2'
finally:
    lock.release()
```

__Better__

```python
# New-way to use a lock
with lock:
    print 'Critical section 1'
    print 'Critical section 2'
```

## Factor-out temporary contexts

```python
try:
    os.remove('somefile.tmp')
except OSError:
    pass
```

__Better__

```python
with ignored(OSError):
    os.remove('somefile.tmp')
```

`ignored` is is new in python 3.4, [documentation](http://docs.python.org/dev/library/contextlib.html#contextlib.ignored).
Note: `ignored` is actually called `suppress` in the standard library.

To make your own `ignored` context manager in the meantime:

```python
@contextmanager
def ignored(*exceptions):
    try:
        yield
    except exceptions:
        pass
```

> Stick that in your utils directory and you too can ignore exceptions

## Factor-out temporary contexts

```python
# Temporarily redirect standard out to a file and then return it to normal
with open('help.txt', 'w') as f:
    oldstdout = sys.stdout
    sys.stdout = f
    try:
        help(pow)
    finally:
        sys.stdout = oldstdout
```

__Better__

```python
with open('help.txt', 'w') as f:
    with redirect_stdout(f):
        help(pow)
```

`redirect_stdout` is proposed for python 3.4, [bug report](http://bugs.python.org/issue15805).

To roll your own `redirect_stdout` context manager

```python
@contextmanager
def redirect_stdout(fileobj):
    oldstdout = sys.stdout
    sys.stdout = fileobj
    try:
        yield fieldobj
    finally:
        sys.stdout = oldstdout
```

## Concise Expressive One-Liners
Two conflicting rules:

 * Don't put too much on one line
 * Don't break atoms of thought into subatomic particles

Raymond's rule:

 * One logical line of code equals one sentence in English

## List Comprehensions and Generator Expressions

```python
result = []
for i in range(10):
s = i ** 2
    result.append(s)
print sum(result)
```

__Better__

```python
print sum(i**2 for i in xrange(10))
```

First way tells you what to do, second way tells you what you want.

# Names and Values Assignments

Assignment makes the name on the left side refer to the value on the right side.

## Assignment

- __Assignment__: make this _name_ refer to that _value_
    - `x = 23`:  the name “x” refers to the value 23 or x is bound to 23.
- An assignment statement can make a second (or third, ...) name refer to the same value.
    - `y = x`: Now x and y both refer to the same _value_; _y doesn't refer to x, but 23_
    - two names, but only one value.
- If two names refer to the same value, this doesn’t magically link the two names. 
    - `x = 12`: Reassigning x leaves y alone.
-  __No__ mechanism for making a name refer to another name.
- Assigning a value to a name __never__ copies the data, it __never__ makes a new value. 
    - When working with `list`, assignment doesn't turn list into two lists
- Anything that can appear on the left-hand side of an assignment statement is a reference
- __Call by assignment__

## Mutable aliasing
- A mutable with more than one name: when the value changes, all names see the change.
- Numbers, strings, and tuples are all __immutable__.
- The aliasing problem can’t happen with immutable values; you can’t change the value in place.

## Change
- Chaning an int: __rebinding__
     - `x = x + 1`
     
- Changing a list: __mutating__
     - `nums.append(7)`
     - `nums += other_list`
     
- Can also __rebind__ lists:
    - `nums = nums + [7]`
    - the + operator here makes an entirely new list, and then the name nums is rebound to it.
    
- Can't mutate an int: ints are __immutable__

## Assignment Variants

```Python
x += y
x = x + y          # conceptually
x = x.__iadd__(y)  # actually
x = type(x).__iadd__(x, y)

# pseudo-code
class List:
    def __iadd__(slef, other):
        self.extend(other)
        return self
    
# These are the same:
num_list += [4, 5]
num_list.extend([4, 5]);   num_list = num_list
```

- compound data structures each of which hold references to values: 
    - list elements, dictionary keys and values, object attributes, and so on. 
    ` Each of these left-hand sides is a reference:
```Python
my_obj.attr = 23
my_dict[key] = 24
my_list[index] = 25
my_obj.attr[key][index].attr = "etc, etc"
```

- “i = x” assigns to the name i, but “i[0] = x” assigns to the first element of i’s value
- More ways to assign to name X; These statement __are__ assignments, not just like
```Python
X = ...
for X in ...
[... for X in ...]
(... for X in ...)
{... for X in ...}
class X(...):
def X(...):
def fn(X): ... ; fn(12)
with ... as X:
except ... as X:
import X
from ... import X
import ... as X
from ... import ... as X
```

In [None]:
#  multiply them all by 10
nums = [1, 2, 3]
for x in nums:          # x = nums[0] ...
    x = x * 10
print(nums)             # [1, 2, 3]   :(

- Our loop never modifies the original list because we are simply reassigning the name x over and over again.
- The best advice is to _avoid mutating lists_, and instead to make new lists:
    - `nums = [ 10*x for x in nums ]`

## Function arguments
- Function arguments are assignments
- Default argument belongs to function defination, it won't destroied after function returned. Don't use (list) in default arg.

In [None]:
def append_twice(a_list, val):
    '''Mutates argument'''
    a_list.append(val)
    a_list.append(val)
    
def append_twice_bad(a_list, val):
    '''useless!'''
    a_list = a_list + [val, val]
    
def append_twice_good(a_list, val):
    '''Returns new list'''
    a_list = a_list + [val, val]
    return a_list


## Any name -> any value @ any time
Dynamic typing
- names have no type, only value holds type
- values have no scope
    - a function has a local var -> the name is scoped to the function
    - when the function returns, the name is destroyed
    - but if the name’s value has other references, it will live on beyond the function call
    - It is a local name, not a local value

In [None]:
# Making a 2D list
board = [[0] * 8] * 8                # bad
board = [[0] * 8 for _ in range(8)]  # good

# Concurrency
https://pybay.com/site_media/slides/raymond2017-keynote/intro.html

## Introduction
### Our Goal
Walk through two examples of threading and multiprocessing to illustrate rules and best practices for taking advantage of concurrency.

### Why Concurrency?
1. Improve perceived responsiveness
1. Improve speed
1. Because that is how the real world works

### Martelli Model of Scaleability
1. 1 core: Single thread and single process
1. 2-8 cores: Multiple threads and multiple processes
1. 9+ cores: Distributed processing

Martelli’s observation: As time goes on, the second category becomes less common and relevant. Single cores become more powerful. Big datasets grow ever larger.

### Global Interpreter Lock
CPython has a lock for its internal shared global state.

The unfortunate effect of the GIL is that no more than one thread can run at a time.

For I/O bound applications, the GIL doesn’t present much of an issue. For CPU bound applications, using threading makes the application speed worse. Accordingly, that drives us to multi-processing to gain more CPU cycles.


### Threads vs Processes
“Your weakness is your strength and your strength is your weakness”.

The strength of threads is __shared state__. The weakness of threads is shared state (managing __race conditions__).

The strength of processes is their __independence__ from one another. The weakness of processes is lack of communication (hence the need for IPC and object pickling and other overhead).

### Threads vs Async
#### Threads
Threads switch __preemptively__. This is convenient because you don’t need to add explicit code to cause a task switch.

The cost of this convenience is that you have to __assume__ a switch can happen at any time. Accordingly, __critical sections__ have to be guarded with locks.

The limit on threads is total CPU power minus the cost of task switches and synchronization overhead.

#### Async
Async switches __cooperatively__, so you do need to add explicit code `yield` or `await` to cause a task switch.

Now you control when task switches occur, so locks and other synchronization are no longer needed.

Also, the cost task switches is very low. Calling a pure Python function has more overhead than restarting a generator or awaitable.

This means that async is very __cheap__.

In return, you’ll need a __non-blocking version__ of just about everything you do. Accordingly, the async world has a huge ecosystem of support tools. This increases the learning curve.

### Comparison
- Async maximizes CPU utilization because it has less overhead than threads.
- Threading typically works with existing code and tools as long as locks are added around critical sections.
- For complex systems, async is much easier to get right than threads with locks.
- Threads require very little tooling (locks and queues).
- Async needs a great deal of tooling (futures, event loops, and non-blocking versions of just about everything).

## Two Simple Examples
Updating and printing a counter:

```Python
counter = 0

print('Starting up')
for i in range(10):
    counter += 1
    print('The count is %d' % counter)
    print('---------------')
print('Finishing up')
```

Displaying the homepage sizes for multiple websites:

```Python
import urllib.request

sites = [
    'https://www.yahoo.com/',
    'http://www.cnn.com',
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.perl.org',
    'http://www.cisco.com',
    'http://www.facebook.com',
    'http://www.twitter.com',
    'http://www.macrumors.com/',
    'http://arstechnica.com/',
    'http://www.reuters.com/',
    'http://abcnews.go.com/',
    'http://www.cnbc.com/',
]

for url in sites:
    with urllib.request.urlopen(url) as u:
        page = u.read()
        print(url, len(page))
```

## Threading Example

### Scripting style
Start with working code that is clear, simple, and runs top to bottom. This is easy to develop and test incrementally.

```Python
counter = 0

print('Starting up')
for i in range(10):
    counter += 1
    print('The count is %d' % counter)
    print('---------------')
print('Finishing up')
```

That gives us the obvious output:

```
Starting up
The count is 1
---------------
The count is 2
---------------
The count is 3
---------------
The count is 4
---------------
The count is 5
---------------
The count is 6
---------------
The count is 7
---------------
The count is 8
---------------
The count is 9
---------------
The count is 10
---------------
Finishing up
```

> __Note__: Get your app tested and debugged in a singled threaded mode first before you start threading. Threading NEVER makes debugging easier.

### Function style
A next step in development is to factor re-usable code into functions.

```Python
counter = 0

def worker():
    'My job is to increment the counter and print the current count'
    global counter

    counter += 1
    print('The count is %d' % counter)
    print('---------------')

print('Starting up')
for i in range(10):
    worker()
print('Finishing up')
```

### Multi-threading is easy!
It is just a matter of launching a few worker threads.

```Python
import threading

counter = 0

def worker():
    'My job is to increment the counter and print the current count'
    global counter

    counter += 1
    print('The count is %d' % counter)
    print('---------------')

print('Starting up')
for i in range(10):
    threading.Thread(target=worker).start()
print('Finishing up')
```

### Testing proves the code is correct!
A simple test run compares perfectly to the original output:

```
$ python3.6 threading_multi1.py
Starting up
The count is 1
---------------
The count is 2
---------------
The count is 3
---------------
The count is 4
---------------
The count is 5
---------------
The count is 6
---------------
The count is 7
---------------
The count is 8
---------------
The count is 9
---------------
The count is 10
---------------
Finishing up
```

### Can you spot the race conditions?
Most people spot the “counter increment” race condition, but most don’t immediately see the “print function” race condition.

> __Note__: Testing cannot prove the absence of errors. It is still useful, don’t rely on it. Many interest racing conditions don’t reveal themselves in test environments. Need proof.

Why didn’t testing reveal the flaws? Happens too fast.

What can we do to improve the effectiveness of testing?

### Fuzzing
Fuzzing is a technique for amplifying race conditions. Putting random amount of sleep.

```Python
import threading, time, random

##########################################################################################
# Fuzzing is a technique for amplifying race condition errors to make them more visible

FUZZ = True

def fuzz():
    if FUZZ:
        time.sleep(random.random())

###########################################################################################

counter = 0

def worker():
    'My job is to increment the counter and print the current count'
    global counter

    fuzz()
    oldcnt = counter
    fuzz()
    counter = oldcnt + 1
    fuzz()
    print('The count is %d' % counter, end='')
    fuzz()
    print()
    fuzz()
    print('---------------', end='')
    fuzz()
    print()
    fuzz()

print('Starting up')
fuzz()
for i in range(10):
    threading.Thread(target=worker).start()
    fuzz()
print('Finishing up')
fuzz()
```

This technique is limited to relatively small blocks of code and is imperfect in that is can’t prove the absence of errors.

Still, fuzzed tests do reveal the presence of errors:

```
Starting up
The count is 1The count is 2The count is 2The count is 2


---------------The count is 3
---------------The count is 4
---------------
---------------The count is 4


The count is 5------------------------------
Finishing up


The count is 5
------------------------------



The count is 6---------------
---------------
```

### David Baron at Mozilla’s San Francisco
    "Must be this tall to write multi-threaded code"

### More Careful Threading with Queues
Interestingly, the rules for threading are just for computing and programming. The physical world is full of concurrency as well. Many of these techniques has physical analogs that are useful for managing people and projects.

> #### Note: RR 1000
>
> ALL shared resources SHALL be run in EXACTLY ONE thread. ALL communication with that thread SHALL be done using an atomic message queue: typically the Queue module, email, message queues like RabbitMQ or ZeroMQ, interesting you can communicate via a database as well.
>
> Resources that need this technique: global variables, user input, output devices, files, sockets, etc.
>
> Some resources that already have locks inside (thread-safe): logging module, decimal module (thread local variables), databases (reader locks and writer locks), email (this is an atomic message queue).



> #### Note: RR 1001
> 
> One category of sequencing problems is to make sure that step A and step B happen sequentially. The solution is to put both in the same thread where all actions proceed sequentially.

> #### Note: RR 1002
> 
> To implement a “barrier” that waits for parallel threads to complete, just `join()` all of the threads.

> #### Note: RR 1003
> 
> You can’t _wait_ on daemon threads to complete (they are infinite loops). Instead, you `join()` on the queue itself. It waits until all the requested tasks are marked as being done.

> #### Note: RR 1004
> 
> Sometimes you need a _global variable_ to communicate between functions. Global variables work great for this purpose in a single threaded program. In multi-threaded code, the mutable global state is a __disaster__. The better solution is to use a `threading.local()` that is __global WITHIN a thread__ but not without.

> #### Note: RR 1005
> 
> Never try to kill a thread from something external to that thread. You never know if that thread is holding a lock. Python doesn’t provide a direct mechanism for kill threads externally; however, you can do it using ctypes, but that is a recipe for a deadlock.
>
> If want a thread to die, make it periodically checkes a message queue or global. Release its own lock and exit gracefully.

### Applying all the rules

```Python
import threading, time, random, queue

##########################################################################################
# Fuzzing is a technique for amplifying race condition errors to make them more visible

FUZZ = True

def fuzz():
    if FUZZ:
        time.sleep(random.random())

###########################################################################################

counter = 0

counter_queue = queue.Queue()

def counter_manager():
    'I have EXCLUSIVE rights to update the counter variable'
    global counter

    while True:
        increment = counter_queue.get()
        fuzz()
        oldcnt = counter
        fuzz()
        counter = oldcnt + increment
        fuzz()
        print_queue.put([
            'The count is %d' % counter,
            '---------------'])
        fuzz()
        counter_queue.task_done()

t = threading.Thread(target=counter_manager)
t.daemon = True
t.start()
del t

###########################################################################################

print_queue = queue.Queue()

def print_manager():
    'I have EXCLUSIVE rights to call the "print" keyword'
    while True:
        job = print_queue.get()
        fuzz()
        for line in job:
            print(line, end='')
            fuzz()
            print()
            fuzz()
        print_queue.task_done()
        fuzz()

t = threading.Thread(target=print_manager)
t.daemon = True
t.start()
del t

###########################################################################################

def worker():
    'My job is to increment the counter and print the current count'
    counter_queue.put(1)
    fuzz()

print_queue.put(['Starting up'])
fuzz()

worker_threads = []
for i in range(10):
    t = threading.Thread(target=worker)
    worker_threads.append(t)
    t.start()
    fuzz()
for t in worker_threads:
    fuzz()
    t.join()

counter_queue.join()
fuzz()
print_queue.put(['Finishing up'])
fuzz()
print_queue.join()
fuzz()
```

### Cleaned-up code without fuzzing

```Python
import threading, queue

###########################################################################################

counter = 0

counter_queue = queue.Queue()

def counter_manager():
    'I have EXCLUSIVE rights to update the counter variable'
    global counter

    while True:
        increment = counter_queue.get()
        counter += increment
        print_queue.put([
            'The count is %d' % counter,
            '---------------'])
        counter_queue.task_done()

t = threading.Thread(target=counter_manager)
t.daemon = True
t.start()
del t

###########################################################################################

print_queue = queue.Queue()

def print_manager():
    'I have EXCLUSIVE rights to call the "print" keyword'
    while True:
        job = print_queue.get()
        for line in job:
            print(line)
        print_queue.task_done()

t = threading.Thread(target=print_manager)
t.daemon = True
t.start()
del t

###########################################################################################

def worker():
    'My job is to increment the counter and print the current count'
    counter_queue.put(1)

print_queue.put(['Starting up'])
worker_threads = []
for i in range(10):
    t = threading.Thread(target=worker)
    worker_threads.append(t)
    t.start()
for t in worker_threads:
    t.join()

counter_queue.join()
print_queue.put(['Finishing up'])
print_queue.join()
```

### Careful Threading with locks
- Locks are hard to use

```Python
import threading, time, random

##########################################################################################
# Fuzzing is a technique for amplifying race condition errors to make them more visible

FUZZ = True

def fuzz():
    if FUZZ:
        time.sleep(random.random())

###########################################################################################

counter_lock = threading.Lock()
printer_lock = threading.Lock()

counter = 0

def worker():
    'My job is to increment the counter and print the current count'
    global counter
    with counter_lock:
        oldcnt = counter
        fuzz()
        counter = oldcnt + 1
        fuzz()
        with printer_lock:
            print('The count is %d' % counter, end='')
            fuzz()
            print()
            fuzz()
            print('---------------', end='')
            fuzz()
            print()
        fuzz()

with printer_lock:
    print('Starting up', end='')
    fuzz()
    print()
fuzz()

worker_threads = []
for i in range(10):
    t = threading.Thread(target=worker)
    worker_threads.append(t)
    t.start()
    fuzz()
for t in worker_threads:
    t.join()
    fuzz()

with printer_lock:
    print('Finishing up', end='')
    fuzz()
    print()

fuzz()
```

Let’s see how well the runs:

```
Starting up
The count is 1
---------------
The count is 2
---------------
The count is 3
---------------
The count is 4
---------------
The count is 5
---------------
The count is 6
---------------
The count is 7
---------------
The count is 8
---------------
The count is 9
---------------
The count is 10
---------------
Finishing up
```

It is perfect!

### Cleaned-up code without fuzzing

Now, let’s clean it up:

```Python
import threading, time, random

counter_lock = threading.Lock()
printer_lock = threading.Lock()

counter = 0

def worker():
    'My job is to increment the counter and print the current count'
    global counter
    with counter_lock:
        counter += 1
        with printer_lock:
            print('The count is %d' % counter)
            print('---------------')

with printer_lock:
    print('Starting up')

worker_threads = []
for i in range(10):
    t = threading.Thread(target=worker)
    worker_threads.append(t)
    t.start()
for t in worker_threads:
    t.join()

with printer_lock:
    print('Finishing up')
```

Results:

- It is perfect!
- It is beautiful.
- It is simpler than using queues.

### Notes on Locks
>#### Note: RR 1005
>
>Locks don’t lock anything. They are just flags and can be ignored. It is a cooperative tool, not an enforced tool.

>#### Note: RR 1006
>
>In general, locks should be considered a low level primitive that is difficult to reason about in non-trivial examples. For more complex applications, you’re almost always better off with using atomic message queues.

>#### Note: RR 1007
>
>The more locks you are acquire at one time, the more you lose the advantages of concurrency.

### Dining Philosophers
The rules given above help you reliably create multi-threaded code is the underlying data flow is a DAG (directed acyclic graph).

When the control flow or data flow is circular, the problem can be much harder. At that point, more formal design and verification techniques are warranted. Otherwise, it can be quite difficult in complex applications to avoid deadlock, have thread starvation, or to have unfair solutions.

## Multi-processing Example
### Scripting style
Start with working code that is clear, simple, and runs top to bottom. This is easy to develop and test incrementally.

```Python
import urllib.request

sites = [
    'https://www.yahoo.com/',
    'http://www.cnn.com',
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.perl.org',
    'http://www.cisco.com',
    'http://www.facebook.com',
    'http://www.twitter.com',
    'http://www.macrumors.com/',
    'http://arstechnica.com/',
    'http://www.reuters.com/',
    'http://abcnews.go.com/',
    'http://www.cnbc.com/',
]

for url in sites:
    with urllib.request.urlopen(url) as u:
        page = u.read()
        print(url, len(page))
```

### Function style

```Python
import urllib.request

sites = [
    'https://www.yahoo.com/',
    'http://www.cnn.com',
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.perl.org',
    'http://www.cisco.com',
    'http://www.facebook.com',
    'http://www.twitter.com',
    'http://www.macrumors.com/',
    'http://arstechnica.com/',
    'http://www.reuters.com/',
    'http://abcnews.go.com/',
    'http://www.cnbc.com/',
]

def sitesize(url):
    ''' Determine the size of a website '''
    with urllib.request.urlopen(url) as u:
        page = u.read()
        return url, len(page)

for result in map(sitesize, sites):
    print(result)
```

> __Note__: A good development strategy is to use map to test the code in a single process and single thread mode before switching to multi-processing.

### What is parallelizeable?
A __key pattern__ of thinking is to divide the world into to “lawn mowing” versus “baby making” – identifying tasks that are significantly parallelizeable versus those that are intrinsically serial.

Amdahl’s Law (according to wikipedia):

> Amdahl’s law is often used in parallel computing to predict the theoretical speedup when using multiple processors. 
>
> For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this reason parallel computing is relevant only for a low number of processors and very parallelizable programs.


Detailed example:

```Python
def sitesize(url):
    ''' Determine the size of a website

    This is non-parallizeable:
    * UDP DNS request for the url
    * UDP DNS response
    * Acquire socket from the OS
    * TCP Connection:  SYN, ACK, SYN/ACK
    * Send HTTP Request for the root resource
    * Wait for the TCP response which is broken-up
      into packets.
    * Count the characters of the webpage

    This is a bit parallizeable:
    Do ten times in parallel (channel bonding):
        1) DNS lookup (UDP request and resp)
        1) Acquire the socket
        2) Send HTTP range requests
        3) The sections comes back in parallel
           across different pieces of fiber.
           http://stackoverflow.com/questions/8293687/sample-http-range-request-session
        4) Count the characters for a single
           block as received.
    Add up the 10 results!

    '''
    u = urllib.request.urlopen(url)
    page = u.read()
    return url, len(page)
```

### Pools of processes

```Python
import urllib.request
from multiprocessing.pool import ThreadPool as Pool
# from multiprocessing.pool import Pool

sites = [
    'https://www.yahoo.com/',
    'http://www.cnn.com',
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.perl.org',
    'http://www.cisco.com',
    'http://www.facebook.com',
    'http://www.twitter.com',
    'http://www.macrumors.com/',
    'http://arstechnica.com/',
    'http://www.reuters.com/',
    'http://abcnews.go.com/',
    'http://www.cnbc.com/',
    'http://www.cnbc.com/',
]

def sitesize(url):
    ''' Determine the size of a website '''
    with urllib.request.urlopen(url) as u:
        page = u.read()
        return url, len(page)

pool = Pool(10)
for result in pool.imap_unordered(sitesize, sites):
    print(result)
```

> __Note__: The imap_unordered is used to improve responsiveness.

> __Note__: The use of imap_unordered is made possible by designing the function to return both its argument and its result as a tuple.

Hazards of thin channel communication (SQL Version)
> __Note__: Don’t make too many trips back and forth

> __Note__: Do significant work on each trip

> __Note__: Don’t send or receive a lot of data

```Python
###########################################################################################
# Bringing too much back and not doing enough work while you are there

summary = collections.Counter()
for employee, dept, salary in c.execute('SELECT employee, dept, salary FROM Employee')
    summary[dept] += salary


###########################################################################################
# Too many trips back and forth

summary = dict()
for dept in c.execute('SELECT DISTINCT dept FROM Employee'):
    c.execute('SELECT SUM(salary) FROM Employee')
    summary[dept] = c.fetchone()[0]


###########################################################################################
# Right way is one trip with where a lot of work gets done and only a summary result in returned

summary = dict(execute('SELECT dept, SUM(salary) FROM Employee GROUPBY dept'))
```

### Performance hazards for multi-processing

```Python
###########################################################################################
# Too many trips back and forth
# If the input iterable to map is very large, it suggests you're making too many trips

def sitesize(url, start):
    req = urllib.request.Request()
    req.add_header('Range:%d-%d' % (start, start+1000))
    u = urllib.request.urlopen(url, req)
    block = u.read()
    return url, len(block)


###########################################################################################
# Not doing enough work relative to the travel time
# Once you get to a process, be sure to do enough work to make the trip worthwhile

def sitesize(url, results):
    with urllib.request.urlopen(url) as u:
        while True:
            line = u.readline()
            results.put((url, len(line)))


###########################################################################################
# Taking too much with you or bringing too much back

def sitesize(url):
    u = urllib.request.urlopen(url)
    page = u.read()
    return url, page
```

### Other Multi-processing notes
> __Note__: Never run a multi-processing example from within an IDE that runs in the same process as the code you are developing. Otherwise, the forking step will fork the IDE itself as well as your code.

> __Note__: When partitioning into subtasks, a common challenge is how to handle data at the boundaries of the partition.

> __Note__: Setting the number of processes is a bit of an art. If the code is CPU bound, the number of cores times two is a reasonable starting point. If the code is IO bound, the number of cores can be much higher. Experimentation is the key.

## Combining Threading and Forking

### General Comments
"There are those who believe that if you mix threading and forking that you are living in a state of sin and deserve whatever happens to you." - Tim Peters

### Recipe for Deadlocks
This code was submitted to the bug tracker last year in http://bugs.python.org/issue27422 :

```Python
#!/usr/bin/env python3
# coding:utf8

import sys
import multiprocessing
import subprocess
from concurrent.futures import ThreadPoolExecutor

def run(arg):
    print("starting %s" % arg)
    p = multiprocessing.Process(target=print, args=("running", arg))
    p.start()
    p.join()
    print("finished %s" % arg)


if __name__ == "__main__":
    n = 16
    tests = range(n)
    with ThreadPoolExecutor(n) as pool:
        for r in pool.map(run, tests):
            pass
```

> __Note__: The general rule is “__thread after you fork not before__”. Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock.

## Async Example
Build a performant non-blocking server from scratch, how to isolate the user’s business logic in callbacks, how to write the callback logic in-line with generators, and how to schedule timed events.

Focus on User's business logic:

```Python
import socket, time, types, select
from collections import namedtuple
from heapq import heappush, heappop

######### Reactor ####################################################################

ScheduledEvent = namedtuple('ScheduleEvent', ['event_time', 'task'])
Session = namedtuple('Session', ['address', 'file'])

events = []                   # heap with events prioritized by earliest time
sessions = {}                 # { csocket : Session(address, file)}
callback = {}                 # { csocket : callback(client, line) }
generators = {}               # { csocket : inline callback generator}

def reactor(host='localhost', port=9600):
    'Main event loop that triggers the appropriate business logic callbacks'
    s = socket.socket()
    s.bind((host, port))
    s.listen(5)
    s.setblocking(0)          # Make asynchronous.  Never wait on a client socket.
    sessions[s] = None
    print('Server up, running, and waiting for call on %s %s' % (host, port))
    try:
        while True:
            # Serve existing clients BUT only if they already have data ready
            ready_to_read, _, _ = select.select(sessions, [], [], 0.1)
            for c in ready_to_read:
                if c is s:
                    c, a = c.accept()
                    connect(c, a)
                    continue
                line = sessions[c].file.readline()
                if line:
                    callback[c](c, line.rstrip())
                else:
                    disconnect(c)

            # Run events scheduled at the appropriate event time
            while events and events[0].event_time <= time.monotonic():
                event = heappop(events)
                event.task()
    finally:
        s.close()

def connect(c, a):
    'Reactor logic for new connections'
    sessions[c] = Session(a, c.makefile())
    on_connect(c)                            # call into user's business logic

def disconnect(c):
    'Reactor logic to end sessions'
    on_disconnect(c)                         # call into user's business logic
    sessions[c].file.close()
    c.close()
    del sessions[c]
    del callback[c]

def add_task(event_time, task):
    'Helper function to schedule one-time tasks at specific time'
    heappush(events, ScheduledEvent(event_time, task))

def call_later(delay, task):
    'Helper function to schedule one-time tasks after a given delay'
    add_task(time.time() + delay, task)

def call_periodic(delay, interval, task):
    'Helper function to schedule recurring tasks'
    def inner():
        task()
        call_later(interval, inner)
    call_later(delay, inner)


def on_connect(c):
        g = nbcaser(c)          # 'g' is a coroutine
        generators[c] = g       # generators -> awaitables
        callback[c] = g.send(None)  # we do this to advance `nbcaser` coroutine
                                    # to yield through the 'readline' coroutine
                                    # which will sleep on its 'yield' expression

def on_disconnect(c):
        g = generators.pop(c)
        g.close()

@types.coroutine
def readline(c):
    'A non-blocking readline to use with two-way generators'
    def inner(c, line):
        g = generators[c]
        try:
            callback[c] = g.send(line)  # `g.send(line)` will resume the `yield inner` point
        except StopIteration:
            disconnect(c)
    line = yield inner
    return line

def sleep(c, delay):
    'A non-blocking sleep to use with two-way generators'
    def inner():
        g = generators[c]
        callback[c] = next(g)
    call_later(delay, inner)
    return lambda *args: callback[c]


######### User's Business Logic ######################################################

def announcement():
    print('The event loop is still running at:', time.ctime())

call_periodic(delay=1, interval=15, task=announcement)

async def nbcaser(c):
    upper, title = 'upper', 'title'
    mode = upper
    print("Received connection from", sessions[c].address)
    try:
        c.sendall(b'<welcome: starting in upper case mode>\n')
        while 1:
            line = await readline(c)  # Non-blocking version
            if line == 'quit':
                c.sendall(b'quit\r\n')
                return
            if mode is upper and line == 'title':
                c.sendall(b'<switching to title case mode>\r\n')
                mode = title
                continue
            if mode is title and line == 'upper':
                line = c.sendall(b'<switching to upper case mode>\r\n')
                mode = upper
                continue
            print(sessions[c].address, '-->', line)
            if mode is upper:
                c.sendall(b'Upper-cased: %a\r\n' % line.upper())
            else:
                c.sendall(b'Title-cased: %a\r\n' % line.title())
    finally:
        print(sessions[c].address, 'quit')


if __name__ == '__main__':
    reactor('localhost', 9600)
```

Credit: Yury Selivanov helped me convert this example from using yield and two-way generators to using coroutines and await.

# Async Miguel

## Comparison
### Multiple Processes
- The OS does all the multi-tasking work
- Only option for multi-core concurrency

### Multiple Threads
- The OS does all the multi-tasking work
- In CPython, the GIL prevents multi-core concurrency

### Asynchronous Programming
- No OS intervention
- One process, one thread

## Practical definition of Async
A style of concurrent programming in which tasks release the CPU during _waiting periods_, so that _other tasks_ can use it.

## How is Async Implemented?
### Suspend and Resume
- Async functions need the ability to _suspend_ and _resume_
- A function that enters a waiting period is suspended, and only resumed when the wait is over
- Four way to implement suspend/resume in Python:
    1. Callback functions (nasty)
    1. Generator functions
    1. Async/await (Python 3.5+)
    1. Greenlets (requires greenlet package)
    
### Shceduling Asnychronous Tasks
- Async frameworks need a scheduler, usually called "__Event Loop__"
- The loop keeps track of all the running tasks
- When a function is suspended, return controls to the loop, which then finds another function to start or resume
- This is called "cooperative multi-tasking"

## Examples

https://gist.github.com/miguelgrinberg/f15bc03471f610cfebeba62438435508

https://docs.python.org/3/library/asyncio-task.html#example-chain-coroutines


In [3]:
import asyncio
loop = asyncio.get_event_loop()

async def hello():
    await asyncio.sleep(3)
    print('Hello!')
    
if __name__ == '__main__':
    loop.run_until_complete(hello())

RuntimeError: This event loop is already running

Hello!
