In [13]:
import os

# Effective Python

# 1. Python thinking

- not regimented or  enforced by the compiler, but has emerged over time through experience of the community
- prefer to be explicit, choose the simple over the complex and maximize readibility
- the best way in Python is the *pythonic* way

In [1]:
# To read the Zen of Python
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### 1. Know which version of Python you're using

In [3]:
!python --version

Python 3.6.6 :: Anaconda, Inc.


In [4]:
#or
!python3 --version

Python 3.6.6 :: Anaconda, Inc.


In [5]:
import sys
print(sys.version_info)
print(sys.version)

sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]


### 2. PEP 8 style guide

- stands for Python Enhancement Proposal #8
- Can be found here: https://www.python.org/dev/peps/pep-0008/


**Whitespace:**
- use spaces instead of tabs for indentation
- use exactly 4 spaces for each level of indenting
  - long lines split into multiple should also have 4 spaces before subsequent lines
- lines should be 79 characters of less
- In a file, functions and classes should be separated by two blank lines
- in a class, methods should be separated by one blank line
- put one space before and after variable assignment (e.g. a = 8)

**Naming:**
- functions, variables,and attributes should be `lowercase_underscore` format
- [protected instance attributes](https://stackoverflow.com/questions/797771/python-protected-attributes) should be in `_leading_underscore` format
- private instance attributes should be in `__double_leading_underscore` format (e.g. passwords, usernames)
- Classes and exceptions should be in CapitalizedWord format
- Module-level constants should in in ALL_CAPS format
- Instance methods in classes should use `self` as the name of the first paramenter
- Class methods should use `cls` as the name of the first parameter

**Expressions and Statements:**
- Don't check for empty values by checking for lnegth (e.g. if len(somelist) == 0). Use `if not somelist` and assume empty values evaluate to `False`
 - same for non-empty lists (it will be `True`)
- Don't do `if/for/while` statements in a single line; spread them out over multiple lines for readibility
- all `import` statements at the top of the file
- Modules should be imported in alphabetical order


**Pylint**: 

Use pylint to analyze your code for PEP-8 compliance. 


**To look up later**:
[Can we use pylint in Jupyter notebooks?](https://stackoverflow.com/questions/50358327/using-pylint-in-ipython-jupyter-notebook)

In [6]:
!pip install pylint



### 3. Know the differentces between `bytes, str`, and `unicode`

- there are many ways to represent Unicode characters as binary data (raw 8-bit values). Most common encoding is UTF-8.
- In Python 3, sequences of characters can be represented as either `str` or `bytes`. Instances of `str` contain Unicode characters
- unicode > binary : encode
- binary > unicode : decode

You need a method that takes a str or bytes nad always returns a str:

In [7]:
def to_str(bytes_or_str):
    if instance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value # instance of str

And the other way around (takes a str or bytes and always returns bytes

In [8]:
def to_str(bytes_or_str):
    if instance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value # instance of bytes

- `str` and `bytes` are never equivalent in Python3, not even the empty strings, so be deliberate!
- A problem exists with the `open` file operation as it defaults to UTF-8 and doens't accept bytes characters:

In [16]:
with open('random.bin', 'w') as f:
    f.write(os.urandom(10))

TypeError: write() argument must be str, not bytes

In [17]:
#Fix it by saying

with open('random.bin', 'wb') as f:
    f.write(os.urandom(10))

### 4. Write Helper Functions instead of complex expressions

- when you want to let's say just filter something and you have to write multiple lines of code, it's best to use functions to capture this
- Don't use complicated one-liners that no one can read later!!!

Example:

In [22]:
from urllib.parse import parse_qs
my_values = parse_qs('red=5&blue=0&green=',
                    keep_blank_values=True)
print(repr(my_values))

{'red': ['5'], 'blue': ['0'], 'green': ['']}


Let's say we want to check for the following and get back a number: 

In [27]:
print('Red\t', my_values.get('red'))
print('Green\t', my_values.get('green'))
print('Opacity\t', my_values.get('opacity'))

Red	 ['5']
Green	 ['']
Opacity	 None


We can try again with some `or` statements:

In [32]:
red = my_values.get('red', [''])[0] or 0
green = my_values.get('green', [''])[0] or 0
opacity = my_values.get('opacity', [''])[0] or 0
print('Red\t%r' % red)
print('Green\t%r' % green)
print('Red\t%r' % opacity)

Red	'5'
Green	0
Red	0


This also doesn't work exactly, as we want to get back an integer value

In [35]:
red = int(my_values.get('red', [''])[0] or 0) # no one can read this

In [33]:
red = my_values.get('red', [''])
red = int(red[0]) if red[0] else 0

This could work for less complicated situations and if/else conditionals can make things very clear.

Best solution: **write a helper function**

In [36]:
def get_first_int(values, key, default = 0):
    found = values.get(key, [''])
    if found[0]:
        found = int(found[0])
    else:
        found = default
    return found

In [38]:
green = get_first_int(my_values, 'green')
green

0

"Don't let Python's pithy syntax for complex expressions get you into a mess like this"

### 5. Know how to slice sequences

"Slicing can be extended to any Python calss that implements the `__getitem__` and `__setitem__` special methods"

In [55]:
a = list('abcdefgh')
a

In [60]:
print('First four:', a[:4]) # you should leave out the 0 to reduce the visual noise
print('Last four:', a[-4:]) # leave out the final index because it's redundant
print('Middle two:', a[3:-3])

First four: ['a', 'b', 'c', 'd']
Last four: ['e', 'f', 'g', 'h']
Middle two: ['d', 'e']


*Note*: The result of `somelist[-0:]` will result in a copy of the original list!

- The result of slicking a list is a whole new list. Modifying the result of slicing will not affect the original list. 
- But when used in assignment (e.g. if you say `somelist[1] = 2` then it does modify the original list

In [66]:
b = a[4:]
print("Before:\t", b)
b[1] = 99
print("After:\t", b)
print("No change to a:\t", a)

Before:	 ['e', 'f', 'g', 'h']
After:	 ['e', 99, 'g', 'h']
No change to a:	 ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']


- if you leave out both start and end indexes, you're making a copy of the original

In [67]:
b = a[:]
assert b == a and b is not a

Note that this is different to this case, where b IS equal to a (it's not a copy)

In [68]:
b = a
assert b == a and b is not a

AssertionError: 

### 6. Avoid using `start`, `end` and `stride` in a single slice

In [69]:
a = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
odds = a[::2]
evens = a[1::2]
print(odds)
print(evens)

['red', 'yellow', 'blue']
['orange', 'green', 'purple']


**Issues**

1. The stride syntax (the third element) can cause unexpected behaviour that can cause bugs. For example, it doesn't work with Unicode characters encoded as UTF-8 byte strings. 

In [70]:
w = '北海道'
x = w.encode('utf-8')
y = x[::-1]
z = y.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

2. Negative strides besides -1 are not useful usually. 

The stride part of the indexing can be confusing, hard to understand later and might cause unexpected behaviours. 

If you want/have to use stride, avoid using also start and end indexes together. 

If you need to do both, try first using stride and then slicing. Slicing and then striding will create an extra shallow copy of the data so the first slicing operation must try to reduce the list by as much as possible. 

**Tip**: If you don't have enough memory, try using the `islice` method from `itertools` ([islice()](https://docs.python.org/3/library/itertools.html#itertools.islice)), [example](https://realpython.com/python-itertools/)

### 7. Use list comprehension instead of `map` and `filter`

- Unless you're applying an argument with just one function (e.g. `mean`), a list comprehension is clearer to read thant the `map` build-in function. 

In [77]:
a = list(range(1,11))
squared = [x**2 for x in a]
print(squared)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


In [78]:
squares = map(lambda x: x**2, a) # the same as a 'map'

 List comprehension also allows you to filter out only the items in the original list you are interested in

In [79]:
even_squares = [x**2 for x in a if x %2 == 0]
print(even_squares)

[4, 16, 36, 64, 100]


You can do the same for dictionaries and sets!

In [82]:
chile_ranks = {'ghost': 1, 'habanero': 2, 'cayenne': 3}
rank_dict = {rank: name for name, rank in chile_ranks.items()} # reverse the keys/values in the original dictionary
chile_len_set = {len(name) for name in rank_dict.values()} # get the anme length for each
print(rank_dict)
print(chile_len_set)

{1: 'ghost', 2: 'habanero', 3: 'cayenne'}
{8, 5, 7}


### 8. Avoid more than two expressions in List Comprehensions

In some cases, you can use two `for` arguments in a list comprehension, such as the case of *simplifying a matrix into a single list*

In [84]:
matrix = [[1,2,3], [4,5,6], [7,8,9]]
flat = [x for row in matrix for x in row]
print(flat)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


Others can get really long through that it **NOT GOOD**

In [85]:
squared = [[x**2 for x in row] for row in matrix]
print(squared)

[[1, 4, 9], [16, 25, 36], [49, 64, 81]]


or...

In [86]:
filtered = [[x for x in row if x % 3 == 0]
           for row in matrix if sum(row) >= 10]
print(filtered)

[[6], [9]]


**Bottom line**: Avoid using more than 2 expressions in one list comprehension! As soon as it gets more complex than that, write a helper function!

### 9. Consider generator expressions for large comprehensions

List comprehensions generate a whole new list, which might not be great for memory, especially with large inputs.

Example: you want to read a file and return the number of characters on each line. Doing this with a list comprehension would require holding the length of every line of the file in memory. 

To solve this, we should use **generator expressions**, which a generalization to list comprehensions and generators. Generator expressions don't return the whole output sequence when they're run, but they evaluate to an iterator that yields one item at a time from the expression. 

You can do it by basically writing list comprehensions inside `()`

In [94]:
it = (len(x) for x in open('myfile.txt'))
print(it)

<generator object <genexpr> at 0x10de9b620>


In [95]:
print(next(it))
print(next(it))

7
1


You can also use the generator object as a input into another generator!!

In [96]:
roots = ((x, x**0.5) for x in it)
print(next(roots))

(6, 2.449489742783178)


"When you're looking for a way to compose functionality that's operating on a large stream of input, generator expressions are the best tool for the job"

- Make sure to not use them more than once as they disappear after they're used up!

### 10. Prefer `enumerate` over `range`

Don't use range when the list/iterable you want to iterate over is not a actually a range of integers. For example, instead of doing this:

In [97]:
flavor_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']
for i in range(len(flavor_list)):
    flavor = flavor_list[i]
    print('%d: %s' % (i+1, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


This looks very clumsy compared to using `enumerate`:

In [98]:
for i, flavor in enumerate(flavor_list):
    print('%d: %s' % (i+1, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


In [100]:
for i, flavor in enumerate(flavor_list, 1): # you can even specify from which number it should start counting
    print('%d: %s' % (i, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


### 11. User `zip` to process iterators in parallel

When you find that you have some lists/iterables that are related and you want to work through them in parallel/together, use `zip`. 

In [112]:
names = ['Cecilia', 'Lise', 'Marie']
letters = [len(n) for n in names]
max_letters = 0

In Python3 ,`zip` wraps two or more iterators with a lazy generator. The `zip` generator yields tuples containing the next value from each iterator.

In [106]:
for name, count in zip(names, letters):
    if count > max_letters:
        longest_name = name
        max_letters = count

Problem: the zip generator behaves weirdly if the two input iterators are not of the same length. For example: 

In [113]:
names.append('Rosalind')
for name, count in zip(names, letters):
    print(name)

Cecilia
Lise
Marie


It will only yield as many items as in the shorters input. Another option is using `zip_longest` from `itertools`: [zip_longest()](https://docs.python.org/3/library/itertools.html#itertools.zip_longest)

### 12. Avoid `else` blocks after `for` and `while` Loops

In [115]:
for i in range(3):
    print('Loop {}'.format(i))
else:
    print("Else block!")

Loop 0
Loop 1
Loop 2
Else block!


Don't use this because it's counterintuitive. We think the `else` here might mean "print this block if the for loop doensn't finish", but it's actually the opposite -- it executes when the for loop finishes running. Also, using a `break` statement in a loop will skip the `else` block.

In [116]:
for i in range(3):
    print('Loop {}'.format(i))
    if i == 1:
        break
else:
    print('Else block!')

Loop 0
Loop 1


Also runs in other weird cases, such as when a While loop is initially false:

In [119]:
while False:
    print("Never runs!")
else:
    print("While Else block!")

While Else block!


### 13. Take advantage of the full `try/except/else/finally` sequence

We use `finally` when we want exceptions to propagate up (i.e. appear in the main code), but also want to quickly do some last action before going to the error. A typical example is closing a connection with an `open` file:

In [127]:
handle = open('myfile.txt')
try:
    data = handle.read() # May raise UnicodeDecodeError
finally:
    handle.close() # this will always run, even if an error is raised

`else` blocks are to be used to separate some final action, and they help minimize the amount of code in `try` blocks and to visually distinguish the success case from the `try/except` blocks. They can also just perform "additional actions" before executing some final cleanup code (in the `finally` section). For example:

In [132]:
import json
def load_json_key(data, key):
    try:
        result_dict = json.loads(data) # may raise ValueError
    except ValueError as e:
        raise KeyError from e
    else:
        return result_dict[key]  # may raise KeyError, which will propagate up, unlike the one above!
    

In [131]:
load_json_key(data, 'alpha')

NameError: name 'JSONDecodeError' is not defined

This overall makes the exception propagation behavior clear.

Example of where you might use all of them at once:

In [134]:
UNDEFINED = object()

def divide_json(path):
    handle = open(path, 'r+') # may raise IOError
    try:
        data = handle.read()  # may raise UnicodeDecodeError
        op = json.loads(data) # may raise ValueError
        value = (
            op['numerator']/
            op['denominator'])# may raise ZeroDivisionError
    except ZeroDivisionError as e:
        return UNDEFINED
    else:
        op['result'] = value
        result = json.dumps(op)
        handle.seek(0)
        handle.write(results) # this may raise IOError
        return value
    finally:
        handle.close()        # always runs

## 2. Functions

### 14. Prefer Exceptions to returning `None`
- Functions that return `None` instead of raising an error are error-prone, as multiple things can usually make the function return `None`.
- Raise exceptions/errors to indicate special situations instead of returning `None`. Also document stuff!

In [1]:
def divide(a, b,):
    try:
        return a / b
    except ZeroDivisionError as e:
        raise ValueError('Invalid inputs') from e

In [2]:
x, y = 5, 0
try:
    result = divide(x,y)
except:
    print("Invalid inputs")
else:
    print("Result is %.1f" % result)

Invalid inputs


### 15. Know how closures interact with variable scope

This is all about what happens if you have a `def` inside another function definition. For example: 

In [6]:
def sort_priority(values, group):
    found = False          # Scope: 'sort_priority'
    def helper(x):
        if x in group:
            found = True   # Scope: 'helper' -- BAD!
            return (0, x)
        return (1,x)
    values.sort(key = helper)
    return found

In [8]:
numbers = [8,3,1,2,5,4,7,6]
group = {2,3,5,7}
found = sort_priority(numbers, group)
print("Found:", found) # this makes no sense, because we DID find some of the numbers 
print(numbers)

Found: False
[2, 3, 5, 7, 1, 4, 6, 8]


The sorted results are correct, bu the found result is wrong. 

When you reference a variable in an expression, the Python interpreter looks for it in this order:
1. Current function's scope
2. Any enclosing scope (like other function enclosing the one we're in)
3. Scope of the module (i.e. the .py file, or the _global scope_)
4. The built-in scope, i.e. predefined Python variables (like `list`)


This problem is also called *The scoping bug*

**Fix**
You can use the `nonlocal` statement to indicate that the scope of a variable should be extended to the enclosing function (not to the global scope of the module though!)

In [11]:
def sort_priority2(numbers, group):
    found = False
    def helper(x):
        nonlocal found   #This is where you declare this variable as non-local
        if x in group:
            found = True
            return(0,x)
        return(1,x)
    numbers.sort(key = helper)
    return found

In [13]:
found = sort_priority2(numbers, group)
print("Found:", found) # now this works!
print(numbers)

Found: True
[2, 3, 5, 7, 1, 4, 6, 8]


A much better way to do this however (beyond very simple helper functions) would be with a class:

In [14]:
class Sorter(object):
    def __init__(self, group):
        self.group = group
        self.found = False
        
    def __call__(self, x):
        if x in self.group:
            self.found = True
            return (0,x)
        return (1,x)

In [18]:
sorter = Sorter(group)
numbers.sort(key = sorter)
assert sorter.found is True

#### 16. Consider generators instead of returning lists

Generators are functions that use `yield` expressions. When called, generator functions don't actually run but immediately return an iterator (so you can do next, next, etc.)

Two problems: 
- Code is clunky and dense
- You have to hold full list in memory

For example, **instead of this:**

In [19]:
def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index +1)
    return result

In [None]:
address = 'Four score and seven years ago...'

In [31]:
%%timeit
result = index_words(address)

2.92 µs ± 78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [32]:
print(result)

[0, 5, 11, 15, 21, 27]


A better way to do this is to write a generator function, and then wrap it into a list comprehension:

In [24]:
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index +1     # do this instead of appending! Notice there are no appends here

In [30]:
%%timeit
result = list(index_words_iter(address))

3.07 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [33]:
print(result)

[0, 5, 11, 15, 21, 27]


Holding all results in the list before being returned. Example of function that iterates and yields one line at a time from a text file:

In [40]:
def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset

In [43]:
with open('address.txt', 'r') as f:
    it = index_file(f)
    results = islice(it, 0, 10)
    print(list(results))

[0, 6, 14, 19, 24, 29, 34, 42, 48, 52]


The only caveats is that these generators cannot be reused!

#### 17. Be defensive when iterating over arguments

