# Pythonic thinking

In [10]:
# Install pycodestyle for PEP8 to work
# add a global setting in pycodestyle config to ignore the no new line rule

# pip install pycodestyle
# pip install pycodestyle_magic
%load_ext pycodestyle_magic

The pycodestyle_magic extension is already loaded. To reload it, use:
  %reload_ext pycodestyle_magic


## Item 2: PEP8

Whitespace:
- four spaces in a tab
- four spaces for syntactic indenting
- 79 chars lines
- long expressions indented by four extra spaces on the next line
- in a file, functions and classes separated by two blank lines
- in a class, methods separated by one blank line
- no spaces around: list indexes, function calls, keyword argument assignments
- one space before and after variable assignments

Naming:
- `lowercase_underscore` - functions, variables, attributes
- `_leading_underscore` - protected instance (accessible within the class and sub-classes)
- `__double_leading_underscore` - private instance (accessible within the class)
- `CapitalizedWord` - classes and exceptions
- `ALL_CAPS` - module-level constants
- `self` - instance methods in classes
- `cls` - class methods

Expressions and statements:
- inline negation `(if a is not b)`
- don't check empty by checking length, use `if not <item>` (empty values implicitly evaluate to `False`)
- don't check non-empty by checking length, use `if <item>`
- avoid single-line `if`, `for`, `while`, and `except` - multiple lines for clarity
- put `import` at the top of the file
- use absolute names for importing modules, e.g. `from bar import foo`
- for relative imports use explicit syntax, e.g. `from . import foo`
- imports: standard library modules, third-party modules, your own modules; each subsection alphabetical

## Item 3: `bytes`, `str`, and `unicode`

Python 3:
- `bytes` - raw 8-bit values, binary serialization format represented by a sequence of 8-bit integers, good for storing data and sending it to other applications, can only use ASCII literal characters
- `str` - Unicode characters, no binary encoding associated (represented internally as a sequence of Unicode codepoints), default type when creating a string

Python 2:
- `unicode` - Unicode characters, no binary encoding associated
- `str` - raw 8-bit values

Represent Unicode characters as binary:
- most common encoding is `UTF-8`
- Unicode -> binary - `encode`
- binary -> Unicode - `decode`

Encode and decode at the furthest boundary of interfaces - the core should use Unicode (`str` in Python 3 and `unicode` in Python 2), not assume a specific character encoding. Accepting of alternative text encodings, strict about output encoding (ideally `UTF-8`).

Two most commmon cases:
- operate on raw 8-bit values with `UTF-8` encoded characters (or a different encoding)
- operate on Unicode characers that have no specific encoding

It's useful to have helper methods that convert input to `bytes` and `str` (Python 3, Python 2 would need `unicode` and `str`).

In [26]:
%%pycodestyle


def to_str(bytes_or_str):
    """
    @input - bytes or str
    @output - str
    """
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value


def to_bytes(bytes_or_str):
    """
    @input - bytes or str
    @output - bytes
    """
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

There are gotchas when dealing with raw 8-bit values and Unicode.

1. In Python 2, `unicode` and `str` seem to be the same type when a `str` only contains 7-bit ASCII characters:
    - you can combine them together using `+`
    - you can compare them using equality and inequality operators
    - you can use `unicode` in format strings like '%s'


2. In Python 3, `bytes` and `str` instances are never equivalent (not even when empty), so you must be more deliberate about types. You can't combine them or compare them. They're not friends. They don't mix.


3. Operations involving file handles (`open`) default to `UTF-8` encoding in Python 3, binary encoding in Python 2.
    - in Python 2, `with open(file, 'w')` will work for binary data, would fail in Python 3
    - in Python 3, you need to open in write binary mode instead `with open(file, 'wb')`
    - same with reading - Python 2 uses `r` for binary, Python 3 requires `rb` mode

In [30]:
print(type('Kitten!'))
print(type(b'Kitten!'))

<class 'str'>
<class 'bytes'>


In [36]:
string = '$%@#'

# print bytes
encoded = string.encode('utf-8')
print(encoded)

# print str
encoded.decode()

b'$%@#'


'$%@#'

### But why?
There is no way to determine what type of encoding byte strings are. Technically, at the lowest level, everything is made of bytes, but to be practically usable in applications we need to know the encoding. And using Unicode strings helps us preserve this information.

Sources:
- https://timothybramlett.com/Strings_Bytes_and_Unicode_in_Python_2_and_3.html
- https://medium.com/better-programming/strings-unicode-and-bytes-in-python-3-everything-you-always-wanted-to-know-27dc02ff2686

### The Unicode sandwich
As mentioned before, you want `bytes` on input and output, but manipulate `str` in your application.

```
------------------
   bytes (input)    <-- data from the outside world
------------------
     decode()
------------------
str (manipulation)  <-- access to many useful string manipulation libraries and methods
------------------
     encode()
------------------
  bytes (output)    <-- send it back to the outside world
------------------
```

Fun talk about Unicode: https://nedbatchelder.com/text/unipain.html

## Item 4: Helper functions
Python is pretty expressive, which may encourage you to write single-line expressions with a lot of logic. But readability outweights brevity.

In [52]:
from urllib.parse import parse_qs

In [76]:
my_values = parse_qs('red=5&blue=0&green=', keep_blank_values=True)
print(repr(my_values))

print('\nRed:     ', my_values.get('red')) 
print('Green:   ', my_values.get('green'))
print('Opacity: ', my_values.get('opacity'))

red = my_values.get('red', [''])[0] or 0
green = my_values.get('green', [''])[0] or 0
opacity = my_values.get('opacity', [''])[0] or 0

print('\nRed:     %r' % red) 
print('Green:   %r' % green)
print('Opacity: %r' % opacity)

red = int(my_values.get('red', [''])[0] or 0)
green = int(my_values.get('green', [''])[0] or 0)
opacity = int(my_values.get('opacity', [''])[0] or 0)

print('\nRed:     %r' % red) 
print('Green:   %r' % green)
print('Opacity: %r' % opacity)

# ternary operator is a bit clearer, but not too much of an improvement
red = my_values.get('red', ['']) 
red = int(red[0]) if red[0] else 0
green = my_values.get('green', ['']) 
green = int(green[0]) if green[0] else 0
opacity = my_values.get('opacity', [''])
opacity = int(opacity[0]) if opacity[0] else 0

print('\nRed:     %r' % red) 
print('Green:   %r' % green)
print('Opacity: %r' % opacity)

{'red': ['5'], 'blue': ['0'], 'green': ['']}

Red:      ['5']
Green:    ['']
Opacity:  None

Red:     '5'
Green:   0
Opacity: 0

Red:     5
Green:   0
Opacity: 0

Red:     5
Green:   0
Opacity: 0


In [72]:
def get_first_int(values, key, default=0):
    found = values.get(key, [''])
    if found[0]:
        found = int(found[0])
    else:
        found = default
    return found


get_first_int(my_values, 'green')

0

## Item 5: Slicing sequences
Slicing gets a subset of a sequence with minimal effort, built-in for `list`, `str`, and `bytes`. Can be extended to any Python class with `__getitem__` and `__setitem__`.

Basic syntax: `somelist[start:end]` (start inclusive, end exclusive)

In [125]:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

print('First four                  ', letters[:4])
print('Last four                                       ', letters[-4:])
print('Middle two                                 ', letters[3:-3])

# Would throw AssertionError if not correct
assert letters[:5] == letters[0:5]
assert letters[5:] == letters[5:len(letters)]

print("All                         ", letters[:])
print("First five                  ", letters[:5])
print("All but last                ", letters[:-1])
print("Everything after first four                     ", letters[4:])
print("Last three                                           ", letters[-3:])
print("Third to fifth                        ", letters[2:5])
print("Third to second last                  ", letters[2:-1])
print("Third last to second last                            ",letters[-3:-1])

# calling letters[20] directly causes an IndexError
# but slicing out of bonds is fine
print("First twenty items          ", letters[:20])
print("Last twenty items           ", letters[-20:])

# copy of the original list
letters[-0:]

First four                   ['a', 'b', 'c', 'd']
Last four                                        ['e', 'f', 'g', 'h']
Middle two                                  ['d', 'e']
All                          ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
First five                   ['a', 'b', 'c', 'd', 'e']
All but last                 ['a', 'b', 'c', 'd', 'e', 'f', 'g']
Everything after first four                      ['e', 'f', 'g', 'h']
Last three                                            ['f', 'g', 'h']
Third to fifth                         ['c', 'd', 'e']
Third to second last                   ['c', 'd', 'e', 'f', 'g']
Third last to second last                             ['f', 'g']
First twenty items           ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Last twenty items            ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']


['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

Slicing a list creates a new list. References to objects from the original list are maintained, but a change in the result of slicing won't affect the original list.

In [115]:
letters_two = letters[4:]
print("Before:                         ", letters_two)
letters_two[1] = 'kitten'
print("After:                          ", letters_two)
print("No change:  ", letters)

Before:                          ['e', 'f', 'g', 'h']
After:                           ['e', 'kitten', 'g', 'h']
No change:   ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']


When using assignemts, slices will replace the specified range. They don't need to be the same size (unlike tuple assignments `a, b = c[:2]`). Values before and after the slice will be preserved and the list will grow or shrink.

In [119]:
print("Before ", letters)
letters[2:7] = ['tiny', 'lil', 'kitten']
print("After  ", letters)

Before  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
After   ['a', 'b', 'tiny', 'lil', 'kitten', 'h']


Leaving both start and end empty will make a copy of the original list.

In [123]:
a = [1, 2]
b = a[:]

assert b == a and b is not a

Assigning a slice with no start or end will replace the entire contents with a copy of what's referenced (instead of a new list)

In [124]:
b = a
print("Before a ", a)
print("Before b ", b)
a[:] = ['tiny', 'lil', 'kitten']
assert a is b
print("After a  ", a)
print("After b  ", b)

Before a  [1, 2]
Before b  [1, 2]
After a   ['tiny', 'lil', 'kitten']
After b   ['tiny', 'lil', 'kitten']


Questions:
- what does it mean to contain a raw 8-bit value?