#**Function Utilities**



## Lambda
Lamda is ideal for callback in which the function is passed as argument to another function 

In [None]:
def square_fn(x):
  return x * x

square_ld = lambda x : x * x

**NOTE**: You shouldn't use lambda to define the function that could be called multiple times

## Map
map(function, iterater) applies function across every item in iterater, then returns a map object

In [None]:
# 1st option
nums = [1, 2, 3, 4, 5]
nums_square = [x * x for x in nums]

# 2nd option
nums_square_1 = map(square_fn, nums)
nums_square_2 = map(square_ld, nums)
print(list(nums_square_1))

[1, 4, 9, 16, 25]


map can be alse used to apply function across items in a multiple of iteraters

In [None]:
a, b = 3, -0.3
xs = [2, 3, 4, 5]
labels = [6.4, 8.9, 10.9, 15.3]

diffs = map(lambda x, y : (x * a + b - y) ** 2, xs, labels)
result = sum(diffs) ** 0.5 / len(xs)
print(result)

0.3092329219213246


## Filter
filter applies a bool function across items in iterator like map, except that it returns a map that only contains items for which function returns True. 

In [None]:
errors = [0.1, 0.5, 0.6, 0.7]
bad_preds = filter(lambda x : x > 0.5, errors)
print(list(bad_preds))

[0.6, 0.7]


## Reduce
reduce is used to apply an operator aggregatedly across all item. 

In [None]:
from functools import reduce
nums = [2, 3, 4, 3]
prod = reduce(lambda x, y : x * y, nums)
print(prod)

72


## Functions on integer list

In [None]:
#sort
a = {'cream': 5, 'cake': 2, 'hotdog': 7}
a = sorted(a.items(), key = lambda i : i[1], reverse = True)
a

[('hotdog', 7), ('cream', 5), ('cake', 2)]

In [None]:
#sum
a = [2, 4, 5, 3]
s = sum(a)
s

14

#**Ternary Operator**

A shorter way to express condition

In [None]:
is_rain = True
path = "wet" if is_rain else "dry"
path = ("wet", "dry")[is_rain]
path

'dry'

#**List Manipulation**
Include: 
* unpacking
* slicing
* replace
* flatten
* generator
* zip




In [None]:
#unpacking
elems = [1, 2, 3, 4]
a, b, c, d = elems
e, *s_elems, f = elems

In [None]:
#slicing to reverse
elems = [1, 2, 3, 4]
print(elems[::-1])

# slicing to delete
del elems[::2]

In [None]:
#replace
elems = range(10)
elems[1:2] = [10, 20, 30]
#insert
elems[1:1] = [10, 20, 30]

In [None]:
#flatten list using sum
list_of_lists = [[1], [2, 3], [4, 5, 6]]
sum(list_of_lists, [])

#flatten list recursively
nested_lists = [[1, 2], [[3, 4], [5, 6], [[7, 8], [9, 10], [[11, [12, 13]]]]]]
flatten = lambda x : [y for l in x for y in flatten(l)] if type(x) is list else [x]
flatten(nested_lists)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

NOTE: use generator instead of list when you need to deal with a large amount of data

In [None]:
#list
tokens = ['i', 'want', 'to', 'go', 'to', 'school']
def ngrams(tokens, n):
  length = len(tokens)
  grams = []
  for i in range(0, length - n + 1):
    grams.append(tokens[i:i+n])
  return grams

ngrams(tokens, 3)

[['i', 'want', 'to'],
 ['want', 'to', 'go'],
 ['to', 'go', 'to'],
 ['go', 'to', 'school']]

In [None]:
#generator
def ngrams(tokens, n):
  length = len(tokens)
  for i in range(0, length - n + 1):
    yield tokens[i:i+n]

ngrams_generator = ngrams(tokens, 3)
for ngram in ngrams_generator:
  print(ngram)

['i', 'want', 'to']
['want', 'to', 'go']
['to', 'go', 'to']
['go', 'to', 'school']


In [None]:
#zip 
def ngrams(tokens, n):
  length = len(tokens)
  slices = (tokens[i:length-n+i+1] for i in range(n))
  return zip(*slices)

ngrams_generator = ngrams(tokens, 3)
print(ngrams_generator) # zip objects are generators
for ngram in ngrams_generator:
    print(ngram)

<zip object at 0x7f90231ad050>
('i', 'want', 'to')
('want', 'to', 'go')
('to', 'go', 'to')
('go', 'to', 'school')


In [None]:
# unzip a list
student_list = [('tham', 10), ('thu', 20)]
student_names, student_age = list(zip(*student_list))
print(student_names)
print(student_age)

('tham', 'thu')
(10, 20)


**NOTE**: Chú ý cách sử dụng zip; *: unpacking

zip(ite_1, ite_2,...): imagine it like transposing the matrix that has ite_s as row vectors. It returns a matrix-like iterator of which items are row vectors and collumn vectors are ite in original matrix

# **collections & arraylike**

## `namedtuple`

In [None]:
from collections import namedtuple
Billionaire = namedtuple("Billionaire", ("name", "asset", "hobby"))

bobby = Billionaire("Bobby", "1e11 $", "earning money")
print(bobby.asset)

1e11 $


**NOTE**: *When should tuple be used over list?*

The key difference between tuple and list is that lists are mutable objects while tuples are immutable.

Because of that, for list object, Python allocate an extra memory block in case it is appended more items. Whereas, Python allocates just an adequate memory block for tuple.

-> Therefore, tuples are **more memory efficient than** lists. Furthermore, looking up in tuple is slightly faster than in list.

Ref: https://towardsdatascience.com/python-tuples-when-to-use-them-over-lists-75e443f9dcd7

The relation between *list and tuple* is similar to relation between *set and frozenset*

In [None]:
import sys

a_list = [1, 2, 3, 4]
a_tuple = (1, 2, 3, 4)

print(sys.getsizeof(a_list))
print(sys.getsizeof(a_tuple))

104
88


## define an object as an iterator

Some available iterable objects in Python are list, set, dictionary, ...

You can make a class become an iterator by defining the behaviour for 2 magic methods: `__next__` and `__iter__`

The main advantage of iterator is **lazy evaluation**. Lazy evaluation allows you to access dataset while it is being computed.

Ref: https://www.freecodecamp.org/news/how-and-why-you-should-use-python-generators-f6fb56650888/

In [None]:
def check_prime(number):
  for divisor in range(2, int(number**0.5) + 1):
    if number % divisor == 0:
      return False
  
  return True

class Primes: 
  def __init__(self, max):
    self.max = max
    self.number = 1

  def __iter__(self):
    return self

  def __next__(self):
    self.number += 1
    if self.number >= self.max:
      raise StopIteration
    elif check_prime(self.number):
      return self.number
    else:
      return self.__next__()
  
primes = Primes(10000000000000)
for x in primes:
  print(x)

#**Classes and Magic methods**

## Overview
Magic methods are prefixed and suffixed by the double underscore __, so called dunder.

In [None]:
class Node:
  __slots__ = ('value', 'left', 'right')
  def __init__(self, value, left = None, right = None):
    self.value = value
    self.left = left
    self.right = right
  
  def __repr__(self):
    strings = [f'value: {self.value}']
    strings.append(f'left: {self.left.value}' if self.left else 'left: None')
    strings.append(f'right: {self.right.value}' if self.right else 'right: None')
    return ', '.join(strings)
  def __eq__(self, other):
    return self.value == other.value

  def __lt__(self, other):
    return self.value < other.value

  def __ge__(self, other):
    return self.value >= other.value

left = Node(4)
root = Node(5, left)
print(left == root)
print(left < root)
print(left >= root)

False
True
False


## `__init__` and `__call__`

`__init__`: is used to define constructor for initializing instance

`__call__`: is called when the instance is called to do something

In [None]:
import time
class Alarm:
  def __init__(self, sound):
    self._sound = sound

  def __call__(self, count):
    time.sleep(count)
    for i in range(6): 
      print(self._sound.upper())
      time.sleep(1)

myalarm = Alarm("TING")
myalarm(2)
  

TING
TING
TING
TING
TING
TING


#**Object Attributes**

A brief way to declare the attribute in init. 

locals() function is used to get all variables defined in local namespace. 

`__dict__`() funtion returns a dictionary of object's attributes

In [None]:
class Family:
  def __init__(self, number = 5, location = "countryside"):
    params = locals()
    del params['self']
    self.__dict__ = params

fam = Family()
fam.__dict__

{'location': 'countryside', 'number': 5}

`**kwargs`: for arbitrary list of attributes

In [None]:
class Man:
  def __init__(self, **kwargs):
    self.__dict__ = kwargs

man = Man(weight=54, country = "VN", job = "singer")
man.__dict__

{'country': 'VN', 'job': 'singer', 'weight': 54}

NOTE: Tìm hiểu thêm về __slot__ : https://stackoverflow.com/questions/472000/usage-of-slots/28059785#28059785 

Sometimes, we want to examine for attributes/ methods of objects at runtime. In such case, introspection functions can help use. Some of them are `dir`, `type`, `id`

In [None]:
b = {'a': 1}
dir(b) # return list of attributes and methods of object
type(b) # return type of object
id(b) # return id of object

dict

Some functions are associated with class attribute:

`getattr(self, key)`: similar to `self.key`

`setattr(self, key, val)`: similar to `self.key = val`

`property(fget, fset, doc)`: returns a property object (aka attribute) that has fget as getter, fset as setter and doc as docstring.

`hasattr(self, key)`: check if key is an attribute of object

#**Wildcard Import**

Suppose in file.py, you import as follow:

`from parts import *` 

And below is your parts.py
```
  import numpy
  import tensorflow
  class Encoder:
      ...
  class Decoder:
      ...
  class Loss:
      ...
  def helper(*args, **kwargs):
      ...
  def utils(*args, **kwargs):
      ...
```
In this case, file.py will import Encoder, Decoder, Loss , helper, utils as well as  numpy and tensorflow.

If you want that part will only export Encoder, Decoder, Loss for other sides to use, you need to specify that using `__all__` keyword

```
__all__ = ['Encoder', 'Decoder', 'Loss']
import numpy
import tensorflow
class Encoder:
    ...
```

I recommend you use it as a habit

#**Decorator**

Decorator is a function usually combined with other function. Consider a decorated function as an activity you want to do(e.g buy cloth, buy food), decorator acts as a necessary procedure go with that activity (e.g. check if you can afford).

## Defining Decorator

In [None]:
# 1st example : time decorator
import time
def timeit(fn): 
    # *args and **kwargs are to support positional and named arguments of fn
    def get_time(*args, **kwargs): 
        start = time.time() 
        output = fn(*args, **kwargs)
        print(f"Time taken in {fn.__name__}: {time.time() - start:.7f}")
        return output  # make sure that the decorator returns the output of fn
    return get_time

import functools

@functools.lru_cache()
def fib_helper(n):
    if n < 2:
        return n
    return fib_helper(n - 1) + fib_helper(n - 2)

@timeit
def fib(n):
    """ fib is a wrapper function so that later we can change its behavior
    at the top level without affecting the behavior at every recursion step.
    """
    return fib_helper(n)

In [None]:
# 2nd example : log decorator
from functools import wraps
def logit(func):
  @wraps(func)
  def log_decorator(*args, **kwargs):
    print(func.__name__, "was called")
    return func(*args, **kwargs)
  return log_decorator

@logit
def hello(name = "thu"):
  """
    chao bang tieng viet
  """
  print("chao", name)

hello(name= "heo")

# see change if you remove @wraps
print(hello.__name__)
print(hello.__doc__)

hello was called
chao heo
hello

    chao bang tieng viet
  


In [None]:
# 3rd example: add argument to decorator
def logit(logfile = 'out.log'):
  def log_decorator(func):
    @wraps(func)
    def log_wrapper(*args, **kwargs):
      log_string = func.__name__ + " was called"
      print()
      # Open logfile
      with open(logfile, 'a') as f:
        f.write(log_string + '\n')
      return func
    return log_wrapper
  return log_decorator

@logit(logfile='func2.log')
def myfunc2():
    pass

Decorator can also be a class, in additional to a function

## Built-in Decorators

`@property`: define getter

`@<name_attribute>.setter`: define setter

`@staticmethod`: static method


#**Caching**

NOTE: Tìm hiểu thêm về cache: https://docs.python.org/3/library/functools.html

Memoization is a form of cache.

store all previous results of function by their set of arguments.

In [None]:
import functools
@functools.lru_cache()
def fib_helper(n):
    if n < 2:
        return n
    return fib_helper(n - 1) + fib_helper(n - 2)

@timeit
def fib(n):
    """ fib is a wrapper function so that later we can change its behavior
    at the top level without affecting the behavior at every recursion step.
    """
    return fib_helper(n)

In [None]:
fib(50)
fib(50)

Time taken in fib: 0.0000601
Time taken in fib: 0.0000026


12586269025

In [None]:
fib(60)

Time taken in fib: 0.0000041


1548008755920

You can define cache decorator by yourself (learn from sumy).

NOTE: It is commonplace in text processing to **cache the attribute** of an object. It helps avoid computing multiple times.

In [None]:
stopwords = ["is", "the"]

def cached_property(getter):
  @wraps(getter)
  def decorator(self):
    key = "_cached_property_" + getter.__name__
    if not hasattr(self, key):
      setattr(self, key, getter(self))

    return getattr(self, key)
  return property(decorator)

class Document:
  def __init__(self, words):
    self._words = words

  @cached_property
  def words(self):
    print("Hello")
    return tuple(w for w in self._words if w not in stopwords)

text = "a b c d e e f d f d d" * 100
doc = Document(text.split())
print(doc.words)
print(doc.words)
print(doc.words)

Hello
('a', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e', 'e', 'f', 'd', 'f', 'd', 'da', 'b', 'c', 'd', 'e',

#**Magic variables**


They are usually used when making **function decorators** or **in monkey patching**

In [None]:
# *args
def test_args(*args):
  for arg in args:
    print(arg)

test_args(1, 2, 3, 4, 5)

1
2
3
4
5


In [None]:
# **kwargs (keyword args)
def test_kwargs(**kwargs):
  for key, val in kwargs.items():
    print(key, val)

test_kwargs(a = 1, b=2)
dict_args = {'a': 1, 'b' : 2}
test_kwargs(**dict_args)

a 1
b 2
a 1
b 2


#**Debugging**

Using `pdb.set_trace()` to set breakpoint
Some useful command you can use to controll debugger:
- `c`: continue execution
- `w`: shows context
- `a`: print arg list
- `n`: debug next line 

In [None]:
import pdb

def make_bread():
    pdb.set_trace()
    print("1")
    print("2")
    return "I don't have time"

print(make_bread())

# **Comment in Python**

## Doc string

In [None]:
def merge(sent_1, sent_2):
  """
  append 2nd sentence to 1st sentence

  :param sent_1, sent_2
    string
  :returns string
    string got by appending
  """
  return sent_1 + sent_2

# of course, you shouldn't write doc for such simple function like above

## Comment utilities in PyCharm & IntelliJ

\# TODO: add exception handling

Besides, you can use some unsupported tags like:

\# SEE: link references

\# NOTE: this has bug

...

# **Exception Handling**

The process of exception handling happens as follow:

- First, python saves the state of execution at the moment the error occurs

- Execute a piece of code known as exception handler

- Continue normal flow with previously saved data

**TODO**: `try`...`except`...`else`...`finally`

In [4]:
#use 'raise' to define custom error exception - simple
def compute_age(birth_year, cur_year):
  if (birth_year > cur_year):
    raise ValueError("Current year should larger than birth year")
  return cur_year - birth_year

compute_age(2021, 2008)

ValueError: ignored

You can use `assert <condition>` to indicate the condition that state must satisfy. If it satisfies, continue to execute the code. Else, raise `AssertionError` 

In [2]:
def compare_length(str1, str2):
  assert len(str1) != len(str2)
  return len(str1) > len(str2)

compare_length("aa", "bb")

AssertionError: ignored

#**Tips**
1. Advoid using `global` keyword
2. Should use `__slots__` in the class that will be instanciated a lot of times (i.e hundreds, thousands)
Detail: https://stackoverflow.com/questions/472000/usage-of-slots 