# setup

first, we need setup vscode and python environment to run this notebook.

## On local PC

* we need install miniconda and setup a virtual environment for this learning activities.
  * `conda create --name=learn python=3.13`
* then install `ipykernel`
  * `which pip`
  * `pip install ipykernel`

## On VS Code

* we need `python` and `jupyter` extension from microsoft.
* Go to the Extensions view by clicking on the icon on the sidebar or pressing Ctrl+Shift+X (Windows/Linux) or Cmd+Shift+X (macOS).
* Search for and install:
  * "Jupyter" extension by Microsoft. This enables Jupyter notebook integration.
  * "Python" extension by Microsoft for enhanced Python support.
* Set Up Your Environment:
  * Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P).
  * Type "Configure Python" and select "Python: Set Default Interpreter Path."
  * Choose the location of your Python interpreter or specify it manually.
* Create a New Jupyter Notebook:
  * create a new file with a .ipynb extension.
  * You can start coding and run cells using Ctrl+Enter to execute the current cell or Shift+Enter to run and move to the next cell.

# Generators

* Any function with a `yield` is a generator
  *  The `yield` keyword pauses the current execution by saving its states and then resumes from the same when required.
* Generators are memory efficient as they generate items on-the-fly

In [36]:
def fibonacci():
    print("init a and b")
    a, b = 0, 1

    while True:
        print("before yield")
        yield a
        print("after yield, updating a and b")
        a, b = b, a + b
        print("a and b updated. loop again")

fib = fibonacci()
print(type(fib))
for _ in range(4):
    print("calling fib")
    print(next(fib))
    print("done this iteration\n")

<class 'generator'>
calling fib
init a and b
before yield
0
done this iteration

calling fib
after yield, updating a and b
a and b updated. loop again
before yield
1
done this iteration

calling fib
after yield, updating a and b
a and b updated. loop again
before yield
1
done this iteration

calling fib
after yield, updating a and b
a and b updated. loop again
before yield
2
done this iteration



In [34]:
def read_lines(file_path):
    with open(file_path, 'r') as file:
        count = 0
        chunk = []
        for line in file:
            print("# yielding a single line")
            yield line.strip()

            chunk.append(line)
            count += 1
            if count == 2:
                print("# previous yielded two lines in chunk, yielding the chunk")
                yield ''.join(chunk).strip()
                count = 0
                chunk = []

                yield "inserted line for fun"

# Usage:
print("example of using generator to read file line by line")
print()
for line in read_lines('./conn.py'):
    print(line)


example of using generator to read file line by line

# yielding a single line
import paramiko
# yielding a single line
import time
# previous yielded two lines in chunk, yielding the chunk
import paramiko
import time
inserted line for fun
# yielding a single line
import random
# yielding a single line
from retrying import retry
# previous yielded two lines in chunk, yielding the chunk
import random
from retrying import retry
inserted line for fun
# yielding a single line
from timeout_decorator import timeout
# yielding a single line

# previous yielded two lines in chunk, yielding the chunk
from timeout_decorator import timeout
inserted line for fun
# yielding a single line
# SSH credentials
# yielding a single line
HOST = "192.168.4.64"
# previous yielded two lines in chunk, yielding the chunk
# SSH credentials
HOST = "192.168.4.64"
inserted line for fun
# yielding a single line
USERNAME = "xzhao2"
# yielding a single line
PASSWORD = "clover"
# previous yielded two lines in chunk, yi

# Scope

ok, let's continue. Python is a pass-by-reference type of language, means the arguments can be altered inside the function because they are basically references to the objects.

* Mutable objects (like lists or dictionaries) can be modified within the function.
* Immutable objects (like integers or strings) cannot be changed and reassigning them inside the function doesn’t affect the original object.

In [15]:
def fun(a, b):
    a += ", world"
    # here `a` is `a` local variable, a totally different object than global variable `a`
    b.append(", world")
    # here b is a reference to the original list
    print(a)

a = "hello"
b = ["hello"]
fun(a, b)
print(a)
print(b)


hello, world
hello
['hello', ', world']


# List

let's try some data structures like set

In [None]:
my_list = [6, 1, 2, 3, 2, 4, 5, 1]
print(my_list)
my_set = set(my_list)
print(my_set)
print(list(my_set))

# deletions
del my_list[6] # remove the element at index 6
print(my_list)

my_list.pop(3) # remove the element at index 3
print(my_list)

my_list.remove(1) # remove the first occurrence of value 1
print(my_list)

del my_list[:] # remove all elements
print(my_list)

[6, 1, 2, 3, 2, 4, 5, 1]
{1, 2, 3, 4, 5, 6}
[1, 2, 3, 4, 5, 6]
[6, 1, 2, 3, 2, 4, 1]
[6, 1, 2, 2, 4, 1]
[6, 2, 2, 4, 1]
[]


# Dictionary

let's research on dictionary, which is a hash table basically and using linked-list (separate chaining) to resolve the collision.

In [7]:
# Simple example of separate chaining using lists as buckets
class HashTable:
    def __init__(self):
        self.buckets = [[] for _ in range(5)]
    
    def get_bucket(self, key_hash):
        return self.buckets[key_hash % len(self.buckets)]

    def insert(self, key, value):
        bucket = self.get_bucket(hash(key))
        h = hash(key) % len(self.buckets)
        print(f"key: {key}, hash: {h}, bucket: {bucket}")
        # Search the bucket for the key
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return
        # If not found, append to the bucket
        bucket.append((key, value))

    def delete(self, key):
        bucket = self.get_bucket(hash(key))
        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                return
        raise KeyError(key)

    def get(self, key):
        bucket = self.get_bucket(hash(key))
        for k, v in bucket:
            if k == key:
                return v
        raise KeyError(key)

# Create a hash table and insert some values
ht = HashTable()
ht.insert("apple", 1)
ht.insert("banana", 2)
ht.insert("apply", 3)
ht.insert("apple", 5)

try:
    print(ht.get("apple"))   # Output: 1
    print(ht.get("banana"))  # Output: 2
    print(ht.get("apply"))
    print(ht.get("melon"))
except KeyError as e:
    print(f"get error: {e} is not in the hash table")

ht.delete("apply")
try:
    print(ht.get("apply"))
except KeyError as e:
    print(f"get error: {e} is not in the hash table")


key: apple, hash: 3, bucket: []
key: banana, hash: 0, bucket: []
key: apply, hash: 2, bucket: []
key: apple, hash: 3, bucket: [('apple', 1)]
5
2
3
get error: 'melon' is not in the hash table
get error: 'apply' is not in the hash table


In [None]:
sales_data = [
    {
        "region": "North America",
        "products": [
            {"product_id": 101, "name": "Laptop", "sales": 500},
            {"product_id": 102, "name": "Phone", "sales": 700}
        ]
    },
    {
        "region": "Europe",
        "products": [
            {"product_id": 201, "name": "Tablet", "sales": 400},
            {"product_id": 202, "name": "Smartwatch", "sales": 350}
        ]
    }
]

# Accessing nested data
print(sales_data[0]["products"][0]["name"])  # Output: Laptop

# Modifying sales data for a product in Europe
sales_data[1]["products"][1]["sales"] = 400
# print(sales_data)

for item in sales_data:
    for product in item["products"]:
        print(item["region"], product["name"], product["sales"])

if "Europe" in sales_data[1].values():
    print("Yes")

Laptop
North America Laptop 500
North America Phone 700
Europe Tablet 400
Europe Smartwatch 400
Yes


# Misc

## docstring and math module

In [30]:
def get_total_sales(data):
    """
    Calculate the total sales from the sales data.

    Args:
        data (list): A list of dictionaries containing sales data. Each dictionary represents a region and contains a list of products with their sales.

    Returns:
        int: The total sales from all regions and products.
    """
    total_sales = 0
    for region in data:
        for product in region["products"]:
            total_sales += product["sales"]
    return total_sales

# Example usage
total_sales = get_total_sales(sales_data)
print(f"Total Sales: {total_sales}")

print(get_total_sales.__doc__)

Total Sales: 2000

Calculate the total sales from the sales data.

Args:
    data (list): A list of dictionaries containing sales data. Each dictionary represents a region and contains a list of products with their sales.

Returns:
    int: The total sales from all regions and products.



In [33]:
import math

print(5/2)
print(5//2)
print(math.ceil(5/2))

2.5
2
3


## about `*args` and `**kwargs`

* just using `for` loop over it

In [None]:
def example_function(arg1, *args, **kwargs):
    print("First argument:", arg1)
    
    print("\nAdditional arguments (*args):")
    for arg in args:
        print(arg)
    
    print("\nKeyword arguments (**kwargs):")
    for key, value in kwargs.items():
        print(f"{key}: {value}")

# Example usage
example_function(1, 2, 3, 4, name="Alice", age=30, city="New York")

First argument: 1

Additional arguments (*args):
2
3
4

Keyword arguments (**kwargs):
name: Alice
age: 30
city: New York


## lambda function

* can have any number of parameters but, can have just one statement.

In [None]:
a = lambda x, y : x+y
print(a(7, 19))

26


## Comprehension

* a syntax construction to ease the creation of list/set/dict. Also, people are saying it is more efficient

In [None]:
# list
old_list = [1, 3, 4, 6, 7, 8, 3, 4, 9]
new_list = [x*2 for x in old_list]
print(new_list)
a_set = {x*2 for x in old_list}
print(a_set)
a_dict = {x: x*4 for x in old_list}
print(a_dict)

del a_dict[3]
print(a_dict)

sliced = old_list[:3]
print(sliced)
sliced[0] = 100
print(sliced)
print(old_list)

[2, 6, 8, 12, 14, 16, 6, 8, 18]
{2, 6, 8, 12, 14, 16, 18}
{1: 4, 3: 12, 4: 16, 6: 24, 7: 28, 8: 32, 9: 36}
{1: 4, 4: 16, 6: 24, 7: 28, 8: 32, 9: 36}
[1, 3, 4]
[100, 3, 4]
[1, 3, 4, 6, 7, 8, 3, 4, 9]


## range() is exclusive of the end value

In [None]:
l=list(range(3))
print(l)

[0, 1, 2]


# Class and OOP

- **Overriding** is about redefining methods in subclasses to provide specific implementations, aka polymorphism.
- **Overloading** allows multiple functions or methods with the same name but different parameter lists.

## interface class vs abstract class

* conceptually, interface class is only about abstract methods the child class must implement
* while abstract class can have both abstract methods and concrete methods
* python has a module ABC (Abstract Base Class) to realize this idea

```
   from abc import ABC, abstractmethod

   class Animal(ABC):
       @abstractmethod
       def sound(self):
           pass

   class Dog(Animal):
       def sound(self):
           print("Bark")

   class Cat(Animal):
       def sound(self):
           print("Meow")
```


In [38]:
class Dog:
    def __init__(self, name=None):
        if name:
            self.name = name
        else:
            self.name = "Unknown"

    def bark(self):
        print(f"[{self.name}:] Woof!")

# Dynamically adding a color attribute to the Dog class
Dog.color = "Brown"

# Creating an instance of Dog
dog_instance = Dog("huahua")
dog_instance.bark()  # Outputs "[huahua:] Woof!"

# Accessing the dynamically added color attribute
print(dog_instance.color)  # Outputs "Brown"

# Dynamically adding a sleep method to the Dog class
def sleep(self):
    print("ZZZ")

Dog.sleep = sleep

# Accessing the dynamically added sleep method
dog_instance.sleep()  # Outputs "ZZZ"

[huahua:] Woof!
Brown
ZZZ


# map()

* given a list of items, map() iterate over them and process each of them with a function
* returns a map object, which can be converted to a list by `list()`

In [66]:
numbers = [1, 2, 3, 4]
squared = map(lambda x: x ** 2, numbers)
print(type(squared))
print(squared)
print(list(squared))

# Using map() with multiple iterables
list1 = [1, 2, 3]
list2 = [4, 5, 6]

added = map(lambda x, y: x + y, list1, list2)
print(list(added))

# with filter
numbers = [1, 2, 3, 4, 5]
odds = filter(lambda x: x % 2 != 0, numbers)
squared_odds = map(lambda x: x ** 2, odds)
print(list(squared_odds)) 

# with object
class Person:
    def __init__(self, name):
        self.name = name

people = [Person("Alice"), Person("Bob"), Person("Charlie")]
names = map(lambda x: f"NAME: {x.name.upper()}", people)
print(list(names)) 

# with reduce
from functools import reduce

numbers = [2, 3, 4]
squared = map(lambda x: x ** 2, numbers)
product = reduce(lambda x, y: x * y, squared)
print(product)

# with custom iterator
class MyIterator:
    def __init__(self, start, end):
        self.start = start
        self.end = end

    def __iter__(self):
        for i in range(self.start, self.end + 1):
            yield i

iterator = MyIterator(1, 5)
mapped = map(lambda x: x * 2, iterator)
print(list(mapped))

# error handling
def process_item(item):
    try:
        # Attempt to perform an operation that might raise an exception
        result = int(item) * 2
        return result
    except ValueError:
        # Handle the exception, e.g., by returning a default value or logging the error
        print(f"Error processing item: {item}. Returning None.")
        return None

data = [1, 2, 'a', 4, 5]
processed_data = list(map(process_item, data))
print(processed_data)
# Expected output: [2, 4, None, 8, 10]

<class 'map'>
<map object at 0x7637a1b3b520>
[1, 4, 9, 16]
[5, 7, 9]
[1, 9, 25]
['NAME: ALICE', 'NAME: BOB', 'NAME: CHARLIE']
576
[2, 4, 6, 8, 10]
Error processing item: a. Returning None.
[2, 4, None, 8, 10]


In [59]:
import multiprocessing
import time

def square(x):
    return x ** 2

data = [i for i in range(10000000)]

# Using built-in map()
sequential_start = time.time()
squared_sequential = list(map(square, data))
print("Sequential execution time:", time.time() - sequential_start)

# Using multiprocessing's map()
processes_start = time.time()
with multiprocessing.Pool(processes=16) as pool:
    squared_parallel = pool.map(square, data)
print("Parallel execution time:", time.time() - processes_start)
print(squared_parallel[:10])

Sequential execution time: 1.3616161346435547
Parallel execution time: 2.9431400299072266
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
