## Advanced Python - Notes

### The Python Data Model
The Python Data Model allows the consistency of the language. It can be seen as framework which formalizes the interfaces of the building blocks of Python. In other words, it is a collection of interfaces that define how Python built-in behaviors and operations should interact with objects (through dunder nethods).  
When we code in Python, the interpreter invokes special methods to perform basic object operations based on a special syntax (`__some_method__`).  
When custom methods implement special (dunder) methods, they can behave as built-in types. This is achieved thanks to operator overloading.  
  
All data in a Python program can be repesented by objects or relations between objects. In other words, data is encapsulated within objects in Python. An object is a collection of data and methods that act on the data. This allows for a higher abstraction level, where the use can focus on the operations that can be performed with such objects rather than the internall implementation of data. These objects can interact with each other in various manners, such as passing messages or invoking methods. Python also treat code as objects. This is a important feature of high level languages when we think in a functional manner, because in this model functions can be treated as just another object for example.  
  
Each object in Python has and identity, a type and a value. It is important to note that a object identity never changes once it has been created. This identity can be tought as the memory adress of the object. In this scenario, the `is` operator is the way to compare the identity of two objects.  

### ID
The `id` fuction returns the memory adress of a object and is unchangeable. Despite being a general information about every Python object, the *id* sometime is obtained through some optimizations, as listed bellow:  

In [21]:
first_value = 5
print(id(first_value))

second_value = 2
print(id(second_value))

third_value = 2
print(id(third_value))

print(first_value is second_value)
print(second_value is third_value)

139683335127408
139683335127312
139683335127312
False
True


The pattern observed above only refers to numbers between -5 and 256. This is because Python stores small integers in a cache due to performance reasons. In those cases, Python pre-allocates the integer objects created and reuses them as a atempt to save memory.  
So when we created two variables with the same value, Python uses the pre-allocated value, so these two variables point to the same object in memory.  
  
For integers outside this range, Python then allocates new objects each time that these variables are created with the refered value.  

In [2]:
first_value = 257
print(id(first_value))

second_value = 257
print(id(second_value))

third_value = -6
print(id(third_value))

fourth_value = -6
print(id(fourth_value))

print(first_value is second_value)
print(third_value is fourth_value)

2817949078032
2817949078064
2817949077968
2817949078352
False
False


In [4]:
first_value = 256
print(id(first_value))

second_value = 256
print(id(second_value))

third_value = -5
print(id(third_value))

fourth_value = -5
print(id(fourth_value))

print(first_value is second_value)
print(third_value is fourth_value)

140734695319960
140734695319960
140734695311608
140734695311608
True
True


This behavior extend in certain level to strings and do not is extended to floats.

In [7]:
first_value = 2.56
print(id(first_value))

second_value = 2.56
print(id(second_value))

third_value = -5.0
print(id(third_value))

fourth_value = -5.0
print(id(fourth_value))

print(first_value is second_value)
print(third_value is fourth_value)

2817949077392
2817949077360
2817949076880
2817949077328
False
False


In [11]:
first_value = "Hello_world"
print(id(first_value))

second_value = "Hello_world"
print(id(second_value))

third_value = "Hello, world!"
print(id(third_value))

fourth_value = "Hello, world!"
print(id(fourth_value))

print(first_value is second_value)
print(third_value is fourth_value)

140002182303664
140002182303664
140002182303280
140002182307952
True
False


For strings, Python performs string interning selectively. This means that string literals that look like identifiers are often interned automatically.  
Interning is a type of caching that is applied to specific object instances and refers to Python reusing the string object instead of creating a new one every time we use the same string literal.  
  

Python can perform some optimizations with immutables objects that seems the same interning performed with integers and strings, but that actually are compilers optimizations where the interpreter understands that some immutables objects are the same if they appear in different places in the code.  

In [26]:
first_value = (1, 2, 3)
print(id(first_value))

second_value = (1, 2, 3)
print(id(second_value))

third_value = {1, 2, 3}
print(id(third_value))

fourth_value = {1, 2, 3}
print(id(fourth_value))

print(first_value is second_value)
print(third_value is fourth_value)


# Optimizations occurs here:
def test_id():
    a = (1, 2, 3)
    b = (1, 2, 3)
    print(f"a is b? The answer is: {a is b}")


test_id()

140002182175680
140002182110016
140002182123072
140002182124416
False
False
a is b? The answer is: True


The above optimization occurs only in the function namespace due to its restrictive nature. The function namespace is a confined and well-defined scope of code analysis. Therefore allowing the compiler to make some optimizations more aggressive when compared to its application at the global namespace (more unpredictable and dynamic).  

The *id* function has some uses that are related to scenarios where the the managment of objects identities are necessary. This situations in general involves identifying duplicates, performing caching/memoizantion and implementing low-level optimizations. Although it has its uses, the *id* function is rarely applied in Python and has some specific use cases where it can be positively applied.

### Type  
The object's type defines which operations can be applied to the object. It also defines the possible values that object of the referd type can have. The `type()` fucntion returns the type of a object.

In [4]:
a = 5
print(type(a))
b = [1, 2, 3]
print(type(b))
print(type(type))
print(type(object))

<class 'int'>
<class 'list'>
<class 'type'>
<class 'type'>


The `type` object is a metaclass and also a function which returns the type of a object. In this sense, `type` is an instace of itself, which forms a recursive loop and can also be considered the foundational class of the Python language, because it its the class which all other classes are derived from, directly or indirectly.  
This self-referential characteristic of the `type` metaclass is what allow it to act as a sort of a foundation for all classes. Python allow us to create metaclasses for our classes. These metaclasses are also derived from the `type` class.  

Although different from the base class, is important noting that `type` is the default metaclass in Python which defines how a class behaves, allowing the customization of class creation, modification and deletion, furthermore it also can be thought of as classes of classes (a class is a instace of it metaclass). Im this context, every class in Python is is an instance of `type` even itself.  
The base class is the one inherited by another class. In Python the `object` class is the one inherited by all new-style classes (with all its attributes and methods).  

Besides getting the type of a object, the `type` function can be used to create a class dynamically:  

In [13]:
MyDynamicClass = type(
    "MyDynamicClass", (object,), {"x": 5, "y": 10, "z": staticmethod(lambda x: x * x)}
)
instance = MyDynamicClass()
print(instance.x)
print(instance.z(5))

5
25


The use of dynamic created classes follows a clear pattern that is mostly suited to situatios when the properties, methods, or base classes of the objects of one's code are determined by external factors that are only known at runtime, such as user input, configuration data, or external schemas. This approach fits properly in scenarios where static class creation would lead to escessive code duplication or complexity.  
A helpful way of thinking in terms of using this approach is to evaluate if the behavior of a class is dependent on conditions only known at runtime, requires class generation/modification dynammically, aims to avoid code duplication, or allows end users to modify/extend the system without altering the core codebase.   
Although very useful in certain situations, the use of type in this manner adds a significant level of complexity to the code, necessitating its use to be always well-considered and keeping in mind the possible trade-offs.   

There are some use cases that cover the vast majority of situations where creating a class dynamically could be helpful. One such use case is when there's a requirement to create a class at runtime that should contain type hints:

In [15]:
from typing import Type, cast


class BaseClass:
    pass


# The cast function cast a value to a type, returning it unchanged. It is only useful for type-checkers
# because it signals that the type has the designated value.
cast(Type[BaseClass], type("SubClass", (BaseClass,), {}))

__main__.SubClass

It is also possible to create class dynamically with slots:

In [1]:
MyClass = type("MyClass", (), {"__slots__": ("a", "b")})
print(MyClass.a)

<member 'a' of 'MyClass' objects>


Slots allow for the explicit specification of which instance attributes an object is expected to have. This results in faster attribute access and reduced memory usage. The memory savings occur because instances do not use a dynamic dictionary for attribute storage; instead, they use a fixed-sized array, which is less memory-intensive than a dictionary. Additionally, accessing attributes stored in `__slots__` follows a more direct memory access pattern, resulting in less indirection compared to a dictionary. Moreover, using `__slots__` ensures that instances cannot access attributes not listed in it.

It's worth mentioning that dynamic attribute assignment is still possible when using slots, although some of the memory size benefits are lost. This can be achieved by including `"__dict__"` in the `__slots__` definition, allowing for dynamic attributes in addition to those specified. The use of slots requires careful consideration due to the changes in class behavior it introduces.

More information about __slots__ can be found [here](https://stackoverflow.com/questions/472000/usage-of-slots).

Another situation that is useful being aware of is the possibility for passing default arguments to a subclass created dynamically:

In [2]:
class BaseClass:
    def __init_subclass__(cls, my_name):
        print(f"Subclass created and my name is {my_name}")
        # super().__init_subclass__() Needed only in case of multiple inheritance

MyDynamicSubClass = type("MyDynamicSubClass", (BaseClass,), {}, my_name="John")

Subclass created and my name is John


When is this pattern useful?

 - Runtime Configuration: When specific information is only available at runtime, allowing for dynamic class creation and customization.
- Validation and Registration: For implementing checks and registering instances to ensure data integrity and uniqueness.
- Default Data Provision: To pre-populate subclasses with relevant data based on their specific type or context.  

Further information about `__init_subclass__` can be found [here](https://stackoverflow.com/questions/63473901/python-dynamically-create-class-while-providing-arguments-to-init-subclass).

### Value
The value of a object can change or not. This capabilitie defines if a object is mutable or immutable. But immutability is a tricky concept because a immutable object can have a reference to a mutable one. So, in this scenario, the value of a immutable object `can change` when the mutable object its referes to change. In this sense immutability is different from having a unchangeable value. Altough, the `id` of the object do not tend to change, at least in general.

In [59]:
num = 256
lst = [1, 2, num]
tpl = (1, 2, lst)
print(tpl)
print(id(num))
print(id(lst))
print(id(tpl))
num = (5545, "my tuple")
lst[2] = num
print(tpl)
print(id(num))
print(id(lst))
print(id(tpl))

(1, 2, [1, 2, 256])
8893480
140518044983424
140518044832640
(1, 2, [1, 2, (5545, 'my tuple')])
140518079995392
140518044983424
140518044832640


In the above code, the `id`s of lst and tpl did not changed because when a object is created in Python its id remains the same even after altering the value it is refering to.  
One other important thing is that, for immutable types, operations that compute a new value may actually return a reference to any existing object `a = 1; b = 1` where a and b may referes to the same object, but in `c = []; d = []` is guaranteed that c and d refers to different objects.  

### Data Structures
The built-in Python sequences can be summarized as Container Sequences and Flat Sequences. The difference between these two is that the first one can hold items of different types while the former only can hold values that are from the same type.  
In general, Flat Sequences are more compact altough they have the limitation to only holding primitive types.  
Every python object has a metadata header with at least three fields:   
- on_refcnt   
- ob_type   
- ob_val  
 
And each one of these fields take 8 bytes. For this reason a array is more compact than a tuple, for example: the array is a single object holding the raw values of its items while the tuple consists of several objects (the tuple itself and each object contained in it).


In [3]:
# https://stackoverflow.com/questions/68630/are-tuples-more-efficient-than-lists-in-python
import dis
import sys
import array
from pympler import asizeof

tpl = compile("tuple([1.5, 2.1, 3.8])", "", "exec")
dis.dis(tpl)
print(f"getsizeof: {sys.getsizeof(tuple([1.5, 2.1, 3.8]))}")
print(f"asizeof: {asizeof.asizeof(tuple([1.5, 2.1, 3.8]))}")

print(50 * "_")
print(50 * " ")

arr = compile('array.array("d", [1.5, 2.1, 3.8])', "", "exec")
dis.dis(arr)
print(f'getsizeof: {sys.getsizeof(array.array("d", [1.5, 2.1, 3.8]))}')
print(f'asizeof: {asizeof.asizeof(array.array("d", [1.5, 2.1, 3.8]))}')

  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (tuple)
              6 BUILD_LIST               0
              8 LOAD_CONST               0 ((1.5, 2.1, 3.8))
             10 LIST_EXTEND              1
             12 CALL                     1
             20 POP_TOP
             22 RETURN_CONST             1 (None)
getsizeof: 64
asizeof: 136
__________________________________________________
                                                  
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (array)
              4 LOAD_ATTR                1 (NULL|self + array)
             24 LOAD_CONST               0 ('d')
             26 BUILD_LIST               0
             28 LOAD_CONST               1 ((1.5, 2.1, 3.8))
             30 LIST_EXTEND              1
             32 CALL                     2
             40 POP_TOP
             42 RETURN_CONST             2 (None)
getsizeof: 104


In the above code we used two ways of calculating the size of a python object. `getsizeof` returns the size of the object passed to it only whereas `asizeof` returns the size of the object passed to it but also the objects referenced within the complex object instance.   
In general, `getsizeof` only reports the raw memory allocated (header and its immediate contents) for the list's data and it is based on the interpreter's internal calculations.  
`asizeof` does that by recursively adding up the sizes of all objects after traversing the object contents and examining the object attributes. `asizeof` also avoids calculating the size of the same object twice, in the case of this object be referenced more than once, by keeping tracking of the objcets IDs that it already visited by using a `set` or something a similar mechanism that allow it to recognize already-measured objects. It is important to note that it uses `getsizeof` to measure the size of the objects encountered and also take account of the extra space allocated by containers objects since some Python objects have over-allocation to amortized append operations, furthermore it also is capable of detecting shared objects, such as objects that are instances of classes with `__slots__` or some interned objects like strings. Although very useful, `asizeof` returns the estimated size of the objects plus all of it referents (_referents_ are objects acessible from the parent object according tho the traversal of its attributes and contents).  
In summary, `asizeof` does the following:  
1 - Indentify the object type;  
2 - Consult a pre-defined table with estimated base sizes for different object types;  
3 - Evaluate the content and (a) if applicable, recursively iterate through the object contents or/and (b) for each element, repeat the steps 1-3 to estimate the size and accumulate it to the total;  
4 - Keeps track of objects by analyzing internal references within the parent objects and adds the contribution for referenced objects (avoidind double counting);  
5 - Apply additional estimations for header infos and garbage collection, while avoiding double counting;  
6 - Returns the summed up size of the traversed object.   

Another way to think about collections in Python is comparing Mutable and Immutable objects. The difference between them is clear, but it's worth mentioning that Mutable sequences inherit all the Immutable sequences' methods.

#### Local Scope in Lists Comprehensions and Generators Expressions
List comprehensions and its siblings have a local scope to hold the variables assigned in the `for` clause.

In [5]:
x = "ABC"
codes = [ord(j) for j in x]
# print(j) Error

codes = [last := ord(i) for i in x]
print(last)
# print(i) Error

for i in x:
    pass
print(i)

67
C
