# Object references, mutability, and recycling

A name is not the object, a name is a seperate thing.

Variables are labels, not boxes.

- garbage collection
- weak reference to "remember" objects without keeping them alive.



## 1. Variables are not boxes

Python variables are like reference variables in Java, so it's better to think of them as labels attached to objects.

In [1]:
# Variables a and b hold references to the same list, not copies of the list
a = [1,2,3]
b = a
a.append(4)
b

[1, 2, 3, 4]

"Variable s is assigned to the seesaw", but never "The seesaw is assigned to variable s"

After all, the object is created before the assignment.

In [2]:
class Gizmo:
    def __init__(self):
        print('Gizmo id: %d' % id(self))
        

In [3]:
x = Gizmo()


Gizmo id: 2122018808888


In [4]:
# Gizmo was actually instantiated before the multiplication was attempted.
y = Gizmo() * 10

Gizmo id: 2122018809560


TypeError: unsupported operand type(s) for *: 'Gizmo' and 'int'

In [5]:
# But variable y was never created, because the exception happend
# while the right hand side of the assignment was being evaluated.
dir()

['Gizmo',
 'In',
 'Out',
 '_',
 '_1',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'a',
 'b',
 'exit',
 'get_ipython',
 'quit',
 'x']

To understand an assignment in Python, always read the right-hand side first: that's where the object is created or retrieved. After that, the variable on the left is bound to the object, like a label stuck to it.

## 2. Identity, equality and aliases

In this example, `lewis` and `charles` are aliases : two variables bound to the same object. 

On the other hand `alex` is not an alias for `charles` : these variables are bound to distinct objects.

The object bound to `alex` and `charles` have the same value - that's what == compares - but they have different identities.

In [7]:
charles = {'name': 'Charles L. Dodgson', 'born':1832}
lewis = charles
lewis is charles

True

In [8]:
id(lewis), id(charles)

(2122019004616, 2122019004616)

In [9]:
lewis['balance'] = 950
charles

{'balance': 950, 'born': 1832, 'name': 'Charles L. Dodgson'}

In [10]:
alex = {'name': 'Charles L. Dodgson', 'born':1832, 'balance':950}
# The objects compare equal, because of the __eq__ implementation in the dict class.
alex == charles

True

In [12]:
alex is not charles

True

In [13]:
id(alex)

2122018828552

Every object has an identity, a type and a value. An object's identity never changes once it has been created; you may think of it as the object's address in memory. The `is` operator compares the identity of two obejcts; the `id()` function returns an integer representing its identity.

The real meaning of an object's id is implementation-dependent. 

The key point is that the id is guaranteed to be a unique numberic label, and it will never chagne during the life of the object.

In practice, Idendity checks are most done with the `is` operator, and not by comparing ids.

### Choosing between == and is



The `==` operator compares the values of objects(the data they hold), while `is` compares their identities.

If you are comparing a variable to a singletone, then it makes sense to use `is`.

the most common case is checking whether a variable is bound to `None`

```python
x is None

x is not None
```

The `is` operator is faster than `==`, because it cannot be overloaded, so Python does not have to find and find and invoke special methods to evaluate it, and computing it as simple as comparing two integer ids.

Most built-in types override `__eq__` with more meaningful implementations that actually take into account the values of the object attributes.

### The relative immutability of tuples

Tuples, like most Python collections - list, dicts, sets etc. - hold references to objects.
(On the other hand, single-type sequences like `str`, `bytes` and `array.array` are flat : they don't contain references but physically hold the data in contiguous memory.)


The immutability of tuples really refers to the physical contents of the `tuple` data structure(ie. the reference it holds), and does not extend to the referenced objects.

It's the reason why some tuples(tuple of mutables) are unhashable.

In [14]:
t1 = (1,2,[30,40])
t2 = (1,2,[30,40])
t1==t2

True

In [15]:
id(t1[-1])

2122035929480

In [16]:
t1[-1].append(99)

In [17]:
t1

(1, 2, [30, 40, 99])

In [18]:
# The identity of t1[-1] has not changed, only its value.
id(t1[-1])

2122035929480

In [19]:
t1==t2

False

## 3. Copies are shallow by default

A copy is an equal object with a different id. But if an object contains other objects, should the copy also duplicate the inner objects, or is it ok to share them?



### Copy are shallow by default

The easiest way to copy a list is to use the bulit-in constructor for the type itself.


In [20]:
l1 = [3, [55, 44], (7,8,9)]

In [21]:
l2 = list(l1)
l2

[3, [55, 44], (7, 8, 9)]

In [22]:
l2 == l1

True

In [23]:
l2 is l1

False

For lists and other mutalbe sequences, the shortcut
```python
l2 = l1[:]
```
also makes a copy

using the constructor or `[:]` produces a `shallow copy`,i.e. the outermost container is duplicated, but the copy is  filled with `references` to the same item held by the original container.

In [28]:
# l2 is a shallow copy of l1.
l1 = [3, [66, 55, 44], (7,8,9)]
l2 = list(l1)
l1.append(100)
# This affects ls becuase l2[1] is bound to the same list as l1[1].
l1[1].remove(55)

In [29]:
print('l1:', l1)
print('l2:', l2)

l1: [3, [66, 44], (7, 8, 9), 100]
l2: [3, [66, 44], (7, 8, 9)]


In [30]:
# For a mutalbe object like list referred by l2[1], the operator += changes the list in-place.
l2[1] += [33,22]
# += on a tuple creates a new tuple and rebinds the varialbe l2[2] here.
# Now the tuples in the last position of l1 and l2 are no longer the same object.
l2[2] += (10,11)

In [31]:
print('l1:',l1)
print('l2:',l2)

l1: [3, [66, 44, 33, 22], (7, 8, 9), 100]
l2: [3, [66, 44, 33, 22], (7, 8, 9, 10, 11)]


### Deep and shallow copies of arbitrary objects

Sometimes you need to make deep copies, i.e. duplicates that do not share references of embedded objects. 

The `copy` module provides the `deepcopy` and `copy` functions that return deep and shallow copies of arbitrary objects.

In [32]:
class Bus:
    
    def __init__(self, passengers=None):
        if passengers is None:
            self.passangers = []
            
        else:
            self.passengers = list(passengers)
            
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
        

In [34]:
import copy
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
# shallow copy
bus2 = copy.copy(bus1)
# deep copy
bus3 = copy.deepcopy(bus1)
# Using copy and deepcopy we create three distinct Bus instances.
id(bus1), id(bus2), id(bus3)

(2122036220760, 2122036220816, 2122036184792)

In [35]:
bus1.drop('Bill')
bus2.passengers

['Alice', 'Claire', 'David']

In [36]:
# bus1 and bus2 share the same list object, because bus2 is a shallow copy of bus1.
id(bus1.passengers), id(bus2.passengers), id(bus3.passengers)

(2122036198856, 2122036198856, 2122018814792)

In [37]:
# bus3 is a deep copy of bus1, so its passengers attribute refers to another list.
bus3.passengers

['Alice', 'Bill', 'Claire', 'David']

The `deepcopy` function remebers the objects already copied to handle cyclic references gracefully.

In [38]:
# Cyclic references : b refers to a, and then is appended to a; deepcopy still manages to copy a.
a = [10, 20]
b = [a, 30]
a.append(b)

In [39]:
a

[10, 20, [[...], 30]]

In [40]:
from copy import deepcopy
c = deepcopy(a)
c

[10, 20, [[...], 30]]

A deep copy may be too deep in some cases.For example, objects may refer to external resources or singletones that should not be copied.

You may control the behavior of both `copy` and `deepcopy` by implementing the `__copy()__` and `__deepcopy__()` special methods.

## 4. Function parameters as references

The only mode of parameter passing in Python is `call by sharing`.

`Call by sharing` means that each formal parameter of the function gets a copy of each reference in the arguments.

The parameters inside the function become aliases of the actual arguments.

파이썬에선 call by sharing으로 함수의 파라미터가 argument의 reference를 copy해서 받기 때문에 reference가 가리키는 것을 바꿀 가능성이 있다.

In [41]:
# A function may change any mutable object it receives
def f(a, b):
    a += b
    return a


In [42]:
x = 1
y = 2
f(x,y)

3

In [43]:
x, y

(1, 2)

In [44]:
a = [1,2]
b = [3,4]
f(a,b)

[1, 2, 3, 4]

In [45]:
# list is changed
a, b

([1, 2, 3, 4], [3, 4])

In [46]:
t = (10,20)
u = (30, 40)
f(t,u)

(10, 20, 30, 40)

In [47]:
# tuple is unchanged
t, u

((10, 20), (30, 40))

### Mutable types as parameter defaults : bad idea

Another issue related to function parameters is the use of mutable values for defaults.

You should avoid mutable objects as default values for parameters.

In [1]:
# danger of a mutable default
class HauntedBus:
    # When the passengers argument is not passed, 
    # this parameter is bound to the default list object,which is initially empty.
    def __init__(self, passengers=[]):
        # This assignment makes self.passenger an alias for passengers which is itself
        # an alias for the default list, when no passenger argument is given.
        self.passengers = passengers
    
    def pick(self, name):
        # This assignment makes self.passenger an alias for passengers which is itself an alias
        # for the default list, when no passegerns argument is given
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)

In [2]:
bus1 = HauntedBus(['Alice', 'Bill'])
bus1.passengers

['Alice', 'Bill']

In [3]:
bus1.pick('Charlie')

In [4]:
bus1.drop('Alice')

In [5]:
bus1.passengers

['Bill', 'Charlie']

In [6]:
# bus2 starts empty, so the defualt empty list assigned to self.passengers.
bus2 = HauntedBus()
bus2.pick('Carrie')
bus2.passengers

['Carrie']

In [7]:
dir()

['HauntedBus',
 'In',
 'Out',
 '_',
 '_2',
 '_5',
 '_6',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'bus1',
 'bus2',
 'exit',
 'get_ipython',
 'quit']

In [8]:
# bus3 also starts empty, again the defualt list is assigned
bus3 = HauntedBus()
# default is no longer empty
bus3.passengers

['Carrie']

In [9]:
bus3.pick('Dave')
bus2.passengers

['Carrie', 'Dave']

In [10]:
# The problem : bus2.passengers and bus3.passengers refer to the same list.
bus2.passengers is bus3.passengers

True

In [11]:
bus1.passengers

['Bill', 'Charlie']

In [12]:
dir(HauntedBus.__init__)

['__annotations__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

In [13]:
HauntedBus.__init__.__defaults__

(['Carrie', 'Dave'],)

In [15]:
HauntedBus.__init__.__defaults__[0]

['Carrie', 'Dave']

In [14]:
# bus2.passenger is an alias bound to the first element of the HauntedBus.__init__.__default__ attribute
HauntedBus.__init__.__defaults__[0] is bus2.passengers

True

The problem is that `Bus` instance that don't get an initial passenger list end up  sharing the same passenger list among themselves.

`self.passengers` becomese an alias for the default value of the `passenger` parameter.

The problem is that each default value is evaluated when the function is defined - i.e usually when the module is loaded - and the default values become attributes of the function object.



The issue with mutalbe defaults explains why `None` is often used as the default value for parameters that may receive mutable values.

```python
class Bus:
    def __init__(self, passengers=None):
        if passengers is None:
            self.passengers  = []
        else:
            # Assign a copy of it to self.passengers.
            self.passengers = list(passengers)
            
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.removce(name)
```



### Defensive programming with mutable parameters



In [16]:
class TwilightBus:
    
    def __init__(self, passengers=None):
        if passengers is None:
            passengers = []
            
        else:
            # this assignment makes self.passengers an alias for passengers
            # which is itself an alias for the actual argument passed to __init__ .
            self.passengers = passengers
            
    def pick(self, name):
        # Actually mutating the original list received as argument to the constructor.
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
        

In [17]:
basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']
bus = TwilightBus(basketball_team)
bus.drop('Tina')
bus.drop('Pat')
basketball_team

['Sue', 'Maya', 'Diana']

The problem here is that the bus is aliasing the list that is passed to the constructor.

Instead, it should keep its own passenger list.

`self.passengers` should be initialized with a `copy` of it.

In [None]:
class TwilightBus:
    
    def __init__(self, passengers=None):
        if passengers is None:
            passengers = []
            
        else:
            # Make a copy of the passenger list, or convert it to a list if it's not one.
            self.passengers = list(passengers)      
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
        

This solution is more flexible : now the argument passed to the `passengers` parameter may be a `tuple` or any other iterable, like a `set` or even database results, because the `list` constructor accepts any iterable.

## 5. del and garbage collection

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected.

`del` statement deletes names, not objects. 

An object may be garbage collected as result of a `del` command, but only if the variable deleted holds the last reference to the object, or if the object becomes unreachable. (If two objects refer to each other, they may be destroyed if the garbage collector determines that they are otherwise unreachable because their only references are their mutual references.)

Rebinding a variable may also cause the number of references to an object reach zero, causing its destruction.

In CPython the primary algorithm for garbage collection is `reference counting`

Each object keeps count of how many reference point to it.

As soon as that `refcount` reaches zero, the object is immediately destroyed:
Cpython calls the `__del__` method on the object and then frees the memory allocated to the object.

Use `weakref.finalize` to register a callback function to be called when an object is destroyed.

In [48]:
import weakref

s1 = {1,2,3}
# s1 and s2 are aliases referring to the same set, {1,2,3}
s2 = s1
# This function must not be a bound method the object about to be destroyed or
# otherwise hold a reference to it.
def bye():
    print('Gone with the wind')
    

In [49]:
# Register the bye callback to the object referred by s1.
ender = weakref.finalize(s1, bye)
# The .alive attribute is True  before the finalize object is called.
ender.alive

True

In [50]:
del s1
# As discussed, del does not delete an object, just a reference to it.
ender.alive

True

In [51]:
# Rebindingd the last reference, s2, makes {1,2,3} unreachable. 
s2 = 'spam'
ender.alive

Gone with the wind


False

`del` does not delete objects, but objects may be deleted as a consequence of being unreachable after `del` is used.

`s1` reference was passed to the `finalize` function, which must have held on to it in order to monitor the object and invoke the callback. This works because `finalize` holds a `weak reference` to {1,2,3} 

## 6. Weak references

The presence of references is what keeps an object alive in memory.

Weak references to an object do not increase its reference count. Therefore weak reference does not prevent `the referent` from being garbage collected.

In [52]:
# A weak reference is a callable that returns the referenced object or None
# if the referent is no more.
import weakref
a_set = {0,1}
#wref reference object is created and inspected in the next line.
wref = weakref.ref(a_set)

In [53]:
wref

<weakref at 0x000001EE13373EF8; to 'set' at 0x000001EE1334AE48>

In [54]:
# Invoking wref() returns the referenced object, {0,1}. 
# Because this is a console session, the result {0, 1} is bound to the _variable.
wref()

{0, 1}

In [55]:
# a_set no longer refers to the {0,1} set, so its  reference count is decreased.
# But the _variable still refers to it.
a_set = {2,3,4}
wref()

{0, 1}

In [56]:
# When this expression is evalueated, {0,1} lives, therefore wref() is not None.
# But _is then bound to the resulting value,False.
# Now there are no more strong reference to {0,1}
wref() is None

False

In [57]:
# Because the {0,1} object is now gone, this last call to wref() returns None.
wref() is None

False

Consider using `WeakKeyDictionary`, `WeakValueDictionary`, `WeakSet` and `finalize` which use weak reference internally. - instead of creating and handling your own `weakref.ref` instances by hand.

### The WeakValueDictionary Skit

The calss `WeakValueDictionary` implements a mutable mapping where the values are weak references to objects. 

When a referred object is garbage collected elsewhere in the program, the corresponding key is automatically removed from the `WeakValueDictionary.`

This is commonly used for caching.



In [1]:
class Cheese:
    
    def __init__(self, kind):
        self.kind = kind
        
    def __repr__(self):
        return 'Cheese(%r)' % self.kind

In [2]:
import weakref
stock = weakref.WeakValueDictionary()
catalog = [Cheese('Red Leicester'), Cheese('Tilsit'), Cheese('Brie'), Cheese('Parmesan')]


In [3]:
for cheese in catalog:
    # stock maps the name of the cheese to a weak reference to the cheese instance in the catalog.
    stock[cheese.kind] = cheese

In [4]:
sorted(stock.keys())

['Brie', 'Parmesan', 'Red Leicester', 'Tilsit']

In [5]:
for key, val in stock.items():
    print(key, val)

Tilsit Cheese('Tilsit')
Brie Cheese('Brie')
Red Leicester Cheese('Red Leicester')
Parmesan Cheese('Parmesan')


In [6]:
# After catalog is deleted, most cheeses are gone from the stock.
del catalog
print(sorted(stock.keys()))
dir()

['Parmesan']


['Cheese',
 'In',
 'Out',
 '_',
 '_4',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'cheese',
 'exit',
 'get_ipython',
 'key',
 'quit',
 'stock',
 'val',
 'weakref']

In [7]:
for key, val in stock.items():
    print(key, val)

Parmesan Cheese('Parmesan')


In [6]:
del cheese
print(sorted(stock.keys()))
print(dir())

[]
['Cheese', 'In', 'Out', '_', '_4', '_5', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_i6', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'quit', 'stock', 'weakref']


A `temporary variable` may cause an object to last longer than expected by holding a reference to it. This is usually not a problem with local variables : they are destroyed when the function returns.

But glabal variable never go away unless explicitly deleted.

`WeakKeyDictionary` can be used to associate additional data with an object owned by other parts of an application without adding attributes to those objects.

This can be especially useful with objects that override attribute accesses.


 `WeakSEt` : Set class that keeps weak references to its elements. An element will be discarded when no strong reference to it exists any more.
 
 If you need to build a class that is aware of every one of its instances, a good solution is to create a class attribute with a `WeakSet` to hold the references to the instances.
 
Otherwise, if a regular `set` was used, the instances would never be garbage collected, because the class itself would have strong reference to them, and classese live as long as Python process unless you deliberately delete them.

### Limitations of weak references

Not every Python object may be the target, or referent, of a weak reference. Basic `list` and `dict` instances may not be referent, but a plain subclass of either can solve this problem easily.

In [8]:
class MyList(list):
    """list subclass whose instances may be weakly referenced"""
    
a_list = MyList(range(10))

# a_list can be the target of a weak reference
wref_to_a_list = weakref.ref(a_list)

In [9]:
b_list = list(range(10))

wref_to_b_list = weakref.ref(b_list)

TypeError: cannot create weak reference to 'list' object

A `set` instance can be referent, that's why a `set` was used. 

User-defined types also pose no problem, which explains why the silly `Cheese` class was needed.

But `int` and `tuple` instances cannot be targets of weak references, even if subclass of those types are created.

## 7. Tricks Python plays with immutables

for a `tuple` t, `t[:]` does not make a copy, but returns a reference to the same object. You also get a reference to the same tuple if you write `tuple(t)`.

In [10]:
t1 = (1,2,3)
t2 = tuple(t1)

In [11]:
# t1 and t2 are bound to the same object.
t2 is t1

True

In [12]:
t3 = t1[:]
t3 is t1

True

The same behavior can be observed with instances of `str`, `bytes` and `frozenset`. 

Note that a `frozenset` is not a sequence, so `fs[:]` does not work if `fs` is a `frozenset`.

But `fs.copy()` has the same effect : it returns a reference to the same object, and not a copy at all.

In [13]:
t1 = (1,2,3)
t3 = (1,2,3)
t3 is t1

False

In [14]:
s1 = 'ABC'
s2 = 'ABC'
s2 is s1

True

The sharing of string literals is an optimization techinique called `interning`

Cpython uses the same technique with small integers to avoid unnecessary duplication of popular numbers like 0, -1 and 42.

Never depend on `str` or `int` interning. 

Always use `==` and not `is` to compare them for equality.

## 8. Chapter Summary

Every Python object has `an identity`, `a type` and `a value`. Only the value of an object changes over time.

The fact that varaibles hold references has many practical consequences in Python programming.

1. Simple assingment does not create copies.
2. Augmented assignmnet with `+=`, `*=` creates new objects if the left-hand variable is bound to an immutable object, but may modify a mutable object in-place.
3. Assigning a new value to an existing variable does not change the object previously bound to it. This is called a `rebinding` : the variable is now bound to a different object. If that variable was the last reference to the previous object, that object will be garbage collected.
4. Function parameters are passed as aliases, which means the function may change any mutable object received as an argument. There is no way to prevent this, except making local copies or using immutable objects(eg. passing a tuple instead of a list).
5. Using mutable objects as default values for function parameters is dangerous because if the parameters are changed in-place then the default is changed, affecting every future call that relies on the default.

In CPython, objects are discarded as soon as the number of references to them reaches zero.

They may also be discarded if they form groups with cyclic references but no outside references.