# Interning: Integers

Reusing objects on demand.

1. At startup, Python (CPython), pre-loads (caches) a global list of integers in the range [-5, 256].
2. This number was selected, because these numbers show very often.

### Integer obj caching

1. Objects ranging from -5 to 256 are singleton objects. Singleton objects are classes that can only be initiated once. Whenever we re-initiate it, the original one comes back.
2. That's why when integer is created one of these values, the original object is returned. If value is not one of these, every time new obj is created.
3. Any time an integer is referenced in that range, Python will use the cached version of that object.
4. Same thing happen with strings, only a small number of letters cached objects are created.

In [27]:
a = 500
b = 500
print(id(a), '--> Not same')
print(id(b), '--> Not same')

a = 25
b = 25
print(id(a), '--> same')
print(id(b), '--> same')

2227600139824 --> Not same
2227600137168 --> Not same
2227522307056 --> same
2227522307056 --> same


# Interning: Strings

As the python code is compiled, IDENTIFIERS are interned (imp!).

Identifiers are:
- variable names
- function names
- class names
- etc.

Some strings are also interned by python, not all strings are interned:
- string literal that look like identifiers (e.g. 'hello_world').
- although if it starts with a digit, even though that is not a valid identifier, it may still get interned.

<p style="text-align: center;">But don't count on it</p>

### Why python do interning?
It's all about (speed and, possibly, memory) optimization.

Python, both internally, and in the code you write, deals with lots and lots of dictionaries type lookups, on string keys, which means a lot of string equality testing (for variable accessing in the codes further down).

If we have two strings that have same value, if we know that first one is being interned, then both are pointing to the same memory address.

### Force strings to be interned
Not all the strings are automatically interned by python, but we can force it using ```sys.intern()```.

```
import sys

a = sys.intern('the quick brown fox')
b = sys.intern('the quick brown fox')
```
now this is fast
```
a is b -> True
```
now this is slow
```
a == b -> True
```

Both values should be created by ```sys.intern()```, otherwise, they will be different objects in memory.

In [38]:
# it looks like an identifier (variable_names), it will be interned
a = 'hello'
b = 'hello'

print(id(a), id(b))
print(a == b, a is b)

2227596448368 2227596448368
True True


In [39]:
# there is a space it doesn't looks like an identifier (variable_names), it will not be interned
a = 'hello world'
b = 'hello world'

print(id(a), id(b))
print(a == b, a is b)

2227601034032 2227601034416
True False


In [41]:
# this looks like an identifier (variable_names)
a = "_this_is_a_long_string_that_could_be_used_as_an_identifier"
b = "_this_is_a_long_string_that_could_be_used_as_an_identifier"

a is b

True

In [68]:
from sys import intern
import time

In [57]:
a = intern('hello world')
b = intern('hello world')
c = 'hello world'

print(id(a), id(b), '--> Same memory address')
print(a is b, '--> Compare by memory address')
print(a == b, '--> Compare by values (data)')

print()

print(id(a), id(c), '--> Different memory address')
print(a is c, '--> Compare by memory address')
print(a == c, '--> Compare by values (data)')

2227600560752 2227600560752 --> Same memory address
True --> Compare by memory address
True --> Compare by values (data)

2227600560752 2227601192560 --> Different memory address
False --> Compare by memory address
True --> Compare by values (data)


In [74]:
def compare_using_equals(n):
    # string repeated 200 times
    a = 'a long string that is not interned' * 200
    b = 'a long string that is not interned' * 200
    for i in range(n):
        if a == b:
            pass

In [73]:
def compare_using_interning(n):
    # string repeated 200 times
    a = intern('a long string that is not interned' * 200)
    b = intern('a long string that is not interned' * 200)
    for i in range(n):
        if a is b:
            pass

In [80]:
start = time.perf_counter()
compare_using_equals(10000000)
print('equality:', time.perf_counter() - start)

equality: 3.5030590999958804


In [79]:
start = time.perf_counter()
compare_using_interning(10000000)
print('interning:', time.perf_counter() - start)

interning: 0.8154997999954503
