### Serialization
Serialization is the process to take python objects and convert them to a sequence of bytes. Deserialization is to take a sequence of bytes and convert them to python objects. Why do we need these processes?
* we divde bites in to group of eights, and call them bytes. 
* interpretor constructs this for you and you focus on high-level objects such as list, dictionary etc. 
* when you need to transfer or store data, it must be converted to a sequence of bytes.

How can we do the convertion?
* JSON, YAML, PICKLE, XML and more (all are serialization formats)
* serialization is usually done at the edges of program. when you receive data from a socket, write to a file, create a database etc.
* you deserailize byes to python object or objects from inputs, work with python objects, and serialize python objects into bytes

Factors to consider when using formats for serialization and deserialization
* languages supported: JSON is supported by multiple languages, and is therefore popular in APIs
* types supported: JSON doesn't support daytime or timestamp format that you need to convert to strings or numbers
* Schema defines how your data look like. Schema helps to validate data and detect errors
* performance: both cpu and bandwidth cost money
  + how much time and cpu required to convert 
  + how many bytes are sent or stored 
* examples: use JSON for external APIs and Protocol buffers for internal communication
  + JSON doesn't support tuple. It will return a list, which may cause problem if you take advantage of immutability of tuple
* only serialze data at the edge of the program

### Overview of commonly used formats:
* pickle is a python-only format that can serialize almost any python type, including your custom classes
  + pickle may use the built-in evol function during deserialization. need to be security-aware of it
  + pickle can serialize python objects, such as set, datetime, and functions, but not lambda expressions
* JSON most popular serialization format. Text-based and support a limited set of types
  + dose not support timestamp 
  + no comments, making it un-usable for configuration. you need configuration files with comments
  + looks like python, but are not the same
* YAML and TOML are text-based and popular in configuration
  + YAML can be complex and serialization is slow. not a problem for configuration
  + TOML is newer, adopted by python world, such as pyproject.TOML
* CSV and XML are old and established formats
  + csv can be imported and exported from Excel
  + csv doesn't have a schema
  + everything in csv is in string format
* XML has schema and is also text format
* msgpack and BSON are binary formats without schema (as binary JSON)
  + BSON is used by MONGO-DB, widely used and debugged
  + msgpack (message package) is open format and supports a wider range of programming languages
* Protocol Buffer is a format started at Google and become very popular
  + also called protobuf
  + A .proto file to define data types called messages, then use protoc compiler to generate seralization code for various languages
  + one definition for messages for all languages using it
  + its binary format makes it very efficient
  + mostly used for internal communications between services
* SQL: format to define data in relational databases
  + well esablished for querying data
  + has schema
  + support a side range of types
  + SQL3 module in python that supports it

### Pickle
* pickle API has two sets of functions: 
  + dump and load: write to and read from files and return none
  + dumps and loads: returns a byte object

In [5]:
"""Dance moves"""
from datetime import datetime, timedelta


class Move:
    """Move is a dance move"""
    def __init__(self, time, limb, what):
        self.time = time
        self.limb = limb
        self.what = what

    def __repr__(self):
        cls = self.__class__.__name__
        return f'{cls}({self.time!r}, {self.limb!r}, {self.what!r})'


# Dance moves
second = timedelta(seconds=1)
now = datetime.now()
move1 = Move(now + 1*second, 'jump', 'to the left')
move2 = Move(now + 2*second, 'step', 'to the right')
move3 = Move(now + 3*second, 'hands', 'on your hips')
move4 = Move(now + 4*second, 'knees', 'bring in tight')

#### `dumps(object)` and `loads(data)` work with Python objects and binary data
* `data = dumps(object)` returns binary format of the serialized object. Here the result is saved in data
* loads(data) takes the serialized binary and returns the deserialized Python object 

In [6]:
import pickle

Serialize python object by `pickle.dumps(object)`

In [7]:
data = pickle.dumps(move1)
print("data: ", data)
print("type of data: ", type(data))

data:  b'\x80\x04\x95p\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x04Move\x94\x93\x94)\x81\x94}\x94(\x8c\x04time\x94\x8c\x08datetime\x94\x8c\x08datetime\x94\x93\x94C\n\x07\xe5\x0c\x12\x17\x10:\x06\x11\x9f\x94\x85\x94R\x94\x8c\x04limb\x94\x8c\x04jump\x94\x8c\x04what\x94\x8c\x0bto the left\x94ub.'
type of data:  <class 'bytes'>


In [8]:
dance = [move1, move2, move3, move4]
data_dance = pickle.dumps(dance)

Deserialize bytes to python object by `pickle.loads(data)`

In [9]:
move1d = pickle.loads(data)
move1d

Move(datetime.datetime(2021, 12, 18, 23, 16, 58, 397727), 'jump', 'to the left')

In [10]:
danced = pickle.loads(data_dance)
danced

[Move(datetime.datetime(2021, 12, 18, 23, 16, 58, 397727), 'jump', 'to the left'),
 Move(datetime.datetime(2021, 12, 18, 23, 16, 59, 397727), 'step', 'to the right'),
 Move(datetime.datetime(2021, 12, 18, 23, 17, 0, 397727), 'hands', 'on your hips'),
 Move(datetime.datetime(2021, 12, 18, 23, 17, 1, 397727), 'knees', 'bring in tight')]

#### `dump(file) and load(file)` work with files

In [11]:
with open('move1.pkl', 'wb') as out:
    pickle.dump(move1, out)

In [12]:
with open('move1.pkl', 'rb') as fp:
    move1f = pickle.load(fp)

move1f

Move(datetime.datetime(2021, 12, 18, 23, 16, 58, 397727), 'jump', 'to the left')

In [13]:
with open("dance.pkl", 'wb') as out:
    pickle.dump(dance, out)

In [14]:
with open("dance.pkl", 'rb') as fp:
    dancef = pickle.load(fp)
    
dancef    

[Move(datetime.datetime(2021, 12, 18, 23, 16, 58, 397727), 'jump', 'to the left'),
 Move(datetime.datetime(2021, 12, 18, 23, 16, 59, 397727), 'step', 'to the right'),
 Move(datetime.datetime(2021, 12, 18, 23, 17, 0, 397727), 'hands', 'on your hips'),
 Move(datetime.datetime(2021, 12, 18, 23, 17, 1, 397727), 'knees', 'bring in tight')]

#### work with sockets for streaming
* we can write one object after another to the socket and read them one at a time at the receiving end
* the following code snippet uses socketpair to simulate client-server

In [15]:
from socket import socketpair

# create read and write socket pair
ws, rs = socketpair()

# create a write file and a read file for the socket
w, r = ws.makefile('wb'), rs.makefile('rb')

# Serialize
pickle.dump(move1, w)
pickle.dump(move2, w)
pickle.dump(move3, w)
pickle.dump(move4, w)

# flush the data to send the data over to the other side (read side)
w.flush()

# De-serialize the object from the othe side of the socket
for _ in range(4):
    move = pickle.load(r)
    print(f'{move.limb} {move.what}')

jump to the left
step to the right
hands on your hips
knees bring in tight


### Shelve
* Shelve is like a dictionary
* can store many objects, each one by its own key
* store objects that are too slow to move to databases

In [16]:
import shelve

# store objects
db = shelve.open('dance.db')
db['1'] = move1
db['2'] = move2
db['3'] = move3
db['4'] = move4
db.close()

db = shelve.open('dance.db')
print(db['1'])

Move(datetime.datetime(2021, 12, 18, 23, 16, 58, 397727), 'jump', 'to the left')


### Serialization with repr
* there are two ways to represent an object in python
  + str: for external representation
  + repr: for developers
  + repr is usually a way to re-create the python object
  + add a repr to your own types, and use the repr format when logging, which you can copy and paste to interpreter to inspect the object
  + All the following can be used to print repr of an object obj
    + print('%r' % obj)
    + print(repr(obj))
    + print(f'{obj!r}')

In [17]:
i, s = 1, '1'

# using str
print(i, s)

# using repr
print(repr(i), repr(s))

1 1
1 '1'


#### Example of using repr to print object in logger and re-create the object

In [18]:
"""repr example"""
import logging


logging.basicConfig(level=logging.INFO, filename='game.log')


class Player:
    """A player in the game"""
    def __init__(self, id, name, keys):
        self.id = id
        self.name = name
        self.keys = keys

    def __repr__(self):
        cls = self.__class__.__name__  # self can be a subclass
        return f'{cls}({self.id!r}, {self.name!r}, {self.keys!r})'


p1 = Player(1, 'Parzival', {'copper', 'jade'})
logging.info('p1 is %r', p1)

In [19]:
!cat game.log

INFO:root:p1 is Player(1, 'Parzival', {'copper', 'jade'})
INFO:root:p1 is Player(1, 'Parzival', {'jade', 'copper'})


In [20]:
# re-create the object from the logger entry
p1 = Player(1, 'Parzival', {'copper', 'jade'})

p1

Player(1, 'Parzival', {'jade', 'copper'})

In [21]:
p1.id

1

#### NamedTuple and dataclass for repr
* you can write your own repr function for your class, but it is tedious
* namedtuple and DataClass can shorten the code you need to write

In [22]:
from collections import namedtuple

Player = namedtuple('Player', 'id name keys')

p1 = Player(1, 'Parzival', {'copper', 'jade'})
print(repr(p1))

Player(id=1, name='Parzival', keys={'jade', 'copper'})


In [23]:
from dataclasses import dataclass

@dataclass
class Player:
    """A player in the game"""
    id: int
    name: str
    keys: set
        
p1 = Player(1, 'Parzival', {'copper', 'jade'})
print(p1)

Player(id=1, name='Parzival', keys={'jade', 'copper'})


### Another option is to not use serialization, but configuration files (further understand python config)
* security
* use SimpleNameSpace and exec to load a config.py file
* use importmachinary to load config.py file
* config file can contain any valid python code

In [29]:
%run load_config.py
config = load_config('config.py')
config

namespace(api_key='test-api-key', log_level=20, port=8080, num_workers=100)

In [30]:
%run load_config_imp.py
config = load_config('config.py')
config

<module 'config' from 'config.py'>

In [31]:
config.num_workers

100

#### Exercise: add the repr method of a class and load data from a pickle file
* the following is the content of rides.py, which must be available (in the same folder or python path) for pickle.load()

In [32]:

"""Add __repr__ to Ride class and then loads rides from rides.pkl (pickle
format) and print repr of each ride.
"""
from datetime import datetime


class Ride:
    def __init__(self, start, end, distance, num_passengers):
        
        self.start = start  # type: datetime
        self.end = end  # type: datetime
        self.distance = distance  # type: float
        self.num_passengers = num_passengers  # type: int
        
    def __repr__(self):
        cls = self.__class__.__name__
        start, end, distance, num_passengers = self.start, self.end, self.distance, self.num_passengers
        return f'{cls}({start!r}, {end!r}, {distance!r}, {num_passengers!r})'

In [33]:
with open('rides.pkl', 'rb') as fp:
    while True:
        try:
            ride = pickle.load(fp)
            print(fp)
        except EOFError:
            break

<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>
<_io.BufferedReader name='rides.pkl'>


### IMPORTANT TO KNOW!!!
What happens when pickle.load() function returns an error message of:
"ModuleNotFoundError: No module named 'Data' ?

The program that created the pickle file did import Data and there are references to that module inside the pickled object. The program that loads the pickled object needs to be able to import that module to resolve those references. Either put the location of Data.py on your PYTHONPATH (or add the location to sys.path), or copy the module to where your program can find it.
