## Serialization and Deserialization

- [**Pickling**](#pickling)
- [**JSON Serialization**](#json_serialization)
- [**Custom JSON Encoding**](#custom_json_encoding)
- [**Using JSONEncoder**](#using_jsonencoder)
- [**Custom JSON Decoding**](#custom_json_decoding)
- [**Using JSONDecoder**](#using_jsondecoder)

---

### Pickling <a name='pickling'></a>

`pickle` module is a Python specific way of serializing and deserializing objects, it can write objects into **binary-format** files for data persistence and reload them as needed.

* Pickle string:

In [1]:
import pickle

In [2]:
ser = pickle.dumps('Taylor Swift')
print(ser)

b'\x80\x04\x95\x10\x00\x00\x00\x00\x00\x00\x00\x8c\x0cTaylor Swift\x94.'


In [3]:
deser = pickle.loads(ser)
print(deser)

Taylor Swift


* Pickle number:

In [4]:
ser = pickle.dumps(1114)
print(ser)

b'\x80\x04\x95\x04\x00\x00\x00\x00\x00\x00\x00MZ\x04.'


In [5]:
deser = pickle.loads(ser)
print(deser)

1114


* Pickle list:

In [6]:
ser = pickle.dumps([1, 2, ('Taylor', 'Swift')])
print(ser)

b'\x80\x04\x95\x1c\x00\x00\x00\x00\x00\x00\x00]\x94(K\x01K\x02\x8c\x06Taylor\x94\x8c\x05Swift\x94\x86\x94e.'


In [7]:
# Note that the deserialized object equals to the original one, but is not the same object as the original
deser = pickle.loads(ser)
print(deser)

[1, 2, ('Taylor', 'Swift')]


---

### JSON Serialization <a name='json_serialization'></a>

`JSON` is considered as a safe approach regarding serialization/deserialization comparing to `pickle`, but it on the other hand only support limited data types including:

* string
* int
* float
* boolean
* array (list)
* dictionary
* empty value

And it will not be able to handle for instance:

* complex number
* decimal
* tuple
* set
* datetime

In [8]:
import json

In [9]:
d1 = {'a': 100, 'b': 200}
# If using json.dump() will directly write the output into a file
d1_json = json.dumps(d1)
print(f'{type(d1_json)}\n{d1_json}')

<class 'str'>
{"a": 100, "b": 200}


In [10]:
d2 = json.loads(d1_json)
print(f'{type(d2)}\n{d2}')

<class 'dict'>
{'a': 100, 'b': 200}


Note that `JSON` will always return dictionaries where the keys are in `str` type, which can be different from the original type. 

In [11]:
d1 = {1: 100, 2: 200}
d1_json = json.dumps(d1)
print(f'{type(d1_json)}\n{d1_json}')

<class 'str'>
{"1": 100, "2": 200}


In [12]:
d2 = json.loads(d1_json)
print(f'{type(d2)}\n{d2}')

<class 'dict'>
{'1': 100, '2': 200}


---

### Custom JSON Encoding <a name='custom_json_encoding'></a>

When encountering types that `JSON` is not able to handle, one can provide the customized approach (i.e. callable) for it to manipulate.

In [13]:
from datetime import datetime

In [14]:
log_record = {
    'time': datetime.utcnow(),
    'message': 'testing',
    'args': (10, 'test')
}

In [15]:
json.dumps(log_record, default=lambda x: 'Unknown serialization')

'{"time": "Unknown serialization", "message": "testing", "args": [10, "test"]}'

---

### Using JSONEncoder <a name='using_jsonencoder'></a>

`JSONEncoder` is the class in `json` module for data serialization, and it can be overloaded using custom set-up to control how the serialization should be done.

In [16]:
class CustomJSONEncoder(json.JSONEncoder):
    def __init__(self, *args, **kwargs):
        # Specify extra arguments
        super().__init__(skipkeys=True,
                        allow_nan=False,
                        indent='---',
                        separators=(':', ' = '))
    
    # Override default() method
    def default(self, arg):
        if isinstance(arg, datetime):
            return arg.isoformat()
        # Delegate back to the parent class
        return super().default(arg)

In [17]:
log_record = {
    'time': datetime.utcnow(),
    'message': 'testing',
    'args': (10, 'test')
}

In [18]:
# Use customized JSONEncoder class
json.dumps(log_record, cls=CustomJSONEncoder)

'{\n---"time" = "2023-09-27T11:13:01.984383":\n---"message" = "testing":\n---"args" = [\n------10:\n------"test"\n---]\n}'

---

### Custom JSON Decoding <a name='custom_json_decoding'></a>

JSON decoding is normally much more complex than encoding process, and we can exploit following steps to make sure it can be transformed into correct type.

> Schema: Provide information of object type in the string to control parsing format.

In [19]:
string = """
{
    "time": 
    {
        "objecttype": "datetime",
        "value": "1996-11-14T02:30:00"
    },
    "message": "blessing moment"
}
"""

> Extra argument: Add specific arguments for meeting different needs of parsing.

* object_hook: Apply the callable that is used to manipulate data types all the way through each layer.

In [20]:
def custom_decoder(arg):
    if 'objecttype' in arg and arg['objecttype'] == 'datetime':
        datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
    return arg

In [21]:
json.loads(string, object_hook=custom_decoder)

{'time': {'objecttype': 'datetime', 'value': '1996-11-14T02:30:00'},
 'message': 'blessing moment'}

* parse_float: Apply customized way to handle floats.

In [22]:
from decimal import Decimal

def make_decimal(arg):
    print('Float received: ', type(arg), arg)
    return Decimal(arg)

In [23]:
string = """
{
    "a": 10,
    "b": 0.7,
    "c": 0.99
}
"""

In [24]:
json.loads(string, parse_float=make_decimal)

Float received:  <class 'str'> 0.7
Float received:  <class 'str'> 0.99


{'a': 10, 'b': Decimal('0.7'), 'c': Decimal('0.99')}

* parse_int: Apply customized way to handle integers.

In [25]:
def make_binary(arg):
    print('Int received: ', type(arg), arg)
    return bin(int(arg))

In [26]:
string = """
{
    "a": 10,
    "b": 0.7,
    "c": 0.99
}
"""

In [27]:
json.loads(string, parse_int=make_binary)

Int received:  <class 'str'> 10


{'a': '0b1010', 'b': 0.7, 'c': 0.99}

---

### Using JSONDecoder <a name='using_jsondecoder'></a>

`JSONDecoder` is the class in `json` module for data deserialization, and it can be overloaded using custom set-up to control how the deserialization should be done.

In [28]:
class CustomDecoder(json.JSONDecoder):
    
    # Override decode() method
    def decode(self, arg):
        obj = json.loads(arg)
        if 'points' in arg:
            obj['points'] = [Point(x, y) for x, y in obj['points']]
        return obj

In [29]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def __repr__(self):
        return f'Point(x = {self.x}, y = {self.y})'

In [30]:
s1 = '''
{
    "points": [
        [1, 2],
        [-1, -2]
    ]
}
'''

In [31]:
s2 = '''
{
    "a": 100,
    "b": [1, 2, 3],
    "c": "python",
    "d": {
        "e": 4,
        "f": 5.5
    }
}
'''

In [32]:
json.loads(s1, cls=CustomDecoder)

{'points': [Point(x = 1, y = 2), Point(x = -1, y = -2)]}

In [33]:
json.loads(s2, cls=CustomDecoder)

{'a': 100, 'b': [1, 2, 3], 'c': 'python', 'd': {'e': 4, 'f': 5.5}}