<div align="center">
<img src="https://upload.wikimedia.org/wikipedia/commons/a/a8/%D0%9B%D0%9E%D0%93%D0%9E_%D0%A8%D0%90%D0%94.png" width=400px/>
<br /><br />
<b><font size=7 face="Arial">Сериализация и десериализация</font></b>
<br />
<h4>2021</h4>
</div>

<div align="center"><b><font size=6>Зачем это все?</font></b></div>

<div align="center"><img src="https://blogdotxkcddotcom.files.wordpress.com/2019/08/sendafile_1.png?w=1191&h=334" width="400px"/></div>

1. Web API: JSON/RPC/...
2. Конфигурация приложения
3. Кеширование / Хранение в БД

...

Форматы:

1. Текстовые: JSON, YAML, XML, ...
2. Бинарные: Pickle, protobuf, FlatBuffers, ...

Еще: <a href="https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats">Comparison of data-serialization formats</a>

<div align="center"><b><font size=6>JSON</font></b></div>
<div align="center"><img src="https://www.json.org/img/json160.gif"/></div>

```json
{
    "orders": [
        {
            "id": "2345328",
            "date": "June 20, 2020 10:45:34",
            "trackId": "XGB2567TD",
            "customer": {
                "custId": "106156",
                "fname": "Max",
                "lname": "Hatfield",
                "city": "NY"
            }
        }
    ]
}
```

JSON - JavaScript Object Notation

Формальное описание: https://www.json.org/json-en.html

Еще: <a href="https://www.youtube.com/playlist?list=PLEzQf147-uEoNCeDlRrXv6ClsLDN-HtNm">Videos about JSON</a>

Библиотеки для работы с JSON:
1. json 
2. simplejson
3. python-rapidjson
4. orjson
5. ujson

В модуле json 4 основных функции: 2 для работы с потоками и 2 для работы со строчками.

Поток:
1. dump
2. load

Строчка:
1. dumps
2. loads

In [1]:
import json

data = ['foo', {'bar': ('baz', None, 1.0, 2)}]
data_dump = json.dumps(data)
print(data_dump)

with open('result.json', 'w') as fout:
    json.dump(data, fout)
!cat result.json

["foo", {"bar": ["baz", null, 1.0, 2]}]
["foo", {"bar": ["baz", null, 1.0, 2]}]

In [2]:
data_parsed = json.loads(data_dump)
print(data_parsed)

with open('result.json') as fin:
    print(json.load(fin))

['foo', {'bar': ['baz', None, 1.0, 2]}]
['foo', {'bar': ['baz', None, 1.0, 2]}]


In [3]:
print(data == data_parsed)

False


In [4]:
print(data)
print(data_parsed)

['foo', {'bar': ('baz', None, 1.0, 2)}]
['foo', {'bar': ['baz', None, 1.0, 2]}]


| Python  | JSON  |
|:---|:---|
| dict  | Object |
| list  | Array  |
| tuple  | Array  |
| str  | String  |
| int  | Number (int)  |
| float  | Number (real)  |
| True  | true  |
| False  | false  |
| None  | null  |

Что насчет Decimal, complex, datetime, ...?

In [5]:
from decimal import Decimal
num = Decimal('0.1')
json.dumps(num)

TypeError: Object of type Decimal is not JSON serializable

In [6]:
def my_encode(obj):
    if isinstance(obj, Decimal):
        return str(obj)
    raise TypeError('Unknown object type {}'.format(type(obj)))
    
print(json.dumps(num, default=my_encode))
print(json.dumps(num, default=str))

"0.1"
"0.1"


In [7]:
class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Decimal):
            return str(obj)
        return json.JSONEncoder.default(self, obj)
    
    def encode(self, obj):
        res = json.JSONEncoder.encode(self, obj)
        if isinstance(obj, list):
            return 'formatted:{}'.format(res)
        return res

data = ['hello world', Decimal('1.23'), [1.1234, 2, 3]]

print(json.dumps(data, cls=MyEncoder))

formatted:["hello world", "1.23", [1.1234, 2, 3]]


Как загрузить Decimal, complex, datetime, ...?

In [8]:
class DecimalEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Decimal):
            return {'__Decimal__': str(obj)}
        return json.JSONEncoder.default(self)
    
def as_Decimal(dct):
    val = dct.get('__Decimal__')
    if val is not None:
        return Decimal(val)
    return dct

a = [Decimal('0.1'), Decimal('0.001')]
a_json = json.dumps(a, cls=DecimalEncoder)
print(a)
print(a_json)
b = json.loads(a_json, object_hook=as_Decimal)
print(b)

[Decimal('0.1'), Decimal('0.001')]
[{"__Decimal__": "0.1"}, {"__Decimal__": "0.001"}]
[Decimal('0.1'), Decimal('0.001')]


In [9]:
json_str = '[0.01, 0.001]'
a = json.loads(json_str, parse_float=Decimal)
print(a)

[Decimal('0.01'), Decimal('0.001')]


Q: Как сделать удобнее? <br>
А: Использовать другие модули для работы с json/дополнительные модули.

Полезные модули:
* dataclasses-json

In [10]:
import simplejson

a = [0.1, Decimal('0.001')]
a_json = simplejson.dumps(a)
print(a_json)

a_parsed = simplejson.loads(a_json)
print(a_parsed, type(a_parsed[1]))

a_parsed_dec = simplejson.loads(a_json, use_decimal=True)
print(a_parsed_dec, type(a_parsed_dec[0]))

[0.1, 0.001]
[0.1, 0.001] <class 'float'>
[Decimal('0.1'), Decimal('0.001')] <class 'decimal.Decimal'>


**Ловушки JSON**

In [11]:
# Keys are always str

dct = {
    1: 'one',
    2: 'two',
    3: 'three',
}

dct_json = json.dumps(dct)
print('json_str:', dct_json)
print(json.loads(dct_json))

json_str: {"1": "one", "2": "two", "3": "three"}
{'1': 'one', '2': 'two', '3': 'three'}


In [12]:
# Multiple dumps 

val1 = [1, 2, 3]
val2 = {'key': 'value'}

with open('bad.json', 'w') as fout:
    json.dump(val1, fout)
    fout.write('\n')
    json.dump(val2, fout)
    
!cat bad.json

[1, 2, 3]
{"key": "value"}

In [13]:
with open('bad.json') as fin:
    json.load(fin)

JSONDecodeError: Extra data: line 2 column 1 (char 10)

In [14]:
with open('bad.json') as fin:
    for line in fin:
        print(json.loads(line))

[1, 2, 3]
{'key': 'value'}


In [15]:
# repr misuse

arr = [1, 2, 3]
print(repr(arr))
print(json.loads(repr(arr)))

arr2 = ["Hello world", "!"]
print(repr(arr2))
print(json.loads(repr(arr2)))

[1, 2, 3]
[1, 2, 3]
['Hello world', '!']


JSONDecodeError: Expecting value: line 1 column 2 (char 1)

**Полезные аргументы dump/dumps**

`json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)`

`json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)`

In [17]:
# indent + sort_keys

data = [
    {
        'name': 'Max',
        'age': 20,
    },
    {
        'name': 'Alex',
        'age': 31,
    }
]

print(json.dumps(data))
print(json.dumps(data, indent=2, sort_keys=True))

[{"name": "Max", "age": 20}, {"name": "Alex", "age": 31}]
[
  {
    "age": 20,
    "name": "Max"
  },
  {
    "age": 31,
    "name": "Alex"
  }
]


In [18]:
# ensure_ascii - json.dump only
msg = 'Привет мир!'

with open('file_ascii.txt', 'w') as fout:
    json.dump(msg, fout)
!cat file_ascii.txt
!echo "\n"

with open('file_utf8.txt', 'w', encoding='utf8') as fout:
    json.dump(msg, fout, ensure_ascii=False)
!cat file_utf8.txt

"\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440!"

"Привет мир!"

In [19]:
with open('file_utf8.txt', encoding='utf8') as fin:
    print(json.load(fin))

Привет мир!


Лучше задавать кодировку при `ensure_ascii=False`.

Иначе будет использована `locale.getpreferredencoding()`.

In [2]:
# allow_nan

num = float('inf')
print(json.dumps(num))
print(json.dumps(num, allow_nan=False))

Infinity


ValueError: Out of range float values are not JSON compliant

**Полезные аргументы load/loads**

`json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)`

`json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)`

In [21]:
json.loads('{"foo": "bar"}', object_pairs_hook=print)
json.loads('{"foo": "bar"}', object_hook=print)

[('foo', 'bar')]
{'foo': 'bar'}


<div align="center"><b><font size=6>YAML</font></b></div>
<div align="center"><img src="https://upload.wikimedia.org/wikipedia/commons/9/92/Yaml_logo.png"/></div>

<div align="center"><img src="https://miro.medium.com/max/2000/1*2Rly7p5CqW8-sb3CneOOxQ.png" /></div>

YAML - Yet Another Markup Language<br>
YAML - YAML Ain't Markup Language

Официальный сайт: https://yaml.org <br>
Спецификация (ver. 1.1): https://yaml.org/spec/1.1/current.html <br>
Спецификация (ver. 1.2): https://yaml.org/spec/1.2.2/

Быстрая инструкция: https://learnxinyminutes.com/docs/yaml/

Еще: <a href="https://stackoverflow.com/questions/1726802/what-is-the-difference-between-yaml-and-json/1729545#1729545">What is the difference between YAML and JSON?</a>

### YAML - надмножество JSON (1.2)
<div align="center"><img src="https://imgs.xkcd.com/comics/standards.png" /></div>

#### Словарь

```yaml
one: 1
two: 2 # comment
0.125: float key
1: one
2: 'two'
"key with :": "value"
flag: false
null_value: null
```

#### Вложенный словарь

```yaml
nested_map_1:
  key: value
  nested_map_2:
      new_key: new_value
```

#### Последовательность
   
```yaml
- Item 1
- Item 2
-
  - nested item 1
  - nested item 2
- - new nested item 1
  - new nested item 2
- - - yet another item 1
    - yet another item 2
-
  - nested key 1: value
  - nested key 2: value2  
```

#### Последовательность внутри словаря
```yaml
outer_key:
  innter_key:
    - item 1
    - item 2
```

#### JSON в YAML
```yaml
json_map: {"key": "value"}
json_seq: [1, 2, 3, "hello"]
quotes are optional: {key: [1, 3, 3, hello]}
```

#### Множества

```yaml
set1:
  ? item1
  ? item2
  ? item3
  
set2: {item1, item2, item3}

set3:
  item1: null
  item2: null
  item3: null
```

Примеры yaml из курса:
* https://gitlab.manytask.org/py-tasks/public-2021-fall/-/blob/master/.gitlab-ci.yml

#### Даты
```yaml
datetime: 2020-08-10T10:30:42.3Z
datetime_with_space: 2020-08-10 10:30:42
date: 2020-08-10
```

#### Теги и типы

```yaml
explicit_string: !!str 1.23
py_complex: !!python/complex 3+2j
```

#### Бинарные данные (в base64 кодировке)

```yaml
generic: !binary |
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
 OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
 +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
 AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
```

Библиотеки для работы с YAML:
1. PyYaml
2. ruamel.yaml

Основные функции модуля PyYAML:
1. `load` / `safe_load` / `unsafe_load` / `full_load`
2. `dump` / `safe_dump`
3. `load_all` / `safe_load_all` / `unsafe_load_all` / `full_load_all`
4. `dump_all` / `safe_dump_all`

**Зачем так много?**
1. `def load(stream, Loader=None)`
2. `def dump(data, stream=None, Dumper=Dumper, **kwds)`

Проблемы с Arbitrary Code Execution:
1. https://github.com/yaml/pyyaml/pull/386
2. https://github.com/yaml/pyyaml/issues/420

<p style="color:red">
Если вы не доверяете источнику yaml-файла, то используете только safe функции
<p style="color:red">

In [22]:
import yaml

data = yaml.safe_load('''
key 1: 
  - Item 1
  - Item 2
key 2:
  inner key: 10.5
''')
print(data)

{'key 1': ['Item 1', 'Item 2'], 'key 2': {'inner key': 10.5}}


In [23]:
print(yaml.safe_dump(data))

key 1:
- Item 1
- Item 2
key 2:
  inner key: 10.5



In [24]:
with open('sample.yaml', 'w') as fout:
    yaml.safe_dump(data, fout)
    
!cat sample.yaml
!echo "\n"

with open('sample.yaml') as fin:
    print(yaml.safe_load(fin) == data)

key 1:
- Item 1
- Item 2
key 2:
  inner key: 10.5


True


In [25]:
arr1 = [1, 2, 3]
arr2 = [4, 5, 6]

with open('multiple.yaml', 'w') as fout:
    yaml.safe_dump(arr1, fout)
    yaml.safe_dump(arr2, fout)
    
!cat multiple.yaml
!echo "\n"

with open('multiple.yaml') as fin:
    print(yaml.safe_load(fin))

- 1
- 2
- 3
- 4
- 5
- 6


[1, 2, 3, 4, 5, 6]


In [26]:
arr1 = [1, 2, 3]
arr2 = [4, 5, 6]

with open('multiple_fix.yaml', 'w') as fout:
    yaml.safe_dump(arr1, fout, explicit_start=True)
    yaml.safe_dump(arr2, fout, explicit_start=True)
    
!cat multiple_fix.yaml
!echo "\n"

with open('multiple_fix.yaml') as fin:
    for arr in yaml.safe_load_all(fin):
        print(arr)

---
- 1
- 2
- 3
---
- 4
- 5
- 6


[1, 2, 3]
[4, 5, 6]


In [27]:
# safe_dump vs safe_dump_all

arr = [1, 2, 3, 4]
print(yaml.safe_dump(arr, explicit_start=True))
print()
print(yaml.safe_dump_all(arr, explicit_start=True))

---
- 1
- 2
- 3
- 4


--- 1
--- 2
--- 3
--- 4
...



In [28]:
arr = [1, 2, 3, 4]
dump_str = yaml.safe_dump_all(arr, explicit_start=True)
for item in yaml.safe_load_all(dump_str):
    print(item, type(item))
    
print(yaml.safe_load(dump_str))

1 <class 'int'>
2 <class 'int'>
3 <class 'int'>
4 <class 'int'>


ComposerError: expected a single document in the stream
  in "<unicode string>", line 1, column 5:
    --- 1
        ^
but found another document
  in "<unicode string>", line 2, column 1:
    --- 2
    ^

<div align="center"><b><font size=6>Python Pickle</font></b></div>

Модуль для сериализации-десериализации произвольных Python-объектов.

Основные особенности:
1. Бинарный формат
2. Поддерживает большинство Python объектов
3. Не безопасен (десериализация может привести к выполнению произвольного кода)

Формат pickle - 6 разных протоколов: <br>
0 - cтарый "человеко-читабельный" формат <br>
1 - старый бинарный формат <br>
2 - бинарный формат с Python 2.3 (используется по умолчанию с этой версии) <br>
3 - бинарный формат с Python 3.0 (используется по умолчанию с этой версии) <br>
4 - новый бинарный формат. Добавлен в Python 3.4. Используется по умолчанию с Python 3.8 <br>
5 - улучшенный новый бинарный формат. Добавлен в Python 3.8 <br>

Можно использовать любой протокол при желании. <br>
Однако, чем более высокая версия используется, тем более свежий должен быть Python.

In [29]:
import sys
import pickle

print(sys.version_info)
print(pickle.HIGHEST_PROTOCOL)
print(pickle.DEFAULT_PROTOCOL)

sys.version_info(major=3, minor=8, micro=12, releaselevel='final', serial=0)
5
4


В модуле pickle 4 основных функции: 2 для работы с потоками и 2 для работы со строчками.

Поток:
1. dump
2. load

Строчка:
1. dumps
2. loads

In [30]:
data = {
    'one': 1,
    'two': 2,
    'three': 3,
}

for protocol_version in range(6):
    data_dump = pickle.dumps(data, protocol=protocol_version)
    print(protocol_version, ':', data_dump)
    print(data == pickle.loads(data_dump))

0 : b'(dp0\nVone\np1\nI1\nsVtwo\np2\nI2\nsVthree\np3\nI3\ns.'
True
1 : b'}q\x00(X\x03\x00\x00\x00oneq\x01K\x01X\x03\x00\x00\x00twoq\x02K\x02X\x05\x00\x00\x00threeq\x03K\x03u.'
True
2 : b'\x80\x02}q\x00(X\x03\x00\x00\x00oneq\x01K\x01X\x03\x00\x00\x00twoq\x02K\x02X\x05\x00\x00\x00threeq\x03K\x03u.'
True
3 : b'\x80\x03}q\x00(X\x03\x00\x00\x00oneq\x01K\x01X\x03\x00\x00\x00twoq\x02K\x02X\x05\x00\x00\x00threeq\x03K\x03u.'
True
4 : b'\x80\x04\x95\x1f\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03one\x94K\x01\x8c\x03two\x94K\x02\x8c\x05three\x94K\x03u.'
True
5 : b'\x80\x05\x95\x1f\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03one\x94K\x01\x8c\x03two\x94K\x02\x8c\x05three\x94K\x03u.'
True


In [31]:
with open('simple.pkl', 'wb') as fout:
    pickle.dump(data, fout)

with open('simple.pkl', 'rb') as fin:
    print(pickle.load(fin))

{'one': 1, 'two': 2, 'three': 3}


In [5]:
# Missing class example 

from dataclasses import dataclass
import pickle
from decimal import Decimal

@dataclass
class Item:
    name: str
    price: Decimal = '0.0'
    quantity: int = 0
        
    @property
    def total_cost(self):
        return self.price * self.quantity
    

item = Item('book', price=Decimal('1.23'), quantity=2)
print(item.total_cost)

item_pickled = pickle.dumps(item, protocol=0)
print(item_pickled)

# del Item

print(pickle.loads(item_pickled))

2.46
b'ccopy_reg\n_reconstructor\np0\n(c__main__\nItem\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nVname\np6\nVbook\np7\nsVprice\np8\ncdecimal\nDecimal\np9\n(V1.23\np10\ntp11\nRp12\nsVquantity\np13\nI2\nsb.'
Item(name='book', price=Decimal('1.23'), quantity=2)


In [33]:
# lambda example

my_lambda = lambda x: x ** 2
print(my_lambda(2))

pickle.dumps(my_lambda)

4


PicklingError: Can't pickle <function <lambda> at 0x10dc2a430>: attribute lookup <lambda> on __main__ failed

In [8]:
def func(x):
    return x + 2

pickle.dumps(func, protocol=0)

b'c__main__\nfunc\np0\n.'

Pickle сохраняет **только** аттрибуты объекта. <br>
Для классов и функций сохраняется **только** идентификаторы, которые потом позволят "восстановить" объект.

**Полезные аргументы** <br>
`pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)` <br>
`pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)`

In [34]:
# Custom pickle logic
from urllib3 import PoolManager


class HTTPAdapter:
    __attrs__ = ('pool_connections', 'pool_timeout')
    
    def __init__(self, pool_connections, pool_timeout):
        self.pool_connections = pool_connections
        self.pool_timeout = pool_timeout
        self.pool_manager = None  # Set in _init_pool_manager
        self._init_pool_manager()
    
    def __getstate__(self):
        print('In __getstate__')
        return {attr: getattr(self, attr, None) for attr in self.__attrs__}
    
    def __setstate__(self, state):
        print('In __setstate__')
        for attr, value in state.items():
            setattr(self, attr, value)
        self._init_pool_manager()  # Reinit pool after deserialization
    
    def _init_pool_manager(self):
        self.pool_manager = PoolManager(self.pool_connections, timeout=self.pool_timeout)
        
        
adapter = HTTPAdapter(42, 32)
adapter_dump = pickle.dumps(adapter, protocol=0)
print(adapter_dump)
print('======Break======')
pickle.loads(adapter_dump)

In __getstate__
b'ccopy_reg\n_reconstructor\np0\n(c__main__\nHTTPAdapter\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nVpool_connections\np6\nI42\nsVpool_timeout\np7\nI32\nsb.'
In __setstate__


<__main__.HTTPAdapter at 0x10dc3f670>