Skip to content

Latest commit

 

History

History
220 lines (139 loc) · 5.11 KB

pickle.md

File metadata and controls

220 lines (139 loc) · 5.11 KB

pickleimage title

contents

related file

  • Lib/pickle.py
  • Modules/_pickle.c
  • Modules/clinic/_pickle.c.h

introduction

We use pickle module to serialize and unserialize our python object, There're several protocol version for pickle, The current version is 4

The pickle module will use the faster _pickle implemented in C(Modules/_pickle.c) if possible, if the _pickle module not found, the python implemented pickle(Lib/pickle.py) will be used

Type Implementation
None save_none
bool save_bool
int save_long
float save_float
bytes save_bytes
str save_str
tuple save_tuple
list save_list
dict save_dict
set save_set
frozenset save_frozenset
FunctionType save_global
save_reduce

implementation

Whenever you call dump, some extra information will be added to the result

The first byte is an identifier indicate that the following binary content is encoded in "pickle protocol"

The second byte is the protocol version

The final byte is a stop symbol indicate that it's the end of the binary content

pickle_head

None

NONE = b'N'   # push None

def save_none(self, obj):
	self.write(NONE)

The data is N here, with the aforementioned information added to it

>>> import pickle
>>> pickle.dumps(None)
b'\x80\x04N.'

bool

bool is simiiar to None

NEWTRUE        = b'\x88'  # push True
NEWFALSE       = b'\x89'  # push False

def save_bool(self, obj):
	if self.proto >= 2:
		self.write(NEWTRUE if obj else NEWFALSE)

The data here is b'\x88'(True) and b'\x89'(False)

>>> import pickle
>>> pickle.dumps(True)
b'\x80\x04\x88.'
>>> pickle.dumps(False)
b'\x80\x04\x89.'

int

The integer will be saved in various format according to it's value

int

int2

float

The float is saved in IEEE_754 standard

float

bytes

bytes object is save directly as the data part below

The head part various according to the data size

bytes

str

str is similar to bytes, except that str is encoded in utf-8 format before dump

str

tuple

tuple is more complicated than other basic type

If the tuple is empty

tuple0

Let's see an example

dumps(("a", "b", (2, )))
b'\x80\x04\x95\x0f\x00\x00\x00\x00\x00\x00\x00\x8c\x01a\x94\x8c\x01b\x94K\x02\x85\x94\x87\x94.'

\x80\x04 is pickle protocol and pickle version

\x95\x0f\x00\x00\x00\x00\x00\x00\x00 is frame symbol(\x95) and frame size(8 bytes) in little endian

. in last byte is the STOP symbol

Besides are the data

tuple1

I find that dumps does not support self reference tuples(how to Build Self-Referencing Tuples)

list

Let's see an exmple again

dumps(["a", "b", (2, )])
b'\x80\x04\x95\x11\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x01a\x94\x8c\x01b\x94K\x02\x85\x94e.'

The first several bytes are pickle protocol, pickle version and frame size

The last byte is STOP symbol

The data can be described as (]\x94(\x8c\x01a\x94\x8c\x01b\x94K\x02\x85\x94e)

list will be dumped batch by batch(default batch size 1000)

list1

dict and set are similar to list and tuple, begin and end with type symbol inidicate the type, and iter through each object and recursive call dump for each object

type

If what's to be saved is a type

class A(object):
    a = "a"
    b = "b"

    def run(self):
        print(self.a, self.b)

pickle.dumps(A)
b'\x80\x04\x95\x12\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x01A\x94\x93\x94.'

The data part is \x8c\x08__main__\x94\x8c\x01A\x94\x93\x94

type1

dumps(A) saves the module_name (__main__) and the object name(A) in str formart

object

If what's to be saved is an instance

a = A()
pickle.dumps(a)
b'\x80\x04\x95\x15\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x01A\x94\x93\x94)\x81\x94.'

The data part is \x8c\x08__main__\x94\x8c\x01A\x94\x93\x94)\x81\x94

The only difference is that there're some extra information appended after the previous dumped result

A TUPLE indicate the args needed for instance call, in the current case a = A() the args is empty, so it's an EMPTY_TUPLE

A NEWOBJ symbol indicate that it needs to call cls.__new__(cls, *args) after load the dumped result

object1