- cpython/Objects/bytesobject.c
- cpython/Include/bytesobject.h
- cpython/Objects/clinic/bytesobject.c.h
The memory layout of PyBytesObject looks like memory layout of tuple object and memory layout of int object, but simpler than any of them.
bytes object is an immutable object, whenever you need to modify a bytes object, you need to create a new one, which keeps the implementation simple.
s = b""
let's initialize a byte object with ascii characters
s = b"abcdefg123"
s = "我是帅哥".encode("utf8")
The field ob_shash should store the hash value of the byte object, value -1 means not computed yet.
The first time the hash value computed, it will be cached in the ob_shash field
the cached hash value can save recalculation and speeds up dictionary lookups
field ob_size is inside every PyVarObject, the PyBytesObject uses this field to store size information to keep O(1) time complexity for len() operation and tracks the size of non-ascii string(may be null characters inside)
The PyBytesObject is a python wrapper of c style null terminate string, with ob_shash for caching hash value and ob_size for storing the size information of PyBytesObject
The implementation of PyBytesObject looks like the embstr encoding in redis
redis-cli
127.0.0.1:6379> set a "hello"
OK
127.0.0.1:6379> object encoding a
"embstr"