Skip to content

Latest commit

 

History

History
90 lines (54 loc) · 2.6 KB

bytes.md

File metadata and controls

90 lines (54 loc) · 2.6 KB

bytes

contents

related file

  • cpython/Objects/bytesobject.c
  • cpython/Include/bytesobject.h
  • cpython/Objects/clinic/bytesobject.c.h

memory layout

memory layout

The memory layout of PyBytesObject looks like memory layout of tuple object and memory layout of int object, but simpler than any of them.

example

empty bytes

bytes object is an immutable object, whenever you need to modify a bytes object, you need to create a new one, which keeps the implementation simple.

s = b""

empty

ascii characters

let's initialize a byte object with ascii characters

s = b"abcdefg123"

ascii

nonascii characters

s = "我是帅哥".encode("utf8")

nonascii

summary

ob_shash

The field ob_shash should store the hash value of the byte object, value -1 means not computed yet.

The first time the hash value computed, it will be cached in the ob_shash field

the cached hash value can save recalculation and speeds up dictionary lookups

ob_size

field ob_size is inside every PyVarObject, the PyBytesObject uses this field to store size information to keep O(1) time complexity for len() operation and tracks the size of non-ascii string(may be null characters inside)

summary

The PyBytesObject is a python wrapper of c style null terminate string, with ob_shash for caching hash value and ob_size for storing the size information of PyBytesObject

The implementation of PyBytesObject looks like the embstr encoding in redis

redis-cli
127.0.0.1:6379> set a "hello"
OK
127.0.0.1:6379> object encoding a
"embstr"