Projects to optimize CPython 3.7
- Multiple interepters per process
- Gilectomy: GIL-less CPython
- Add a JIT to CPython? :-) (see Pyston and Pyjion)
- MERGED: Issue #26110: LOAD_METHOD and CALL_METHOD
- Issue #28158: Implement LOAD_GLOBAL opcode cache
- Free list for single-digits ints
- Convert more C functions to METH_FASTCALL and Argument Clinic
- Argument Clinic should understand *args and **kwargs parameters
- Argument Clinic: Fix signature of optional positional-only arguments
- _struct module
- DONE: print() function.
TODO: convert to Argument Clinic (need
- Search for Argument Clinic open issues
- Better bytecode/AST?
- Split PyGC_Head from object (ML thread)
- sizeof 1-tuple becomes (1 (pointer to gc head) + 3 (PyVarObject) + 1) words from (3 (gc head) + 3 + 1) words.
- Embed some tuples into code object.
(None,), code object uses 8 (or 6 if above optimization is land) words for the tuple and the pointer to it. It can be 2 words (length and one PyObject*).
- It may reduce RAM usage and improve cache utilization.
- Optimize option for stripping
- Reduces one dict for each (annotated) functions.
-O3may be OK, but individual optimization flag (e.g.
-Odocstring) would be better. It affects PEP 488.
- Interned-key only dict: Most name lookup uses interned string. If dict contains only interned keys only, lookup can see only pointer, and hash can be dropped from dict entries. This can reduce memory usage and cache utilization of namespece dicts.
- Global freepool: Many types has it's own freepool. Sharing freepool can increase memory and cache
PyMem_FastFree(void* ptr, size_t size)to store memory block to freepool, and
PyMem_Malloccan check global freepool first.