|
| 1 | +.. highlight:: c |
| 2 | + |
| 3 | +.. _extension-modules: |
| 4 | + |
| 5 | +Defining extension modules |
| 6 | +-------------------------- |
| 7 | + |
| 8 | +A C extension for CPython is a shared library (for example, a ``.so`` file |
| 9 | +on Linux, ``.pyd`` DLL on Windows), which is loadable into the Python process |
| 10 | +(for example, it is compiled with compatible compiler settings), and which |
| 11 | +exports an :ref:`initialization function <extension-export-hook>`. |
| 12 | + |
| 13 | +To be importable by default (that is, by |
| 14 | +:py:class:`importlib.machinery.ExtensionFileLoader`), |
| 15 | +the shared library must be available on :py:attr:`sys.path`, |
| 16 | +and must be named after the module name plus an extension listed in |
| 17 | +:py:attr:`importlib.machinery.EXTENSION_SUFFIXES`. |
| 18 | + |
| 19 | +.. note:: |
| 20 | + |
| 21 | + Building, packaging and distributing extension modules is best done with |
| 22 | + third-party tools, and is out of scope of this document. |
| 23 | + One suitable tool is Setuptools, whose documentation can be found at |
| 24 | + https://setuptools.pypa.io/en/latest/setuptools.html. |
| 25 | + |
| 26 | +Normally, the initialization function returns a module definition initialized |
| 27 | +using :c:func:`PyModuleDef_Init`. |
| 28 | +This allows splitting the creation process into several phases: |
| 29 | + |
| 30 | +- Before any substantial code is executed, Python can determine which |
| 31 | + capabilities the module supports, and it can adjust the environment or |
| 32 | + refuse loading an incompatible extension. |
| 33 | +- By default, Python itself creates the module object -- that is, it does |
| 34 | + the equivalent of :py:meth:`object.__new__` for classes. |
| 35 | + It also sets initial attributes like :attr:`~module.__package__` and |
| 36 | + :attr:`~module.__loader__`. |
| 37 | +- Afterwards, the module object is initialized using extension-specific |
| 38 | + code -- the equivalent of :py:meth:`~object.__init__` on classes. |
| 39 | + |
| 40 | +This is called *multi-phase initialization* to distinguish it from the legacy |
| 41 | +(but still supported) *single-phase initialization* scheme, |
| 42 | +where the initialization function returns a fully constructed module. |
| 43 | +See the :ref:`single-phase-initialization section below <single-phase-initialization>` |
| 44 | +for details. |
| 45 | + |
| 46 | +.. versionchanged:: 3.5 |
| 47 | + |
| 48 | + Added support for multi-phase initialization (:pep:`489`). |
| 49 | + |
| 50 | + |
| 51 | +Multiple module instances |
| 52 | +......................... |
| 53 | + |
| 54 | +By default, extension modules are not singletons. |
| 55 | +For example, if the :py:attr:`sys.modules` entry is removed and the module |
| 56 | +is re-imported, a new module object is created, and typically populated with |
| 57 | +fresh method and type objects. |
| 58 | +The old module is subject to normal garbage collection. |
| 59 | +This mirrors the behavior of pure-Python modules. |
| 60 | + |
| 61 | +Additional module instances may be created in |
| 62 | +:ref:`sub-interpreters <sub-interpreter-support>` |
| 63 | +or after Python runtime reinitialization |
| 64 | +(:c:func:`Py_Finalize` and :c:func:`Py_Initialize`). |
| 65 | +In these cases, sharing Python objects between module instances would likely |
| 66 | +cause crashes or undefined behavior. |
| 67 | + |
| 68 | +To avoid such issues, each instance of an extension module should |
| 69 | +be *isolated*: changes to one instance should not implicitly affect the others, |
| 70 | +and all state owned by the module, including references to Python objects, |
| 71 | +should be specific to a particular module instance. |
| 72 | +See :ref:`isolating-extensions-howto` for more details and a practical guide. |
| 73 | + |
| 74 | +A simpler way to avoid these issues is |
| 75 | +:ref:`raising an error on repeated initialization <isolating-extensions-optout>`. |
| 76 | + |
| 77 | +All modules are expected to support |
| 78 | +:ref:`sub-interpreters <sub-interpreter-support>`, or otherwise explicitly |
| 79 | +signal a lack of support. |
| 80 | +This is usually achieved by isolation or blocking repeated initialization, |
| 81 | +as above. |
| 82 | +A module may also be limited to the main interpreter using |
| 83 | +the :c:data:`Py_mod_multiple_interpreters` slot. |
| 84 | + |
| 85 | + |
| 86 | +.. _extension-export-hook: |
| 87 | + |
| 88 | +Initialization function |
| 89 | +....................... |
| 90 | + |
| 91 | +The initialization function defined by an extension module has the |
| 92 | +following signature: |
| 93 | + |
| 94 | +.. c:function:: PyObject* PyInit_modulename(void) |
| 95 | +
|
| 96 | +Its name should be :samp:`PyInit_{<name>}`, with ``<name>`` replaced by the |
| 97 | +name of the module. |
| 98 | + |
| 99 | +For modules with ASCII-only names, the function must instead be named |
| 100 | +:samp:`PyInit_{<name>}`, with ``<name>`` replaced by the name of the module. |
| 101 | +When using :ref:`multi-phase-initialization`, non-ASCII module names |
| 102 | +are allowed. In this case, the initialization function name is |
| 103 | +:samp:`PyInitU_{<name>}`, with ``<name>`` encoded using Python's |
| 104 | +*punycode* encoding with hyphens replaced by underscores. In Python: |
| 105 | + |
| 106 | +.. code-block:: python |
| 107 | +
|
| 108 | + def initfunc_name(name): |
| 109 | + try: |
| 110 | + suffix = b'_' + name.encode('ascii') |
| 111 | + except UnicodeEncodeError: |
| 112 | + suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') |
| 113 | + return b'PyInit' + suffix |
| 114 | +
|
| 115 | +It is recommended to define the initialization function using a helper macro: |
| 116 | + |
| 117 | +.. c:macro:: PyMODINIT_FUNC |
| 118 | +
|
| 119 | + Declare an extension module initialization function. |
| 120 | + This macro: |
| 121 | + |
| 122 | + * specifies the :c:expr:`PyObject*` return type, |
| 123 | + * adds any special linkage declarations required by the platform, and |
| 124 | + * for C++, declares the function as ``extern "C"``. |
| 125 | + |
| 126 | +For example, a module called ``spam`` would be defined like this:: |
| 127 | + |
| 128 | + static struct PyModuleDef spam_module = { |
| 129 | + .m_base = PyModuleDef_HEAD_INIT, |
| 130 | + .m_name = "spam", |
| 131 | + ... |
| 132 | + }; |
| 133 | + |
| 134 | + PyMODINIT_FUNC |
| 135 | + PyInit_spam(void) |
| 136 | + { |
| 137 | + return PyModuleDef_Init(&spam_module); |
| 138 | + } |
| 139 | + |
| 140 | +It is possible to export multiple modules from a single shared library by |
| 141 | +defining multiple initialization functions. However, importing them requires |
| 142 | +using symbolic links or a custom importer, because by default only the |
| 143 | +function corresponding to the filename is found. |
| 144 | +See the `Multiple modules in one library <https://peps.python.org/pep-0489/#multiple-modules-in-one-library>`__ |
| 145 | +section in :pep:`489` for details. |
| 146 | + |
| 147 | +The initialization function is typically the only non-\ ``static`` |
| 148 | +item defined in the module's C source. |
| 149 | + |
| 150 | + |
| 151 | +.. _multi-phase-initialization: |
| 152 | + |
| 153 | +Multi-phase initialization |
| 154 | +.......................... |
| 155 | + |
| 156 | +Normally, the :ref:`initialization function <extension-export-hook>` |
| 157 | +(``PyInit_modulename``) returns a :c:type:`PyModuleDef` instance with |
| 158 | +non-``NULL`` :c:member:`~PyModuleDef.m_slots`. |
| 159 | +Before it is returned, the ``PyModuleDef`` instance must be initialized |
| 160 | +using the following function: |
| 161 | + |
| 162 | + |
| 163 | +.. c:function:: PyObject* PyModuleDef_Init(PyModuleDef *def) |
| 164 | +
|
| 165 | + Ensure a module definition is a properly initialized Python object that |
| 166 | + correctly reports its type and a reference count. |
| 167 | +
|
| 168 | + Return *def* cast to ``PyObject*``, or ``NULL`` if an error occurred. |
| 169 | +
|
| 170 | + Calling this function is required for :ref:`multi-phase-initialization`. |
| 171 | + It should not be used in other contexts. |
| 172 | +
|
| 173 | + Note that Python assumes that ``PyModuleDef`` structures are statically |
| 174 | + allocated. |
| 175 | + This function may return either a new reference or a borrowed one; |
| 176 | + this reference must not be released. |
| 177 | +
|
| 178 | + .. versionadded:: 3.5 |
| 179 | +
|
| 180 | +
|
| 181 | +.. _single-phase-initialization: |
| 182 | +
|
| 183 | +Legacy single-phase initialization |
| 184 | +.................................. |
| 185 | +
|
| 186 | +.. attention:: |
| 187 | + Single-phase initialization is a legacy mechanism to initialize extension |
| 188 | + modules, with known drawbacks and design flaws. Extension module authors |
| 189 | + are encouraged to use multi-phase initialization instead. |
| 190 | +
|
| 191 | +In single-phase initialization, the |
| 192 | +:ref:`initialization function <extension-export-hook>` (``PyInit_modulename``) |
| 193 | +should create, populate and return a module object. |
| 194 | +This is typically done using :c:func:`PyModule_Create` and functions like |
| 195 | +:c:func:`PyModule_AddObjectRef`. |
| 196 | +
|
| 197 | +Single-phase initialization differs from the :ref:`default <multi-phase-initialization>` |
| 198 | +in the following ways: |
| 199 | +
|
| 200 | +* Single-phase modules are, or rather *contain*, “singletons”. |
| 201 | +
|
| 202 | + When the module is first initialized, Python saves the contents of |
| 203 | + the module's ``__dict__`` (that is, typically, the module's functions and |
| 204 | + types). |
| 205 | +
|
| 206 | + For subsequent imports, Python does not call the initialization function |
| 207 | + again. |
| 208 | + Instead, it creates a new module object with a new ``__dict__``, and copies |
| 209 | + the saved contents to it. |
| 210 | + For example, given a single-phase module ``_testsinglephase`` |
| 211 | + [#testsinglephase]_ that defines a function ``sum`` and an exception class |
| 212 | + ``error``: |
| 213 | + |
| 214 | + .. code-block:: python |
| 215 | +
|
| 216 | + >>> import sys |
| 217 | + >>> import _testsinglephase as one |
| 218 | + >>> del sys.modules['_testsinglephase'] |
| 219 | + >>> import _testsinglephase as two |
| 220 | + >>> one is two |
| 221 | + False |
| 222 | + >>> one.__dict__ is two.__dict__ |
| 223 | + False |
| 224 | + >>> one.sum is two.sum |
| 225 | + True |
| 226 | + >>> one.error is two.error |
| 227 | + True |
| 228 | +
|
| 229 | + The exact behavior should be considered a CPython implementation detail. |
| 230 | + |
| 231 | +* To work around the fact that ``PyInit_modulename`` does not take a *spec* |
| 232 | + argument, some state of the import machinery is saved and applied to the |
| 233 | + first suitable module created during the ``PyInit_modulename`` call. |
| 234 | + Specifically, when a sub-module is imported, this mechanism prepends the |
| 235 | + parent package name to the name of the module. |
| 236 | + |
| 237 | + A single-phase ``PyInit_modulename`` function should create “its” module |
| 238 | + object as soon as possible, before any other module objects can be created. |
| 239 | + |
| 240 | +* Non-ASCII module names (``PyInitU_modulename``) are not supported. |
| 241 | + |
| 242 | +* Single-phase modules support module lookup functions like |
| 243 | + :c:func:`PyState_FindModule`. |
| 244 | + |
| 245 | +.. [#testsinglephase] ``_testsinglephase`` is an internal module used \ |
| 246 | + in CPython's self-test suite; your installation may or may not \ |
| 247 | + include it. |
0 commit comments