Skip to content

Vincent-de-Comarmond/cpython-extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPython

DISCLAIMER:

I am not an expert in any of the following Not at all. So … whatever you see below might be (and probably is) very badly done. The algorithms are definitely not very efficient and the way everything is glued together is probably sub-optimal or plain bad.

The project can be cloned from here:

Key resources:

Preamble: WTF is CPython

What you know as Python is actually CPython. It is the reference/standard implementation of the Python programming language and the most common (by far). Other implementations do exist and are used in various contexts. So the long of the short of this is that Python/CPython (I won’t differentiate between the two anymore) is actually an interpreted language written in C. I.e. it is something like translator/interpreter of a string/text-file written in C.

Let’s think about why this is. An interpreted language (i.e. one that’s not compiled to machine code) cannot run on hardware by itself. This is because hardware cannot understand “words” or strings. Computer chips understand electrical inputs in the form of 0s and 1s. Ultimately everything needs to get cooked down to this. Whatever is translating your Python to 0s and 1s cannot be the same words you see - at some stage whatever is interpreting Python must change the words to numbers. For this compiled/machine code is necessary.

So … like how is this useful?

Have you ever wondered how Numpy (for example) is so fast compared to pure python? Or why anybody would choose Pandas’ very non-pythonic syntax? The reason Pandas is useful is because it’s faster than “normal Python” one you’re working with more than a trivial amount of data.

The reason both of these libraries are “faster” than writing things by hand in Python is where the C in CPython comes in.

  • The standard Python is written in C
  • Is it possible to extend the “standard” Python by writing libraries in C (or even just extend the language itself)?
    • It turns out that this is indeed possible (though let’s leave modifying the language itself to the professionals).

The project: Make a C-based library to extend Python

So … let us try to make a C-extension to Python. Let’s do something computationally-intensive (like finding prime numbers).

What the project looks like:

.
├── LICENSE
├── pure_c
│   ├── primefinderfunc.c
│   ├── primefinderfunc.h
│   └── primefindermain.c
├── python_extension     # This is the working directory
│   ├── primefinder.c
│   ├── primefinderfunc.c
│   ├── primefinderfunc.h
│   ├── setup.py
│   ├── speedtest.py
│   ├── tests_prime_finding.py
└── README.org

Feel free to ignore the pure_c directory. This is where I tested my algorithm using purely C to ensure it worked before trying to convert it to a Python extension.

The C code

Note again - I am not an expert in this. So this stuff is … very sketchy at best. I’ve tried to comment/document the code to make it as understandable as possible. Please read the comments for direction.

primefinderfunc.c

This contains the function which computes primes. Note the statement #include <stdbool.h> - this is how one does imports in C. Note that the function generatePrimes does not return anything, but populates the array given to it. This is because one can’t really return an array from a function in C (this is a bit of a complicated question).

The code here is pretty simple.

primefinderfunc.h

This is how we make things exportable in C. There is nearly nothing here except the function stub void generatePrimes(int, long[]);. The C pre-processor finds the function body (defined above in primefinderfunc.c) based on the function signature of this stub and copies the function to wherever one writes #include "primefinderfunc.h", thereby making it visible in other scripts.

Note also the code:

#ifndef PRIMEFINDERFUNC_H
#define PRIMEFINDERFUNC_H
// ...
#endif

This simply prevents the code in the header (.h file) from being read multiple times (in a more complicated project) which would cause an error.

primefinder.c

This is where things get … ugly. This is the “glue” that glues the simple C code to Python. Again, I have gone out-of-my-way to try to comment the code to make it understandable. I do not understand everything that is happening here, but I try to explain what I can.

Beware of the following.

#include <python3.12/Python.h> // BEWARE - this may vary on your system

  • You will need the Python development headers installed on your system for this and it will have to be on you C compiler’s “include” path. On some systems this is trivial. On other systems I have no idea how this is automated (I just checked numpy … their build processes azure pipeline, meson build - are fare more complex to manage multiple environments).
  • Your path <python3.12/Python.h> may well be different from mine. Again, I’m unsure about how to deal with this generally and it’s out of the scope of this.

The Python code

Installing the C extension in Python

  • pip install .: Install our c-extension
  • You will probably need a c compiler on your system. GNU/Linux is built with C … so this was not a problem for me. For windows you might (maybe) have to install Microsoft’s C/C++ tools. I’m not sure about Mac.

setup.py

This is the Python half of the glue sticking together Python and C. This allows us to install the C module as a Python module and use the C function/s as if they were Python (I’m simplifying a bit, but anyways).

What the code does is pretty self-explanatory. Essentially it all boils down to “install this C module as a Python module.”

tests_prime_finding.py

This, more than anything, is just a demonstration of unit-testing with some interesting things thrown in.

Let’s quickly glance at some interesting stuff

from contextlib import redirect_stdout
from io import StringIO

import unittest

# Custom
from speedtest import timeit, generate_primes
from c_primefinder import find_primes


@timeit
def noarg_func():
    """Silly testing function"""
    return 1
...

NOTES/Things of Interest:

  • from contextlib import redirect_stdout: This allows us to capture outputs for print statements and write them elsewhere
  • from io import StringIO: This provides us with a “virtual file” for reading/writing strings to. So this reads/writes (strings) to memory rather than to a file. It can be very useful for some contexts.
  • from speedtest import timeit ...: We import the timeit decorator (more on that later) and use it with @timeit on top of a function signature.
...
def test_python_function(self):
    """Test to see that the python implementation gives us the correct result"""
    # ...
    python_result = generate_primes.__wrapped__(len(TestPrimeNumbers.PRIMES))
    self.assertEqual(tuple(python_result), TestPrimeNumbers.PRIMES)
...

As commented in the code the .__wrapped__ attribute of a decorated function gives us the undecorated function.

To run the unit-tests one can use either:

python -m unittest tests_prime_finding.py
# or
python tests_prime_finding.py # Only if unittest.main() is included in the script

speedtest.py

Things are finally coming to a head. We are going to test if our c-based library (together with all the glue of moving things between C and Python) is faster than the same algorithm in Python. There’s a bunch of interesting things here.

Firstly you can see the timeit decorator. A decorator is a function which modifies another function. In this case the decorator prints out how long it took for the function to execute, it’s arguments etc. as myfunction(arg1, arg2, kwarg1=val1, kwarg2=val2) execution time 0.001s. Pyspark’s udf is also a decorator. Decorators can be very useful, but shouldn’t be used thoughtlessly, as they can make the code more difficult to understand. Please note that there are far better ways (more reliable) to test the performance in python - I have used this here to show an example of writing a decorator. Decorators can be applied in one of 2 ways - either by using an @ on top of the function one wants to decorate, or by calling the decorating function and passing in the function you want to modify as an argument. Like so:

...
# First way of applying a decorator
@timeit
def generate_primes(desired_primes: int) -> List[int]:
    ...

# Second way ... making another function by using it as an argument to a decorator
timeit(find_primes)(DESIRED_NUM_PRIMES)

In this script there are also two functions def generate_primes_c_copy(desired_primes: int, prime_numbers: List[int]) -> None: and def generate_primes(desired_primes: int) -> List[int]:. These “look” very funny (non-pythonic), but this is because I’ve tried to copy the C algorithm exactly (so that we can have a fair test).

The results and me showing off some org-mode stuff

The script needs the desired number of primes to generate as an input. So call the script like so. Also, note, org-mode is amazing (for many reasons) - one of which is that it allows you to run nearly any language (and any combination of languages) in an org notebook, and even to pass results from one language to another language.

cd python_extension  # Remember this is suppossed to be the working directory
source venv/bin/activate
python speedtest.py 10000
python speedtest.py 50000
find_primes(10000)executiontime0.370544s
generate_primes(10000)executiontime2.991399s
find_primes(50000)executiontime9.217098s
generate_primes(50000)executiontime75.331726s

And there you have it - the C extension is nearly 10x faster than the pure python solution.

NOTE: This is still very much slower than a pure-c solution would be - if you look primefinder.c you’ll see that a huge amount of work goes into converting Python Objects to C and C things to Python Objects. If were were only doing C - this inefficiency would be removed.

Appendix: Org-mode supported languages:

LanguageIdentifierDocumentationMaintainer
AWKawkob-doc-awkTyler Smith
ApacheGroovygroovyPalak Mathur
Arduinoarduino
Asymptoteasymptoteob-doc-asymptote
CCob-doc-cThierry Banel
C++cppob-doc-cThierry Banel
CLIshellob-doc-shell
CSScssob-doc-css
CalccalcTom Gillespie
Clojureclojureob-doc-clojureDaniel Kraus
Comintmodecomint
Coqcoq
DDob-doc-cThierry Banel
EmacsLispemacs-lisp,elispob-doc-elisp
Eshelleshellob-doc-eshellstar diviner
FOMUSfomus
FortranF90
GNUScreenscreenob-doc-screenKen Mankoff
GNUsedsed
Gforthforth
Gnuplotgnuplotob-doc-gnuplotIhor Radchenko
Graphvizdotob-doc-dotJustin Abrahms
HaskellhaskellLawrence Bottorff
JJob-doc-J
Javajavaob-doc-javaIan Martins
Juliajuliaob-doc-juliaPedro Bruel
LaTeXlatexob-doc-LaTeX
LilyPondlyob-doc-lilypond
Lisplispob-doc-lisp
Lualuaob-doc-lua
MATLAB®matlabob-doc-octave-matlab
Makemakefileob-doc-makefile
Mathematicamathematica
Mathomatic™mathomaticob-doc-mathomatic
Maximamaxob-doc-maxima
Monocsharp
Monovbnet
Mozartozob-doc-oz
Mscgenmscgenob-doc-mscgen
Node.jsjsob-doc-js
OCamlocaml
Octaveoctaveob-doc-octave
Orgmodeorgob-doc-org
PHPphp
Perlperlob-doc-perlCorwin Brust
PicoLisppicolispob-doc-picolisp
PlantUMLplantumlob-doc-plantuml
ProcessingprocessingJarmo Hurri
Pythonpythonob-doc-pythonJack Kamm
RRob-doc-RJeremie Juste
Redisredis
Rubyruby
SMILESsmiles
SPICEspice
SQLsqlob-doc-sqlDaniel Kraus
SQLitesqliteob-doc-sqliteNick Savage
Sasssass
Schemeschemeob-doc-scheme
Shenshen
Stanstanob-doc-stan
Statastataob-doc-stata
SuperCollidersclang
Tcltclob-doc-tcl
Valavalaob-doc-vala
abcabcob-doc-abc
ditaaditaaob-doc-ditaa
ebnf2psebnf
hledgerhledger
ioio
ledgerledgerob-doc-ledger
ΕΥΚΛΕΙΔΗΣeukleidesob-doc-eukleides

About

Little toy detailing writing c extensions for python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published