CPython

DISCLAIMER:

I am not an expert in any of the following Not at all. So … whatever you see below might be (and probably is) very badly done. The algorithms are definitely not very efficient and the way everything is glued together is probably sub-optimal or plain bad.

The project can be cloned from here:

git clone https://github.com/Vincent-de-Comarmond/cpython-extension.git
git clone git@github.com:Vincent-de-Comarmond/cpython-extension.git
https://github.com/Vincent-de-Comarmond/cpython-extension

Key resources:

Preamble: WTF is CPython

What you know as Python is actually CPython. It is the reference/standard implementation of the Python programming language and the most common (by far). Other implementations do exist and are used in various contexts. So the long of the short of this is that Python/CPython (I won’t differentiate between the two anymore) is actually an interpreted language written in C. I.e. it is something like translator/interpreter of a string/text-file written in C.

Let’s think about why this is. An interpreted language (i.e. one that’s not compiled to machine code) cannot run on hardware by itself. This is because hardware cannot understand “words” or strings. Computer chips understand electrical inputs in the form of 0s and 1s. Ultimately everything needs to get cooked down to this. Whatever is translating your Python to 0s and 1s cannot be the same words you see - at some stage whatever is interpreting Python must change the words to numbers. For this compiled/machine code is necessary.

So … like how is this useful?

Have you ever wondered how Numpy (for example) is so fast compared to pure python? Or why anybody would choose Pandas’ very non-pythonic syntax? The reason Pandas is useful is because it’s faster than “normal Python” one you’re working with more than a trivial amount of data.

The reason both of these libraries are “faster” than writing things by hand in Python is where the C in CPython comes in.

The standard Python is written in C
Is it possible to extend the “standard” Python by writing libraries in C (or even just extend the language itself)?
- It turns out that this is indeed possible (though let’s leave modifying the language itself to the professionals).

The project: Make a C-based library to extend Python

So … let us try to make a C-extension to Python. Let’s do something computationally-intensive (like finding prime numbers).

What the project looks like:

.
├── LICENSE
├── pure_c
│   ├── primefinderfunc.c
│   ├── primefinderfunc.h
│   └── primefindermain.c
├── python_extension     # This is the working directory
│   ├── primefinder.c
│   ├── primefinderfunc.c
│   ├── primefinderfunc.h
│   ├── setup.py
│   ├── speedtest.py
│   ├── tests_prime_finding.py
└── README.org

Feel free to ignore the pure_c directory. This is where I tested my algorithm using purely C to ensure it worked before trying to convert it to a Python extension.

The C code

Note again - I am not an expert in this. So this stuff is … very sketchy at best. I’ve tried to comment/document the code to make it as understandable as possible. Please read the comments for direction.

primefinderfunc.c

This contains the function which computes primes. Note the statement #include <stdbool.h> - this is how one does imports in C. Note that the function generatePrimes does not return anything, but populates the array given to it. This is because one can’t really return an array from a function in C (this is a bit of a complicated question).

The code here is pretty simple.

primefinderfunc.h

This is how we make things exportable in C. There is nearly nothing here except the function stub void generatePrimes(int, long[]);. The C pre-processor finds the function body (defined above in primefinderfunc.c) based on the function signature of this stub and copies the function to wherever one writes #include "primefinderfunc.h", thereby making it visible in other scripts.

Note also the code:

#ifndef PRIMEFINDERFUNC_H
#define PRIMEFINDERFUNC_H
// ...
#endif

This simply prevents the code in the header (.h file) from being read multiple times (in a more complicated project) which would cause an error.

primefinder.c

This is where things get … ugly. This is the “glue” that glues the simple C code to Python. Again, I have gone out-of-my-way to try to comment the code to make it understandable. I do not understand everything that is happening here, but I try to explain what I can.

Beware of the following.

#include <python3.12/Python.h> // BEWARE - this may vary on your system

You will need the Python development headers installed on your system for this and it will have to be on you C compiler’s “include” path. On some systems this is trivial. On other systems I have no idea how this is automated (I just checked numpy … their build processes azure pipeline, meson build - are fare more complex to manage multiple environments).
Your path <python3.12/Python.h> may well be different from mine. Again, I’m unsure about how to deal with this generally and it’s out of the scope of this.

The Python code

Installing the C extension in Python

pip install .: Install our c-extension
You will probably need a c compiler on your system. GNU/Linux is built with C … so this was not a problem for me. For windows you might (maybe) have to install Microsoft’s C/C++ tools. I’m not sure about Mac.

setup.py

This is the Python half of the glue sticking together Python and C. This allows us to install the C module as a Python module and use the C function/s as if they were Python (I’m simplifying a bit, but anyways).

What the code does is pretty self-explanatory. Essentially it all boils down to “install this C module as a Python module.”

tests_prime_finding.py

This, more than anything, is just a demonstration of unit-testing with some interesting things thrown in.

Let’s quickly glance at some interesting stuff

from contextlib import redirect_stdout
from io import StringIO

import unittest

# Custom
from speedtest import timeit, generate_primes
from c_primefinder import find_primes


@timeit
def noarg_func():
    """Silly testing function"""
    return 1
...

NOTES/Things of Interest:

from contextlib import redirect_stdout: This allows us to capture outputs for print statements and write them elsewhere
from io import StringIO: This provides us with a “virtual file” for reading/writing strings to. So this reads/writes (strings) to memory rather than to a file. It can be very useful for some contexts.
from speedtest import timeit ...: We import the timeit decorator (more on that later) and use it with @timeit on top of a function signature.

...
def test_python_function(self):
    """Test to see that the python implementation gives us the correct result"""
    # ...
    python_result = generate_primes.__wrapped__(len(TestPrimeNumbers.PRIMES))
    self.assertEqual(tuple(python_result), TestPrimeNumbers.PRIMES)
...

As commented in the code the .__wrapped__ attribute of a decorated function gives us the undecorated function.

To run the unit-tests one can use either:

python -m unittest tests_prime_finding.py
# or
python tests_prime_finding.py # Only if unittest.main() is included in the script

speedtest.py

Things are finally coming to a head. We are going to test if our c-based library (together with all the glue of moving things between C and Python) is faster than the same algorithm in Python. There’s a bunch of interesting things here.

Firstly you can see the timeit decorator. A decorator is a function which modifies another function. In this case the decorator prints out how long it took for the function to execute, it’s arguments etc. as myfunction(arg1, arg2, kwarg1=val1, kwarg2=val2) execution time 0.001s. Pyspark’s udf is also a decorator. Decorators can be very useful, but shouldn’t be used thoughtlessly, as they can make the code more difficult to understand. Please note that there are far better ways (more reliable) to test the performance in python - I have used this here to show an example of writing a decorator. Decorators can be applied in one of 2 ways - either by using an @ on top of the function one wants to decorate, or by calling the decorating function and passing in the function you want to modify as an argument. Like so:

...
# First way of applying a decorator
@timeit
def generate_primes(desired_primes: int) -> List[int]:
    ...

# Second way ... making another function by using it as an argument to a decorator
timeit(find_primes)(DESIRED_NUM_PRIMES)

In this script there are also two functions def generate_primes_c_copy(desired_primes: int, prime_numbers: List[int]) -> None: and def generate_primes(desired_primes: int) -> List[int]:. These “look” very funny (non-pythonic), but this is because I’ve tried to copy the C algorithm exactly (so that we can have a fair test).

The results and me showing off some org-mode stuff

The script needs the desired number of primes to generate as an input. So call the script like so. Also, note, org-mode is amazing (for many reasons) - one of which is that it allows you to run nearly any language (and any combination of languages) in an org notebook, and even to pass results from one language to another language.

cd python_extension  # Remember this is suppossed to be the working directory
source venv/bin/activate
python speedtest.py 10000
python speedtest.py 50000

find_primes(10000)	execution	time	0.370544s
generate_primes(10000)	execution	time	2.991399s
find_primes(50000)	execution	time	9.217098s
generate_primes(50000)	execution	time	75.331726s

And there you have it - the C extension is nearly 10x faster than the pure python solution.

NOTE: This is still very much slower than a pure-c solution would be - if you look primefinder.c you’ll see that a huge amount of work goes into converting Python Objects to C and C things to Python Objects. If were were only doing C - this inefficiency would be removed.

Appendix: Org-mode supported languages:

Language	Identifier	Documentation	Maintainer
AWK	awk	ob-doc-awk	Tyler Smith
ApacheGroovy	groovy		Palak Mathur
Arduino	arduino
Asymptote	asymptote	ob-doc-asymptote
C	C	ob-doc-c	Thierry Banel
C++	cpp	ob-doc-c	Thierry Banel
CLI	shell	ob-doc-shell
CSS	css	ob-doc-css
Calc	calc		Tom Gillespie
Clojure	clojure	ob-doc-clojure	Daniel Kraus
Comintmode	comint
Coq	coq
D	D	ob-doc-c	Thierry Banel
EmacsLisp	emacs-lisp,elisp	ob-doc-elisp
Eshell	eshell	ob-doc-eshell	star diviner
FOMUS	fomus
Fortran	F90
GNUScreen	screen	ob-doc-screen	Ken Mankoff
GNUsed	sed
Gforth	forth
Gnuplot	gnuplot	ob-doc-gnuplot	Ihor Radchenko
Graphviz	dot	ob-doc-dot	Justin Abrahms
Haskell	haskell		Lawrence Bottorff
J	J	ob-doc-J
Java	java	ob-doc-java	Ian Martins
Julia	julia	ob-doc-julia	Pedro Bruel
LaTeX	latex	ob-doc-LaTeX
LilyPond	ly	ob-doc-lilypond
Lisp	lisp	ob-doc-lisp
Lua	lua	ob-doc-lua
MATLAB®	matlab	ob-doc-octave-matlab
Make	makefile	ob-doc-makefile
Mathematica	mathematica
Mathomatic™	mathomatic	ob-doc-mathomatic
Maxima	max	ob-doc-maxima
Mono	csharp
Mono	vbnet
Mozart	oz	ob-doc-oz
Mscgen	mscgen	ob-doc-mscgen
Node.js	js	ob-doc-js
OCaml	ocaml
Octave	octave	ob-doc-octave
Orgmode	org	ob-doc-org
PHP	php
Perl	perl	ob-doc-perl	Corwin Brust
PicoLisp	picolisp	ob-doc-picolisp
PlantUML	plantuml	ob-doc-plantuml
Processing	processing		Jarmo Hurri
Python	python	ob-doc-python	Jack Kamm
R	R	ob-doc-R	Jeremie Juste
Redis	redis
Ruby	ruby
SMILES	smiles
SPICE	spice
SQL	sql	ob-doc-sql	Daniel Kraus
SQLite	sqlite	ob-doc-sqlite	Nick Savage
Sass	sass
Scheme	scheme	ob-doc-scheme
Shen	shen
Stan	stan	ob-doc-stan
Stata	stata	ob-doc-stata
SuperCollider	sclang
Tcl	tcl	ob-doc-tcl
Vala	vala	ob-doc-vala
abc	abc	ob-doc-abc
ditaa	ditaa	ob-doc-ditaa
ebnf2ps	ebnf
hledger	hledger
io	io
ledger	ledger	ob-doc-ledger
ΕΥΚΛΕΙΔΗΣ	eukleides	ob-doc-eukleides

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
pure_c		pure_c
python_extension		python_extension
.gitignore		.gitignore
LICENSE		LICENSE
README.html		README.html
README.org		README.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CPython

DISCLAIMER:

Key resources:

Preamble: WTF is CPython

So … like how is this useful?

The project: Make a C-based library to extend Python

What the project looks like:

The C code

primefinderfunc.c

primefinderfunc.h

primefinder.c

Beware of the following.

The Python code

Installing the C extension in Python

setup.py

tests_prime_finding.py

speedtest.py

The results and me showing off some org-mode stuff

Appendix: Org-mode supported languages:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Vincent-de-Comarmond/cpython-extension

Folders and files

Latest commit

History

Repository files navigation

CPython

DISCLAIMER:

Key resources:

Preamble: WTF is CPython

So … like how is this useful?

The project: Make a C-based library to extend Python

What the project looks like:

The C code

primefinderfunc.c

primefinderfunc.h

primefinder.c

Beware of the following.

The Python code

Installing the C extension in Python

setup.py

tests_prime_finding.py

speedtest.py

The results and me showing off some org-mode stuff

Appendix: Org-mode supported languages:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages