php extension implementing the custom Cityhash algorithm found on Clickhouse source code (why? Well, I needed the exact hash function for some work-related projects and ended up building my own - and first! :D - php extension with its own source code).
According to Clickhouse docs the current Cityhash function is the "original" from Google (or at least it's linked without further explanation) BUT the version they're running is a different one.
The proper files are located here and despite having the same headers and copyright info from Google their content is slighly different. The original implementations seems to be found here.
(anyway, I'm usually refering to this one as clickhouse_cityhash)
- A simple basic php extension, ready to be compiled and added to any php.ini or similar, that will enable specific Clickhouse Cityhash custom functions
- A C version ported from the original Clickhouse Cityhash algorithm that can be found on their repo
- More than 300 different hashes created from the original cityhash64 function (on Tinybird) for testing against this specific extension
- A small C program for testing the C implementation without dealing with anything php-related
- gcc, make, etc.
- php header dev files ("the latest ones" will probably be fine)
On this main folder you should have the following:
- config.m4, a php-extension-related config file. Used to build the whole extension
- php_clickhouse_cityhash.c and php_clickhouse_cityhash.h, main php extension files
- /include, path with the Cityhash libs
- /tests, path with test files and extra utils to make sure eveything works fine
Everything else will probably be aux/tmp stuff generated by the extension configuration.
Now we can start by building the php extension stuff around this:
$ phpize
Configuring for:
PHP Api Version: 20220829
Zend Module Api No: 20220829
Zend Extension Api No: 420220829
This will create extra files around the directory related to the php
version. Once we have everything created, configure the makefiles:
$ ./configure
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for a sed that does not truncate output... /bin/sed
...
...
If those dependencies are okay, it'll create a Makefile ready to use to compile / install the extension:
$ make
$ sudo make install (if desired)
The make process will create a clickhouse_cityhash.so
under a new /modules dir. You can now use this file to load it whenever you want to, attach it to a php.ini file, automate some deployment pipeline, etc.
And now you can test it with the php test file:
$ php -d extension=clickhouse_cityhash.so tests/php/test.php
For cleaning files:
$ make clean
$ phpize --clean
Once the module is loaded, it'll add five more functions from the original Cityhash lib:
clickhouse_cityhash64(string/int $str): string
clickhouse_cityhash64_with_seed(string/int $str, int $seed): string
clickhouse_cityhash64_with_seeds(string/int $str, int $seed_0, int $seed_1): string
clickhouse_cityhash128(string/int $str): string
clickhouse_cityhash128_with_seed(string/int $str, int $seed_part_1, int $seed_part_2): string
(notice the "clickhouse_cityhash" naming to avoid confusions with a "regular" cityhash implementation)
The most important call here is the first one, clickhouse_cityhash64(string $str): string
: this is the one available on Tinybird and the one that needs to be compared with the provided sample hashes. The other ones were added to cover all the different Cityhash methods but, as far as I know, aren't available on Tinybird.
Keep in mind:
- input string can also be a number (basically anything that can be easily casted as string - float will be okay, array noep)
- all function params are necessary (classic php errors on that otherwise)
- yes, hashes are purely numeric but we're returning strings (since we're dealing with unsigned 64 bit numbers -or 128!- the PHP_INT_MAX maaaay be under the max hash value)
clickhouse_cityhash128_with_seed
use ONE 128 bit hash divided onto TWO segments: first and second (internally it'll be a PAIR)
- https://stackoverflow.com/questions/3632160/how-to-make-a-php-extension
- https://wiki.php.net/internals/extensions
- https://www.phpinternalsbook.com/php7/extensions_design/php_functions.html
- https://www.phpinternalsbook.com/php7/internal_types/strings/printing_functions.html
- https://github.com/php/php-src/blob/ef4b2fc283ddaf9bd692015f1db6dad52171c3ce/README.PARAMETER_PARSING_API
- https://github.com/php/php-src/blob/648be8600ff89e1b0e4a4ad25cebad42b53bed6d/Zend/zend_API.h
- https://github.com/php/php-src/blob/master/docs/parameter-parsing-api.md
- http://php.adamharvey.name/manual/en/internals2.funcs.php
- https://docstore.mik.ua/orelly/webprog/php/ch14_07.htm
- https://clickhouse.com/docs/en/sql-reference/functions/hash-functions/
- https://github.com/ClickHouse/ClickHouse/tree/master/contrib/cityhash102 (main folder with Clickhouse custom implementation)
- https://github.com/google/cityhash (Google's "original" Cityhash; notice this is NOT the implemented version on Clickhouse/Tinybird/here)
- https://github.com/go-faster/city (golang library that implements both the "classic" Cityhash and the Clickhouse Specific Cityhash)