Skip to content

theisolinearchip/clickhouse_cityhash_php_extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clickhouse_cityhash_php extension

php extension implementing the custom Cityhash algorithm found on Clickhouse source code (why? Well, I needed the exact hash function for some work-related projects and ended up building my own - and first! :D - php extension with its own source code).

According to Clickhouse docs the current Cityhash function is the "original" from Google (or at least it's linked without further explanation) BUT the version they're running is a different one.

The proper files are located here and despite having the same headers and copyright info from Google their content is slighly different. The original implementations seems to be found here.

(anyway, I'm usually refering to this one as clickhouse_cityhash)

What do we have here

  • A simple basic php extension, ready to be compiled and added to any php.ini or similar, that will enable specific Clickhouse Cityhash custom functions
  • A C version ported from the original Clickhouse Cityhash algorithm that can be found on their repo
  • More than 300 different hashes created from the original cityhash64 function (on Tinybird) for testing against this specific extension
  • A small C program for testing the C implementation without dealing with anything php-related

Requirements

  • gcc, make, etc.
  • php header dev files ("the latest ones" will probably be fine)

Installing

On this main folder you should have the following:

  • config.m4, a php-extension-related config file. Used to build the whole extension
  • php_clickhouse_cityhash.c and php_clickhouse_cityhash.h, main php extension files
  • /include, path with the Cityhash libs
  • /tests, path with test files and extra utils to make sure eveything works fine

Everything else will probably be aux/tmp stuff generated by the extension configuration.

Now we can start by building the php extension stuff around this:

$ phpize

Configuring for:
PHP Api Version:         20220829
Zend Module Api No:      20220829
Zend Extension Api No:   420220829

This will create extra files around the directory related to the php version. Once we have everything created, configure the makefiles:

$ ./configure

checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for a sed that does not truncate output... /bin/sed
...
...

If those dependencies are okay, it'll create a Makefile ready to use to compile / install the extension:

$ make
$ sudo make install (if desired)

The make process will create a clickhouse_cityhash.so under a new /modules dir. You can now use this file to load it whenever you want to, attach it to a php.ini file, automate some deployment pipeline, etc.

And now you can test it with the php test file:

$ php -d extension=clickhouse_cityhash.so tests/php/test.php 

For cleaning files:

$ make clean
$ phpize --clean

Usage

Once the module is loaded, it'll add five more functions from the original Cityhash lib:

clickhouse_cityhash64(string/int $str): string
clickhouse_cityhash64_with_seed(string/int $str, int $seed): string
clickhouse_cityhash64_with_seeds(string/int $str, int $seed_0, int $seed_1): string

clickhouse_cityhash128(string/int $str): string
clickhouse_cityhash128_with_seed(string/int $str, int $seed_part_1, int $seed_part_2): string

(notice the "clickhouse_cityhash" naming to avoid confusions with a "regular" cityhash implementation)

The most important call here is the first one, clickhouse_cityhash64(string $str): string: this is the one available on Tinybird and the one that needs to be compared with the provided sample hashes. The other ones were added to cover all the different Cityhash methods but, as far as I know, aren't available on Tinybird.

Keep in mind:

  • input string can also be a number (basically anything that can be easily casted as string - float will be okay, array noep)
  • all function params are necessary (classic php errors on that otherwise)
  • yes, hashes are purely numeric but we're returning strings (since we're dealing with unsigned 64 bit numbers -or 128!- the PHP_INT_MAX maaaay be under the max hash value)
  • clickhouse_cityhash128_with_seed use ONE 128 bit hash divided onto TWO segments: first and second (internally it'll be a PAIR)

Links

php extension stuff

cityhash / clickhouse stuff

About

PHP extension implementing the custom "cityhash" algorithm found on Clickhouse source code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages