Skip to content
/ keras-hrp Public

Hashed Random Projection layer for TF2/Keras

License

Notifications You must be signed in to change notification settings

ulf1/keras-hrp

Repository files navigation

PyPI version PyPi downloads DOI

keras-hrp: Hashed Random Projection layer for TF2/Keras

Hashed Random Projection layer for TF2/Keras.

Usage

Hashed Random Projections (HRP), binary representations, encoding/decoding for storage (notebook)

Generate a HRP layer with a new hyperplane

The random projection or hyperplane is randomly initialized. The initial state of the PRNG (random_state) is required (Default: 42) to ensure reproducibility.

import keras_hrp as khrp
import tensorflow as tf

BATCH_SIZE = 32
NUM_FEATURES = 64
OUTPUT_SIZE = 1024

# demo inputs
inputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))

# instantiate layer 
layer = khrp.HashedRandomProjection(
    output_size=OUTPUT_SIZE,
    random_state=42   # Default: 42
)

# run it
outputs = layer(inputs)
assert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)

Instiantiate HRP layer with given hyperplane

import keras_hrp as khrp
import tensorflow as tf
import numpy as np

BATCH_SIZE = 32
NUM_FEATURES = 64
OUTPUT_SIZE = 1024

# demo inputs
inputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))

# create hyperplane as numpy array
myhyperplane = np.random.randn(NUM_FEATURES, OUTPUT_SIZE)

# instantiate layer 
layer = khrp.HashedRandomProjection(hyperplane=myhyperplane)

# run it
outputs = layer(inputs)
assert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)

Serialize Boolean to Int8

Python stores 1-bit boolean values always as 8-bit integers or 1-byte. Some database technologies behave in similar way, and use up 8x-times of the theoretically required storage space (e.g., Postgres boolean uses 1-byte instead of 1-bit). In order to save memory or storage space, chuncks of 8 boolean vector elements can be transformed into one 1-byte int8 number.

import keras_hrp as khrp
import numpy as np

# given boolean values
hashvalues = np.array([1, 0, 1, 0, 1, 1, 0, 0])

# serialize boolean to int8
serialized = khrp.bool_to_int8(hashvalues)

# deserialize int8 to boolean
deserialized = khrp.int8_to_bool(serialized)

# check
np.testing.assert_array_equal(deserialized, hashvalues)

Appendix

Installation

The keras-hrp git repo is available as PyPi package

pip install keras-hrp
pip install git+ssh://git@github.com/ulf1/keras-hrp.git

Install a virtual environment

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir

(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv. Use an absolute path without whitespaces.)

Python commands

  • Jupyter for the examples: jupyter lab
  • Check syntax: flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
  • Run Unit Tests: PYTHONPATH=. pytest

Publish

# pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist 
twine upload -r pypi dist/*

Clean up

find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv

Support

Please open an issue for support.

Contributing

Please contribute using Github Flow. Create a branch, add commits, and open a pull request.

Acknowledgements

The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 433249742 (GU 798/27-1; GE 1119/11-1).

Maintenance

  • till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project 433249742
  • since 01.Sep.2023 (v0.2.0) the code repository is maintained by @ulf1.

Citation

Please cite the arXiv Preprint when using this software for any purpose.

@misc{hamster2023rediscovering,
      title={Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings}, 
      author={Ulf A. Hamster and Ji-Ung Lee and Alexander Geyken and Iryna Gurevych},
      year={2023},
      eprint={2304.02481},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Hashed Random Projection layer for TF2/Keras

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages