Skip to content

The hash trick that reduces graphic memory usage for EmbeddingBag.


Notifications You must be signed in to change notification settings


Repository files navigation


This is a sub-project for dlrm_ssm. It applies the idea from the paper Compressing Neural Netwkrs with the Hashing Trick to the Embedding Bags in Pytorch. The HashedEmbeddingBag reduces the graphic memory usage of EmbeddingBag, and only affects the accuracy slightly. Most code are modifed from Pytorch. We uses it as a comparsion to other embedding compression method in our paper Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation Systems.

How to install


The project requests cudatoolkit-dev to be compiled. If you installed CUDA Toolkit from Nvidia's official website, then it should be fine.

If you installed CUDA Toolkit from conda, my suggestion is to create a separate environment and install cudatoolkit-dev by:

conda install -c conda-forge cudatoolkit-dev


First, clone the repository:

git clone

Then, goes into the repository directory:

cd HashedEmbeddingBag

And install the package:

python install

How to use

The API is similar to The EmbeddingBag API of Pytorch.


For HashedEmbeddingBag:

name explanation
num_embeddings (int) size of the dictionary of embeddings
embedding_dim (int) the size of each embedding vector
compression(float) the ratio between the size of embedding for HashedEmbeddingBag and the size of embedding for Pytorch's EmbeddingBag
mode(string) sum/mean/max mode are all supported
_weight(Tensor) A one dimension Tensor used as the hashing buffer, if _weight is provided, the compression parameter will be ignored

For forwarding:

  • input and offsets have to be of the same type, either int or long.
  • if input is 2D of shape (B, N),
    • it will be treated as B bags (sequences) each of fixed length N, and this will return B values aggregated in a way depending on the mode. offsets is ignored and required to be None in this case.
  • If input is 1D of shape (N),
    • it will be treated as a concatenation of multiple bags (sequences). offsets is required to be a 1D tensor containing the starting index positions of each bag in input. Therefore, for offsets of shape (B), input will be viewed as having B bags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.


if we want to embedding a category with 1000 different values, and the length of each embedding is 16.

num_categories = 1000
embedding_dim = 16
compression = 0.1
EE = hashedEmbeddingBag.HashedEmbeddingBag(num_categories, embedding_dim, compression, "sum")

If we have a input contains 18 bags, each bag contains 0 - 7 category values.

bag_num = 18

bag_size = torch.randint(low=0, high=7, size=(bag_num,))
indices_num = bag_size.sum().item()

indices = torch.randint(low=0, high=num_categories - 1, size=(indices_num,))
offsets =[torch.zeros(1, dtype=torch.long), bag_size.cumsum(dim=0)[:-1]])

To do the Embedding, run:

embeddings = EE(indices, offsets)

The embeddings will be a 18 x 16 Tensor.


The hash trick that reduces graphic memory usage for EmbeddingBag.







No releases published
