Start with a JSON of our emojis:

In [None]:
import json

with open('emojis.json') as json_data:
    emojis = json.load(json_data)

print(emojis[-1])

{'name': 'water wave', 'emoji': '🌊'}


Next, we have to convert each emoji name to a vector. We call these vectors "embeddings".

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

In [None]:
emoji_names = [emoji['name'] for emoji in emojis]
emoji_name_embeddings = model.encode(emoji_names)

print(emoji_name_embeddings[0])

Now, convert our search query to an embedding vector as well:

In [16]:
query = "thai green curry"
query_embedding = model.encode([query])[0]

print(query_embedding)

[-3.96825597e-02  1.01327114e-02 -7.52246752e-03 -1.28105339e-02
  1.88992172e-02  2.91211554e-03  6.62427545e-02 -2.74637397e-02
  1.26501948e-01 -5.16763479e-02  1.14437528e-01 -8.20542425e-02
  1.65764053e-04 -3.99781158e-03  6.68256432e-02 -7.06777023e-03
  4.67435047e-02 -5.31320320e-03 -9.45683792e-02 -1.81885257e-01
 -6.64948970e-02  3.63597050e-02 -5.48365386e-03  2.85144392e-02
 -9.23436657e-02 -1.85490567e-02  6.01369850e-02  2.77057793e-02
 -3.21628302e-02 -5.71679212e-02 -5.25158644e-02 -3.67968567e-02
 -6.36268333e-02  3.80199999e-02 -1.22505322e-01 -5.47966966e-03
 -5.89681230e-02 -1.56650525e-02  5.26091643e-02  3.34644727e-02
 -2.92109586e-02  1.15965409e-02  8.99019092e-02 -6.98408037e-02
  7.44570047e-02  3.96279767e-02 -4.66899127e-02 -2.03601476e-02
  4.63716360e-03 -2.32769176e-02 -1.81581434e-02 -6.96181180e-03
 -3.96640748e-02  4.86699305e-02  6.18821047e-02 -7.15366825e-02
 -4.60323393e-02 -3.28911059e-02  2.81647239e-02  3.36098559e-02
  3.05489171e-02  2.28697

Find the closest emoji to our query embedding:

In [17]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def encode(query):
  return model.encode([query])[0]

def closest_emoji(query_embedding):
  similarities = cosine_similarity([query_embedding], emoji_name_embeddings)[0]
  match_index = np.argmax(similarities)
  score = np.max(similarities)
  return emojis[match_index], score

closest_emoji(encode("thai green curry"))

({'name': 'curry rice', 'emoji': '🍛'}, 0.6853474)

But we don't have to stop here. We can also use the embeddings to do emoji arithmetic:

In [31]:
closest_emoji(encode("heart") - encode("love"))

({'name': 'anatomical heart', 'emoji': '🫀'}, 0.4898693)

In [47]:
closest_emoji(encode("mother") - encode("job"))

({'name': 'pregnant woman', 'emoji': '🤰'}, 0.31187236)

In [55]:
closest_emoji(encode("strawberry cheesecake") + encode("healthy"))

({'name': 'strawberry', 'emoji': '🍓'}, 0.6214577)

In [62]:
closest_emoji(encode("mcdonalds") - encode("fast food"))

({'name': 'flag Heard  McDonald Islands', 'emoji': '🇭🇲'}, 0.31021857)

In [128]:
closest_emoji(encode("marriage") - encode("ring"))

({'name': 'wedding', 'emoji': '💒'}, 0.32908672)

The truth is these (arithmetic) examples are not perfect - the handpicked examples work well but there were a bunch that didn't work for me.