# Embeddings: Semantics of the language

We have some sentences and want to compare the semantic distance of them.

In [None]:
%pip install sentence-transformers==5.0.0
%pip install numpy==2.3.1

## 1. The data

In [4]:
sentences = [
    'The dogs play with the ball on the grass',
    'The pack rolls the round thing on a meadow',
    'Die Hunde spielen mit dem Ball auf der Wiese',
    'Das Rudel rollt das runde Ding auf dem Rasen herum',
    'The balloon rises to the sky',
    'The pigeon lands on the roof',
]

## 2. Create embeddings using Huggingface models

Models can, for example, be found at the [https://huggingface.co/spaces/mteb/leaderboard](https://huggingface.co/spaces/mteb/leaderboard)

In [2]:
from sentence_transformers import SentenceTransformer

model_name = 'intfloat/multilingual-e5-large'
embedding_model = SentenceTransformer(model_name)

embeddings = embedding_model.encode(sentences)

print(embeddings[0])
for embedding in embeddings:
    print(len(embedding))


  from .autonotebook import tqdm as notebook_tqdm


[ 0.00696113  0.00294617 -0.03543508 ... -0.00045179 -0.03532086
 -0.01644677]
1024
1024
1024
1024
1024
1024


In [5]:
import numpy as np

def cosine_similarity(left, right):
    return np.dot(left, right) / (np.linalg.norm(left) * np.linalg.norm(right))

for i, left in enumerate(sentences):
    for j, right in enumerate(sentences):
        if j < i:
            continue
        similarity = cosine_similarity(embeddings[i], embeddings[j])
        print(f'{similarity:.4f}: {left} <-> {right}')


1.0000: The dogs play with the ball on the grass <-> The dogs play with the ball on the grass
0.8134: The dogs play with the ball on the grass <-> The pack rolls the round thing on a meadow
0.8993: The dogs play with the ball on the grass <-> Die Hunde spielen mit dem Ball auf der Wiese
0.8116: The dogs play with the ball on the grass <-> Das Rudel rollt das runde Ding auf dem Rasen herum
0.7384: The dogs play with the ball on the grass <-> The balloon rises to the sky
0.7284: The dogs play with the ball on the grass <-> The pigeon lands on the roof
1.0000: The pack rolls the round thing on a meadow <-> The pack rolls the round thing on a meadow
0.7888: The pack rolls the round thing on a meadow <-> Die Hunde spielen mit dem Ball auf der Wiese
0.8414: The pack rolls the round thing on a meadow <-> Das Rudel rollt das runde Ding auf dem Rasen herum
0.7446: The pack rolls the round thing on a meadow <-> The balloon rises to the sky
0.7437: The pack rolls the round thing on a meadow <-> T