# YDB Vector Store Example

## Setup

First, set up a local YDB with [docker compose file](https://github.com/ydb-platform/langchain-ydb/blob/main/docker/docker-compose.yml) using command: `docker compose up -d --wait`



Install `langchain-ydb` python package

In [1]:
!pip install -qU langchain-ydb

Then prepare embeddings model to work with:

In [2]:
!pip install -qU langchain-huggingface

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

Finally, create YDB Vector Store:

In [3]:
from langchain_ydb.vectorstores import YDB, YDBSettings

vector_store = YDB(
    embeddings,
    config=YDBSettings(
        table="langchain_ydb_example_notebook",
        drop_existing_table=True,
    ),
)

## Operations with YDB Vector Store

Prepare data to work with:

In [4]:
data = [
    (
        "The Earth revolves around the Sun once every 365.25 days.",
        {"category": "astronomy"}
    ),
    (
        "Water boils at 100 degrees Celsius at standard atmospheric pressure.",
        {"category": "science"}
    ),
    (
        "Light travels at approximately 299,792 kilometers per second in a vacuum.",
        {"category": "science"}
    ),
    (
        "The Great Wall of China is over 13,000 miles long.",
        {"category": "history"}
    ),
    (
        "Mount Everest is the highest mountain in the world, standing at 29,032 feet.",
        {"category": "geography"}
    ),
    (
        "The Amazon Rainforest is the largest tropical rainforest, covering over 5.5 "
        "million square kilometers.",
        {"category": "geography"}
    ),
    (
        "The human body contains 206 bones.",
        {"category": "biology"}
    ),
    (
        "The Pacific Ocean is the largest ocean on Earth, covering more than "
        "63 million square miles.",
        {"category": "geography"}
    ),
    (
        "The speed of sound in air is around 343 meters per second at "
        "room temperature.",
        {"category": "science"}
    ),
    (
        "A leap year occurs every four years to help synchronize the calendar year "
        "with the solar year.",
        {"category": "astronomy"}
    ),
    (
        "The cheetah is the fastest land animal, capable of running up to 75 miles per "
        "hour.",
        {"category": "biology"}
    ),
    (
        "Venus is the hottest planet in our solar system, with surface temperatures of "
        "around 467 degrees Celsius.",
        {"category": "astronomy"}
    ),
    (
        "Honey never spoils. Archaeologists have found pots of honey in "
        "ancient Egyptian tombs that are over 3,000 years old and still edible.",
        {"category": "history"}
    ),
    (
        "The heart of a resting adult pumps about 70 milliliters of blood per beat.",
        {"category": "biology"}
    ),
    (
        "The blue whale is the largest animal on Earth, growing up to "
        "100 feet long and weighing as much as 200 tons.",
        {"category": "biology"}
    ),
    (
        "The Eiffel Tower in Paris was completed in 1889 and was the tallest structure "
        "in the world until 1930.",
        {"category": "history"}
    ),
    (
        "Sharks have been around for over 400 million years, surviving several mass "
        "extinction events.",
        {"category": "biology"}
    ),
    (
        "Bananas are berries, while strawberries are not. Botanically, berries "
        "come from the ovary of a single flower with seeds embedded in the flesh.",
        {"category": "biology"}
    ),
    (
        "Tokyo is the most populous city in the world, with a population of over 37 "
        "million people in the metropolitan area.",
        {"category": "geography"}
    ),
    (
        "The Mona Lisa, painted by Leonardo da Vinci, is one of the most famous "
        "works of art and is displayed in the Louvre Museum in Paris.",
        {"category": "art"}
    )
]


texts = [row[0] for row in data]
metadatas = [row[1] for row in data]


Insert this data to vector store:

In [5]:
ids = vector_store.add_texts(texts, metadatas)

Inserting data...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 11.76it/s]


Similarity search:

In [6]:
vector_store.similarity_search("Any facts about Tokyo?", k=2)

[Document(metadata={'category': 'geography'}, page_content='Tokyo is the most populous city in the world, with a population of over 37 million people in the metropolitan area.'),
 Document(metadata={'category': 'history'}, page_content='The Great Wall of China is over 13,000 miles long.')]

Similarity search with score:

In [7]:
result = vector_store.similarity_search_with_score("What objects are huge?", k=4)
for res, score in result:
    print(f"[SIM={score:.3f}] {res.metadata['category']} \t | {res.page_content}")

[SIM=0.508] biology 	 | The blue whale is the largest animal on Earth, growing up to 100 feet long and weighing as much as 200 tons.
[SIM=0.373] history 	 | The Great Wall of China is over 13,000 miles long.
[SIM=0.339] geography 	 | The Pacific Ocean is the largest ocean on Earth, covering more than 63 million square miles.
[SIM=0.305] geography 	 | The Amazon Rainforest is the largest tropical rainforest, covering over 5.5 million square kilometers.


Similarity search with score and filter:

In [8]:
result = vector_store.similarity_search_with_score(
    "What objects are huge?", filter={"category":"geography"}
)
for res, score in result:
    print(f"[SIM={score:.3f}] {res.metadata['category']} \t | {res.page_content}")

[SIM=0.339] geography 	 | The Pacific Ocean is the largest ocean on Earth, covering more than 63 million square miles.
[SIM=0.305] geography 	 | The Amazon Rainforest is the largest tropical rainforest, covering over 5.5 million square kilometers.
[SIM=0.265] geography 	 | Mount Everest is the highest mountain in the world, standing at 29,032 feet.
[SIM=0.234] geography 	 | Tokyo is the most populous city in the world, with a population of over 37 million people in the metropolitan area.
