# Hybrid Search: dense and sparse vectors

LlamaIndex integration with Qdrant supports sparse embeddings as well. From the user perspective, it doesn't change much, as they interact through the same interface. Since sparse and dense vectors work best in different setups, it makes sense to combine them if we want to have the best of both worlds. There are, however, some parameters we can control.

Let's again start with recreating our pipeline, but this time we will use the other collection that has sparse vectors as well.


In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(
    embed_model="local:BAAI/bge-large-en"
)

In [3]:
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore

import os

client = QdrantClient(
    os.environ.get("QDRANT_URL"), 
    api_key=os.environ.get("QDRANT_API_KEY"),
)
vector_store_hybrid = QdrantVectorStore(
    client=client,
    collection_name="hacker-news-hybrid",
    enable_hybrid=True,
    batch_size=20,  # this is important for the ingestion
)

tokenizer_config.json:   0%|          | 0.00/449 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/620 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/417 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/699 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

In [4]:
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store_hybrid,
    service_context=service_context,
)

## Differences between sparse and dense vectors

Sparse vectors are usually used in high-dimensional spaces, where the majority of the elements are zero. A single dimension represents a single word, so the dimensionality of the space is equal to the size of the vocabulary, with just a few non-zero values. 

There are various ways to create sparse vectors, but the most common one is to use the TF-IDF or BM25 representation. It's a simple and effective way to represent the importance of words in a document and in many cases create a solid baseline for the search.

LlamaIndex uses SPLADE by default, which is based on transformers, similar to dense embedding models. **The main advantage of using sparse vectors is that they overcome the problem of vocabulary mismatch**. If a word is not present in the vocabulary of the dense embedding model, we can still represent it using the sparse vectors.

## Using sparse vectors only

Before we dive into the hybrid search, let's see what might be achieved by using sparse vectors alone. We already know the nodes retrieved by dense vectors so it makes sense to compare the results returned by both methods.

In [5]:
from llama_index.vector_stores.types import VectorStoreQueryMode
from llama_index.indices.vector_store import VectorIndexRetriever

sparse_retriever = VectorIndexRetriever(
    index=index,
    vector_store_query_mode=VectorStoreQueryMode.SPARSE,
    sparse_top_k=5,
)

In [6]:
nodes = sparse_retriever.retrieve("What is the best way to learn programming?")
for i, node in enumerate(nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: Best way to learn GPU programming?

I&#x27;d like to learn GPU programming but I&#x27;m having difficulty finding high-quality resources. I tried a class at coursera and was severely disappointed by both quality and content.<p>What are the best resources for learning things like GPU architecture, CUDA, Triton, etc?<p>My goal is to do be able to do something like take a description of Flash Attention and implement it from scratch, or optimize existing CUDA code.

Disclosure: I&#x27;ve never done it.  But I looked at some CUDA code from Leela Chess Zero and it made reasonable sense.  It&#x27;s just C++ with some slight changes.  The GPU architecture is a little bit quirky but not that complicated either.  Plus there are libraries like pytorch that handle most of the GPU stuff for you.<p>I would say ML concepts and algorithms are way more complicated than GPU programming per se.  The fast.ai lectures were pretty understandable when I watched some of them a few years ago, but att

## Hybrid search

There are some specific use cases in which we may prefer to use just the sparse vectors. But both methods may complement each other and we usually need to find the sweet spot. The `VectorIndexRetriever` class allows us to control the parameters of the search. We can set the `sparse_top_k` and `similarity_top_k` parameters to control the number of results returned by each method. We can also set the `alpha` parameters to control the importance of each method (`0.0` = sparse, `1.0` = dense vectors only).

In [7]:
hybrid_retriever = VectorIndexRetriever(
    index=index,
    vector_store_query_mode=VectorStoreQueryMode.HYBRID,
    sparse_top_k=5,
    similarity_top_k=5,
    alpha=0.1,
)

In [8]:
nodes = hybrid_retriever.retrieve("What is the best way to learn programming?")
for i, node in enumerate(nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: Best way to learn GPU programming?

I&#x27;d like to learn GPU programming but I&#x27;m having difficulty finding high-quality resources. I tried a class at coursera and was severely disappointed by both quality and content.<p>What are the best resources for learning things like GPU architecture, CUDA, Triton, etc?<p>My goal is to do be able to do something like take a description of Flash Attention and implement it from scratch, or optimize existing CUDA code.

Disclosure: I&#x27;ve never done it.  But I looked at some CUDA code from Leela Chess Zero and it made reasonable sense.  It&#x27;s just C++ with some slight changes.  The GPU architecture is a little bit quirky but not that complicated either.  Plus there are libraries like pytorch that handle most of the GPU stuff for you.<p>I would say ML concepts and algorithms are way more complicated than GPU programming per se.  The fast.ai lectures were pretty understandable when I watched some of them a few years ago, but att

In [9]:
# We shouldn't be modifying the alpha parameter after the retriever has been created
# but that's the easiest way to show the effect of the parameter
hybrid_retriever._alpha = 0.9

nodes = hybrid_retriever.retrieve("What is the best way to learn programming?")
for i, node in enumerate(nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: What would you look for in a platform to learn programming?

Hey everyone!<p>I&#x27;m curious, what does the perfect programming education platform look like to you?<p>I&#x27;m an experienced developer, but I really think that the current options for learning programming could be a lot better. I know that there are platforms like CodeCademy and places to watch video courses like YouTube and Udemy. There are also so many scammy &quot;learn to code&quot; sites (CodeFinity).<p>The pattern I notice is that platforms like CodeCademy are web-apps and are very career-path-oriented (i.e. get certifications). I personally think that having a platform which was a desktop app would be a better solution. Rather than focusing on career-tracks, you could follow courses to build a specific project using an integrated IDE, 100% on your machine.<p>How important are career-tracks and certifications to you? Or, would you rather just learn to build a specific project on your own machine all in o