# 🔍 BERT Semantic Search

This notebook demonstrates how to build a semantic search engine using BERT-based embeddings via `sentence-transformers`, FAISS for fast retrieval, and a small sample corpus.


## 1️⃣ Install & Import Libraries

In [None]:
!pip install -q sentence-transformers faiss-cpu
from sentence_transformers import SentenceTransformer
import faiss
import json
import numpy as np
import pandas as pd


## 2️⃣ Load Corpus

In [None]:
# Load sample corpus
with open('data/corpus.json') as f:
    corpus = json.load(f)
texts = [item['text'] for item in corpus]
ids = [item['id'] for item in corpus]
pd.DataFrame(corpus)

## 3️⃣ Encode Corpus with Sentence-BERT

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts, convert_to_numpy=True, normalize_embeddings=True)
embeddings.shape

## 4️⃣ Build FAISS Index

In [None]:
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)  # Inner product for cosine similarity
index.add(embeddings)

## 5️⃣ Perform a Search Query

In [None]:
query = "How can I claim health insurance?"
query_vec = model.encode([query], convert_to_numpy=True, normalize_embeddings=True)
D, I = index.search(query_vec, k=3)  # top 3 results
results = [(corpus[i]['text'], float(D[0][j])) for j, i in enumerate(I[0])]
for text, score in results:
    print(f"Score: {score:.4f} | Text: {text}")