# üß† VectorDBPipe Demo

Welcome to the official **VectorDBPipe Demo Notebook!**  
This notebook shows how to:

‚úÖ Load text data  
‚úÖ Generate embeddings  
‚úÖ Store and retrieve them from a vector database  
‚úÖ Run the full pipeline end-to-end

Let's get started üöÄ

## üèóÔ∏è 1. Installation

You can install the package directly from PyPI once published:
```bash
pip install vectorDBpipe
```

Or if you're running locally for development:
```bash
pip install -e .
```

In [None]:
# ‚úÖ Importing necessary modules
from vectorDBpipe.pipeline.text_pipeline import TextPipeline
from vectorDBpipe.config.config_manager import ConfigManager
from vectorDBpipe.logger.logging import get_logger

logger = get_logger(__name__)

config = ConfigManager().get_config()
pipeline = TextPipeline(config)

logger.info("Pipeline initialized successfully!")

## üìÇ 2. Load and View Sample Data

In [None]:
# Example: Sample text dataset
sample_docs = [
    "Artificial Intelligence is transforming the world.",
    "Machine Learning enables computers to learn from data.",
    "Vector databases are crucial for similarity search.",
    "Natural Language Processing helps machines understand text."
]

for i, doc in enumerate(sample_docs, 1):
    print(f"Doc {i}: {doc}")

## ‚öôÔ∏è 3. Generate Embeddings

In [None]:
embeddings = pipeline.embedder.generate_embeddings(sample_docs)
print(f"Generated embeddings shape: {len(embeddings)} x {len(embeddings[0])}")

## üóÇÔ∏è 4. Store Embeddings in Vector Database (FAISS/Chroma)

In [None]:
pipeline.vector_store.insert_vectors(sample_docs, embeddings)
logger.info("Embeddings successfully stored in vector DB!")

## üîç 5. Query and Retrieve Similar Texts

In [None]:
query = "How does AI impact society?"
results = pipeline.vector_store.search_vectors(query, top_k=2)

print("Query:", query)
print("\nTop Results:")
for idx, res in enumerate(results, 1):
    print(f"{idx}. {res}")

## üìä 6. Visualize Embedding Space

In [None]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

pca = PCA(n_components=2)
reduced = pca.fit_transform(embeddings)

plt.figure(figsize=(7,5))
plt.scatter(reduced[:,0], reduced[:,1], c='blue')

for i, txt in enumerate(sample_docs):
    plt.annotate(f"Doc {i+1}", (reduced[i,0]+0.02, reduced[i,1]))

plt.title("2D Visualization of Document Embeddings")
plt.xlabel("PCA-1")
plt.ylabel("PCA-2")
plt.grid(True)
plt.show()

## ‚úÖ 7. Full Pipeline Run Example

In [None]:
final_results = pipeline.run(sample_docs, query="What is machine learning?")
print(final_results)

## üß© 8. Summary

üéØ **You just ran an end-to-end vector pipeline!**

- Loaded text data
- Generated embeddings
- Stored and queried via vector DB
- Visualized similarity in 2D space

üëâ Try extending this notebook to include:
- Your own documents or datasets
- Custom embedding models (OpenAI, HuggingFace, etc.)
- Integration with LangChain or RAG agents