pip install symbolicai
Run symconfig
to init the cache and get the path of the current configuration that you need to set up. For this project, you'll need an EMBEDDING_ENGINE_MODEL
. We support openai
and local through llamacpp
. Read more about the configuration here and local models here.
Once the setup is done, install this package:
sympkg i ExtensityAI/LightRAG-symai --submodules
Install LightRAG
submodule. The path is located in packages
under [get-this-using-symconfig/].symai/packages/ExtensityAI/LightRAG-symai
.
cd path/to/LightRAG-symai/LightRAG
pip install -e .
You'll also need to have psql
installed with pgvector
extension. Once installed, create the database using psql
. Make sure to use the same database name, user, and password (if set) as the one specified in the configuration file.
createuser -s lightrag
createdb lightrag
psql -U lightrag -d rag
Lastly, enable the pgvector
extension:
rag=# CREATE EXTENSION IF NOT EXISTS vector;
Create a config.json
file in the root directory of the project:
cd path/to/LightRAG-symai
touch config.json
Then, set up your configuration. Here's an example:
{
"backend": "postgres",
"tokenizer_name": "Xenova/gpt-4o",
"chunker_name": "RecursiveChunker",
"local": {
"working_dir": "./lightrag_cache"
},
"postgres": {
"host": "localhost",
"port": 5432,
"user": "lightrag",
"password": "lightrag1234",
"database": "rag",
"working_dir": "rag_store",
"workspace": "default",
"embedding_batch_num": 8
},
"rag_settings": {
"embedding_dim": 1536,
"max_token_size": 8191,
"embedding_cache_enabled": false,
"embedding_cache_sim_threshold": 0.90,
"llm_cache_enabled": false,
"batch_size": 20
}
}
Then the plugin can be used as follows:
import asyncio
import logging
from lightrag import QueryParam
from symai import Import
from symai.components import FileReader
async def main():
# Load configuration
Config = Import.load_expression("ExtensityAI/LightRAG-symai", "Config")
config = Config.load("path/to/config.json")
# Initialize RAGManager with backend
RAGManager = Import("ExtensityAI/LightRAG-symai", working_dir=config.local.working_dir)
rag_manager = await RAGManager.init_with_backend(config=config)
# Optional: Enable detailed logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("lightrag")
logger.setLevel(logging.DEBUG)
logger.disabled = False
# Add documents to the RAG system
reader = FileReader()
document_contents = [
reader("path/to/document1.pdf").value[0],
reader("path/to/document2.pdf").value[0]
]
doc_ids = ["doc1", "doc2"] # Provide identifiers for the documents
# Upsert documents (insert or update)
await rag_manager.chunk_and_upsert(
document_contents=document_contents[0], # or `document_paths` or `document_urls`
document_ids=doc_ids[0],
workspace="default" # Optional workspace name
)
# Query the RAG system
query = "your query here"
query_params = QueryParam(
mode="naive",
top_k=5,
only_need_context=False,
return_doc_names=True,
response_type="Long answer with multiple sections and paragraphs; list facts and details."
)
result = await rag_manager.query(query, param=query_params, workspace="default")
# Access results
if isinstance(result, str):
print(result) # Cache hit
else:
print(result["doc_names"]) # List of relevant document names
print(result["response"]) # Generated response based on the documents
if __name__ == "__main__":
asyncio.run(main())
See the example notebook for more detailed usage examples and information on more advanced features, such as tagging: β‘οΈ Example.ipynb
- Document Insertion & Updates: Support for both inserting new documents and updating existing ones
- Workspace Management: Organize documents in different workspaces
- Tag System: Add tags to documents for organized insertion and querying
- Configurable Query Parameters: Fine-tune search results with customizable parameters
- Detailed Logging: Optional DEBUG level logging for troubleshooting
- Postgres Backend: Robust storage and retrieval using PostgreSQL
- File Format Support: Works with various document formats through FileReader component