Caution
This is an experimental project and not ready for production use. Use at your own risk.
Seamless LLM Lineage for DataHub with LangChain and LangSmith
Features • Installation • Quick Start • Usage • Architecture • Contributing • License
A comprehensive observability solution that integrates LangChain and LangSmith workflows into DataHub's metadata platform, providing deep visibility into your LLM operations.
- 🔄 Real-Time Observation: Live monitoring of LangChain operations
- 📊 Rich Metadata: Detailed tracking of models, prompts, and chains
- 🔍 Deep Insights: Comprehensive metrics and lineage tracking
- 🚀 Multiple Platforms: Support for LangChain, LangSmith, and more
- 🛠 Extensible: Easy to add new platforms and emitters
- 🧪 Debug Mode: Built-in debugging and dry run capabilities
# Clone the repository
git clone <repository-url>
cd langchain-datahub-integration
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy and configure environment
cp .env.example .env
- Configure Environment
# Required environment variables
LANGSMITH_API_KEY=ls-...
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=default
OPENAI_API_KEY=sk-...
DATAHUB_GMS_URL=http://localhost:8080
DATAHUB_TOKEN=your_token_here
- Run Basic Example
from langchain_openai import ChatOpenAI
from src.platforms.langchain import LangChainObserver
from src.emitters.datahub import DataHubEmitter
from src.config import ObservabilityConfig
# Setup observation
config = ObservabilityConfig(langchain_verbose=True)
emitter = DataHubEmitter(gms_server="http://localhost:8080")
observer = LangChainObserver(config=config, emitter=emitter)
# Initialize LLM with observer
llm = ChatOpenAI(callbacks=[observer])
# Run with automatic observation
response = llm.invoke("Tell me a joke")
The integration consists of three main components:
-
Observers (
src/platforms/
)- Real-time monitoring of LLM operations
- Metric collection and event tracking
- Platform-specific adapters
-
Emitters (
src/emitters/
)- DataHub metadata emission
- Console debugging output
- JSON file export
-
Collectors (
src/collectors/
)- Historical data collection
- Batch processing
- Aggregated metrics
# examples/langchain_basic.py
from langchain_openai import ChatOpenAI
from src.platforms.langchain import LangChainObserver
observer = LangChainObserver(config=config, emitter=emitter)
llm = ChatOpenAI(callbacks=[observer])
# examples/langchain_rag.py
from langchain.chains import RetrievalQA
from src.utils.metrics import MetricsAggregator
chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(),
callbacks=[observer]
)
# examples/langsmith_ingest.py
from src.cli.ingest import ingest_logic
ingest_logic(
days=7,
platform='langsmith',
debug=True,
save_debug_data=True
)
The integration is highly customizable through:
- Configuration (
src/config.py
): Environment and platform settings - Custom Emitters: Implement
LLMMetadataEmitter
for new destinations - Platform Extensions: Add new platforms by implementing
LLMPlatformConnector
- Metrics Collection: Extend
MetricsAggregator
for custom metrics
- Fork the repository
- Create a feature branch
- Run tests and linting:
make test make lint
- Submit a pull request
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Made with ❤️ by Vincent Koc