From 5da89abe210b65e0a01f003eaa001531947d888d Mon Sep 17 00:00:00 2001 From: Cyber MacGeddon Date: Mon, 6 Oct 2025 12:00:56 +0100 Subject: [PATCH 1/3] Update CHANGELOG for 1.4 (and added missing 1.3) --- .github/workflows/ci.yml | 2 +- .github/workflows/pages.yml | 2 +- community/changelog/trustgraph.md | 92 +++++++++++++++++++++++++++++++ 3 files changed, 94 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ab1f9b6..e05042a 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -2,7 +2,7 @@ name: CI on: push: - branches: ["main"] + branches: ["master"] pull_request: jobs: diff --git a/.github/workflows/pages.yml b/.github/workflows/pages.yml index c8c4569..bda7042 100644 --- a/.github/workflows/pages.yml +++ b/.github/workflows/pages.yml @@ -8,7 +8,7 @@ name: Deploy Jekyll site to Pages on: push: - branches: ["main"] + branches: ["master"] # Allows you to run this workflow manually from the Actions tab workflow_dispatch: diff --git a/community/changelog/trustgraph.md b/community/changelog/trustgraph.md index c2be38b..4b9fb3e 100644 --- a/community/changelog/trustgraph.md +++ b/community/changelog/trustgraph.md @@ -8,6 +8,98 @@ grand_parent: TrustGraph Documentation # Changelog +## v1.4.0 (2025-10-06) + +### New Features +- **Flow Configurable Parameters** (#526, #530, #531, #532, #533, #541): + Major enhancements to flow parameter system: + - Flow configurable parameters with dynamic settings + - LLM dynamic settings using llm-model and llm-rag-model parameters + - Temperature parameter support for all LLMs + - Flow creation uses parameter defaults in API and CLI + - Advanced parameter mode with controlled-by relationships + - New CLI tools: tg-show-parameter-types + - Dynamic chunking parameters +- **Structured Data Diagnosis Service** (#518, #519): + - New structured data diagnosis service plumbed into API gateway + - Added XML, JSON, CSV detection capabilities + - Type detector with schema selection +- **Enhanced Collection Management** (#520, #522, #542, #544): + - Collection metadata management and deletion capabilities + - Librarian services integrated with collection manager + - Collection tracking across all processors + - Explicit collection creation/deletion (removed implicit creation) + - Fixed collection management synchronization issues +- **User/Collection Isolation** (#509, #510): + - Neo4j user/collection separation + - Memgraph user/collection processing + +### Improvements +- **Cassandra Performance** (#521): + - Refactored Cassandra knowledge graph for single table + - Multi-table implementation for performance enhancement + - Added Cassandra collection table +- **GraphRAG Optimizations** (#527): Implemented GraphRAG optimizations with + updated tests +- **Vector Store Enhancements** (#512): Vector stores now create collections + on query +- **Build System** (#515): Parallel container builds +- **Logging** (#528, #543): Reduced excessive request/response logging and + log spam + +### Bug Fixes +- **Collection Management** (#544): Fixed collection management + synchronization problems +- **Metrics** (#539, #540): Fixed label names and label issues in metrics +- **WebSocket** (#517): Fixed async websocket closure handling +- **CLI** (#529): Fixed CLI typo +- **Tests** (#534, #535): Fixed failing tests and improved LLM parameter + test coverage +- **Object Writer** (#544): Fixed object writer management issues +- **Milvus** (#544): Updated Milvus to use ANN correctly + +### API Changes +- **Gateway** (#514): Return empty embeddings list as empty list through + gateway. +- **Parameters**: Changed `parameters` to `parameter-types` for consistency + +--- + +## v1.3.0 + +### New Features +- **Structured Data Enhancements** (#492, #493, #496, #498, #500): Major improvements to structured data handling: + - NLP query to GraphQL service for natural language database queries + - Structured query tool integration with agent framework + - Enhanced structured query support and object batching + - Structured data loader CLI with auto mode functionality + - Object import capabilities with batch processing +- **Collection Management** (#503, #511): + - Extended use of user + collection fields throughout the system + - Stores automatically create collections on query +- **Tool Groups** (#484): Added tool grouping functionality for better organization + +### Improvements +- **GraphQL Enhancements** (#486, #489): + - Added GraphQL table query support + - Removed redundant GraphQL collection parameter +- **Cassandra Configuration Standardization** (#483, #488, #490): + - Made Cassandra options (user, password, host) consistent across all utilities + - Consolidated Cassandra configuration for better consistency + - Refactored Cassandra operations to use common helper functions +- **API Improvements** (#513): Return empty embeddings list as empty list through gateway + +### Bug Fixes +- **Vector Store Fixes** (#507): Fixed Milvus vector store integration issues +- **Document Processing** (#506): Fixed document RAG processing issues +- **Monitoring** (#502): Fixed Prometheus incorrect metric names +- **API Consistency** (#481): Fixed trustgraph-base chunks/documents confusion in the API +- **System Integration** (#494): Resolved various system integration issues +- **Import/Export** (#476): Fixed graceful shutdown for import/export operations +- **Knowledge Loading** (#472): Use collection field from request when loading knowledge core + +--- + ## v1.2.17 ### New Features From da820cf1db2a8d405d2b8ada75008fa5d86064ef Mon Sep 17 00:00:00 2001 From: Cyber MacGeddon Date: Mon, 6 Oct 2025 12:30:32 +0100 Subject: [PATCH 2/3] Doc updates for 1.4, collections and flow params --- reference/apis/api-collection.md | 382 +++++++++++++++++ reference/cli/tg-delete-collection.md | 203 +++++++++ reference/cli/tg-list-collections.md | 137 ++++++ reference/cli/tg-set-collection.md | 196 +++++++++ reference/cli/tg-show-flow-classes.md | 38 +- reference/cli/tg-show-flows.md | 75 +++- reference/cli/tg-show-parameter-types.md | 248 +++++++++++ reference/cli/tg-start-flow.md | 83 +++- reference/configuration/parameters.md | 518 +++++++++++++++++++++++ 9 files changed, 1847 insertions(+), 33 deletions(-) create mode 100644 reference/apis/api-collection.md create mode 100644 reference/cli/tg-delete-collection.md create mode 100644 reference/cli/tg-list-collections.md create mode 100644 reference/cli/tg-set-collection.md create mode 100644 reference/cli/tg-show-parameter-types.md create mode 100644 reference/configuration/parameters.md diff --git a/reference/apis/api-collection.md b/reference/apis/api-collection.md new file mode 100644 index 0000000..4511c1b --- /dev/null +++ b/reference/apis/api-collection.md @@ -0,0 +1,382 @@ +--- +title: Collection Management API +layout: default +parent: APIs +--- + +# TrustGraph Collection Management API + +This API provides collection management for TrustGraph, allowing users to create, list, update, and delete collections. Collections are used to organize and isolate data within TrustGraph, enabling multi-tenancy and project separation. + +## Overview + +Collections provide a mechanism for organizing TrustGraph data by user and project. Each collection can store documents, embeddings, knowledge graph data, and structured objects in isolation from other collections. + +## Request/response + +### Request + +The request contains the following fields: +- `operation`: The operation to perform (see operations below) +- `user`: User ID who owns the collection +- `collection`: Collection identifier +- `name`: Human-readable collection name (optional) +- `description`: Detailed description of collection purpose (optional) +- `tags`: Array of tags for categorization (optional) +- `tag_filter`: Array of tags for filtering list operations (optional) +- `limit`: Maximum number of results to return (optional) + +### Response + +The response contains the following fields: +- `collections`: Array of collection metadata objects +- `timestamp`: Response timestamp +- `error`: Error information if operation fails + +### Collection Metadata + +Each collection object contains: +- `user`: User ID who owns the collection +- `collection`: Unique collection identifier +- `name`: Human-readable name +- `description`: Detailed description +- `tags`: Array of tags +- `created_at`: ISO timestamp when created +- `updated_at`: ISO timestamp when last updated + +## Operations + +### LIST-COLLECTIONS - List Collections for User + +List all collections for a specific user, with optional tag filtering. + +Request: +```json +{ + "operation": "list-collections", + "user": "alice" +} +``` + +Response: +```json +{ + "collections": [ + { + "user": "alice", + "collection": "research", + "name": "Research Project", + "description": "Medical research document analysis", + "tags": ["research", "medical"], + "created_at": "2024-01-15T10:30:00Z", + "updated_at": "2024-01-20T14:45:00Z" + }, + { + "user": "alice", + "collection": "archive", + "name": "Archive", + "description": "Archived documents", + "tags": ["archive"], + "created_at": "2024-01-01T00:00:00Z", + "updated_at": "2024-01-01T00:00:00Z" + } + ], + "timestamp": "2024-01-20T15:00:00Z" +} +``` + +#### With Tag Filter + +Request: +```json +{ + "operation": "list-collections", + "user": "alice", + "tag_filter": ["research"] +} +``` + +Response returns only collections with matching tags. + +### UPDATE-COLLECTION - Create or Update Collection + +Create a new collection or update metadata of an existing collection. + +Request: +```json +{ + "operation": "update-collection", + "user": "alice", + "collection": "research", + "name": "Research Project", + "description": "Medical research document analysis", + "tags": ["research", "medical", "priority"] +} +``` + +Response: +```json +{ + "collections": [ + { + "user": "alice", + "collection": "research", + "name": "Research Project", + "description": "Medical research document analysis", + "tags": ["research", "medical", "priority"], + "created_at": "2024-01-15T10:30:00Z", + "updated_at": "2024-01-20T15:05:00Z" + } + ], + "timestamp": "2024-01-20T15:05:00Z" +} +``` + +**Notes:** +- If collection doesn't exist, it will be created +- Only specified fields are updated; others remain unchanged +- All metadata fields (name, description, tags) are optional + +### DELETE-COLLECTION - Delete Collection and Data + +Permanently delete a collection and all associated data. + +Request: +```json +{ + "operation": "delete-collection", + "user": "alice", + "collection": "old-research" +} +``` + +Response: +```json +{ + "timestamp": "2024-01-20T15:10:00Z" +} +``` + +**Warning**: This operation is irreversible. All data in the collection will be permanently deleted, including: +- Collection metadata +- Documents and embeddings +- Knowledge graph triples +- Structured data objects +- Processing history + +## REST Service + +The REST service is available at `/api/v1/collection-management` and accepts the above request formats. + +## Websocket + +Requests have a `request` object containing the operation fields. +Responses have a `response` object containing the response fields. + +Request: +```json +{ + "id": "unique-request-id", + "service": "collection-management", + "request": { + "operation": "list-collections", + "user": "alice" + } +} +``` + +Response: +```json +{ + "id": "unique-request-id", + "response": { + "collections": [ + { + "user": "alice", + "collection": "research", + "name": "Research Project", + "description": "Medical research document analysis", + "tags": ["research", "medical"], + "created_at": "2024-01-15T10:30:00Z", + "updated_at": "2024-01-20T14:45:00Z" + } + ], + "timestamp": "2024-01-20T15:00:00Z" + }, + "complete": true +} +``` + +## Pulsar + +The Pulsar schema for the Collection Management API is defined in Python code here: + +https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/services/collection.py + +Default request queue: +`non-persistent://tg/request/collection` + +Default response queue: +`non-persistent://tg/response/collection` + +Request schema: +`trustgraph.schema.CollectionManagementRequest` + +Response schema: +`trustgraph.schema.CollectionManagementResponse` + +### Request Schema Fields + +```python +class CollectionManagementRequest(Record): + operation = String() # Operation to perform + user = String() # User ID + collection = String() # Collection ID + timestamp = String() # Request timestamp (ISO) + name = String() # Collection name + description = String() # Collection description + tags = Array(String()) # Collection tags + created_at = String() # Created timestamp (ISO) + updated_at = String() # Updated timestamp (ISO) + tag_filter = Array(String()) # Tag filter for list operations + limit = Integer() # Result limit +``` + +### Response Schema Fields + +```python +class CollectionManagementResponse(Record): + error = Error() # Error information + timestamp = String() # Response timestamp (ISO) + collections = Array(CollectionMetadata()) # Collection metadata array +``` + +### Collection Metadata Schema + +```python +class CollectionMetadata(Record): + user = String() # User ID + collection = String() # Collection ID + name = String() # Collection name + description = String() # Collection description + tags = Array(String()) # Collection tags + created_at = String() # Created timestamp (ISO) + updated_at = String() # Updated timestamp (ISO) +``` + +## Python SDK + +The Python SDK provides convenient access to the Collection Management API: + +```python +from trustgraph.api import Api + +api = Api("http://localhost:8088/") +collection_api = api.collection() + +# List collections +collections = collection_api.list_collections(user="alice") +for collection in collections: + print(f"{collection.collection}: {collection.name}") + +# List collections with tag filter +research_collections = collection_api.list_collections( + user="alice", + tag_filter=["research"] +) + +# Create or update collection +collection_api.update_collection( + user="alice", + collection="research", + name="Research Project", + description="Medical research document analysis", + tags=["research", "medical"] +) + +# Delete collection +collection_api.delete_collection(user="alice", collection="old-research") +``` + +## Features + +- **Multi-Tenancy**: Separate collections per user +- **Organization**: Tag-based categorization and filtering +- **Metadata Management**: Rich metadata with names and descriptions +- **Data Isolation**: Complete isolation between collections +- **Lifecycle Management**: Create, update, and delete operations +- **Atomic Operations**: Collection operations are atomic + +## Use Cases + +### Project Organization +```python +# Create collections for different projects +api.collection().update_collection( + user="research-team", + collection="project-alpha", + name="Project Alpha", + description="Alpha project knowledge base", + tags=["research", "active", "2024"] +) +``` + +### Multi-Tenant Applications +```python +# Set up collections for different customers +for customer_id in customer_ids: + api.collection().update_collection( + user=customer_id, + collection="main", + name=f"{customer_id} Main Collection", + description=f"Primary collection for {customer_id}", + tags=["customer", "production"] + ) +``` + +### Data Lifecycle Management +```python +# Archive old collections +old_collections = api.collection().list_collections( + user="archive-team", + tag_filter=["archive"] +) +for collection in old_collections: + # Export data, then delete + api.collection().delete_collection( + user="archive-team", + collection=collection.collection + ) +``` + +## Error Handling + +Errors are returned in the response `error` field: + +```json +{ + "error": { + "type": "CollectionNotFound", + "message": "Collection 'invalid-name' not found" + }, + "timestamp": "2024-01-20T15:00:00Z" +} +``` + +Common error types: +- `CollectionNotFound`: Collection doesn't exist +- `PermissionDenied`: User lacks permissions +- `InvalidRequest`: Malformed request +- `InternalError`: Server-side error + +## Related APIs + +- [Flow API](api-flow) - Manage processing flows +- [Knowledge API](api-knowledge) - Knowledge graph operations +- [Librarian API](api-librarian) - Document management + +## CLI Tools + +- [`tg-list-collections`](../cli/tg-list-collections) - List collections +- [`tg-set-collection`](../cli/tg-set-collection) - Create/update collections +- [`tg-delete-collection`](../cli/tg-delete-collection) - Delete collections diff --git a/reference/cli/tg-delete-collection.md b/reference/cli/tg-delete-collection.md new file mode 100644 index 0000000..02641f8 --- /dev/null +++ b/reference/cli/tg-delete-collection.md @@ -0,0 +1,203 @@ +--- +title: tg-delete-collection +layout: default +parent: CLI +--- + +# tg-delete-collection + +Delete a collection and all its data. + +## Synopsis + +```bash +tg-delete-collection COLLECTION [options] +``` + +## Description + +The `tg-delete-collection` command permanently deletes a collection and all associated data from TrustGraph. This includes documents, embeddings, knowledge graph triples, and any other data stored within the collection. + +**Warning**: This operation is irreversible. All data in the collection will be permanently lost. + +## Arguments + +### Required Arguments + +- `COLLECTION`: Collection ID to delete + +### Optional Arguments + +- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) +- `-U, --user USER`: User ID (default: `trustgraph`) +- `-y, --yes`: Skip confirmation prompt + +## Examples + +### Delete Collection with Confirmation +```bash +tg-delete-collection old-research +``` + +The command will prompt for confirmation: +``` +Are you sure you want to delete collection 'old-research' and all its data? (y/N): +``` + +### Delete Collection Without Confirmation +```bash +tg-delete-collection old-research -y +``` + +### Delete Collection for Specific User +```bash +tg-delete-collection customer-data -U alice -y +``` + +### Using Custom API URL +```bash +tg-delete-collection temp-data \ + -u http://production:8088/ \ + -y +``` + +## Output Format + +### Successful Deletion +``` +Collection 'old-research' deleted successfully. +``` + +### Cancelled Operation +``` +Operation cancelled. +``` + +## Interactive Confirmation + +By default, the command prompts for confirmation before deletion: + +``` +Are you sure you want to delete collection 'old-research' and all its data? (y/N): +``` + +Valid confirmation responses: +- `y` or `yes` (case-insensitive) - Proceed with deletion +- Any other input - Cancel operation + +Use the `-y/--yes` flag to skip this prompt for automated scripts. + +## What Gets Deleted + +When you delete a collection, the following data is permanently removed: + +- **Collection Metadata**: Name, description, tags, timestamps +- **Documents**: All documents loaded into the collection +- **Embeddings**: Document and graph embeddings +- **Knowledge Graph Data**: Triples, entities, relationships +- **Structured Data**: Any objects stored in the collection +- **Processing History**: All processing logs and metadata + +## Use Cases + +### Cleanup Development Collections +```bash +# Delete temporary testing collection +tg-delete-collection dev-testing -y +``` + +### Remove Completed Projects +```bash +# Archive and delete completed research project +# (assuming data has been backed up externally) +tg-delete-collection research-2023 -y +``` + +### Multi-Tenant Management +```bash +# Remove customer collection after contract end +tg-delete-collection customer-acme -U customer-acme -y +``` + +### Automated Cleanup Scripts +```bash +#!/bin/bash +# Delete all collections with specific tag +for collection in $(tg-list-collections -t temporary | tail -n +4 | awk '{print $2}'); do + tg-delete-collection "$collection" -y +done +``` + +## Environment Variables + +- `TRUSTGRAPH_URL`: Default API URL + +## Error Handling + +### Collection Not Found +```bash +Exception: Collection 'invalid-name' not found +``` +**Solution**: Verify the collection ID with `tg-list-collections`. + +### Permission Errors +```bash +Exception: Permission denied +``` +**Solution**: Ensure you're deleting a collection owned by your user. + +### Connection Errors +```bash +Exception: Connection refused +``` +**Solution**: Verify the API URL and ensure TrustGraph is running. + +## Safety Considerations + +### Before Deletion + +1. **Verify Collection**: Use `tg-list-collections` to confirm the collection ID +2. **Check Contents**: Review what data will be lost +3. **Backup Data**: Export important data before deletion +4. **Consider Alternatives**: Consider archiving instead of deleting +5. **Coordinate with Team**: Ensure no one is using the collection + +### Backup Options + +```bash +# Export knowledge graph data before deletion +tg-get-kg-core -U alice -c research > research-backup.ttl + +# Export document embeddings +tg-save-doc-embeds -U alice -c research -f research-embeddings.json +``` + +## Related Commands + +- [`tg-list-collections`](tg-list-collections) - List collections to verify before deletion +- [`tg-set-collection`](tg-set-collection) - Create or update collection metadata +- [`tg-get-kg-core`](tg-get-kg-core) - Export knowledge graph data before deletion +- [`tg-save-doc-embeds`](tg-save-doc-embeds) - Export document embeddings before deletion + +## API Integration + +This command uses the [Collection Management API](../apis/api-collection) with the `delete-collection` operation. + +## Notes + +- Deletion is permanent and cannot be undone +- The operation deletes all data across all TrustGraph storage systems +- Collection metadata is removed from the system +- Other users' collections are not affected +- The deletion is atomic - either all data is deleted or none + +## Best Practices + +1. **Always Backup**: Export important data before deletion +2. **Use Confirmation**: Avoid using `-y` flag unless in automated scripts +3. **Verify First**: Double-check collection ID with `tg-list-collections` +4. **Document Deletion**: Keep records of what was deleted and when +5. **Test in Development**: Test deletion scripts in development first +6. **Coordinate**: Notify team members before deleting shared collections +7. **Consider Archiving**: For historical data, consider archiving over deletion +8. **Audit Trail**: Maintain logs of collection deletions for compliance diff --git a/reference/cli/tg-list-collections.md b/reference/cli/tg-list-collections.md new file mode 100644 index 0000000..f7c405f --- /dev/null +++ b/reference/cli/tg-list-collections.md @@ -0,0 +1,137 @@ +--- +title: tg-list-collections +layout: default +parent: CLI +--- + +# tg-list-collections + +List collections for a user with their metadata. + +## Synopsis + +```bash +tg-list-collections [options] +``` + +## Description + +The `tg-list-collections` command displays all collections associated with a user, showing their metadata including names, descriptions, tags, and timestamps. Collections are used to organize and isolate data within TrustGraph, allowing multiple users and projects to maintain separate data spaces. + +## Options + +- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) +- `-U, --user USER`: User ID (default: `trustgraph`) +- `-t, --tag-filter TAG`: Filter by tags (can be specified multiple times) + +## Examples + +### List All Collections for Default User +```bash +tg-list-collections +``` + +### List Collections for Specific User +```bash +tg-list-collections -U alice +``` + +### Filter Collections by Tag +```bash +tg-list-collections -t research +``` + +### Filter by Multiple Tags +```bash +tg-list-collections -t research -t medical +``` + +### Using Custom API URL +```bash +tg-list-collections -u http://production:8088/ -U production-user +``` + +## Output Format + +The command displays collections in a formatted table: + +``` ++------------+------------------+------------------------+------------+---------------------+---------------------+ +| Collection | Name | Description | Tags | Created | Updated | ++------------+------------------+------------------------+------------+---------------------+---------------------+ +| research | Research Project | Medical research docs | research | 2024-01-15 10:30:00 | 2024-01-20 14:45:00 | +| default | Default | Default collection | default | 2024-01-01 00:00:00 | 2024-01-01 00:00:00 | ++------------+------------------+------------------------+------------+---------------------+---------------------+ +``` + +### No Collections Available +```bash +No collections found. +``` + +## Collection Fields + +Each collection displays the following information: + +- **Collection**: Unique collection identifier +- **Name**: Human-readable name +- **Description**: Detailed description of the collection's purpose +- **Tags**: Comma-separated list of tags for categorization +- **Created**: Timestamp when collection was created +- **Updated**: Timestamp of last update + +## Use Cases + +### Project Management +```bash +# List all research collections +tg-list-collections -t research + +# Check collections for a specific team +tg-list-collections -U data-science-team +``` + +### Multi-Tenant Environments +```bash +# List collections for each customer +tg-list-collections -U customer-a +tg-list-collections -U customer-b +``` + +### Collection Discovery +```bash +# Find all collections tagged as production +tg-list-collections -t production + +# List collections for audit +tg-list-collections -U admin > collections-audit.txt +``` + +## Environment Variables + +- `TRUSTGRAPH_URL`: Default API URL + +## Related Commands + +- [`tg-set-collection`](tg-set-collection) - Create or update collection metadata +- [`tg-delete-collection`](tg-delete-collection) - Delete a collection and its data +- [`tg-load-knowledge`](tg-load-knowledge) - Load data into a specific collection + +## API Integration + +This command uses the [Collection Management API](../apis/api-collection) with the `list-collections` operation. + +## Notes + +- Collections are user-scoped; each user has their own set of collections +- Tag filtering uses AND logic when multiple tags are specified +- Timestamps are displayed in ISO format +- The default user is "trustgraph" if not specified + +## Best Practices + +1. **Use Descriptive Names**: Assign meaningful names to collections for easy identification +2. **Tag Consistently**: Use consistent tagging schemes across your organization +3. **Regular Audits**: Periodically review collections to identify unused ones +4. **Document Purpose**: Use clear descriptions to explain collection purposes +5. **User Separation**: Use different users for different teams or projects diff --git a/reference/cli/tg-set-collection.md b/reference/cli/tg-set-collection.md new file mode 100644 index 0000000..007528a --- /dev/null +++ b/reference/cli/tg-set-collection.md @@ -0,0 +1,196 @@ +--- +title: tg-set-collection +layout: default +parent: CLI +--- + +# tg-set-collection + +Create or update collection metadata. + +## Synopsis + +```bash +tg-set-collection COLLECTION [options] +``` + +## Description + +The `tg-set-collection` command creates a new collection or updates the metadata of an existing collection. Collections are used to organize and isolate data within TrustGraph, allowing multiple users and projects to maintain separate data spaces. + +If the collection doesn't exist, it will be created. If it exists, the specified metadata fields will be updated. + +## Arguments + +### Required Arguments + +- `COLLECTION`: Collection ID to create or update + +### Optional Arguments + +- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) +- `-U, --user USER`: User ID (default: `trustgraph`) +- `-n, --name NAME`: Human-readable collection name +- `-d, --description DESCRIPTION`: Detailed description of the collection +- `-t, --tag TAG`: Collection tag (can be specified multiple times) + +## Examples + +### Create New Collection with Full Metadata +```bash +tg-set-collection research \ + -n "Research Project" \ + -d "Medical research document analysis" \ + -t research \ + -t medical +``` + +### Update Existing Collection Description +```bash +tg-set-collection research \ + -d "Updated: Medical and climate research documents" +``` + +### Add Tags to Collection +```bash +tg-set-collection research \ + -t research \ + -t medical \ + -t priority +``` + +### Create Collection for Specific User +```bash +tg-set-collection customer-data \ + -U alice \ + -n "Alice's Data" \ + -d "Customer-specific data collection" +``` + +### Minimal Collection Creation +```bash +tg-set-collection myproject +``` + +## Output Format + +On successful operation, the command displays a confirmation message and metadata table: + +``` +Collection 'research' set successfully. ++--------------+---------------------------------------+ +| Collection | research | +| Name | Research Project | +| Description | Medical research document analysis | +| Tags | research, medical | +| Updated | 2024-01-20 14:45:00 | ++--------------+---------------------------------------+ +``` + +## Use Cases + +### Project Setup +```bash +# Create collection for new research project +tg-set-collection climate-2024 \ + -n "Climate Research 2024" \ + -d "Climate change research and analysis" \ + -t research \ + -t climate \ + -t 2024 +``` + +### Multi-Tenant Configuration +```bash +# Set up collections for different customers +tg-set-collection acme-corp \ + -U customer-acme \ + -n "ACME Corporation" \ + -d "ACME Corp knowledge base" + +tg-set-collection widgets-inc \ + -U customer-widgets \ + -n "Widgets Inc" \ + -d "Widgets Inc documentation" +``` + +### Collection Organization +```bash +# Organize collections by environment +tg-set-collection prod-main \ + -n "Production Main" \ + -d "Primary production data" \ + -t production + +tg-set-collection dev-testing \ + -n "Development Testing" \ + -d "Development and testing data" \ + -t development +``` + +### Metadata Updates +```bash +# Add new tags to existing collection +tg-set-collection research \ + -t archive \ + -t completed + +# Update description +tg-set-collection research \ + -d "Archived: Medical research document analysis (completed 2024)" +``` + +## Collection Metadata + +Collection metadata includes: + +- **Collection ID**: Unique identifier (specified as argument) +- **Name**: Human-readable name for display +- **Description**: Detailed explanation of collection purpose +- **Tags**: List of tags for categorization and filtering +- **Timestamps**: Created and updated timestamps (managed automatically) + +## Environment Variables + +- `TRUSTGRAPH_URL`: Default API URL + +## Error Handling + +### Connection Errors +```bash +Exception: Connection refused +``` +**Solution**: Verify the API URL and ensure TrustGraph is running. + +### Invalid Collection ID +```bash +Exception: Invalid collection ID format +``` +**Solution**: Collection IDs should contain only alphanumeric characters, hyphens, and underscores. + +## Related Commands + +- [`tg-list-collections`](tg-list-collections) - List collections and their metadata +- [`tg-delete-collection`](tg-delete-collection) - Delete a collection and its data +- [`tg-load-knowledge`](tg-load-knowledge) - Load data into a specific collection + +## API Integration + +This command uses the [Collection Management API](../apis/api-collection) with the `update-collection` operation. + +## Notes + +- Collections are user-scoped; each user has their own namespace +- Metadata is optional but recommended for organization +- Tags can be used for filtering with `tg-list-collections` +- If a collection exists, only specified fields are updated; others remain unchanged +- The command creates collections implicitly if they don't exist + +## Best Practices + +1. **Use Descriptive IDs**: Choose meaningful collection IDs that indicate purpose +2. **Provide Clear Names**: Use human-readable names for better usability +3. **Document Purpose**: Always include descriptions explaining collection usage +4. **Tag Consistently**: Use consistent tagging schemes across your organization +5. **Plan Hierarchy**: Consider using prefixes for related collections (e.g., `prod-*`, `dev-*`) +6. **Review Regularly**: Update metadata as collection purposes evolve diff --git a/reference/cli/tg-show-flow-classes.md b/reference/cli/tg-show-flow-classes.md index 591c24e..832d4b3 100644 --- a/reference/cli/tg-show-flow-classes.md +++ b/reference/cli/tg-show-flow-classes.md @@ -20,6 +20,8 @@ The `tg-show-flow-classes` command displays a formatted table of all flow class Flow classes are templates that define the structure and services available for creating flow instances. This command helps you understand what flow classes are available for use. +**New in v1.4**: The command now displays configurable parameters for each flow class, including their types and default values. + ## Options ### Optional Arguments @@ -35,14 +37,20 @@ tg-show-flow-classes Output: ``` -+-----------------+----------------------------------+----------------------+ -| flow class | description | tags | -+-----------------+----------------------------------+----------------------+ -| document-proc | Document processing pipeline | production, nlp | -| data-analysis | Data analysis and visualization | analytics, dev | -| web-scraper | Web content extraction flow | scraping, batch | -| chat-assistant | Conversational AI assistant | ai, interactive | -+-----------------+----------------------------------+----------------------+ ++-------------------+----------------------------------+----------------------+ +| name | document-proc | +| description | Document processing pipeline | +| tags | production, nlp | +| parameters | model: LLM model [llm-model (default: gpt-4)] | +| | temperature: Response creativity [temperature (default: 0.7)] | ++-------------------+----------------------------------+----------------------+ + ++-------------------+----------------------------------+ +| name | data-analysis | +| description | Data analysis and visualization | +| tags | analytics, dev | +| parameters | chunk-size: Text chunking size [chunk-size (default: 1000)] | ++-------------------+----------------------------------+----------------------+ ``` ### Using Custom API URL @@ -66,14 +74,24 @@ tg-show-flow-classes | grep -E "(document|text|nlp)" The command displays results in a formatted table with columns: -- **flow class**: The unique name/identifier of the flow class +- **name**: The unique name/identifier of the flow class - **description**: Human-readable description of the flow class purpose - **tags**: Comma-separated list of categorization tags +- **parameters**: Configurable parameters with types and defaults (new in v1.4) + +### Parameter Information (New in v1.4) + +Parameters are displayed with: +- Parameter name and description +- Parameter type reference +- Default value from parameter type definition + +Format: ` param-name: Description [param-type (default: value)]` ### Empty Results If no flow classes exist: ``` -No flows. +No flow classes. ``` ## Use Cases diff --git a/reference/cli/tg-show-flows.md b/reference/cli/tg-show-flows.md index bc2bcac..31173af 100644 --- a/reference/cli/tg-show-flows.md +++ b/reference/cli/tg-show-flows.md @@ -20,6 +20,8 @@ The `tg-show-flows` command displays all currently configured flow instances, in This command is essential for understanding what flows are available, discovering service endpoints, and finding Pulsar queue names for direct API integration. +**New in v1.4**: The command now displays flow parameter settings with human-readable descriptions. + ## Options - `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) @@ -41,25 +43,30 @@ tg-show-flows -u http://production:8088/ The command displays each flow in a formatted table with the following information: ``` -+-------+---------------------------+ -| id | research-flow | -| class | document-rag+graph-rag | -| desc | Research document pipeline | -| queue | agent request: non-persistent://tg/request/agent:default | -| | agent response: non-persistent://tg/request/agent:default | -| | graph-rag request: non-persistent://tg/request/graph-rag:document-rag+graph-rag | -| | graph-rag response: non-persistent://tg/request/graph-rag:document-rag+graph-rag | -| | text-load: persistent://tg/flow/text-document-load:default | -+-------+---------------------------+ - -+-------+---------------------------+ -| id | medical-analysis | -| class | medical-nlp | -| desc | Medical document analysis | -| queue | embeddings request: non-persistent://tg/request/embeddings:medical-nlp | -| | embeddings response: non-persistent://tg/request/embeddings:medical-nlp | -| | document-load: persistent://tg/flow/document-load:medical-analysis | -+-------+---------------------------+ ++-------------+---------------------------+ +| id | research-flow | +| class | document-rag+graph-rag | +| desc | Research document pipeline | +| parameters | • LLM model: GPT-4 | +| | • Temperature: 0.7 | +| | • Chunk size: 1000 | +| queue | agent request: non-persistent://tg/request/agent:default | +| | agent response: non-persistent://tg/request/agent:default | +| | graph-rag request: non-persistent://tg/request/graph-rag:document-rag+graph-rag | +| | graph-rag response: non-persistent://tg/request/graph-rag:document-rag+graph-rag | +| | text-load: persistent://tg/flow/text-document-load:default | ++-------------+---------------------------+ + ++-------------+---------------------------+ +| id | medical-analysis | +| class | medical-nlp | +| desc | Medical document analysis | +| parameters | • LLM model: Claude 3 Opus | +| | • Temperature: 0.5 | +| queue | embeddings request: non-persistent://tg/request/embeddings:medical-nlp | +| | embeddings response: non-persistent://tg/request/embeddings:medical-nlp | +| | document-load: persistent://tg/flow/document-load:medical-analysis | ++-------------+---------------------------+ ``` ### No Flows Available @@ -137,11 +144,36 @@ Exception: Unauthorized - `TRUSTGRAPH_URL`: Default API URL +## Flow Parameters (New in v1.4) + +The command now displays flow parameter settings for each flow instance: + +### Parameter Display + +Parameters are shown with human-readable descriptions from parameter type definitions: +- **Enum values**: Display descriptions instead of IDs (e.g., "GPT-4" instead of "gpt-4") +- **Sorted order**: Parameters appear in their configured order +- **Controlled parameters**: Shows inheritance relationships when parameters are controlled by others + +### Example Parameter Output + +``` +| parameters | • LLM model: GPT-4 (Most capable OpenAI model) | +| | • RAG model: GPT-4 (controlled by LLM model) | +| | • Temperature: 0.7 | +| | • Chunk size: 1000 | +``` + +### When No Parameters + +If a flow has no configurable parameters, the parameters field is omitted from the output. + ## Related Commands -- [`tg-start-flow`](tg-start-flow) - Start a new flow instance +- [`tg-start-flow`](tg-start-flow) - Start a new flow instance with parameters - [`tg-stop-flow`](tg-stop-flow) - Stop a running flow -- [`tg-show-flow-classes`](tg-show-flow-classes) - List available flow classes +- [`tg-show-flow-classes`](tg-show-flow-classes) - List available flow classes and their parameters +- [`tg-show-parameter-types`](tg-show-parameter-types) - View parameter type definitions - [`tg-show-flow-state`](tg-show-flow-state) - Show detailed flow status - [`tg-show-config`](tg-show-config) - Show complete system configuration @@ -194,6 +226,7 @@ tg-show-flows | grep "graph-rag request" - **id**: Unique flow instance identifier - **class**: Flow class name used to create the instance - **desc**: Human-readable flow description +- **parameters**: Configured parameter values with descriptions (new in v1.4) - **queue**: Service interfaces and their Pulsar queue names ### Queue Names diff --git a/reference/cli/tg-show-parameter-types.md b/reference/cli/tg-show-parameter-types.md new file mode 100644 index 0000000..c7254df --- /dev/null +++ b/reference/cli/tg-show-parameter-types.md @@ -0,0 +1,248 @@ +--- +title: tg-show-parameter-types +layout: default +parent: CLI +--- + +# tg-show-parameter-types + +Shows all defined parameter types used in flow classes. + +## Synopsis + +```bash +tg-show-parameter-types [options] +``` + +## Description + +The `tg-show-parameter-types` command displays all parameter type definitions configured in TrustGraph. Parameter types define the schema and constraints for parameters that can be used in flow class definitions, including data types, default values, valid enums, and validation rules. + +Parameter types provide a centralized way to define reusable parameter schemas that ensure consistency across flow classes. + +## Options + +- `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) +- `-t, --type TYPE`: Show only the specified parameter type + +## Examples + +### Show All Parameter Types +```bash +tg-show-parameter-types +``` + +### Show Specific Parameter Type +```bash +tg-show-parameter-types -t llm-model +``` + +### Using Custom API URL +```bash +tg-show-parameter-types -u http://production:8088/ +``` + +## Output Format + +The command displays each parameter type in a formatted table: + +``` ++----------------+-----------------------------------------------------+ +| name | llm-model | +| description | LLM model selection | +| type | string | +| default | gpt-4 | +| valid values | • gpt-4 (GPT-4 model) | +| | • gpt-3.5-turbo (GPT-3.5 Turbo model) | +| | • claude-3-opus (Claude 3 Opus model) | +| constraints | required | ++----------------+-----------------------------------------------------+ + ++----------------+-----------------------------------------------------+ +| name | temperature | +| description | LLM temperature parameter for response randomness | +| type | number | +| default | 0.7 | +| constraints | min: 0.0, max: 2.0 | ++----------------+-----------------------------------------------------+ + ++----------------+-----------------------------------------------------+ +| name | chunk-size | +| description | Maximum size of text chunks in characters | +| type | integer | +| default | 1000 | +| constraints | min: 100, max: 10000 | ++----------------+-----------------------------------------------------+ +``` + +### No Parameter Types Defined +```bash +No parameter types defined. +``` + +## Parameter Type Fields + +Each parameter type includes: + +- **name**: Unique identifier for the parameter type +- **description**: Human-readable explanation of the parameter's purpose +- **type**: Data type (string, number, integer, boolean, array, object) +- **default**: Default value used when parameter is not specified +- **valid values**: Enum of allowed values (for enum types) +- **constraints**: Validation rules (min, max, minLength, maxLength, pattern, required) + +## Parameter Type Components + +### Data Types + +Supported parameter types: +- **string**: Text values +- **number**: Numeric values (floating-point) +- **integer**: Whole numbers +- **boolean**: true/false values +- **array**: Lists of values +- **object**: Structured data + +### Enum Values + +Parameters can define enums with descriptive labels: + +``` +valid values | • gpt-4 (GPT-4 model) + | • claude-3-opus (Claude 3 Opus model) + | • mistral-large (Mistral Large model) +``` + +### Constraints + +Common validation constraints: +- **min / max**: Numeric range limits +- **minLength / maxLength**: String length limits +- **pattern**: Regular expression validation +- **required**: Must be provided (no default) + +## Use Cases + +### Discover Available Parameters +```bash +# See what parameters can be configured +tg-show-parameter-types +``` + +### Check Parameter Defaults +```bash +# View default LLM model +tg-show-parameter-types -t llm-model +``` + +### Validate Flow Configuration +```bash +# Check valid values before configuring flow +tg-show-parameter-types -t embedding-model +``` + +### Documentation and Reference +```bash +# Generate parameter documentation +tg-show-parameter-types > parameter-reference.txt +``` + +## Parameter Type Definition + +Parameter types are stored in the configuration system with type `parameter-types`. They follow this schema: + +```json +{ + "type": "string", + "description": "LLM model selection", + "default": "gpt-4", + "enum": [ + { + "id": "gpt-4", + "description": "GPT-4 model" + }, + { + "id": "gpt-3.5-turbo", + "description": "GPT-3.5 Turbo model" + }, + { + "id": "claude-3-opus", + "description": "Claude 3 Opus model" + } + ], + "required": true +} +``` + +## Environment Variables + +- `TRUSTGRAPH_URL`: Default API URL + +## Related Commands + +- [`tg-show-flow-classes`](tg-show-flow-classes) - Show flow classes and their parameters +- [`tg-start-flow`](tg-start-flow) - Start a flow with parameter values +- [`tg-show-flows`](tg-show-flows) - Show active flows and their parameter settings +- [`tg-put-config-item`](tg-put-config-item) - Create or update parameter type definitions +- [`tg-list-config-items`](tg-list-config-items) - List all parameter types + +## Configuration Management + +Parameter types are managed through the configuration API: + +```bash +# List all parameter types +tg-list-config-items -t parameter-types + +# Get a specific parameter type +tg-get-config-item -t parameter-types -k llm-model + +# Create or update parameter type +tg-put-config-item -t parameter-types -k custom-param -v '{"type": "string", "default": "value"}' +``` + +## Parameter Usage in Flow Classes + +Flow classes reference parameter types in their definitions: + +```json +{ + "description": "Document processing flow", + "parameters": { + "model": { + "type": "llm-model", + "description": "LLM model to use for processing", + "order": 1 + }, + "temperature": { + "type": "temperature", + "description": "Response randomness", + "order": 2 + } + } +} +``` + +When starting a flow, users can override defaults: + +```bash +tg-start-flow -n document-processor -i my-flow -d "Processing" \ + --param model=claude-3-opus \ + --param temperature=0.5 +``` + +## Best Practices + +1. **Consistent Types**: Define parameter types centrally for reuse across flows +2. **Clear Descriptions**: Provide detailed descriptions for each parameter +3. **Sensible Defaults**: Set appropriate default values for common use cases +4. **Validation Rules**: Use constraints to prevent invalid configurations +5. **Enum Documentation**: Include descriptions for enum values to guide users +6. **Version Control**: Track parameter type changes over time +7. **Documentation**: Document parameter types for team reference + +## See Also + +- [Parameter Configuration Reference](../configuration/parameters) - Detailed parameter type schema +- [Flow Class Configuration](../configuration/flow-classes) - Using parameters in flow classes +- [Config API](../apis/api-config) - Managing parameter type definitions diff --git a/reference/cli/tg-start-flow.md b/reference/cli/tg-start-flow.md index d15156f..bff9d9d 100644 --- a/reference/cli/tg-start-flow.md +++ b/reference/cli/tg-start-flow.md @@ -20,6 +20,8 @@ The `tg-start-flow` command creates and starts a new processing flow instance ba Once started, a flow provides endpoints for document processing, knowledge queries, and other TrustGraph services through its configured interfaces. +**New in v1.4**: Flows can now be customized with configurable parameters that control LLM models, chunking behavior, and other processing settings. + ## Options ### Required Arguments @@ -32,6 +34,16 @@ Once started, a flow provides endpoints for document processing, knowledge queri - `-u, --api-url URL`: TrustGraph API URL (default: `$TRUSTGRAPH_URL` or `http://localhost:8088/`) +### Flow Parameters (New in v1.4) + +Parameters can be provided in three ways: + +- `-p, --parameters JSON`: Flow parameters as JSON string (e.g., `'{"model": "gpt-4", "temp": "0.7"}'`) +- `--parameters-file FILE`: Path to JSON file containing flow parameters +- `--param KEY=VALUE`: Individual parameter as key=value pair (can be used multiple times) + +**Note**: All parameter values are stored as strings internally, regardless of their input format. + ## Examples ### Start Basic Document Processing Flow @@ -59,6 +71,48 @@ tg-start-flow \ -u http://production:8088/ ``` +### Start Flow with Parameters (New in v1.4) + +#### Using Key=Value Pairs +```bash +tg-start-flow \ + -n "document-rag+graph-rag" \ + -i "custom-flow" \ + -d "Customized processing with Claude" \ + --param model=claude-3-opus \ + --param temperature=0.5 \ + --param chunk-size=2000 +``` + +#### Using JSON String +```bash +tg-start-flow \ + -n "document-rag+graph-rag" \ + -i "custom-flow" \ + -d "Customized processing" \ + -p '{"model": "gpt-4", "temperature": "0.7", "chunk-size": "1500"}' +``` + +#### Using JSON File +```bash +# Create parameters file +cat > flow-params.json < Date: Mon, 6 Oct 2025 12:38:48 +0100 Subject: [PATCH 3/3] Update remaining docs --- reference/apis/api-flow.md | 34 +++- reference/apis/pulsar.md | 90 +++++++++ reference/configuration/flow-classes.md | 248 +++++++++++++++++++++++- 3 files changed, 370 insertions(+), 2 deletions(-) diff --git a/reference/apis/api-flow.md b/reference/apis/api-flow.md index 23a4387..ba7a162 100644 --- a/reference/apis/api-flow.md +++ b/reference/apis/api-flow.md @@ -20,6 +20,7 @@ The request contains the following fields: - `class-definition`: Flow class definition JSON (for put-class) - `description`: Flow description (for start-flow) - `flow-id`: Flow instance ID (for flow instance operations) +- `parameters`: Map of parameter name to value (for start-flow, new in v1.4) ### Response @@ -29,6 +30,7 @@ The response contains the following fields: - `class-definition`: Flow class definition JSON (returned by get-class) - `flow`: Flow instance JSON (returned by get-flow) - `description`: Flow description (returned by get-flow) +- `parameters`: Map of parameter name to value (returned by get-flow, new in v1.4) - `error`: Error information if operation fails ## Operations @@ -131,7 +133,11 @@ Response: ```json { "flow": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion-flow-123\", \"response\": \"persistent://tg/response/text-completion-flow-123\"}}}", - "description": "PDF processing workflow instance" + "description": "PDF processing workflow instance", + "parameters": { + "model": "gpt-4", + "temperature": "0.7" + } } ``` @@ -147,6 +153,23 @@ Request: } ``` +**New in v1.4**: Request with parameters: +```json +{ + "operation": "start-flow", + "class-name": "pdf-processor", + "flow-id": "flow-123", + "description": "Processing document batch 1", + "parameters": { + "model": "gpt-4", + "temperature": "0.7", + "chunk-size": "1500" + } +} +``` + +All parameter values are stored as strings. Parameters not specified will use defaults from parameter type definitions. + Response: ```json {} @@ -234,6 +257,14 @@ definition = await client.get_class("pdf-processor") # Start a flow instance await client.start_flow("pdf-processor", "flow-123", "Processing batch 1") +# Start a flow instance with parameters (new in v1.4) +await client.start_flow( + "pdf-processor", + "flow-123", + "Processing batch 1", + parameters={"model": "gpt-4", "temperature": "0.7"} +) + # List active flows flows = await client.list_flows() @@ -245,6 +276,7 @@ await client.stop_flow("flow-123") - **Flow Classes**: Templates that define workflow structure and interfaces - **Flow Instances**: Active running workflows based on flow classes +- **Configurable Parameters**: Customize flow behavior with parameters (new in v1.4) - **Dynamic Management**: Flows can be started/stopped dynamically - **Template Processing**: Uses template replacement for customizing flow instances - **Integration**: Works with TrustGraph ecosystem for data processing pipelines diff --git a/reference/apis/pulsar.md b/reference/apis/pulsar.md index 7c1e4f0..bf94abf 100644 --- a/reference/apis/pulsar.md +++ b/reference/apis/pulsar.md @@ -28,6 +28,10 @@ These services run independently and have fixed Pulsar queue names: ### Flow API - **Request Queue**: `non-persistent://tg/request/flow` - **Response Queue**: `non-persistent://tg/response/flow` +- **Request Schema**: `trustgraph.schema.FlowRequest` +- **Response Schema**: `trustgraph.schema.FlowResponse` + +**New in v1.4**: The `FlowRequest` schema includes a `parameters` field for configuring flow instances. See [Flow Parameters](#flow-parameters) below. ### Knowledge API - **Request Queue**: `non-persistent://tg/request/knowledge` @@ -221,6 +225,91 @@ config_request = ConfigRequest( # Then connect to the appropriate queues for the service you need ``` +## Flow Parameters + +**New in v1.4**: Flow instances can be configured with parameters that customize their behavior. Parameters are passed when starting flows and stored with the flow instance. + +### FlowRequest Schema + +The `trustgraph.schema.FlowRequest` schema includes a `parameters` field: + +```python +class FlowRequest(Record): + operation = String() # Operation to perform (e.g., "start-flow") + class_name = String() # Flow class name + flow_id = String() # Flow instance ID + description = String() # Flow description + parameters = Map(String()) # Parameter name -> value map (new in v1.4) + class_definition = String() # Flow class definition JSON +``` + +### Parameters Field + +The `parameters` field is a `Map(String())` in the Pulsar schema: +- **Type**: Map with string keys and string values +- **Keys**: Parameter names (e.g., `"model"`, `"temperature"`, `"chunk-size"`) +- **Values**: Parameter values as strings (e.g., `"gpt-4"`, `"0.7"`, `"1500"`) + +All parameter values are stored as strings internally, regardless of the parameter type. Processors are responsible for converting string values to appropriate types based on parameter type definitions. + +### Example with Parameters + +Starting a flow with parameters via Pulsar: + +```python +import pulsar +from trustgraph.schema import FlowRequest, FlowResponse + +# Connect to Pulsar +client = pulsar.Client('pulsar://localhost:6650') + +# Create producer for flow requests +producer = client.create_producer( + 'non-persistent://tg/request/flow', + schema=pulsar.schema.AvroSchema(FlowRequest) +) + +# Start flow with parameters +request = FlowRequest( + operation='start-flow', + class_name='document-rag+graph-rag', + flow_id='my-custom-flow', + description='Custom processing flow', + parameters={ + 'model': 'claude-3-opus', + 'temperature': '0.5', + 'chunk-size': '2000' + } +) + +producer.send(request) +``` + +### FlowResponse Schema + +The `trustgraph.schema.FlowResponse` schema includes parameters in flow information: + +```python +class FlowResponse(Record): + class_names = Array(String()) # List of flow class names + flow_ids = Array(String()) # List of flow instance IDs + class_definition = String() # Flow class definition JSON + flow = String() # Flow instance JSON + description = String() # Flow description + parameters = Map(String()) # Parameter settings (new in v1.4) + error = String() # Error information +``` + +When querying for flow information with `get-flow`, the response includes the `parameters` map showing the current parameter settings for that flow instance. + +### Parameter Documentation + +For more information about flow parameters: +- [Parameter Types Configuration](../configuration/parameters) - Parameter type definitions +- [Flow Class Configuration](../configuration/flow-classes) - Using parameters in flow classes +- [Flow API](api-flow) - Flow management API including parameters +- [tg-start-flow](../cli/tg-start-flow) - Starting flows with parameters via CLI + ## Best Practices 1. **Query Flow Configuration**: Always query the current flow configuration to get accurate queue names @@ -228,6 +317,7 @@ config_request = ConfigRequest( 3. **Choose Appropriate Persistence**: Use persistent queues for critical data, non-persistent for performance 4. **Schema Validation**: Use the appropriate Pulsar schema for each service 5. **Error Handling**: Implement proper error handling for queue connection and message failures +6. **Parameter Values**: Remember that all parameter values are strings in the Pulsar schema ## Security Considerations diff --git a/reference/configuration/flow-classes.md b/reference/configuration/flow-classes.md index 5f75906..a35431a 100644 --- a/reference/configuration/flow-classes.md +++ b/reference/configuration/flow-classes.md @@ -123,6 +123,249 @@ Additional information about the flow class: } ``` +## Parameters + +**New in v1.4**: Flow classes can define configurable parameters that allow customization of flow behavior without modifying the flow class definition. Parameters enable users to select different LLM models, adjust processing settings, and control flow behavior when starting flow instances. + +### Parameter Definition Schema + +Parameters are defined in the flow class definition using this structure: + +```json +{ + "description": "Flow class description", + "tags": ["tag1", "tag2"], + "parameters": { + "param-name": { + "type": "parameter-type-ref", + "description": "Human-readable description", + "order": 1, + "controlled-by": "other-param-name" + } + }, + "class": { ... }, + "flow": { ... }, + "interfaces": { ... } +} +``` + +### Parameter Fields + +#### type (required) + +Reference to a parameter type definition stored in the configuration system. Parameter types define the schema, validation rules, default values, and allowed values. + +**Example:** +```json +"parameters": { + "model": { + "type": "llm-model" + } +} +``` + +The `llm-model` type is looked up in the parameter type configuration, which defines valid models, defaults, and constraints. + +#### description (optional) + +Human-readable description of what this parameter controls in the context of this flow. Overrides or supplements the parameter type's description. + +**Example:** +```json +"parameters": { + "model": { + "type": "llm-model", + "description": "LLM model for document analysis and extraction" + } +} +``` + +#### order (optional) + +Display order for the parameter in user interfaces and CLI output. Parameters are shown in ascending order. + +**Example:** +```json +"parameters": { + "model": { + "type": "llm-model", + "order": 1 + }, + "temperature": { + "type": "temperature", + "order": 2 + }, + "chunk-size": { + "type": "chunk-size", + "order": 3 + } +} +``` + +#### controlled-by (optional) + +Indicates that this parameter's value is automatically inherited from another parameter. Used when multiple services in a flow should use the same setting. + +**Example:** +```json +"parameters": { + "llm-model": { + "type": "llm-model", + "description": "Primary LLM model", + "order": 1 + }, + "rag-model": { + "type": "llm-model", + "description": "Model for RAG queries", + "order": 2, + "controlled-by": "llm-model" + } +} +``` + +When `controlled-by` is specified: +- The parameter inherits the value from the controlling parameter +- Users can optionally override the inherited value +- UI can display the inheritance relationship + +### Complete Parameter Example + +```json +{ + "description": "Customizable RAG pipeline with LLM selection", + "tags": ["rag", "configurable"], + "parameters": { + "llm-model": { + "type": "llm-model", + "description": "Primary language model for processing", + "order": 1 + }, + "rag-model": { + "type": "llm-model", + "description": "Model for RAG query generation", + "order": 2, + "controlled-by": "llm-model" + }, + "temperature": { + "type": "temperature", + "description": "Response randomness (0.0 = deterministic, 2.0 = very random)", + "order": 3 + }, + "chunk-size": { + "type": "chunk-size", + "description": "Maximum text chunk size for processing", + "order": 4 + }, + "embedding-model": { + "type": "embedding-model", + "description": "Model for generating document embeddings", + "order": 5 + } + }, + "class": { ... }, + "flow": { ... }, + "interfaces": { ... } +} +``` + +### Parameter Types + +Parameter types are centrally defined in the configuration system with type `parameter-types`. Each parameter type specifies: + +- **Data type**: string, number, integer, boolean, array, object +- **Default value**: Value used when not specified by user +- **Enum values**: List of allowed values with descriptions +- **Constraints**: Validation rules (min/max, length, pattern, required) + +Common parameter types include: + +| Type | Description | Example Values | +|------|-------------|----------------| +| `llm-model` | LLM model selection | `gpt-4`, `claude-3-opus`, `mistral-large` | +| `temperature` | LLM temperature | `0.0` to `2.0` (default: `0.7`) | +| `chunk-size` | Text chunking size | `100` to `10000` (default: `1000`) | +| `embedding-model` | Embedding model | `text-embedding-ada-002`, `text-embedding-3-large` | + +See [Parameter Types](parameters) for complete parameter type documentation. + +### Parameter Resolution + +When a flow instance is started, parameters are resolved in this order: + +1. **User-provided values**: Explicit values from `tg-start-flow --param` or API +2. **Default values**: From parameter type definitions +3. **Controlled-by relationships**: Inherited from controlling parameters +4. **Required validation**: Error if required parameters are missing + +**Example:** + +Given parameter definitions with defaults: +- `llm-model`: default `gpt-4` +- `temperature`: default `0.7` +- `chunk-size`: default `1000` + +Starting a flow with: +```bash +tg-start-flow -n my-flow -i flow1 -d "Test" --param llm-model=claude-3-opus +``` + +Results in: +- `llm-model`: `claude-3-opus` (user-provided) +- `temperature`: `0.7` (default) +- `chunk-size`: `1000` (default) + +### Using Parameters in Flow Definitions + +Parameters can be referenced in flow class definitions using the `{param:name}` syntax. This allows queue names, processor configurations, and other settings to be parameterized. + +**Example:** +```json +{ + "parameters": { + "model": { + "type": "llm-model", + "order": 1 + } + }, + "class": { + "text-completion:{class}": { + "request": "non-persistent://tg/request/text-completion:{class}", + "response": "non-persistent://tg/response/text-completion:{class}", + "config": { + "model": "{param:model}" + } + } + } +} +``` + +When the flow is started with `--param model=gpt-4`, the configuration becomes: +```json +{ + "config": { + "model": "gpt-4" + } +} +``` + +### Parameter Storage + +All parameter values are stored as strings internally, regardless of their input format. When starting flows: + +- Numbers: `--param temperature=0.7` → stored as `"0.7"` +- Booleans: `--param enabled=true` → stored as `"true"` +- Strings: `--param model=gpt-4` → stored as `"gpt-4"` + +Processors are responsible for converting string values to appropriate types based on parameter type definitions. + +### Benefits of Parameters + +1. **Flexibility**: Customize flow behavior without modifying flow classes +2. **Reusability**: Single flow class supports multiple configurations +3. **Consistency**: Centralized parameter type definitions ensure validation +4. **Discoverability**: Users can see available parameters with `tg-show-flow-classes` +5. **Documentation**: Parameter types include descriptions and constraints + ## Template Variables Flow class definitions use template variables that are replaced when flow instances are created: @@ -321,6 +564,9 @@ All processors (both `{id}` and `{class}`) work together as a cohesive dataflow - [tg-put-flow-class](../cli/tg-put-flow-class) - Create or update flow classes - [tg-get-flow-class](../cli/tg-get-flow-class) - Retrieve flow class definitions -- [tg-show-flow-classes](../cli/tg-show-flow-classes) - List available flow classes +- [tg-show-flow-classes](../cli/tg-show-flow-classes) - List available flow classes and parameters +- [tg-start-flow](../cli/tg-start-flow) - Start flows with parameter values +- [tg-show-parameter-types](../cli/tg-show-parameter-types) - View parameter type definitions +- [Parameter Types](parameters) - Parameter type configuration reference - [Flow Processor Reference](../extending/flow-processor) - Building custom processors - [Pulsar Configuration](pulsar) - Message queue configuration \ No newline at end of file