Gigapi Metadata provides a high-performance indexing system for managing metadata about data files (typically Parquet files) organized in time-partitioned structures. It supports efficient querying, merging operations, and provides both local JSON file storage and distributed Redis storage backends.
- Dual Storage Backends: JSON file-based storage for local deployments and Redis for distributed systems
- Time-Partitioned Data: Optimized for date/hour partitioned data structures 1
- Merge Planning: Intelligent merge planning for data consolidation across different layers 2
- Async Operations: Promise-based asynchronous operations for better performance 3
- Efficient Querying: Time-range and folder-based querying capabilities 4
go get github.com/gigapi/metadata
The fundamental data structure representing metadata about a single data file: 5
For local file-based storage, suitable for single-node deployments: 6
For distributed deployments with Redis backend: 7
Before using the library, initialize merge configurations which define merge behavior across different iterations: 8
Example configuration:
import "github.com/gigapi/metadata"
// Configure merge settings: [timeout_sec, max_size_bytes, iteration_id]
metadata.MergeConfigurations = [][3]int64{
{10, 10 * 1024 * 1024, 1}, // 10s timeout, 10MB max size, iteration 1
{30, 50 * 1024 * 1024, 2}, // 30s timeout, 50MB max size, iteration 2
}
// Create a JSON-based table index
tableIndex := metadata.NewJSONIndex("/data/root", "my_database", "my_table")
// Add metadata entries
entries := []*metadata.IndexEntry{
{
Database: "my_database",
Table: "my_table",
Path: "date=2024-01-15/hour=14/file1.parquet",
SizeBytes: 1000000,
MinTime: 1705327200000000000, // nanoseconds
MaxTime: 1705327800000000000,
},
}
// Batch operation (async)
promise := tableIndex.Batch(entries, nil)
result, err := promise.Get()
Redis Index Usage 9
// Query with time range
options := metadata.QueryOptions{
After: time.Now().Add(-24 * time.Hour),
Before: time.Now(),
}
entries, err := tableIndex.GetQuerier().Query(options)
// Get merge plan
planner := tableIndex.GetMergePlanner()
plan, err := planner.GetMergePlan("layer1", 1)
if plan != nil {
// Execute merge (external process)
// ...
// Mark merge as complete
err = planner.EndMerge(plan)
}
The main interface for table-level operations: 10
For database-level operations: 11
The system expects data organized in the following structure:
/root/
├── database1/
│ ├── table1/
│ │ ├── date=2024-01-15/
│ │ │ ├── hour=00/
│ │ │ ├── hour=01/
│ │ │ └── ...
│ │ └── date=2024-01-16/
│ └── table2/
└── database2/
For Redis backend, use standard Redis connection URLs:
redis://localhost:6379/0
- Standard Redisrediss://user:pass@host:6380/1
- Redis with TLS 12
All operations return errors through the Promise interface or standard Go error handling. The library uses async operations for better performance in high-throughput scenarios.
Both JSON and Redis implementations are thread-safe and can be used concurrently across multiple goroutines.
Run tests with a local Redis instance:
# Start Redis
docker run -d -p 6379:6379 redis:alpine
# Run tests
go test ./...
This project is licensed under the Apache License 2.0. 13
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
- The library is optimized for time-series data workloads with frequent writes and time-range queries
- Redis backend is recommended for distributed deployments and high-throughput scenarios
- JSON backend is suitable for single-node deployments and development environments
- Merge operations are designed to be executed by external processes, with the library managing the planning and coordination
- All time values are stored as Unix nanoseconds for high precision temporal operations