A distributed SQLite system with configurable replication factor where N-R nodes can fail without data loss (N=total nodes, R=replication factor).
- Write Model: Write-local-first with async replication to RF nodes
- Read Model: Majority consensus reads for consistency
- Deployment: Kubernetes StatefulSet with persistent volumes
- Communication: HTTP-based inter-node coordination
- Storage: SQLite with JSON support for native JSON storage
- ✅ Configurable replication factor
- ✅ Fault tolerance (N-R node failures)
- ✅ Majority consensus reads
- ✅ Kubernetes-native deployment
- ✅ Automated integration testing
- ✅ Multi-cluster size validation
- Docker
- Kubernetes cluster (kind, minikube, or full cluster)
- kubectl configured
- Go 1.21+
-
Build the container image:
docker build -t distributed-sqlite-node:latest . -
Deploy to Kubernetes:
kubectl apply -f k8s/
-
Wait for pods to be ready:
kubectl wait --for=condition=ready pod -l app=distributed-sqlite-node --timeout=120s
The system includes comprehensive integration tests that validate distributed operations across different cluster configurations.
To test against the default 3-node cluster:
# Setup port forwards
kubectl port-forward distributed-sqlite-nodes-0 8080:8080 &
kubectl port-forward distributed-sqlite-nodes-1 8081:8080 &
kubectl port-forward distributed-sqlite-nodes-2 8082:8080 &
# Run integration tests
go test ./test/... -v -timeout=120sFor comprehensive testing across different cluster sizes with namespace isolation:
# Deploy isolated test cluster
./test/deploy_test_cluster.sh 3 3
# Run tests against this cluster
NAMESPACE=test-$(date +%Y-%m-%d)-3n-3r BASE_PORT=9303 CLUSTER_SIZE=3 ./test/run_namespace_tests.sh
# Cleanup
kubectl delete namespace test-$(date +%Y-%m-%d)-3n-3r# Deploy isolated test cluster
./test/deploy_test_cluster.sh 4 3
# Run tests against this cluster
NAMESPACE=test-$(date +%Y-%m-%d)-4n-3r BASE_PORT=9403 CLUSTER_SIZE=4 ./test/run_namespace_tests.sh
# Cleanup
kubectl delete namespace test-$(date +%Y-%m-%d)-4n-3r# Deploy isolated test cluster
./test/deploy_test_cluster.sh 5 4
# Run tests against this cluster
NAMESPACE=test-$(date +%Y-%m-%d)-5n-4r BASE_PORT=9504 CLUSTER_SIZE=5 ./test/run_namespace_tests.sh
# Cleanup
kubectl delete namespace test-$(date +%Y-%m-%d)-5n-4rThe repository includes several test automation scripts:
test/deploy_test_cluster.sh: Deploy clusters in isolated namespacestest/run_namespace_tests.sh: Run tests against specific namespaced clusterstest/run_cluster_tests.sh: Comprehensive multi-cluster testing (all sizes)
# Test all supported configurations automatically
./test/run_cluster_tests.shThis will test:
- 3-node cluster with RF=3
- 4-node cluster with RF=3
- 5-node cluster with RF=4
The integration tests validate:
- Health Checks: All nodes respond to health endpoints
- Write Replication: Data replicates to all nodes according to RF
- Majority Reads: Consistent reads across multiple nodes
- Fault Tolerance: System remains operational with node failures
- Concurrent Operations: Behavior under concurrent load
✅ 3-node cluster (RF=3):
- Full replication to all 3 nodes
- Fault tolerance with 1 node failure
- Majority consensus reads
✅ 4-node cluster (RF=3):
- Replication to 3 out of 4 nodes
- Fault tolerance with 1 node failure
- Majority consensus reads
✅ 5-node cluster (RF=4):
- Replication to 4 out of 5 nodes
- Fault tolerance with 1 node failure
- Majority consensus reads
- SQLite Concurrency: High concurrent write operations may experience database locking
- Solution: Operation queue implementation (planned)
Set these environment variables in your deployment:
NODE_ID: Unique identifier for the nodePORT: HTTP server port (default: 8080)REPLICATION_FACTOR: Number of replicas (required)CLUSTER_SIZE: Total number of nodes (required)
GET /health- Health checkPOST /set- Store key-value pairGET /get?key=<key>- Retrieve valueDELETE /delete?key=<key>- Delete key
POST /internal/set- Replication endpointGET /internal/get?key=<key>- Internal read endpointDELETE /internal/delete?key=<key>- Internal delete endpoint
This project follows TDD principles:
# Run unit tests
go test ./internal/... -v
# Run integration tests
go test ./test/... -v
# Run all tests
go test ./... -v- Write failing tests first
- Implement minimal code to pass tests
- Refactor while keeping tests green
- Validate with integration tests
- Write-local-first: Writes succeed locally immediately
- Async replication: Background replication to RF-1 other nodes
- Best-effort delivery: Log failures but don't block writes
- Concurrent reads: Query all available nodes simultaneously
- Majority consensus: Return value when majority of responding nodes agree
- Fast response: Return as soon as majority threshold is reached
- N-R failures tolerated: Where N=cluster size, R=replication factor
- Graceful degradation: System continues operating with reduced capacity
- Automatic recovery: Nodes rejoin cluster when healthy
- Follow TDD approach
- Ensure all tests pass
- Update documentation
- Add integration tests for new features
[Add your license here]