Table of Contents
Navflux is a real-time taxi tracking system that uses big data technologies. The system ingests synthetic GPS events, processes them via stream processing, stores historical data, and serves real-time locations through REST and WebSocket APIs.
Architecture: Kafka (streaming) → Spark Streaming (processing) → HBase (storage) + Redis (cache) → FastAPI (web UI)
- Real-time Event Streaming: Fault-tolerant event delivery via Kafka (KRaft mode)
- Stream Processing: Spark Structured Streaming for real-time location updates and trip aggregations
- Dual Storage: HBase for historical analytics, Redis for hot data with 5-minute TTL
- Async Web API: FastAPI with REST endpoints and WebSocket support for live updates
- Data Generation: Synthetic taxi GPS data with realistic movement patterns using Faker and geopy
- Container-Ready: Docker Compose for local development, Kubernetes manifests for production
- Quality Assurance: Comprehensive test coverage with pytest, Ruff linting, Snyk security scanning
Follow these steps to set up Navflux on your local machine.
- Python: 3.11+ (managed via
uv) - Docker: 20.10+ and Docker Compose 2.0+
- Kubernetes: 1.28+ (optional, for K8s deployment)
- System Resources: 8GB RAM minimum, 16GB recommended
-
Clone the repository
git clone https://github.com/vijethph/Navflux.git cd Navflux -
Install uv package manager (if not installed)
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install Python dependencies
uv sync
-
Create environment configuration
cp .env.example .env # Edit .env with your configuration -
Start infrastructure with Docker Compose
docker compose up -d
-
Verify services are healthy
docker compose ps # All services should show status: Up (healthy)
Start all services:
docker compose up -dGenerate synthetic taxi data:
uv run python -m src.data_generator.generatorPublish events to Kafka:
uv run python -m src.kafka_producer.producerStart Spark Streaming job:
spark-submit \
--master local[*] \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 \
src/spark_streaming/streaming_job.pyLaunch FastAPI server:
uv run uvicorn src.web_ui.app:app --host 0.0.0.0 --port 8000Access Web UI:
http://localhost:8000
Health Check:
curl http://localhost:8000/healthGet All Taxis:
curl http://localhost:8000/api/taxisGet Taxi Location:
curl http://localhost:8000/api/taxis/{taxi_id}Get Trip History:
curl http://localhost:8000/api/trips?start_date=2024-01-01&end_date=2024-01-31WebSocket Live Updates:
const ws = new WebSocket("ws://localhost:8000/ws/taxis");
ws.onmessage = (event) => console.log(JSON.parse(event.data));Production deployment:
docker compose -f docker-compose.yml up -dView logs:
docker compose logs -f spark-streamingScale services:
docker compose up -d --scale spark-worker=3Apply manifests:
kubectl apply -f kubernetes/Check deployment status:
kubectl get pods -n navfluxAccess FastAPI service:
kubectl port-forward svc/navflux-web-ui 8000:8000 -n navfluxManagement scripts:
# Deploy all services
./scripts/k8s-manager.sh deploy-all
# Access web UI
./scripts/k8s-manager.sh access-ui
# View Spark logs
./scripts/k8s-manager.sh logs-sparkNavflux/
├── src/
│ ├── data_generator/ # Synthetic GPS data generation
│ ├── kafka_producer/ # Event streaming to Kafka
│ ├── spark_streaming/ # Real-time processing jobs
│ ├── hbase_connector/ # Historical storage client
│ ├── redis_cache/ # Hot cache management
│ └── web_ui/ # FastAPI web gateway
├── tests/ # pytest test suite
├── config/ # Configuration management
├── docker/ # Dockerfiles (multi-stage builds)
├── kubernetes/ # K8s manifests
├── scripts/ # Deployment and management scripts
├── data/ # Data directories (checkpoints, raw, processed)
├── logs/ # Application logs
├── docker-compose.yml # Docker Compose configuration
├── pyproject.toml # Python dependencies (uv)
└── README.md # This file
Contributions are welcome! Please follow these guidelines:
- Fork the Project
- Create your Feature Branch
git checkout -b feature/AmazingFeature
- Make your changes following code standards:
- PEP 8 compliance
- Type hints for all functions
- reStructuredText docstrings
- Run tests and quality checks:
uv run ruff check src/ tests/ uv run pytest --cov snyk code test - Commit your Changes
git commit -m 'feat: Add AmazingFeature' - Push to the Branch
git push origin feature/AmazingFeature
- Open a Pull Request
Distributed under the Apache 2.0 License. See LICENSE for more information.
Vijeth P H - @vijethph
Project Link: https://github.com/vijethph/Navflux
- Apache Kafka - Distributed event streaming platform
- Apache Spark - Unified analytics engine
- Apache HBase - Distributed, scalable big data store
- Redis - In-memory data structure store
- FastAPI - Modern web framework for Python
- Faker - Synthetic data generation library
- structlog - Structured logging for Python
- Best-README-Template - README template

