This folder contains a lightweight data pipeline implemented using Bash and Cron. The pipeline performs the following tasks:
- Export partitions older than 21 days from QuestDB.
- Convert the partitions to Parquet format.
- Upload the Parquet files to S3.
- Delete the local partition folder after upload.
This approach provides a minimal way to schedule ETL jobs without requiring a full-fledged orchestration tool like Airflow or Dagster. Please note this workflow does not provide error control or backfilling, so it is not as robust as using a proper orchestrator.
To run this script, ensure that you have the following installed:
- Linux or macOS (or Windows WSL)
- QuestDB running locally (
http://localhost:9000
) (or change the script to point to your installation) - AWS CLI configured (
aws configure
) - Cron (for scheduling)
git clone https://github.com/questdb/data-orchestration-and-scheduling-samples.git
cd data-orchestration-and-scheduling-samples/bash
Edit the script drop_partitions_older_than_21_days.sh
to match your setup:
ROOT_DIRECTORY="/path/to/questdb/db"
TABLE_NAME="YOUR_TABLE"
S3_BUCKET="YOUR-BUCKET"
S3_KEY_PREFIX="questdb-archive/your-table"
Manually execute the script:
bash drop_partitions_older_than_21_days.sh
If everything is configured correctly, this will:
- Convert and detach partitions older than 21 days in QuestDB.
- Compress the generated Parquet files.
- Upload them to S3.
- Delete the local partition folder.
To schedule the script to run daily at midnight, add the following line to your crontab:
0 0 * * * /bin/bash /path/to/drop_partitions_older_than_21_days.sh >> /var/log/questdb_partition_cleanup.log 2>&1
To edit your crontab:
crontab -e
This script demonstrates how to integrate with QuestDB:
The script interacts with QuestDB via its HTTP REST API at http://localhost:9000/exec
.
Example query execution:
curl -G "http://localhost:9000/exec" --data-urlencode "query=ALTER TABLE $TABLE_NAME CONVERT PARTITION TO PARQUET WHERE ts < '$OLDER_THAN_DATE'"
The script accesses QuestDB’s local data directory (ROOT_DIRECTORY
) to directly manipulate detached partition folders.
Example directory search:
find "$ROOT_DIRECTORY" -type d -name "*.detached"
- Enhance logging and error handling.
- Extend the script to perform additional data processing.
- Use a proper orchestrator like the Dagster and Airflow examples provided.