Compacts parquet files present in an S3 location using AWS Glue job. This repo is part of the medium post that I made here.
After cloning the repo, run the below.
- Review the cloudformation stack parameters under
Job Parameters
of themanager.sh
file. - Create the cloudformation stack.
./manager.sh create-stack
- Run compaction job
./manager.sh run-compaction s3://PATH_WITH_MULTIPLE_FILES s3://READ_OPTIMIZED_STORAGE_PATH
Note: The S3 path location shouldn't end with a slash.
- After updating any cloudformation stack parameters, update it using the below.
./manager.sh update-stack
- Delete the cloudformation stack.
./manager.sh delete-stack