The Data Direct service enables the sharing of Comma Separated Value (CSV) files. The objective of this assignment is to enhance the service by implementing a sanitization pipeline that removes PII.
Implement a sanitization pipeline that removes Email addresses.
- Docker runtime environment (E.g. Rancher Desktop, Docker Desktop, etc.)
- Docker Compose version 2
You are provided with two S3 buckets: input
and output
. The input
bucket stores uploaded files, while the output
bucket is for sanitized files (objects).
The sanitization pipeline should scan each object in the input
bucket. If an Email address is detected, the object should be moved to the blocked
prefix within the input
bucket. Otherwise, if no PII is detected, the object should be moved to the output
bucket.
- Email address has its own dedicated value/column in the CSV
- Each uploaded file is not larger than 64MB
- Store your implementation in
my-solution
Git branch - Use
src
directory for the implementation - All status checks must pass before submitting the assignment
- You may add your own workflows but do not modify existing steps
- Do not modify the
tests
directory - We leverage Localstack, see Localstack AWS services supportability for more information