This project implements a secure file upload system using a combination of AWS services, orchestrated with Terraform and AWS SAM. Users can upload files via a REST API, which are then stored in S3, with metadata recorded in DynamoDB. A subsequent processing workflow is triggered asynchronously using DynamoDB Streams, EventBridge Pipes, and Step Functions.
The system is composed of two main parts: the core infrastructure managed by Terraform, and the serverless application managed by AWS SAM.
- A client authenticates with Amazon Cognito to get a JWT token.
- The client sends a
POSTrequest to an API Gateway endpoint (/upload) with the JWT token and a JSON payload containing file data (Base64-encoded). - The Cognito Authorizer validates the token.
- API Gateway invokes the File Upload Lambda function.
- The Lambda function decodes the file, saves it to an S3 Bucket, and registers metadata (file path, user ID, etc.) in a DynamoDB Table.
- The
INSERTevent in DynamoDB creates a record in its DynamoDB Stream. - An EventBridge Pipe polls the stream for new records.
- The Pipe filters for
INSERTevents and invokes a Step Functions state machine. - The Step Functions workflow invokes a second Processing Lambda function with the event data for downstream tasks.
- Infrastructure as Code:
- Terraform: For core infrastructure (S3, DynamoDB, Cognito, ECR, Pipes, SFN, IAM Roles).
- AWS SAM: For the serverless application (API Gateway, Lambda Functions).
- Compute:
- AWS Lambda: Two Python functions running in containers.
- Lambda Web Adapter: To run a FastAPI application on Lambda.
- Application Backend:
- FastAPI: A modern, fast Python web framework for the upload API.
- Storage & Database:
- Amazon S3: For durable file storage.
- Amazon DynamoDB: For storing file metadata.
- Authentication & Authorization:
- Amazon Cognito: For user authentication and API authorization.
- Integration & Orchestration:
- Amazon API Gateway (HTTP API): To expose the REST endpoint.
- Amazon DynamoDB Streams: To capture data modification events.
- Amazon EventBridge Pipes: To connect the stream to the workflow.
- AWS Step Functions: To orchestrate the post-upload processing.
- CI/CD:
- GitHub Actions: For automated code quality checks and deployment.
- AWS Account & AWS CLI configured
- Terraform CLI
- AWS SAM CLI
- Docker
- A GitHub repository with this code.
-
Navigate to the infrastructure directory:
cd infra -
Initialize Terraform:
terraform init
-
Deploy the resources: Review the plan and apply it.
terraform apply
Enter
yeswhen prompted. -
Get the outputs: After the deployment is complete, Terraform will output several values. These are needed for the next step. You can also retrieve them anytime with:
terraform output
You will get values for
s3_bucket_name,dynamodb_table_name,cognito_user_pool_id, andcognito_app_client_id.
The CD workflow (.github/workflows/cd.yml) requires several secrets to be set in your GitHub repository's settings.
- Go to
Settings > Secrets and variables > Actionsin your GitHub repo. - Create an IAM Role in your AWS account that GitHub Actions can assume. It needs permissions to deploy SAM applications (e.g.,
AdministratorAccessfor simplicity, but a more restricted policy is recommended for production). The trust policy should allow the GitHub OIDC provider. - Add the following repository secrets:
AWS_ROLE_TO_ASSUME: The ARN of the IAM role you created for deployment. This role needs permissions to deploy SAM applications and also thestates:UpdateStateMachinepermission.AWS_REGION: The AWS region where you deployed the resources (e.g.,ap-northeast-1).PROJECT_NAME: The project name used invariables.tf(default:s3-dynamo-pipe-app).S3_BUCKET_NAME: Thes3_bucket_namevalue from the Terraform output.DYNAMODB_TABLE_NAME: Thedynamodb_table_namevalue from the Terraform output.COGNITO_USER_POOL_ID: Thecognito_user_pool_idvalue from the Terraform output.COGNITO_APP_CLIENT_ID: Thecognito_app_client_idvalue from the Terraform output.STATE_MACHINE_ARN: Thesfn_state_machine_arnvalue from the Terraform output.
Pushing your code to the main branch will automatically trigger the GitHub Actions workflow.
- Commit and push all the code to your
mainbranch.git add . git commit -m "Initial project setup" git push origin main
- Go to the "Actions" tab in your GitHub repository to monitor the
Deploy SAM Applicationworkflow. It will build the container images, push them to ECR, and deploy the SAM stack.
-
Create a user: Since user registration is not enabled in the app, create a user manually in the AWS Console. Go to your Cognito User Pool, and under the "Users" tab, create a new user. Make sure to set a temporary password.
-
Confirm the user: When you first sign in, you will be required to change the password. Use the AWS CLI to do this.
aws cognito-idp admin-initiate-auth \ --user-pool-id <YOUR_COGNITO_USER_POOL_ID> \ --client-id <YOUR_COGNITO_APP_CLIENT_ID> \ --auth-flow ADMIN_USER_PASSWORD_AUTH \ --auth-parameters USERNAME=<username>,PASSWORD=<temporary_password>
You will receive a
ChallengeName: 'NEW_PASSWORD_REQUIRED'. -
Set the final password:
aws cognito-idp admin-respond-to-auth-challenge \ --user-pool-id <YOUR_COGNITO_USER_POOL_ID> \ --client-id <YOUR_COGNITO_APP_CLIENT_ID> \ --challenge-name NEW_PASSWORD_REQUIRED \ --challenge-responses USERNAME=<username>,NEW_PASSWORD=<your_new_strong_password> \ --session <session_from_previous_command>
This command will return an
IdToken.
Use the IdToken from the previous step to call the /upload endpoint.
-
Encode a file in Base64: On macOS or Linux:
BASE64_CONTENT=$(base64 -i my-test-file.txt) -
Send the request: Replace the placeholders with your actual values.
API_ENDPOINT=$(aws cloudformation describe-stacks --stack-name s3-dynamo-pipe-app-sam-app --query "Stacks[0].Outputs[?OutputKey=='ApiEndpoint'].OutputValue" --output text) ID_TOKEN="..." # Paste the IdToken here curl -X POST "${API_ENDPOINT}/upload" \ -H "Authorization: ${ID_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "file_name": "my-test-file.txt", "comment": "This is a test file.", "file_data": "'"${BASE64_CONTENT}"'" }'
If successful, you will receive a {"message":"File uploaded successfully.","s3_path":"..."} response. You can then check your S3 bucket and DynamoDB table to verify the results. The processing Lambda's logs in CloudWatch will show the event being processed.