IDP Invoice Automation using Amazon Bedrock Data Automation

This project implements an Intelligent Document Processing (IDP) solution for invoice automation using AWS Bedrock Data Automation.

The system processes invoices uploaded to an S3 bucket, extracts relevant information using Bedrock Data Automation, and provides annotated results. It leverages AWS services such as Lambda, S3, SQS, and EventBridge to create a scalable and efficient document processing pipeline.

Usage Instructions

Installation

Prerequisites:

Python 3.12
AWS CDK CLI
AWS CLI configured with appropriate credentials

Steps:

Clone the repository
Navigate to the project directory
Create a virtual environment:
```
python -m venv .venv
```
Activate the virtual environment:
- On Windows: .venv\Scripts\activate.bat
- On Unix or MacOS: source .venv/bin/activate
Install dependencies:
```
pip install -r requirements.txt
```
synthesize the CloudFormation template for this project.
```
cdk synth
```

Deployment

Configure the project by editing project_config.json
Deploy the stack:
```
cdk deploy
```

Processing Invoices

Upload sample invoice documents from "sample_invoices" folder to the S3 Input Bucket created by the stack inside 'invoices' directory
The system will automatically process the invoices using Bedrock Data Automation
Annotated results and extracted data will be available in the output S3 bucket

Data Flow

Invoice Upload:
- A user uploads an invoice file (PDF or other format) to the Input S3 Bucket (invoices/).
S3 Event Notification:
- This triggers the process_input_files Lambda function.
- The Lambda function processes the uploaded file and sends an event message to the InvoicesBDA Queue (SQS).
Message Handling:
- The SQS queue delivers the message to the process_invoices_bda Lambda function, which starts the invoice processing workflow.
Invoice Processing Workflow:
- The process_invoices_bda Lambda performs the following actions:
  - Convert PDF to PNG: The invoice is converted into PNG format and stored in the Staging Bucket (/staging_bda).
  - Retrieve Invoices Blueprint ARN: It fetches the ARN of the blueprint required for data automation from the Systems Manager Parameter Store (/my-demo/invoices_blueprint).
  - Invoke Bedrock Data Automation: The Lambda function invokes Amazon Bedrock Data Automation as an asynchronous job to process the invoice.
EventBridge Notification:
- Once the Bedrock Data Automation job is completed, it emits an event notification via Amazon EventBridge.
Job Completion Rule:
- A predefined Job Completion Rule in EventBridge triggers the draw_bboxes_invoices Lambda function.
Annotating Results:
- The draw_bboxes_invoices Lambda:
  - Annotates the processed invoice with bounding boxes (for detected fields or objects) and confidence scores.
  - Saves the annotated results to the Output S3 Bucket with the following paths:
    - Extracted Data: /bda_json/invoices/invoice1.json
    - Annotated Image: /bda_bbox_img/invoices/invoice1.png

Architecture Diagram

CDK Custom Resource

The first diagram illustrates the CDK custom resource which triggers a Lambda function to create the BDA blueprint during the initial setup. The Lambda function:

Checks if a blueprint exists for the document type (e.g., invoices).
If not, creates the blueprint dynamically for demonstration purposes.
Stores the created blueprint's ARN in AWS Systems Manager Parameter Store for downstream usage.

Invoice Processing Pipeline

This architecture automates invoice processing using Amazon Bedrock Data Automation (BDA), integrating AWS services for seamless handling, processing, and annotation of uploaded invoices. It is designed to extract structured data from invoices and store annotated results in a structured format.

The system leverages multiple AWS services to automate the entire invoice processing pipeline. Below is a high-level overview of the architecture:

Input S3 Bucket: invoices/
- Receives uploaded invoices and triggers processing.
Lambda Functions:
- process_input_files: Handles S3 event notifications and sends data to the SQS queue.
- process_invoices_bda: Converts files, invokes Bedrock Data Automation, and manages staging.
- draw_bboxes_invoices: Annotates images with bounding boxes and saves results.
Amazon SQS:
- Used as a messaging layer between the initial processing Lambda and the Bedrock processing Lambda.
Amazon Bedrock Data Automation:
- Performs advanced AI-based data extraction from invoices using blueprints.
EventBridge:
- Captures completion events from Bedrock Data Automation and triggers post-processing.
Output S3 Bucket: /bda_json/ and /bda_bbox_img/
- Stores extracted data and annotated invoice images for access.

The architecture offers a fully serverless and scalable solution for automating invoice processing using AWS services. It leverages Amazon Bedrock Data Automation for advanced AI-based data extraction and annotation, ensuring high accuracy and efficiency. With its modular design, the system handles tasks such as file conversion, blueprint-based data processing, and event-driven workflows seamlessly. It automatically processes uploaded invoices, extracts structured data, and annotates images with bounding boxes and confidence scores, storing results in a structured format within S3 buckets. This approach minimizes manual intervention, provides high reliability, and allows easy integration of additional features or services for enhanced functionality.

Infrastructure

The project uses AWS CDK to define and deploy the following infrastructure:

VPC:
- Public and private subnets
- S3 VPC Endpoint
S3 Buckets:
- Access logs bucket
- Input bucket
- Staging bucket
- Output bucket
Lambda Functions:
- create_blueprint_cr
- draw_bboxes_invoices
- process_input_files
- process_invoices_bda
Lambda Layers:
- langchain-core-layer
- pypdfium2-layer
- pillow-layer
- boto3-layer
SQS Queues:
- InvoicesBDAQueue (with KMS encryption)
- InvoicesBDADLQ (Dead Letter Queue)
KMS Key for SQS encryption
EventBridge rule to trigger downstream lambda
IAM roles and policies for Lambda functions and other resources

Baseline performance expectations:

Invoice processing time: ~30 seconds per page
End-to-end pipeline execution: <1 minutes per invoice

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
architecture_diagrams		architecture_diagrams
idp_invoice_automation_using_bedrock_data_automation_cdk		idp_invoice_automation_using_bedrock_data_automation_cdk
lambda		lambda
sample_invoices		sample_invoices
tests/unit		tests/unit
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cdk.context.json		cdk.context.json
cdk.json		cdk.json
project_config.json		project_config.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
source.bat		source.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDP Invoice Automation using Amazon Bedrock Data Automation

Usage Instructions

Installation

Deployment

Processing Invoices

Data Flow

Architecture Diagram

CDK Custom Resource

Invoice Processing Pipeline

Infrastructure

Baseline performance expectations:

Security

License

About

Releases

Packages

Languages

License

aws-samples/idp-invoice-automation-using-bedrock-data-automation-cdk

Folders and files

Latest commit

History

Repository files navigation

IDP Invoice Automation using Amazon Bedrock Data Automation

Usage Instructions

Installation

Deployment

Processing Invoices

Data Flow

Architecture Diagram

CDK Custom Resource

Invoice Processing Pipeline

Infrastructure

Baseline performance expectations:

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages