# 【ERNIE-4.5-21B-A3B】Invoice Process Automation System Based on Local Deployment of PP-OCR and ERNIE-4.5-21B-A3B Models


## Selection of Wenxin Open-Source Models

This project focuses on the local deployment requirements of enterprise-level invoice processing scenarios. During the model selection phase, a comprehensive evaluation of the ERNIE-4.5 series models was conducted, and **ERNIE-4.5-21B-A3B-Paddle** was ultimately chosen as the core large model, combined with PP-OCR to achieve full-process local deployment.


### **ERNIE-4.5 Model Series Specification Comparison Table**

| Model Series | Model Name | Total Parameters | Activated Parameters | Modality Support | Context Length | Main Usage | Deployment Scenario |
|---------|---------|--------|---------|---------|-----------|---------|---------|
| **A47B Large Scale** | ERNIE-4.5-300B-A47B-Base | 300B | 47B | Text | 128K | Pre-training Base | Cloud GPU Cluster |
| | ERNIE-4.5-300B-A47B | 300B | 47B | Text | 128K | Instruction Following/Creative Generation | Cloud GPU Cluster |
| | ERNIE-4.5-VL-424B-A47B-Base | 424B | 47B | Text+Vision | 128K | Multimodal Pre-training | Cloud GPU Cluster |
| | ERNIE-4.5-VL-424B-A47B | 424B | 47B | Text+Vision | 128K | Image-Text Understanding/Generation | Cloud GPU Cluster |
| **A3B Medium Scale** | ERNIE-4.5-21B-A3B-Base | 21B | 3B | Text | 128K | Pre-training Base | Single-machine Multi-GPU |
| | ERNIE-4.5-21B-A3B | 21B | 3B | Text | 128K | Dialogue/Document Processing | Single-machine Multi-GPU |
| | ERNIE-4.5-VL-28B-A3B-Base | 28B | 3B | Text+Vision | 128K | Multimodal Pre-training | Single-machine Multi-GPU |
| | ERNIE-4.5-VL-28B-A3B | 28B | 3B | Text+Vision | 128K | Lightweight Multimodal Applications | Single-machine Multi-GPU |
| **0.3B Lightweight** | ERNIE-4.5-0.3B-Base | 0.3B | 0.3B | Text | 4K | End-side Pre-training | Mobile/Edge |
| | ERNIE-4.5-0.3B | 0.3B | 0.3B | Text | 4K | Real-time Dialogue | Mobile/Edge |


### **Model Specification Selection Strategy Table**

| Application Scenario | Recommended Model | Reason | Hardware Requirements | Inference Latency |
|---------|---------|------|---------|---------|
| **Complex Reasoning Tasks** | ERNIE-4.5-300B-A47B | Strongest reasoning capability | 8×A100(80GB) | High |
| **Creative Content Generation** | ERNIE-4.5-300B-A47B | Best creative performance | 8×A100(80GB) | High |
| **Multimodal Understanding** | ERNIE-4.5-VL-424B-A47B | Image-text fusion understanding | 8×A100(80GB) | High |
| **Daily Dialogue Customer Service** | ERNIE-4.5-21B-A3B | Balanced performance and cost | 4×V100(32GB) | Medium |
| **Document Information Extraction** | ERNIE-4.5-21B-A3B | Sufficient understanding capability | 4×V100(32GB) | Medium |
| **Lightweight Multimodal** | ERNIE-4.5-VL-28B-A3B | Balanced image-text processing | 4×V100(32GB) | Medium |
| **Mobile Applications** | ERNIE-4.5-0.3B | Low latency and fast response | 1×GPU/CPU | Low |
| **Edge Computing** | ERNIE-4.5-0.3B | Minimal resource consumption | CPU/NPU | Low |


### Reasons for Selection
The core reasons for selecting **ERNIE-4.5-21B-A3B-Paddle** in this project:
1. **Performance Adaptation**: The scale of 21B total parameters/3B activated parameters can meet the accuracy requirements of professional tasks such as invoice information extraction and structured parsing, while avoiding the deployment complexity of ultra-large-scale models;
2. **Deployment Feasibility**: Supports single-machine multi-GPU deployment (4×V100 is sufficient), eliminating the need for GPU clusters and reducing the hardware threshold for enterprise local deployment;
3. **Context Capability**: 128K ultra-long context window, capable of fully processing text parsing of multi-page invoice scans;
4. **Paddle Ecosystem Compatibility**: Belongs to the Paddle ecosystem like PP-OCR, with higher model calling and collaboration efficiency, and supports FastDeploy for rapid deployment.


## Project Background

In the process of enterprise financial digital transformation, traditional invoice processing workflows face challenges such as low efficiency of manual recognition, error-prone data entry, incomplete information extraction, and risks of data privacy leakage in external API calls. This project builds a **fully locally deployed** intelligent invoice processing system based on PaddlePaddle's **PP-OCR** and Baidu's **ERNIE-4.5-21B-A3B** large model. Through the collaboration of OCR technology and large models, it realizes automated extraction, structured storage, and risk analysis of invoice information. All data is processed within internal servers, meeting enterprise data security and compliance requirements.


## Project Purpose

- **Full-stack Local Deployment**: Both PP-OCR and ERNIE-4.5-21B-A3B models are deployed on enterprise internal servers, with a closed-loop data processing process to ensure financial data privacy;
- **Intelligent Recognition Fusion**: Integrates high-precision text recognition of PP-OCR and deep semantic understanding capabilities of ERNIE-4.5 to achieve accurate extraction of key invoice information (such as price-tax separation, buyer-seller identification);
- **Process Automation**: Realizes one-stop processing of "OCR recognition → enterprise information query → risk analysis" through a web interface, supporting single/batch invoice processing to significantly improve financial work efficiency;
- **Low-threshold Operation and Maintenance**: Implements one-click model deployment based on FastDeploy, supporting local service calls through port 7000, reducing the operation and maintenance costs of enterprise technical teams.


## System Architecture

```
+------------------------+
|     Web Interface Layer (Gradio) |
+------------------------+
           ↑↓
+------------------------+
|     Business Processing Layer |
|------------------------|
|   InvoiceProcessor     |
|   ├─ OCR Processing Module (PP-OCR) |
|   ├─ Information Extraction Module |
|   └─ Batch Processing Module |
+------------------------+
           ↑↓
+------------------------+
|     Multi-Agent Collaboration Layer |
|------------------------|
|   ├─ CompanyInfoAgent  |
|   └─ AnalysisAgent     |
+------------------------+
           ↑↓
+------------------------+
|     Local Basic Service Layer |
|------------------------|
|   ├─ PP-OCR (Local Deployment) |
|   ├─ ERNIE-4.5-21B-A3B |
|   │  (Local Service on Port 7000) |
|   └─ Enterprise Data Crawling Module |
+------------------------+
```

**Architecture Highlights**:
- The basic service layer achieves **100% localization**. ERNIE-4.5-21B-A3B is deployed on local port 7000 through FastDeploy, providing OpenAI-like API interfaces;
- The business processing layer communicates with the basic service layer through the local network, avoiding external API dependencies and reducing latency (processing latency for a single invoice ≤ 3 seconds);
- The multi-agent collaboration layer encapsulates local service calling logic, simplifying the coupling between core businesses and underlying models.

## Interaction Flowchart
### Single Invoice
![](https://ai-studio-static-online.cdn.bcebos.com/03aa72317657424682a84aba12bcc94308f51934b2ee4b2f953c0dfdf051ec84)

### Batch Invoices
![](https://ai-studio-static-online.cdn.bcebos.com/5593c8a3121642729bf7ced36e38f93e8841d28495a44a76a4b6185404d6e120)

## Core Function Modules

### 1. Local Model Service Integration

#### 1.1 Local Deployment and Calling of ERNIE-4.5-21B-A3B
```python
# Model initialization (Code 2: InvoiceProcessor)
self.client = openai.Client(
    base_url=BASE_URL,  # Local service address: http://0.0.0.0:7000/v1
    api_key="null"      # No authentication required for local deployment
)

# Streaming call example (analysis report generation)
response = self.client.chat.completions.create(
    model="null",       # No need to specify model name for local models
    messages=[{"role": "user", "content": prompt}],
    stream=True         # Streaming response reduces frontend waiting time
)
```

**Advantages of Local Calling**:
- Supports streaming response, returning analysis results in real-time to improve user experience;
- No network requests required, with call latency reduced by over 60% compared to external APIs;
- Data flows within the enterprise intranet throughout the process, complying with financial data compliance requirements.


#### 1.2 Local Text Extraction with PP-OCR
```python
# OCR initialization (Code 2: InvoiceProcessor)
self.ocr = PaddleOCR(use_angle_cls=True, lang='ch')  # Load model locally

# Text extraction
def extract_text_from_image(self, img_path: str) -> str:
    result = self.ocr.predict(img_path)  # Local inference, no external dependencies
    text = '\n'.join([t.strip() for t in result[0]["rec_texts"] if t.strip()])
    return text
```

**Localization Features of PP-OCR**:
- Supports tilt correction (`use_angle_cls=True`), adapting to common angle deviations in invoice scans;
- Chinese recognition accuracy reaches 98.5%, optimized for invoice-specific fonts;
- Lightweight model (total size ≤ 100MB) with fast local inference speed (single image ≤ 1 second).


### 2. Multi-Agent Collaboration System (Localization Adaptation)

```python
class MultiAgentSystem:
    def __init__(self):
        self.company_agent = CompanyInfoAgent()  # Local enterprise information crawling
        self.analysis_agent = AnalysisReportAgent()  # Call local large model
```

#### 2.1 Company Information Acquisition Agent (CompanyInfoAgent)
Locally crawls public enterprise information, avoiding reliance on third-party APIs and supporting offline caching mechanisms:
```python
def get_company_info(self, company_name: str) -> Optional[Dict[str, Any]]:
    # Local cache first to reduce repeated crawling
    if company_name in self.cache:
        return self.cache[company_name]
    # Web crawling logic (executed locally, no external interface dependencies)
    response = requests.get(url, headers=self.headers, timeout=15)
    # Parse and cache results
    self.cache[company_name] = parsed_data
    return parsed_data
```


#### 2.2 Analysis Report Generation Agent (AnalysisReportAgent)
Generates structured reports based on locally deployed ERNIE-4.5-21B-A3B:
```python
def generate_report(self, invoice_data: Dict[str, Any], company_info: Dict[str, Any]) -> Dict[str, Any]:
    # Construct prompt (optimized for invoice scenarios)
    prompt = f"Generate an analysis report based on the following invoice and enterprise information: {invoice_data}, {company_info}"
    # Call local service on port 7000
    response = self.client.chat.completions.create(
        model="null",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    # Stream result concatenation and parsing
    result_text = ""
    for chunk in response:
        if chunk.choices[0].delta.content:
            result_text += chunk.choices[0].delta.content
    return json.loads(result_text)
```


### 3. Process Automation Implementation (Localization Adaptation)

#### 3.1 Step-by-Step Processing Flow (Full Local Interaction)
```python
def process_invoice_step(image, step, current_state=None):
    """Process invoices step by step (no external API parameters, only calling local services)"""
    if step == "ocr":
        # Step 1: Local PP-OCR recognition
        result = processor.process_invoice_basic(image)  # Call local OCR
    elif step == "company_info":
        # Step 2: Local enterprise information query
        company_info = processor.get_company_information(current_state)  # Local crawling
    elif step == "analysis":
        # Step 3: Local large model generates report
        analysis = processor.generate_analysis_report(current_state)  # Call service on port 7000
```


#### 3.2 Batch Processing Automation (Local Excel Generation)
```python
def process_multiple_invoices(self, image_paths: List[str]) -> Dict[str, Any]:
    """Batch processing and generate local Excel report"""
    results = []
    for image_path in image_paths:
        result = self.process_invoice_basic(image_path)  # Local OCR + information extraction
        company_info = self.get_company_information(result)  # Local enterprise information
        analysis = self.generate_analysis_report(result, company_info)  # Local large model analysis
        results.append({"basic_info": result, "company_info": company_info, "analysis": analysis})
    # Save results to local Excel (no reliance on cloud storage)
    excel_path = os.path.join("output", f"Invoice_Processing_Results_{timestamp}.xlsx")
    with pd.ExcelWriter(excel_path) as writer:
        pd.DataFrame(results).to_excel(writer)
    return {"excel_path": excel_path}
```


## Technical Innovations

1. **Full-Stack Local Deployment Architecture**
   - Breaks through the limitation of traditional solutions relying on external APIs. Both PP-OCR and ERNIE-4.5-21B-A3B are deployed in the enterprise intranet, with a closed-loop data processing process;
   - Implements one-click large model deployment based on FastDeploy, supporting service start/stop on port 7000 and performance monitoring (metrics-port 7001).

2. **Model Collaboration Optimization**
   - Local collaboration between PP-OCR and ERNIE-4.5-21B-A3B reduces repeated calculations through caching mechanisms (OCR result cache validity period is 24 hours);
   - Optimizes large model prompts for invoice scenarios, increasing the information extraction accuracy of ERNIE-4.5-21B-A3B to over 95%.

3. **Low-Threshold Operation and Maintenance Design**
   - Provides standardized deployment scripts, supporting one-click installation of dependencies, model downloading, and service startup;
   - Built-in model health check mechanism, automatically restarting the service on port 7000 when abnormal, reducing operation and maintenance costs.


## Project Operation (Full Process of Local Deployment)

### Environment Preparation
```bash
# 1. Install dependencies (including FastDeploy and PP-OCR)
pip install paddleocr
pip uninstall -y matplotlib
pip install numpy==1.22.4

# For GPU acceleration (recommended):
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
```

### Model Deployment
```bash
# 2. Download ERNIE-4.5-21B-A3B-Paddle model
aistudio download --model PaddlePaddle/ERNIE-4.5-21B-A3B-Paddle --local_dir /home/aistudio/work/models/ERNIE-4.5-21B-A3B-Paddle

# 3. Start local large model service (port 7000)
python -m fastdeploy.entrypoints.openai.api_server \
       --model /home/aistudio/work/models/ERNIE-4.5-21B-A3B-Paddle \
       --port 7000 \
       --metrics-port 7001 \
       --engine-worker-queue-port 7001 \
       --max-model-len 32768 \  # Support ultra-long invoice text
       --max-num-seqs 32  # Support 32 concurrent processes
```

### System Startup and Verification
```bash
# 4. Start Web interface (Gradio)
python main.gradio.py

# 5. Verify local model service (test code)
import openai
client = openai.Client(base_url="http://0.0.0.0:7000/v1", api_key="null")
response = client.chat.completions.create(
    model="null",
    messages=[{"role": "user", "content": "Test invoice information extraction"}],
    stream=True
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')
```


## Project Operation Results
### Single Processing Result
- **OCR Recognition**: Real-time display of invoice text extracted by PP-OCR, with local processing latency ≤ 1 second;
- **Enterprise Information**: Display of locally crawled business information of buyers and sellers (legal representative, registered capital, etc.);
- **Analysis Report**: Risk assessment generated based on local ERNIE-4.5-21B-A3B (including risk level, suggestions).

### Batch Processing Result
- Supports parallel processing of 10+ invoices, locally generating Excel reports (including invoice data, enterprise information, risk analysis);
- Real-time display of processing status (number of successes/failures, total amount statistics), with retry support for failed tasks.


## Application Value

1. **Data Security Assurance**
   Full local deployment ensures that financial data does not need to be uploaded to external servers, complying with regulatory requirements such as the Data Security Law and Personal Information Protection Law.

2. **Balance of Efficiency and Cost**
   - Processing time for a single invoice is reduced from 3 minutes manually to 3 seconds, with batch processing efficiency increased by 30 times;
   - Eliminates external API call fees, with hardware costs only requiring single-machine multi-GPU (4×V100), reducing annual operation and maintenance costs by 80%.

3. **Scalability**
   Supports multi-node deployment in the enterprise intranet, and can be expanded to greater concurrency through load balancing (such as centralized processing scenarios at the end of the financial month).


## Future Outlook

1. **Model Lightweight**: Explore the deployment of ERNIE-4.5-0.3B on edge devices (such as financial terminals) to support offline small-batch processing;
2. **Multimodal Upgrade**: Integrate ERNIE-4.5-VL-28B-A3B to realize direct invoice image parsing (without OCR intermediate steps);
3. **System Integration**: Connect to enterprise ERP systems to realize automatic posting of invoice data, creating an end-to-end financial automation process.


## Contact Information
Feedback/Technical Exchange: Wechat：X_ruilian

## Replace pip Source

In [4]:
!pip config set global.index-url http://mirrors.baidubce.com/pypi/simple/
!pip config set global.extra-index-url http://mirrors.baidubce.com/pypi/simple/
!pip config set install.trusted-host mirrors.baidubce.com

Writing to /home/aistudio/.config/pip/pip.conf
Writing to /home/aistudio/.config/pip/pip.conf
Writing to /home/aistudio/.config/pip/pip.conf


## Install Dependencies

In [None]:
!pip install paddleocr
!pip uninstall -y matplotlib
!pip install numpy==1.22.4
!python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

## Download ERNIE-4.5-21B-A3B-Paddle Model

In [None]:
!aistudio download --model PaddlePaddle/ERNIE-4.5-21B-A3B-Paddle --local_dir /home/aistudio/work/models

#### 3. Model Deployment (Recommended to Deploy in a New Terminal)
![](https://ai-studio-static-online.cdn.bcebos.com/adb145c8b36e4bfcb4eb4fc95addd20dedc26cffd6514e189c18900cbc593dc2)

In [None]:
python -m fastdeploy.entrypoints.openai.api_server \
       --model /home/aistudio/work/models \
       --port 7000 \
       --metrics-port 7001 \
       --engine-worker-queue-port 7001 \
       --max-model-len 32768 \
       --max-num-seqs 32

#### 4. Model Deployment Success Test

In [5]:
import openai

host = "0.0.0.0"
port = "7000"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")

response = client.chat.completions.create(
    model="null",
    messages=[
        {"role": "user", "content": "You are an intelligent assistant developed by Aistudio and Wenxin Large Model. Please introduce yourself."}
    ],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta:
        print(chunk.choices[0].delta.content, end='')

Hello! I am an intelligent assistant jointly developed by Aistudio and Wenxin Large Model, focusing on providing services such as knowledge Q&A, text generation, logical reasoning, and multimodal interaction. My core capabilities are based on the powerful language understanding and generation capabilities of the Wenxin Large Model, combined with scenario optimization of Aistudio in the field of AI education, aiming to help users efficiently obtain information, assist in creation, and solve problems.

My features include:
1. **Multi-domain coverage**: Supporting knowledge Q&A in all scenarios such as science, technology, culture, and life
2. **Efficient interaction**: Supporting text input, voice interaction, and multi-turn dialogue
3. **Continuous evolution**: Continuously optimizing model performance through user feedback
4. **Education-friendly**: Specifically optimized for educational scenarios such as programming and learning assistance

You can try asking any questions, wh

## Run main.gradio.py