Paddle Auto Percept

Learn Autonomous Driving Perception from Scratch – No Complex Dependencies Required

Learn Autonomous Driving Perception Models from Scratch

🚗 PaddleAutoPercept (also known as paddle-auto-percept or PAP) is an open-source, beginner-friendly project that implements core autonomous driving perception algorithms from scratch using the PaddlePaddle framework

🚙 This project covers a range of algorithms, from 2D object detection with DETR to 3D surround-view perception with BEVFormer, systematically showcasing their evolution and core concepts.

🚌 Each algorithm is implemented entirely in PaddlePaddle, eliminating complex dependencies and deeply nested framework structures. The code is designed to be clear and intuitive, helping developers grasp the core logic and implementation details of each algorithm.

At least one model’s inference results have been aligned with the official implementation's accuracy, verifying the reliability of the implementations.

Challenges in Learning Autonomous Driving Perception

High code complexity – Official implementations often rely on complex framework designs and deeply nested class structures, making the code difficult to read and debug.
Hard-to-understand implementation details – Key operations like data processing and feature extraction are often poorly documented, making it challenging for beginners.
Inconsistent concepts across models – Similar concepts are implemented differently across different models, making it difficult for learners to form a systematic understanding.
Gap between theory and practice – There is often a significant gap between the descriptions in research papers and the actual code implementations, with a lack of beginner-friendly materials to bridge this gap.
High hardware and environment requirements – Many implementations require high-end hardware and complex dependency configurations, making it difficult for learners to experiment and run inference quickly.

Project Highlights

✅ Comprehensive Coverage
- This project implements core perception algorithms from scratch, ranging from DETR to BEVFormer, gradually demonstrating the evolution from 2D object detection to 3D surround-view perception. Each algorithm is independently implemented, free from the complex dependencies and framework nesting found in official repositories, helping learners deeply understand the core logic of each algorithm.
🔹 Learn from Scratch
- The project adopts a simplified design, avoiding complex framework structures and interface hierarchies. The code is clean and well-commented, allowing learners to quickly get started and gradually master every aspect of algorithm implementation.
⚡ Minimal Dependencies
- No complicated configurations or external dependencies. The inference pipeline runs on CPU, making it accessible even on low-resource devices like MacBooks. Significantly lowers the hardware barrier, allowing for easy learning and experimentation.
📐 Unified Code Structure
- All models follow a consistent coding style and structure, making it easier for learners to compare and understand the similarities and differences between different models. Helps learners grasp the overall architecture and evolution of autonomous driving perception algorithms.
🛠 Independent Implementations
- Data preprocessing is implemented independently, with a simple design, making it easier to understand data flow and input-output structures.

The project is not just a code repository—it serves as a detailed learning guide to help every participant master the core logic of autonomous driving perception algorithms.

Models

DETR (https://arxiv.org/abs/2005.12872)
DeformableDETR (https://arxiv.org/abs/2010.04159)
DETR3D (https://arxiv.org/abs/2110.06922)
BEVFormer (https://arxiv.org/abs/2203.17270)

Installation

Ensure you have PaddlePaddle >= 2.6 installed:

Paddle >= 2.6 (https://www.paddlepaddle.org.cn/)
- CPU： pip install paddlepaddle==2.6.2
- GPU： pip install paddlepaddle-gpu==2.6.2

Quick Start

Clone the repository：

git clone https://github.com/xperzy/paddle-auto-percept.git
cd paddle-auto-percept
cd detr

Download model weights：
- DETR: https://huggingface.co/xperzy/detr-r50-paddle
- DeformableDETR: https://huggingface.co/xperzy/deformable-detr-r50-paddle
- DETR3D: https://huggingface.co/xperzy/detr3d-r50-paddle
- BEVFormer: https://huggingface.co/xperzy/bevformer-r101-paddle
Run the inference：
- Copy the downloaded model weights to the project directory (or modify the weight path in main.py). Then, execute:
```
python main.py
```
- (For DETR3D and BEVFormer) Preprocess the NuScenes dataset：
  - NuScenes dataset processing required before running DETR3D and BEVFormer. (Please refer to original paper github repos for now)

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
bevformer		bevformer
deformable_detr		deformable_detr
detr		detr
detr3d		detr3d
docs		docs
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paddle Auto Percept

Learn Autonomous Driving Perception Models from Scratch

Challenges in Learning Autonomous Driving Perception

Project Highlights

Models

Installation

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

xperzy/paddle-auto-percept

Folders and files

Latest commit

History

Repository files navigation

Paddle Auto Percept

Learn Autonomous Driving Perception Models from Scratch

Challenges in Learning Autonomous Driving Perception

Project Highlights

Models

Installation

Quick Start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages