Learn Autonomous Driving Perception from Scratch – No Complex Dependencies Required
🚗 PaddleAutoPercept (also known as paddle-auto-percept or PAP) is an open-source, beginner-friendly project that implements core autonomous driving perception algorithms from scratch using the PaddlePaddle framework
🚙 This project covers a range of algorithms, from 2D object detection with DETR to 3D surround-view perception with BEVFormer, systematically showcasing their evolution and core concepts.
🚌 Each algorithm is implemented entirely in PaddlePaddle, eliminating complex dependencies and deeply nested framework structures. The code is designed to be clear and intuitive, helping developers grasp the core logic and implementation details of each algorithm.
At least one model’s inference results have been aligned with the official implementation's accuracy, verifying the reliability of the implementations.
- High code complexity – Official implementations often rely on complex framework designs and deeply nested class structures, making the code difficult to read and debug.
- Hard-to-understand implementation details – Key operations like data processing and feature extraction are often poorly documented, making it challenging for beginners.
- Inconsistent concepts across models – Similar concepts are implemented differently across different models, making it difficult for learners to form a systematic understanding.
- Gap between theory and practice – There is often a significant gap between the descriptions in research papers and the actual code implementations, with a lack of beginner-friendly materials to bridge this gap.
- High hardware and environment requirements – Many implementations require high-end hardware and complex dependency configurations, making it difficult for learners to experiment and run inference quickly.
-
✅ Comprehensive Coverage
- This project implements core perception algorithms from scratch, ranging from DETR to BEVFormer, gradually demonstrating the evolution from 2D object detection to 3D surround-view perception. Each algorithm is independently implemented, free from the complex dependencies and framework nesting found in official repositories, helping learners deeply understand the core logic of each algorithm.
-
🔹 Learn from Scratch
- The project adopts a simplified design, avoiding complex framework structures and interface hierarchies. The code is clean and well-commented, allowing learners to quickly get started and gradually master every aspect of algorithm implementation.
-
⚡ Minimal Dependencies
- No complicated configurations or external dependencies. The inference pipeline runs on CPU, making it accessible even on low-resource devices like MacBooks. Significantly lowers the hardware barrier, allowing for easy learning and experimentation.
-
📐 Unified Code Structure
- All models follow a consistent coding style and structure, making it easier for learners to compare and understand the similarities and differences between different models. Helps learners grasp the overall architecture and evolution of autonomous driving perception algorithms.
-
🛠 Independent Implementations
- Data preprocessing is implemented independently, with a simple design, making it easier to understand data flow and input-output structures.
The project is not just a code repository—it serves as a detailed learning guide to help every participant master the core logic of autonomous driving perception algorithms.
- DETR (https://arxiv.org/abs/2005.12872)
- DeformableDETR (https://arxiv.org/abs/2010.04159)
- DETR3D (https://arxiv.org/abs/2110.06922)
- BEVFormer (https://arxiv.org/abs/2203.17270)
Ensure you have PaddlePaddle >= 2.6 installed:
- Paddle >= 2.6 (https://www.paddlepaddle.org.cn/)
- CPU:
pip install paddlepaddle==2.6.2 - GPU:
pip install paddlepaddle-gpu==2.6.2
- CPU:
-
Clone the repository:
git clone https://github.com/xperzy/paddle-auto-percept.git cd paddle-auto-percept cd detr
-
Download model weights:
-
Run the inference:
- Copy the downloaded model weights to the project directory (or modify the weight path in main.py). Then, execute:
python main.py
- (For DETR3D and BEVFormer) Preprocess the NuScenes dataset:
- NuScenes dataset processing required before running DETR3D and BEVFormer. (Please refer to original paper github repos for now)
- Copy the downloaded model weights to the project directory (or modify the weight path in main.py). Then, execute:
