I am an undergraduate student in the 2024 AI Advanced Technology Leader Class at the Harbin Institute of Technology. My research interests are rooted in Computer Vision and Multi-modal AI. I am deeply passionate about Low-Light Image Enhancement (LLIE) and Vision-Language Models (VLMs), and I am actively exploring Vision-Language-Action (VLA) architectures to bridge perception with embodied intelligence.
- CVPR 2026 NTIRE Workshop: Co-authored the official technical report for the NTIRE 2026 Efficient Low-Light Image Enhancement (LLIE) Challenge.
- NTIRE 2026 Challenge Results: Developed MobileIE-6Ch, an ultra-lightweight Retinex-style model with only 101.9K parameters, achieving excellent efficiency-performance trade-offs:
- Rank 7 in the Main Technical-Report Table
- Rank 9 in the Full Final-Testing Table
- Preprint: [Efficient Low-Light Image Enhancement for NTIRE 2026]
- Low-Level Vision (LLIE): Focusing on extreme model compression, lightweight CNNs/ViTs, and real-time image restoration.
- Vision-Language Models (VLMs): Investigating multi-modal understanding, representation alignment, and generative AI via CLIP, LLaVA, and BLIP.
- Vision-Language-Action (VLA): Transitioning multi-modal perception into actionable intelligence for next-generation embodied AI systems.
- Frameworks & Deep Learning: PyTorch, torchvision, Transformers, Hugging Face
- Core Architectures: MobileIE-6Ch, Retinexformer, LLaVA, BLIP
- AI-Assisted Development: Highly proficient in "Vibecoding"—leveraging AI agents and LLMs to rapidly prototype complex neural network modules, implement experimental pipelines, and accelerate research iterations.
Always open to discussing CV research, MLLMs, Embodied AI (VLA), and potential collaborations! 🚀