Wenhao Wu whwu95

Hi, I'm Wenhao Wu 👋

Wenhao Wu (吴文灏🇨🇳) is a Ph.D. student in the School of Computer Science at The University of Sydney, supervised by Prof. Wanli Ouyang. I have a close collaboration with Department of Computer Vision Technology (VIS) at Baidu led by Dr. Jingdong Wang (IEEE Fellow). I received my M.S.E degree from Multimedia Laboratory (MMLab@SIAT), University of Chinese Academy of Sciences, supervised by Prof. Shifeng Chen and Prof. Yu Qiao. I was also fortunate to intern/RA at MMLab@CUHK, Baidu, iQIYI, SenseTime, Samsung Research and Chinese Academy of Sciences. I am honored to be awarded the 11th Baidu PhD Fellowship (2023).

My current research interest includes Cross-Modal Learning and Video Understanding. I have published 20+ papers at the top international CV/AI conferences or journals such as CVPR/ICCV/ECCV/AAAI/IJCAI/ACMMM/TPAMI/IJCV.

🔭 Research Interest

My research interests broadly lie in the areas of Computer Vision and Deep Learning, including:

Cross-Modal Learning (2022-Present): Video-Language Matching, Multimodal Large Language Model (MLLM)
Video Foundation Model (2017-Present): Video Recognition, Efficient Video Tuning
Video-related Applications (2017-2022): Video Sampler, Temporal Action Detection, Anomaly Detction in Video
Self-supervised Learning (2021-2022): Contrastive Video Learning, Masked Video Modeling
Low-level Vision (2021-2022): Image Colorization, Style Transfer, Image Rescaling

🔥 News

2024.05: The extension of Cap4Video has been accepted by TPAMI.
2024.01: I am honored to receive the 11th🎖Baidu Scholarship🎖, a prestigious fellowship awarding 200,000 RMB (about $30,000) to a select 10 PhD students worldwide in Artificial Intelligence, selected from thousands of applicants.
2023.11: We release GPT4Vis , which provides a Quantitative Evaluation of GPT-4 for Visual Understanding across images, videos and point clouds, spinning on 16 popular datasets.
2023.11: We release Side4Video , a Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning, which significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
2023.08: The extension of Text4Vis has been accepted by IJCV.
2023.07: Two First-author papers (Temporal Modeling: ATM , Cross-Modal Retrieval: UA ) are accepted by ICCV2023.
2023.02: Two First-author papers for video understanding (BIKE , Cap4Video ) are accepted by CVPR 2023. Cap4Video involves GPT to enhance text-video learning, is selected as a 🎉Highlight paper🎉 (Top 2.5%).
2022.11: Two papers (Video Recognition: Text4Vis , Style Transfer: AdaCM) are accepted by AAAI 2023.
2022.07: Three papers (Video Sampling: NSNet, TSQNet, Cross-Modal Learning: CODER) are accepted by ECCV 2022.
2022.06: Our MaMiCo, a new video self-supervised learning work, is accepted by ACMMM 2022 (🎉Oral Presentation🎉).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wenhao Wu whwu95

Achievements

Achievements

Highlights

Block or report whwu95

Hi, I'm Wenhao Wu 👋

🔭 Research Interest

🔥 News

Pinned Loading