Dr. Xiaoshuai Hao | Multimodal | Best Researcher Award
Researcher at Beijing Academy of Artificial Intelligence, Chinađź“–
Xiaoshuai Hao is an AI researcher specializing in multimodal learning, large-scale model pretraining, and cross-modal retrieval. He earned his Ph.D. in Information Engineering from the University of Chinese Academy of Sciences, focusing on text-video retrieval and multimodal AI. With professional experience spanning leading AI institutions, he has worked as a researcher at the Beijing Academy of Artificial Intelligence, a senior AI researcher at Samsung Research China, and an applied scientist at Amazon AWS AI Lab. His contributions include innovations in embodied intelligence, robust autonomous driving perception, and high-precision mapping, with multiple patents to his name.
Xiaoshuai has published in top-tier AI conferences such as CVPR, ICCV, and ICRA and serves as a reviewer for premier journals and conferences, including IEEE TCSVT, IEEE TMM, CVPR, AAAI, and IJCAI. He has achieved top rankings in international AI competitions, including 1st place at EPIC-KITCHENS-100 (CVPR 2021) and multiple podium finishes in OOD-CV (ICCV 2023) and The RoboDrive Challenge (ICRA 2024). Recognized for his excellence, he has received the Samsung Research China Outstanding Employee Award and multiple academic honors.
Profile
Education Background🎓
- Ph.D. in Information Engineering, University of Chinese Academy of Sciences, China (2017–2023)
- Research Focus: Text-video cross-modal retrieval, multimodal learning, large model pretraining
- B.Eng. in Network Engineering, Shandong University of Science and Technology, China (2013–2017)
- National Scholarship, Outstanding Student of Shandong Province
Professional Experience🌱
- Beijing Academy of Artificial Intelligence (2024–Present) – Researcher in Embodied Multimodal Large Models
- Samsung Research China (2023–2024) – Senior AI Researcher in robust autonomous driving perception and BEV-based multimodal fusion
- Amazon AWS AI Lab (2021–2022) – Applied Scientist (Intern), working on large-scale multimodal pretraining and MixGen data augmentation for vision-language learning
- Multimodal AI (vision, language, and embodied intelligence)
- Large-scale model pretraining and fine-tuning
- Autonomous driving and high-precision mapping
- Cross-modal retrieval and knowledge fusion
- First author of multiple patents on multimodal mapping, visual-language navigation, and robust perception
- Published in top-tier AI conferences (CVPR, ICCV, ICRA)
- Reviewer for CVPR, AAAI, IJCAI, ACM MM, IEEE TCSVT, and IEEE TMM
- Notable Competitions:
- 1st place: EPIC-KITCHENS-100 2021 Multi-Instance Retrieval (CVPR 2021)
- 3rd place: The RoboDrive Challenge (ICRA 2024), EPIC-KITCHENS-100 2022, OOD-CV (ICCV 2023), EPIC-Sounds 2023 (CVPR 2023)
Awards & Honors
- Samsung Research China Outstanding Employee Award (2023)
- University of Chinese Academy of Sciences Outstanding Student & Student Leader (2021–2022, 2017–2018)
1. MixGen: A New Multi-Modal Data Augmentation
- Authors: X. Hao, Y. Zhu, S. Appalaraju, A. Zhang, W. Zhang, B. Li, M. Li
- Conference: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023
- Citations: 108
- Summary: Proposes MixGen, a multimodal data augmentation method for vision-language representation learning, improving data efficiency through semantic-based synthetic data generation.
2. The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
- Authors: L. Kong, S. Xie, H. Hu, Y. Niu, W.T. Ooi, B.R. Cottereau, L.X. Ng, Y. Ma, W. Zhang, X. Hao, et al.
- Conference: ICRA 2024 Technical Report
- Citations: 23
- Summary: Addresses robustness in autonomous driving through a large-scale benchmark evaluating real-world conditions for perception models.
3. Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval
- Authors: X. Hao, W. Zhang, D. Wu, F. Zhu, B. Li
- Conference: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
- Citations: 21
- Summary: Introduces a domain adaptation framework for video-text retrieval, aligning multimodal representations across different datasets.
4. The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)
- Authors: S. Albanie, Y. Liu, A. Nagrani, A. Miech, E. Coto, I. Laptev, R. Sukthankar, X. Hao, et al.
- Platform: arXiv preprint arXiv:2008.00744, 2020
- Citations: 15
- Summary: A benchmarking challenge for evaluating video understanding models across multiple tasks.
5. Is Your HD Map Constructor Reliable Under Sensor Corruptions?
- Authors: X. Hao, M. Wei, Y. Yang, H. Zhao, H. Zhang, Y. Zhou, Q. Wang, W. Li, L. Kong, et al.
- Conference: NeurIPS 2024
- Citations: 13
- Summary: Examines the robustness of high-definition map construction models against real-world sensor corruptions.
Conclusion
Dr. Xiaoshuai Hao is a highly deserving candidate for the Best Researcher Award in the field of Multimodal AI. His pioneering research, strong industry-academic footprint, and leadership in AI competitions make him an exceptional candidate. While his research already holds global recognition, further industry collaborations, AI policy engagements, and broader application areas could elevate his influence even more.