Wenliang Zhao

I am a fifth-year Ph.D student in the Department of Automation at Tsinghua University, advised by Prof. Jiwen Lu and Prof. Jie Zhou. In 2020, I obtained my B.Eng. in the Department of Automation, Tsinghua University.

I am broadly interested in computer vision and deep learning. My current research focuses on model architectures and generative models.

Email / Google Scholar / Github

News

2024-7: DC-Solver is accepted to ECCV 2024.

2023-9: UniPC is accepted to NeurIPS 2023.

2023-07: VPD is accepted to ICCV 2023.

2022-09: HorNet is accepted to NeurIPS 2022.

2022-03: Check out our work at CVPR 2022 on language-guided dense prediction (DenseCLIP).

2021-09: GFNet and DynamicViT are accepted to NeurIPS 2021.

2021-07: 2 papers on video understanding and interpretable metric learning are accepted to ICCV 2021.

Publications

* equal contribution † project leader

	DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation Wenliang Zhao, Haolin Wang, Jie Zhou , Jiwen Lu European Conference on Computer Vision (ECCV), 2024 [arXiv] [Code] DC-Solver is designed to improve alignment in predictor-corrector diffusion samplers (while also applicable to predictor-only samplers). With negligible search costs, DC-Solver can achieve as few as 5 sampling steps (NFE).
	FlowIE: Efficient Image Enhancement via Rectified Flow Yixuan Zhu, Wenliang Zhao*†, Ao Li, Yansong Tang, Jie Zhou , Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024* Oral Presentation [arXiv] [Code] FlowIE is the first flow-based image enhancement framework that supports various tasks and is efficient in both training (simulation-free) and inference (<5 sampling steps).
	UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou , Jiwen Lu Conference on Neural Information Processing Systems (NeurIPS), 2023 [arXiv] [Code] [Project Page] UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders.
	Unleashing Text-to-Image Diffusion Models for Visual Perception Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu Jie Zhou, Jiwen Lu IEEE International Conference on Computer Vision (ICCV)*, 2023 [arXiv] [Code] [Project Page] [Rank 1st on NYUv2 Depth Estimation] VPD (Visual Perception with Pre-trained Diffusion Models) is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
	HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions Yongming Rao, Wenliang Zhao*, Yansong Tang, Jie Zhou , Ser-Nam Lim , Jiwen Lu NeurIPS*, 2022 [arXiv] [Code] [Project Page] [中文解读] HorNet is a family of generic vision backbones that perform explicit high-order spatial interactions based on Recursive Gated Convolution.
	DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Jie Zhou, Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 [arXiv] [Code] [Project Page] [中文解读] DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
	Global Filter Networks for Image Classification Yongming Rao, Wenliang Zhao, Zheng Zhu , Jiwen Lu , Jie Zhou Conference on Neural Information Processing Systems (NeurIPS), 2021 [arXiv] [Code] [Project Page] [中文解读(By HappyAIWalker)] Global Filter Networks is a transformer-style architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
	DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification Yongming Rao, Wenliang Zhao, Benlin Liu , Jiwen Lu , Jie Zhou , Cho-Jui Hsieh Conference on Neural Information Processing Systems (NeurIPS), 2021 [arXiv] [Code] [Project Page] [知乎] We present a dynamic token sparsification framework to prune redundant tokens in vision transformers progressively and dynamically based on the input.
	Towards Interpretable Deep Metric Learning with Structural Matching Wenliang Zhao, Yongming Rao, Ziyi Wang, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 [arXiv] [Code] We present a framework (DIML) to add interpretability to metric learning and improve the performance of deep metric learning models.
	Group-aware Contrastive Regression for Action Quality Assessment Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 We propose a new contrastive regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for assessment.