Wenliang Zhao

I am a fourth-year Ph.D student in the Department of Automation at Tsinghua University, advised by Prof. Jiwen Lu and Prof. Jie Zhou. In 2020, I obtained my B.Eng. in the Department of Automation, Tsinghua University.

I am broadly interested in computer vision and deep learning. My current research focuses on model architectures and generative models.

Email  /  Google Scholar  /  Github

profile photo

  • 2023-9: UniPC is accepted to NeurIPS 2023.
  • 2023-07: VPD is accepted to ICCV 2023.
  • 2022-09: HorNet is accepted to NeurIPS 2022.
  • 2022-03: Check out our work at CVPR 2022 on language-guided dense prediction (DenseCLIP).
  • 2021-09: GFNet and DynamicViT are accepted to NeurIPS 2021.
  • 2021-07: 2 papers on video understanding and interpretable metric learning are accepted to ICCV 2021.
  • Publications

    * equal contribution     project leader

    dise FlowIE: Efficient Image Enhancement via Rectified Flow
    Yixuan Zhu*, Wenliang Zhao*, Ao Li, Yansong Tang, Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    Oral Presentation
    [arXiv] [Code]

    FlowIE is the first flow-based image enhancement framework that supports various tasks and is efficient in both training (simulation-free) and inference (<5 sampling steps).

    dise UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
    Wenliang Zhao*, Lujia Bai*, Yongming Rao, Jie Zhou , Jiwen Lu
    Conference on Neural Information Processing Systems (NeurIPS), 2023
    [arXiv] [Code] [Project Page]

    UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders.

    dise Unleashing Text-to-Image Diffusion Models for Visual Perception
    Wenliang Zhao*, Yongming Rao*, Zuyan Liu*, Benlin Liu Jie Zhou, Jiwen Lu
    IEEE International Conference on Computer Vision (ICCV), 2023
    [arXiv] [Code] [Project Page] [Rank 1st on NYUv2 Depth Estimation]

    VPD (Visual Perception with Pre-trained Diffusion Models) is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.

    dise HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
    Yongming Rao*, Wenliang Zhao*, Yansong Tang, Jie Zhou , Ser-Nam Lim , Jiwen Lu
    NeurIPS, 2022
    [arXiv] [Code] [Project Page] [中文解读]

    HorNet is a family of generic vision backbones that perform explicit high-order spatial interactions based on Recursive Gated Convolution.

    dise DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
    Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Jie Zhou, Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    [arXiv] [Code] [Project Page] [中文解读]

    DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.

    dise Global Filter Networks for Image Classification
    Yongming Rao*, Wenliang Zhao*, Zheng Zhu , Jiwen Lu , Jie Zhou
    Conference on Neural Information Processing Systems (NeurIPS), 2021
    [arXiv] [Code] [Project Page] [中文解读(By HappyAIWalker)]

    Global Filter Networks is a transformer-style architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.

    dise DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
    Yongming Rao, Wenliang Zhao, Benlin Liu , Jiwen Lu , Jie Zhou , Cho-Jui Hsieh
    Conference on Neural Information Processing Systems (NeurIPS), 2021
    [arXiv] [Code] [Project Page] [知乎]

    We present a dynamic token sparsification framework to prune redundant tokens in vision transformers progressively and dynamically based on the input.

    dise Towards Interpretable Deep Metric Learning with Structural Matching
    Wenliang Zhao*, Yongming Rao*, Ziyi Wang, Jiwen Lu , Jie Zhou
    IEEE International Conference on Computer Vision (ICCV), 2021
    [arXiv] [Code]

    We present a framework (DIML) to add interpretability to metric learning and improve the performance of deep metric learning models.

    dise Group-aware Contrastive Regression for Action Quality Assessment
    Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu , Jie Zhou
    IEEE International Conference on Computer Vision (ICCV), 2021

    We propose a new contrastive regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for assessment.

    Honors and Awards

  • 2020 Outstanding Undergraduate, Tsinghua University
  • 2018 Tang Lixin Scholarship, Tsinghua University
  • 2019 Tsinghua Presidential Award Nomination, Tsinghua University
  • 2018 Zheng Weimin Scholarship, Tsinghua University
  • 2018 Jiang Nanxiang Scholarship, Tsinghua University
  • 2018 1st prize in 36th Challenge Cup, Tsinghua University
  • 2017 Qualcomm Scholarship
  • 2017 National Scholarship, Tsinghua University
  • Academic Services

  • Conference Reviewer CVPR 2022, ECCV 2022, ICME 2022, ICCV 2023, NeurIPS 2023, ICLR 2024
  • Journal Reviewer T-IP

  • Website Template

    © Wenliang Zhao | Last updated: August 3, 2021