Chunyuan Li

I am a senior researcher at Microsoft Research, Redmond. My recent research focuses on large-scale pre-training in computer vision and natural language processing. Some recent works include:

  • Vision-and-language pre-training [1, 2, 3]
  • Self-supervised visual reprensetation learning [1]
  • Deep generative models at scale [1, 2, 3]

I completed my PhD in machine learning at Duke University, advised by Prof. Lawrence Carin. My PhD research studied probabilistic deep learning.


Dec 7, 2021 A vision-language approach to visual recognition:
  • [Florence]: A new backbone learner demonstreates the power of unified language-image-label contrast, offering superior performance over CLIP
  • [GLIP]: An object-level language-image model for object detection and phrase grounding
Nov 27, 2021 Our generative model Lafite :champagne: achieves SoTA text-to-image sythesis performance: on par with DALL-E with only 1% model size.
Sep 28, 2021 Our Focal Transformer is accepted to NeurIPS 2021 as Spotlight! :sparkles: :smile:
Jun 17, 2021 We are releasing EsViT, a much more efficient self-supervised visual learning pipeline reaching new SoTA on ImageNet-1k linear probe!

recent publications

  1. EsViT
    Efficient Self-supervised Vision Transformers for Representation Learning
    Li, Chunyuan, Yang, Jianwei, Zhang, Pengchuan, Gao, Mei, Xiao, Bin, Dai, Xiyang, Yuan, Lu, and Gao, Jianfeng
    arXiv 2021
  2. Focal Transformer
    Focal self-attention for local-global interactions in vision transformers
    Yang, Jianwei, Li, Chunyuan, Zhang, Pengchuan, Dai, Xiyang, Xiao, Bin, Yuan, Lu, and Gao, Jianfeng
    NeurIPS (Spotlight Presentation) 2021