About Me

I am a research scientist at Cargenie Mellon University (CMU) and MBZUAI. I currently co-lead the Causal Learning and Reasoning (CLeaR) Group with Prof. Kun Zhang . Prior to that, I received both my Ph.D. and B.S. degrees from Tsinghua University, advised by Prof. Jie Zhou and Prof. Jiwen Lu . My research interests include causality, representation learning, and computer vision. A central focus of my current work is to develop principled and practical methods for learning meaningful visual representations that support recognization, understanding, generation, and reasoning.

I’m currently on the academic job market! If you’re interested in my research and believe I could be a strong addition to your department, I’d be glad to connect. Please don’t hesitate to reach out by dropping me an e-mail guangyichen1994(at)gmail.com .

News

  • 2026-02: 1 paper on casaul representation learning for compositional generalization is accepted by CVPR'2026.
  • 2026-01: 4 papers on causal representation learning, reasoning, causal discovery, and generative models are accepted by ICLR'2026. We propose SR2, a framework that explains recurrent reasoning models (e.g., HRM) through a selection mechanism [Paper], and [Code].
  • 2026-01: I gave a tutorial at AAAI 2026 on representation learning for LLMs. The slides and paper list are available [tutorials]. Many thanks to Hanqi and Jonathan for their contributions to the success of the tutorial.
  • 2025-11: I have a wonderful visit to HongKong and give some talks on causal representation learning at HKU, CUHK, HKUST, and HKBU. Thanks so much to Prof. Yin Guosheng , Prof. Farzan Farnia , Prof. Raymond Wong , and Prof. Bo Han for their hosts.
  • 2025-10: Excited to release CausalVerse, a NeurIPS 2025 Spotlight! 🎉 It’s the first comprehensive benchmark for causal representation learning with controllable, high-fidelity simulations. Welcome to join us to make CRL practical and powerful: Arxiv, Project page, Datasets, and Code.
  • 2025-09: 5 papers on causal representation learning, vision language model, and generative models are accepted by NeurIPS'2025.
  • 2025-08: I give a talk at UCSD to discuss causal representation learning and trustworthy AI [slides], thanks Biwei for the invitation!
  • 2025-08: 1 paper on causal representation learning is accepted by TPAMI.
  • 2025-06: I give a talk at City University of Hong Kong, thanks Prof. Fenglei Fan for the invitation!
  • 2025-05: 3 papers on demonstating causal representation learning and causal lens in Diffusion model, LLM, and domain adaptation are accepted by ICML'2025.
  • 2025-04: I give talks at NTU and NUS to discuss causal representation in visual understaning [slides], thanks Prof. Dacheng Tao and Prof. Xiaokui Xiao for hosting me!
  • 2025-03: SmartCLIP is selected as Highlight by CVPR'2025.
  • 2025-02: IDOL is selected as Oral by ICLR'2025.
  • 2025-02: 1 paper on casaul representation learning and CLIP model is accepted by CVPR'2025.
  • 2025-01: 4 papers on casaul representation learning are accepted by ICLR'2025.
  • 2025-01: 1 paper on casaul representation learning is accepted by WWW'2025.
  • Research Topics

    Tem-adapter

    Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer

    We present Tem-Adapter, a method for video question answering that leverages image-based knowledge and introduces temporal and semantic alignment.
    Causal MAE

    Understanding Masked Autoencoders via Hierarchical Latent Variable Models

    We provide a causal perspective on masked autoencoders and analyze how latent variables can be identified in hierarchical generative models.
    GAIN

    GAIN: On the Generalization of Instructional Action Understanding

    We introduce GAIN, a benchmark for analyzing the generalization ability of instructional action understanding models.
    MID

    Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion

    We formulate trajectory prediction as the reverse process of motion indeterminacy diffusion and develop a generative framework for stochastic forecasting.
    Spatial-temporal attention

    Spatial-Temporal Attention-aware Learning for Video-based Person Re-identification

    We develop a spatial-temporal attention framework to discover salient visual clues across both spatial and temporal dimensions.
    Unintentional action localization

    Probabilistic Temporal Modeling for Unintentional Action Localization

    We propose a probabilistic framework with dense temporal predictions to model annotation uncertainty in unintentional action localization.
    Counterfactual examples

    Unintentional Action Localization via Counterfactual Examples

    We disentangle content and intention effects by constructing counterfactual video examples and learning hidden causal processes contrastively.
    Counterfactual trajectory prediction

    Human Trajectory Prediction via Counterfactual Analysis

    We propose a counterfactual framework for human trajectory prediction to investigate causal effects and mitigate environmental bias.
    Counterfactual attention learning

    Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

    We propose a counterfactual attention learning framework that evaluates attention quality and provides an effective supervisory signal for representation learning.
    Attention pyramid

    Person Re-identification via Attention Pyramid

    We propose attention pyramid networks based on a split-attend-merge-stack design to learn multi-scale attention-based representations.
    CaRiNG

    CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process

    We propose CaRiNG to learn temporal causal representations under non-invertible generation processes, enabling more identifiable and structured representation learning from sequential observations.
    Temporal coherence

    Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?

    We show that temporal coherence is more critical than temporal motion for video-based person re-identification and introduce adversarial feature augmentation to emphasize it.
    Self-critical attention learning

    Self-Critical Attention Learning for Person Re-Identification

    We present a self-critical attention learning method in which a critic module examines and supervises attention-based representation learning.
    PLOT

    Prompt Learning with Optimal Transport for Vision-Language Models

    We learn multiple comprehensive prompts with optimal transport to better adapt pretrained vision-language models.
    DenseCLIP

    DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

    DenseCLIP adapts pretrained CLIP knowledge, both implicitly and explicitly, for dense prediction tasks.
    CausalVerse

    CausalVerse: A Comprehensive Benchmark for Causal Representation Learning with Controllable High-Fidelity Simulations

    CausalVerse provides a comprehensive benchmark for causal representation learning with controllable, high-fidelity simulations enabling systematic evaluation.
    Self-Refinement

    Towards Self-Refinement of Vision-Language Models with Triangular Consistency

    We propose a self-refinement framework for vision-language models that improves instruction tuning through synthetic generation and triangular consistency filtering.
    Learning Vision and Language Concepts for Controllable Image Generation

    Learning Vision and Language Concepts for Controllable Image Generation

    We study how to learn vision and language concepts for controllable image generation, enabling more interpretable and flexible control over generated visual content.
    Learning Discrete Concepts in Latent Hierarchical Models

    Learning Discrete Concepts in Latent Hierarchical Models

    We study discrete concept learning in latent hierarchical models, providing a structured framework for controllable and interpretable generation.
    Temporally disentangled representation learning

    Temporally Disentangled Representation Learning

    We propose a framework to recover time-delayed latent causal variables and identify their relations from sequential data under stationary environments and distribution shifts.
    LLCP

    LLCP: Learning Latent Causal Processes for Reasoning-based Video Question Answering

    We propose LLCP to learn latent causal processes for reasoning-based video question answering, enabling more structured temporal understanding and causal reasoning in videos.
    Controllable Video Generation with Provable Disentanglement

    Controllable Video Generation with Provable Disentanglement

    We study controllable video generation with provable disentanglement, enabling more interpretable control over dynamic visual content.
    On Causal Discovery in the Presence of Deterministic Relations

    On Causal Discovery in the Presence of Deterministic Relations

    We study causal discovery in the presence of deterministic relations and develop a principled framework for identifying causal structure under such dependencies.
    Structural Estimation of Partially Observed Linear Non-Gaussian Acyclic Model

    Structural Estimation of Partially Observed Linear Non-Gaussian Acyclic Model: A Practical Approach with Identifiability

    We propose a practical and identifiable approach for structural estimation in partially observed linear non-Gaussian acyclic models.

    Publications

    Selected papers are shown by default. Please see Google Scholar for the complete publication list.

    Honors and Awards

  • World’s Top 2% Scientists, Stanford–Elsevier, 2025
  • Jiang Zhen Scholarship, Tsinghua University, 2020
  • 2nd place in Semi-Supervised Recognition Challenge at FGVC7, CVPR, 2020
  • Samsung Scholarship, 2019
  • Academic Excellence Scholarship, Tsinghua University, 2015
  • Tsinghua Scholarship, Tsinghua University, 2014
  • National Encouragement Scholarship, Ministry of Education of P.R. China, 2014
  • National Encouragement Scholarship, Ministry of Education of P.R. China, 2013
  • Teaching

  • TA Analog Electronic Technology Foundation, Tsinghua University, 2018.
  • TA Analog Electronic Technology Foundation, Tsinghua University, 2019.
  • TA Numerical Analysis and Algorithm, Tsinghua University, 2019.
  • TA Probabilistic and Statistical Inference (ML703), MBZUAI,2021, Fall.
  • TA Probabilistic and Statistical Inference (ML703), MBZUAI,2022, (Spring, Fall).
  • TA Advanced Probabilistic and Statistical Inference (ML803), MBZUAI,2023, (Spring, Fall).
  • TA Advanced Probabilistic and Statistical Inference (ML803), MBZUAI,2024, Spring.
  • Academic Services

  • Publicity Chair: for CLeaR 2023.
  • Area Chair: for ICLR 2026, ICML 2026.
  • Co-organizer:
    • The Third Workshop on Human Identification in Multimedia at ICME 2019 – [Website]
    • The Workshop on Causal Representation Learning at NeurIPS 2024 – [Website]
    • The Workshop on Causal Representation Learning at ICDM 2024 – [Website]
  • Conference Reviewer / Program Committee Member: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, AAAI, and so on.
  • Journal Reviewer: TPAMI, TIP, IJCV, TNNLS, TMM, TCSVT, and so on.