Sizhe Chen (陈思哲)

Biography

Hi! I am a CS Ph.D. candidate at UC Berkeley in Berkeley AI Research (BAIR), working with Prof. David Wagner. My research is supported by fundings from NVIDIA Fellowship, Meta FAIR, Google DeepMind, and UCB EECS. Image

I study AI security in real-world applications. Currently, I am defending against prompt injection attacks, the top-1 threat to AI agents. Prompt injection has caused actual harm on multiple AI systems from Google, OpenAI, Anthropic, Slack, etc. To open up broader usage of LLMs in agents, I develop principled, general, and practical prompt injection defenses. Our SoTA training recipe yields Meta-SecAlign LLMs (downloaded 10K times in 3 months), which have an order of magnitude less attack success rates against various prompt injections. Meta-SecAlign-70B enjoys the same commercial-grade utility after our defensive fine-tuning, and is ready for commercial usage.

I am extremely fortunate to have

Previously, I got my M.Eng. and B.Eng. from Shanghai Jiao Tong University, working with Xiaolin Huang, when I also got support from Cihang Xie, Yanzhi Wang, Kun Zhang, and Haotian Tang.

Feel free to connect me by email, but emails in non-English or with attachments tend to be mis-classified as spam by gmail. I accept approximation of my name’s pronunciation, but people’s creativity has expanded to the spelling, e.g., “Size”, “Shizhe”, “Sizche”, etc :).

Invited Talks

  • Securing LLMs Against Prompt Injection for Agentic Applications Image Image Image
    UC Berkeley: Berkeley NLP Group Seminar 2026
    UC Berkeley: Guest Lecture at Computer Security 2026
    Shanghai AI Lab: Xinghe Talk 2026
    Penn State University: Guest Lecture at Threats and Cybersecurity 2025
    UC San Diego: Earlence’s Lab Group Seminar 2025
    Cornell University (Cornell-Tech Campus): Guest Lecture at Trustworthy AI 2025
    Google DeepMind: Adversarial Machine Learning Seminar 2025
    Duke University: Guest Lecture at Generative AI: Foundations, Applications, and Safety 2025
    UC Berkeley: Security Seminar 2024
    Hong Kong Baptist University: TMLR Young Scientist Seminar 2024
    Shanghai Jiao Tong University: PAMI Group Seminar 2024
  • On the Learning Preference of Deep Neural Networks Image
    ICLR Oral Track 2023
    AI Time Youth Ph.D. Talk 2023
  • Subspace Adversarial Training Image
    CVPR Oral Track 2022
  • Adversarial Attacks and Defenses Image
    Northeastern University: Security Seminar 2022

Selected Publications

  • Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
    Sizhe Chen*, Arman Zharmagambetov, David Wagner, Chuan Guo*
    Image Image Image Image Image Image
    Meta-SecAlign-70B is the first fully open-source commercial-grade LLM with built-in prompt injection defense - comparable to gpt-5 and gemini-3-pro in agentic (tool/web) security. Our SoTA training recipe incurs no noticable drop on various utility scores based on the most comprehensive evaluations to date.
  • Defending Against Prompt Injection with DataFilter
    Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner
    Image Image Image Image
    DataFilter is a test-time model-agnostic defense that removes injected instructions from the data before it reaches the LLM.
  • SecAlign: Defending Against Prompt Injection with Preference Optimization
    Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo
    Image M Image Image Image Image
    SecAlign aims at a prompt-injection-robust LLM that prefers (and thus output) the secure response over the insecure one.
  • StruQ: Defending Against Prompt Injection with Structured Queries
    Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner
    Image Image Image Image Image Image
    StruQ is a general framework for prompt injection defense by separating the prompt (user instruction) and data into two channels.
  • Defending Against Prompt Injection with a Few DefensiveTokens
    Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner
    Image Image Image Image
  • One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks
    Shutong Wu*, Sizhe Chen*, Cihang Xie, Xiaolin Huang
    Image Image Image Image Image Image
  • Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet
    Sizhe Chen, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang
    Image Image Image Image
  • Subspace Adversarial Training
    Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang
    Image Image Image Image Image Image

Services

  • Reviewer: CCS 2024/2025/2026, SaTML 2025/2026, AsiaCCS 2027, NeurIPS 2023/2025, ICML 2024/2025, ICLR 2023/2024/2025/2026, CVPR 2023/2024/2025, ICCV 2023, ECCV 2022/2024, IEEE TPAMI, Machine Learning, Pattern Recognition
  • UC Berkeley EECS Student Reviewer: Faculty Hiring Committee 2024, Ph.D. Admission Committee 2024, Equal Access to Application Assistance 2024

Awards

  • Research Fundings: NVIDIA Fellowship 2026-2027 (10/1000+), Meta-BAIR Commons 2024-2026, Google-BAIR Commons 2024-2026, UC Berkeley EECS Departmental Fellowship 2023, NeurIPS 2022 and ICLR 2023 Travel Support
  • Degree Awards: SJTU Best Bachelor’s Thesis (1%) 2020, SJTU Outstanding Graduate 2022/2023
  • Scholarship: China National Scholarship (0.2%) 2021/2022, Kwang-Hua Scholarship 2019, Arawana Scholarship 2017

Misc

  • I practice neatness and minimalism. I am a typical ISTJ in MBTI.
  • I love to sing, attend concerts, photograph, hike, ski, play badminton and table tennis.
  • I write blogs (in Mandarin yet) about my thoughts and experience.
  • My Erdös number is 3 due to my collaboration with Chuan Guo.
Image