Sizhe Chen (陈思哲)

Biography

Hi! I am a CS Ph.D. candidate at UC Berkeley in Berkeley AI Research (BAIR), working with Prof. David Wagner. My research is supported by fundings from NVIDIA Fellowship, Meta FAIR, Google DeepMind, and UCB EECS.

I study AI security in real-world applications. Currently, I am defending against prompt injection attacks, the top-1 threat to AI agents. Prompt injection has caused actual harm on multiple AI systems from Google, OpenAI, Anthropic, Slack, etc. To open up broader usage of LLMs in agents, I develop principled, general, and practical prompt injection defenses. Our SoTA training recipe yields Meta-SecAlign LLMs (downloaded 10K times in 3 months), which have an order of magnitude less attack success rates against various prompt injections. Meta-SecAlign-70B enjoys the same commercial-grade utility after our defensive fine-tuning, and is ready for commercial usage.

I am extremely fortunate to have

Ph.D. Thesis Committee: David Wagner, Raluca Ada Popa, Sewon Min, and Eric Wallace
Mentors: Chuan Guo, Chawin Sitawari, Arman Zharmagambetov, and Nicholas Carlini
Mentees: Yibo Peng, Jaewon Chang, Akshay Anand, Yizhu Wang, Jing Qian, Shutong Wu, and Zhixing Ye.

Previously, I got my M.Eng. and B.Eng. from Shanghai Jiao Tong University, working with Xiaolin Huang, when I also got support from Cihang Xie, Yanzhi Wang, Kun Zhang, and Haotian Tang.

Feel free to connect me by email, but emails in non-English or with attachments tend to be mis-classified as spam by gmail. I accept approximation of my name’s pronunciation, but people’s creativity has expanded to the spelling, e.g., “Size”, “Shizhe”, “Sizche”, etc :).

Invited Talks

Securing LLMs Against Prompt Injection for Agentic Applications
UC Berkeley: Berkeley NLP Group Seminar 2026
UC Berkeley: Guest Lecture at Computer Security 2026
Shanghai AI Lab: Xinghe Talk 2026
Penn State University: Guest Lecture at Threats and Cybersecurity 2025
UC San Diego: Earlence’s Lab Group Seminar 2025
Cornell University (Cornell-Tech Campus): Guest Lecture at Trustworthy AI 2025
Google DeepMind: Adversarial Machine Learning Seminar 2025
Duke University: Guest Lecture at Generative AI: Foundations, Applications, and Safety 2025
UC Berkeley: Security Seminar 2024
Hong Kong Baptist University: TMLR Young Scientist Seminar 2024
Shanghai Jiao Tong University: PAMI Group Seminar 2024
On the Learning Preference of Deep Neural Networks
ICLR Oral Track 2023
AI Time Youth Ph.D. Talk 2023
Subspace Adversarial Training
CVPR Oral Track 2022
Adversarial Attacks and Defenses
Northeastern University: Security Seminar 2022

Selected Publications

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
Sizhe Chen*, Arman Zharmagambetov, David Wagner, Chuan Guo*

Meta-SecAlign-70B is the first fully open-source commercial-grade LLM with built-in prompt injection defense - comparable to gpt-5 and gemini-3-pro in agentic (tool/web) security. Our SoTA training recipe incurs no noticable drop on various utility scores based on the most comprehensive evaluations to date.
Defending Against Prompt Injection with DataFilter
Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner

DataFilter is a test-time model-agnostic defense that removes injected instructions from the data before it reaches the LLM.
SecAlign: Defending Against Prompt Injection with Preference Optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo

SecAlign aims at a prompt-injection-robust LLM that prefers (and thus output) the secure response over the insecure one.
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner

StruQ is a general framework for prompt injection defense by separating the prompt (user instruction) and data into two channels.
Defending Against Prompt Injection with a Few DefensiveTokens
Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner
One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks
Shutong Wu*, Sizhe Chen*, Cihang Xie, Xiaolin Huang
Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet
Sizhe Chen, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang
Subspace Adversarial Training
Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

Services

Reviewer: CCS 2024/2025/2026, SaTML 2025/2026, AsiaCCS 2027, NeurIPS 2023/2025, ICML 2024/2025, ICLR 2023/2024/2025/2026, CVPR 2023/2024/2025, ICCV 2023, ECCV 2022/2024, IEEE TPAMI, Machine Learning, Pattern Recognition
UC Berkeley EECS Student Reviewer: Faculty Hiring Committee 2024, Ph.D. Admission Committee 2024, Equal Access to Application Assistance 2024

Awards

Research Fundings: NVIDIA Fellowship 2026-2027 (10/1000+), Meta-BAIR Commons 2024-2026, Google-BAIR Commons 2024-2026, UC Berkeley EECS Departmental Fellowship 2023, NeurIPS 2022 and ICLR 2023 Travel Support
Degree Awards: SJTU Best Bachelor’s Thesis (1%) 2020, SJTU Outstanding Graduate 2022/2023
Scholarship: China National Scholarship (0.2%) 2021/2022, Kwang-Hua Scholarship 2019, Arawana Scholarship 2017

Misc

I practice neatness and minimalism. I am a typical ISTJ in MBTI.
I love to sing, attend concerts, photograph, hike, ski, play badminton and table tennis.
I write blogs (in Mandarin yet) about my thoughts and experience.
My Erdös number is 3 due to my collaboration with Chuan Guo.