Ray Yang Academic Website

Ruixin (Ray) Yang 杨瑞欣

I am an MSCS student at Georgia Tech, working with Prof. Alan Ritter. I received my BSc degree in Computer Science and Statistics from University of British Columbia in the beautiful Vancouver, Canada.

My research focuses on improving foundation models and agentic systems to make them more trustworthy and adaptive, enabling safe and effective human-AI collaboration. Currently, I am interested in:

(1) auditing, evaluating, and post-training for reliable agentic AI, especially in high-stakes, specialized domains (e.g., scientific and medical applications) and long-horizon tasks (where verification or oversight is difficult), and in scenarios requiring multimodal interaction or multi-agent coordination.

This includes improving both epistemic reliability (e.g., confidence calibration, recognizing knowledge gap, handling ambiguity) and behavioral reliability (e.g., mitigating risks of privacy leakage, tool misuse, and alignment failures).

(2) building user-adaptive and collaborative AI systems that can align with diverse user goals, preferences, and values in real-world interaction.

This includes personalized and pluralistic alignment, as well as improving models' ability to proactively seek information, infer user goals and intents from interaction, and adapt their responses and actions responsibly.

During Summer 2025, I was a Research Engineer Intern at the Center for AI Safety. Previously, I was a research assistant at Dartmouth College where I had the chance to work with Dr. Ruibo Liu and Prof. Soroush Vosoughi on value alignment for LLMs.

Email / Github / Google Scholar / Linkedin / X

Research

	Do Vision-Language Models Respect Contextual Integrity in Location Disclosure? Ruixin Yang, Ethan Mendes, Arthur Wang, James Hays, Sauvik Das, Wei Xu, Alan Ritter ICLR 2026 OpenReview / arXiv / code & data A benchmark and set of analyses for evaluating whether vision-language models respect contextual integrity in location disclosure for image geolocation, revealing that violations of contextual norms may result in privacy harms, characterized by over-disclosure of sensitive locations, poor privacy-utility tradeoffs, and misalignment with human privacy expectations.
	Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang ICLR 2024 Workshop on Reliable and Responsible Foundation Models OpenReview / arXiv / code We propose Collaborative Calibration, a collaborative approach to elicit, calibrate, and rationalize prediction confidence of LLMs.
	Training Socially Aligned Language Models on Simulated Social Interactions Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Diyi Yang, Soroush Vosoughi ICLR 2024 OpenReview / arXiv / code & data Alignment training with data from multi-LLM simulated social interactions, as an efficient, effective, and stable alternative for RLHF.
	Visual Analytics for Generative Transformer Models Raymond Li, Ruixin Yang, Wen Xiao, Ahmed AbuRa'ed, Gabriel Murray, Giuseppe Carenini paper / arXiv / code & data In this work, we present a novel visual analytical framework to support the analysis of transformer-based generative models.
	Generalizing Morphological Inflection Systems to Unseen Lemmas Changbing Yang, Ruixin Yang, Garrett Nicolai, Miikka Silfverberg SIGMORPHON 2022 paper Competed for Shared Task 0: Generalization and Typologically Diverse Morphological Inflection and achieved the highest performance among all submission in both small and large training conditions.

Misc

I come from Nanjing, a beautiful and historical city that served as the capital of six ancient Chinese dynasties over the past two thousand years.

I like listening to Rock N' Roll, ranging from Progressive Rock to BritPop and Pop Rock.

I've also been known to (awkwardly) hoop, smash, and stroke. (Style borrowed here from Prof. Schmidt)

Credits to Jon Barron's website: source code.