Luxi (Lucy) He (@LuxiHeLucy) / X

Luxi (Lucy) He

105 posts

Luxi (Lucy) He

@LuxiHeLucy

Princeton CS PhD @PrincetonPLI. Previously @Harvard ‘23 CS & Math.

Princeton, NJ

Joined October 2022

Luxi (Lucy) He
@LuxiHeLucy
Jun 25, 2024
Why do image-generation models often show Mario for "video-game plumber"? 🧐 How can we identify such keywords? How can we improve upon common mitigation strategies to protect copyrighted characters? See our CopyCat 🐱 paper to find out more! copycat-eval.github.io (1/8)
73K
Luxi (Lucy) He
@LuxiHeLucy
Apr 4, 2024
Fine-tuning on benign data (e.g. Alpaca) can jailbreak models unexpectedly. We study this problem through a data-centric perspective and find that some seemingly benign data could be more harmful than explicitly malicious data! ⚠️🚨‼️ Paper: arxiv.org/pdf/2404.01099… [1/n]
42K
Luxi (Lucy) He
@LuxiHeLucy
Oct 5, 2024
I'm attending @COLM_conf next week! Excited to meet folks and chat about alignment, safety, reasoning, LM evaluations, and more! Please feel free to reach out anytime :) @xiamengzhou and I will present our work on data selection + safety on Tuesday afternoon, come chat with us!
16K
Luxi (Lucy) He
@LuxiHeLucy
May 6, 2024
Excited to be attending #ICLR2024 this week! I will be giving an oral presentation of our work, which was designated Best Paper at the Data Problems for Foundation Models (DPFM) Workshop! Come say hi at ICLR- would love to chat about LLMs, alignment, safety, copyright, and more!
Luxi (Lucy) He
@LuxiHeLucy
Apr 4, 2024
Fine-tuning on benign data (e.g. Alpaca) can jailbreak models unexpectedly. We study this problem through a data-centric perspective and find that some seemingly benign data could be more harmful than explicitly malicious data! ⚠️🚨‼️ Paper: arxiv.org/pdf/2404.01099… [1/n]
18K
Luxi (Lucy) He
@LuxiHeLucy
Jul 26, 2024
[𝐒𝐩𝐨𝐭𝐥𝐢𝐠𝐡𝐭 @genlawcenter '24] Fantastic Copyrighted Beasts and How (Not) to Generate Them. We'll have a spotlight talk at the ICML 24 GenLaw Workshop, and please feel free to reach out and chat more!
Luxi (Lucy) He
@LuxiHeLucy
Jun 25, 2024
Why do image-generation models often show Mario for "video-game plumber"? 🧐 How can we identify such keywords? How can we improve upon common mitigation strategies to protect copyrighted characters? See our CopyCat 🐱 paper to find out more! copycat-eval.github.io (1/8)
12K
Luxi (Lucy) He
@LuxiHeLucy
Oct 15, 2024
Join us today at 3 pm ET for a discussion on AI safety and alignment with @DavidSKrueger 🤩 Submit your questions in advance at the link in the post!
Princeton PLI
@PrincetonPLI
Oct 14, 2024
PASS seminar tomorrow, 10/15 at 3pm ET! Speaker: @DavidSKrueger from @Cambridge_Uni Live: youtube.com/@PrincetonPLI/… Submit questions: tinyurl.com/pass-question Recordings later at: youtube.com/@PrincetonPLI
5.5K
Luxi (Lucy) He
@LuxiHeLucy
Nov 4, 2024
Excited for the talk today at 2pm ET! YouTube link here youtube.com/@PrincetonPLI and submit your questions via forms.gle/7GQXAr9aonfvy1… 🤩
Nathan Lambert
@natolambert
Nov 4, 2024
Giving a talk in 4 hours for Princeton AI Alignment and Safety Seminar on new ways we're pushing the frontier of open recipes for fine-tuning. Lots of good details on recipes, datasets, and models we'll be releasing soon. We put a lot of effort into this one. Link for more info:
youtube.com
Princeton Language & Intelligence
5K
Luxi (Lucy) He
@LuxiHeLucy
Oct 30, 2024
Tune in to our PASS Seminar with @natolambert next Monday! Submit your question via the link in thread :)
Princeton PLI
@PrincetonPLI
Oct 30, 2024
UPCOMING PASS SEMINAR, 11/4 at 2pm ET! Speaker: @natolambert from @allen_ai Live: youtube.com/@PrincetonPLI/… Recordings later at: youtube.com/@PrincetonPLI
3.2K
Luxi (Lucy) He
@LuxiHeLucy
Nov 19, 2024
Happening today! You can submit your questions for Gillian here:
Princeton PLI
@PrincetonPLI
Nov 13, 2024
UPCOMING PASS SEMINAR, 11/19 at 1pm ET! Speaker: @ghadfield from @UofT Live: youtube.com/@PrincetonPLI/… Recordings later at: youtube.com/@PrincetonPLI
docs.google.com
PASS Question Submission
Submit your question for the speaker at Princeton AI Alignment and Safety Seminar (PASS)! We will moderate the questions and ask the speaker during the discussion period. Upcoming Talk: Apr 2 2025,...
3.4K
Luxi (Lucy) He
@LuxiHeLucy
Jun 27, 2024
Wondering why your user experience with many MLLMs doesn't quite align with their high performance on existing benchmarks? 🤨 Our human-curated benchmark CharXiv shows flaws in MLLM chart-understanding, as well as gaps between open/ closed-source models!
Zirui "Colin" Wang
@zwcolin
Jun 27, 2024
🤨 Are Multimodal Large Language Models really as 𝐠𝐨𝐨𝐝 at 𝐜𝐡𝐚𝐫𝐭 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 as existing benchmarks such as ChartQA suggest? 🚫 Our ℂ𝕙𝕒𝕣𝕏𝕚𝕧 benchmark suggests NO! 🥇Humans achieve ✨𝟖𝟎+% correctness. 🥈Sonnet 3.5 outperforms GPT-4o by 10+ points,
00:00
1.6K
Luxi (Lucy) He
@LuxiHeLucy
Dec 13, 2023
I’m attending @NeurIPSConf 2023! I will be presenting our Spotlight Paper “Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions”. Excited to learn more about ML privacy, fairness, LLM safety, and more! #NeurIPS
1.1K
Luxi (Lucy) He
@LuxiHeLucy
Jun 25, 2024
Replying to @LuxiHeLucy
Preventing copyrighted characters (eg. Mario, Batman) generation is important for image & video generation models. We build CopyCat evaluation suite with diverse copyrighted characters and an evaluation pipeline measuring character detection and input consistency. (2/8)
1.5K
Luxi (Lucy) He
@LuxiHeLucy
Apr 4, 2024
Replying to @LuxiHeLucy
Both approaches are effective in identifying such benign subsets that break safety. The gradient-based method is more consistent across datasets. Data selected using Llama-7b-chat as the base model also successfully attacks the Llama-13b-chat model. [4/N]
535
Luxi (Lucy) He
@LuxiHeLucy
Apr 4, 2024
Replying to @LuxiHeLucy
Such seemingly benign but effectively harmful data further raise awareness of safety vulnerabilities when fine-tuning. This type of approaches could help identify optimal safety-utility data mixtures or provide mechanism for data-centric debugging for safety degradation. [6/N]
1.2K