Log inSign up
Luxi (Lucy) He
105 posts
user avatar
Luxi (Lucy) He
@LuxiHeLucy
Princeton CS PhD @PrincetonPLI. Previously @Harvard ‘23 CS & Math.
Princeton, NJ
lumos23.github.io
Joined October 2022
545
Following
1,298
Followers
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Jun 25, 2024
    Why do image-generation models often show Mario for "video-game plumber"? 🧐 How can we identify such keywords? How can we improve upon common mitigation strategies to protect copyrighted characters? See our CopyCat 🐱 paper to find out more! copycat-eval.github.io (1/8)
    Image
    73K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Apr 4, 2024
    Fine-tuning on benign data (e.g. Alpaca) can jailbreak models unexpectedly. We study this problem through a data-centric perspective and find that some seemingly benign data could be more harmful than explicitly malicious data! ⚠️🚨‼️ Paper: arxiv.org/pdf/2404.01099… [1/n]
    Image
    42K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Oct 5, 2024
    I'm attending @COLM_conf next week! Excited to meet folks and chat about alignment, safety, reasoning, LM evaluations, and more! Please feel free to reach out anytime :) @xiamengzhou and I will present our work on data selection + safety on Tuesday afternoon, come chat with us!
    Image
    16K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    May 6, 2024
    Excited to be attending #ICLR2024 this week! I will be giving an oral presentation of our work, which was designated Best Paper at the Data Problems for Foundation Models (DPFM) Workshop! Come say hi at ICLR- would love to chat about LLMs, alignment, safety, copyright, and more!
    user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Apr 4, 2024
    Fine-tuning on benign data (e.g. Alpaca) can jailbreak models unexpectedly. We study this problem through a data-centric perspective and find that some seemingly benign data could be more harmful than explicitly malicious data! ⚠️🚨‼️ Paper: arxiv.org/pdf/2404.01099… [1/n]
    Image
    18K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Jul 26, 2024
    [𝐒𝐩𝐨𝐭𝐥𝐢𝐠𝐡𝐭 @genlawcenter '24] Fantastic Copyrighted Beasts and How (Not) to Generate Them. We'll have a spotlight talk at the ICML 24 GenLaw Workshop, and please feel free to reach out and chat more!
    user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Jun 25, 2024
    Why do image-generation models often show Mario for "video-game plumber"? 🧐 How can we identify such keywords? How can we improve upon common mitigation strategies to protect copyrighted characters? See our CopyCat 🐱 paper to find out more! copycat-eval.github.io (1/8)
    Image
    12K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Oct 15, 2024
    Join us today at 3 pm ET for a discussion on AI safety and alignment with @DavidSKrueger 🤩 Submit your questions in advance at the link in the post!
    user avatar
    Princeton PLI
    @PrincetonPLI
    Oct 14, 2024
    PASS seminar tomorrow, 10/15 at 3pm ET! Speaker: @DavidSKrueger from @Cambridge_Uni Live: youtube.com/@PrincetonPLI/… Submit questions: tinyurl.com/pass-question Recordings later at: youtube.com/@PrincetonPLI
    Banner reads: Princeton AI Alignment and Safety Seminar. David Krueger, University of Cambridge. Tuesday, October 15, 3:00-4:00 pm (ET).
    5.5K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Nov 4, 2024
    Excited for the talk today at 2pm ET! YouTube link here youtube.com/@PrincetonPLI and submit your questions via forms.gle/7GQXAr9aonfvy1… 🤩
    user avatar
    Nathan Lambert
    @natolambert
    Nov 4, 2024
    Giving a talk in 4 hours for Princeton AI Alignment and Safety Seminar on new ways we're pushing the frontier of open recipes for fine-tuning. Lots of good details on recipes, datasets, and models we'll be releasing soon. We put a lot of effort into this one. Link for more info:
    Image
    Image
    youtube.com
    Princeton Language & Intelligence
    5K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Oct 30, 2024
    Tune in to our PASS Seminar with @natolambert next Monday! Submit your question via the link in thread :)
    user avatar
    Princeton PLI
    @PrincetonPLI
    Oct 30, 2024
    UPCOMING PASS SEMINAR, 11/4 at 2pm ET! Speaker: @natolambert from @allen_ai Live: youtube.com/@PrincetonPLI/… Recordings later at: youtube.com/@PrincetonPLI
    Banner promoting PASS seminar with Nathan Lambert from Allen Institute for AI. Talk title: Tulu 3: Exploring Frontiers in Open Language Model Post-Training. On Nov 4, 2 pm ET. Livestream will be available on the PLI YouTube.
    3.2K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Nov 19, 2024
    Happening today! You can submit your questions for Gillian here:
    user avatar
    Princeton PLI
    @PrincetonPLI
    Nov 13, 2024
    UPCOMING PASS SEMINAR, 11/19 at 1pm ET! Speaker: @ghadfield from @UofT Live: youtube.com/@PrincetonPLI/… Recordings later at: youtube.com/@PrincetonPLI
    Banner promoting Princeton AI Alignment and Safety Seminar presenting Gillian Hadfield, talk title: Normativity and the AI Alignment Problem, talk on Nov 19 at 1 pm ET, Livestream on the PLi YouTube
    Image
    docs.google.com
    PASS Question Submission
    Submit your question for the speaker at Princeton AI Alignment and Safety Seminar (PASS)! We will moderate the questions and ask the speaker during the discussion period. Upcoming Talk: Apr 2 2025,...
    3.4K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Jun 27, 2024
    Wondering why your user experience with many MLLMs doesn't quite align with their high performance on existing benchmarks? 🤨 Our human-curated benchmark CharXiv shows flaws in MLLM chart-understanding, as well as gaps between open/ closed-source models!
    user avatar
    Zirui "Colin" Wang
    @zwcolin
    Jun 27, 2024
    🤨 Are Multimodal Large Language Models really as 𝐠𝐨𝐨𝐝 at 𝐜𝐡𝐚𝐫𝐭 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 as existing benchmarks such as ChartQA suggest? 🚫 Our ℂ𝕙𝕒𝕣𝕏𝕚𝕧 benchmark suggests NO! 🥇Humans achieve ✨𝟖𝟎+% correctness. 🥈Sonnet 3.5 outperforms GPT-4o by 10+ points,
    Image
    00:00
    1.6K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Dec 13, 2023
    I’m attending @NeurIPSConf 2023! I will be presenting our Spotlight Paper “Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions”. Excited to learn more about ML privacy, fairness, LLM safety, and more! #NeurIPS
    1.1K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Jun 25, 2024
    Replying to @LuxiHeLucy
    Preventing copyrighted characters (eg. Mario, Batman) generation is important for image & video generation models. We build CopyCat evaluation suite with diverse copyrighted characters and an evaluation pipeline measuring character detection and input consistency. (2/8)
    Image
    1.5K
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Apr 4, 2024
    Replying to @LuxiHeLucy
    Both approaches are effective in identifying such benign subsets that break safety. The gradient-based method is more consistent across datasets. Data selected using Llama-7b-chat as the base model also successfully attacks the Llama-13b-chat model. [4/N]
    Image
    535
  • user avatar
    Luxi (Lucy) He
    @LuxiHeLucy
    Apr 4, 2024
    Replying to @LuxiHeLucy
    Such seemingly benign but effectively harmful data further raise awareness of safety vulnerabilities when fine-tuning. This type of approaches could help identify optimal safety-utility data mixtures or provide mechanism for data-centric debugging for safety degradation. [6/N]
    1.2K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement