<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ypdu.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ypdu.github.io/" rel="alternate" type="text/html" /><updated>2026-04-06T16:28:57-07:00</updated><id>https://ypdu.github.io/feed.xml</id><title type="html">Yupei Du</title><subtitle>PhD student working on NLP and ML - Attribution, Memorization, and Language Models</subtitle><author><name>Yupei Du</name></author><entry><title type="html">Welcome to My Blog!</title><link href="https://ypdu.github.io/posts/2025/01/welcome-to-my-blog/" rel="alternate" type="text/html" title="Welcome to My Blog!" /><published>2025-01-15T00:00:00-08:00</published><updated>2025-01-15T00:00:00-08:00</updated><id>https://ypdu.github.io/posts/2025/01/welcome-to-my-blog</id><content type="html" xml:base="https://ypdu.github.io/posts/2025/01/welcome-to-my-blog/"><![CDATA[<p>Welcome to my new blog! I’ve finally decided to start writing about my research journey, thoughts on NLP and machine learning, and various technical topics that interest me.</p>

<h2 id="what-to-expect">What to Expect</h2>

<p>You can expect posts about:</p>

<ul>
  <li><strong>Research insights</strong>: Sharing lessons learned from my work on attribution, memorization, and language models</li>
  <li><strong>Technical tutorials</strong>: How-to guides on various NLP and ML techniques</li>
  <li><strong>Paper reviews</strong>: My thoughts on interesting papers in the field</li>
  <li><strong>Academic life</strong>: Tips and experiences from graduate school and research</li>
</ul>

<h2 id="why-start-a-blog">Why Start a Blog?</h2>

<p>As a PhD student working on NLP and ML, I often come across interesting ideas, debugging stories, and “aha!” moments that I think could be valuable to share. This blog will serve as both a personal record and hopefully a resource for others in the field.</p>

<p>I’m particularly excited to write about:</p>
<ul>
  <li>Attribution methods for language models</li>
  <li>The fascinating relationship between memorization and generalization</li>
  <li>Practical tips for ML research and experimentation</li>
</ul>

<h2 id="stay-tuned">Stay Tuned</h2>

<p>I’ll be posting regularly about my research adventures. Feel free to reach out if you have any questions or topics you’d like me to cover!</p>

<p><img src="https://media.giphy.com/media/l0HlBO7eyXzSZkJri/giphy.gif" alt="Research GIF" /></p>

<p><em>Happy researching!</em></p>]]></content><author><name>Yupei Du</name></author><category term="welcome" /><category term="personal" /><category term="research" /><summary type="html"><![CDATA[Welcome to my new blog! I’ve finally decided to start writing about my research journey, thoughts on NLP and machine learning, and various technical topics that interest me.]]></summary></entry><entry><title type="html">Understanding Attribution in Language Models: A Research Overview</title><link href="https://ypdu.github.io/posts/2025/01/understanding-attribution/" rel="alternate" type="text/html" title="Understanding Attribution in Language Models: A Research Overview" /><published>2025-01-10T00:00:00-08:00</published><updated>2025-01-10T00:00:00-08:00</updated><id>https://ypdu.github.io/posts/2025/01/understanding-attribution</id><content type="html" xml:base="https://ypdu.github.io/posts/2025/01/understanding-attribution/"><![CDATA[<p>Attribution has become one of the most crucial topics in making language models more transparent and trustworthy. In this post, I’ll share some insights from my research on attribution methods and why they matter for building safe AI systems.</p>

<h2 id="what-is-attribution">What is Attribution?</h2>

<p>Attribution, in the context of language models, refers to the process of identifying which parts of the training data, model components, or input contributed most to a particular prediction. Think of it as asking: “Why did the model produce this specific output?”</p>

<h2 id="types-of-attribution">Types of Attribution</h2>

<p>There are several flavors of attribution that researchers work on:</p>

<h3 id="1-data-attribution">1. Data Attribution</h3>
<ul>
  <li><strong>Question</strong>: Which training examples influenced this prediction?</li>
  <li><strong>Methods</strong>: Influence functions, TracIn, gradient-based methods</li>
  <li><strong>Applications</strong>: Data selection, debugging, privacy</li>
</ul>

<h3 id="2-feature-attribution">2. Feature Attribution</h3>
<ul>
  <li><strong>Question</strong>: Which input tokens/features matter most?</li>
  <li><strong>Methods</strong>: Gradients, attention, LIME, SHAP</li>
  <li><strong>Applications</strong>: Model interpretation, bias detection</li>
</ul>

<h3 id="3-component-attribution">3. Component Attribution</h3>
<ul>
  <li><strong>Question</strong>: Which model parameters/layers are responsible?</li>
  <li><strong>Methods</strong>: Probing, circuit analysis, mechanistic interpretability</li>
  <li><strong>Applications</strong>: Model understanding, targeted editing</li>
</ul>

<h2 id="why-attribution-matters">Why Attribution Matters</h2>

<p>From my research experience, I’ve found that attribution is crucial for:</p>

<ol>
  <li><strong>Building Trust</strong>: Users need to understand why models make certain decisions</li>
  <li><strong>Debugging Models</strong>: Finding and fixing problematic behaviors</li>
  <li><strong>Data Quality</strong>: Identifying low-quality or biased training data</li>
  <li><strong>Regulatory Compliance</strong>: Many domains require explainable AI</li>
</ol>

<h2 id="challenges-in-attribution">Challenges in Attribution</h2>

<p>Working on attribution research has taught me that there are several fundamental challenges:</p>

<ul>
  <li><strong>Ground Truth</strong>: How do we know if our attributions are “correct”?</li>
  <li><strong>Scalability</strong>: Many methods don’t scale to modern large language models</li>
  <li><strong>Faithfulness</strong>: Do the attributions actually reflect the model’s reasoning?</li>
</ul>

<h2 id="current-research-directions">Current Research Directions</h2>

<p>The field is rapidly evolving with exciting developments in:</p>
<ul>
  <li><strong>Mechanistic Interpretability</strong>: Understanding the circuits within transformers</li>
  <li><strong>Efficient Attribution</strong>: Methods that work with billion-parameter models</li>
  <li><strong>Multimodal Attribution</strong>: Extending attribution to vision-language models</li>
</ul>

<h2 id="example-simple-gradient-based-attribution">Example: Simple Gradient-Based Attribution</h2>

<p>Here’s a quick example of how you might compute simple gradient-based attribution:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">torch</span>

<span class="k">def</span> <span class="nf">compute_input_attribution</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">input_ids</span><span class="p">,</span> <span class="n">target_token_id</span><span class="p">):</span>
    <span class="s">"""
    Compute gradient-based attribution for input tokens.
    """</span>
    <span class="c1"># Enable gradients for input embeddings
</span>    <span class="n">embeddings</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">get_input_embeddings</span><span class="p">()(</span><span class="n">input_ids</span><span class="p">)</span>
    <span class="n">embeddings</span><span class="p">.</span><span class="n">requires_grad_</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span>
    
    <span class="c1"># Forward pass
</span>    <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs_embeds</span><span class="o">=</span><span class="n">embeddings</span><span class="p">)</span>
    <span class="n">logits</span> <span class="o">=</span> <span class="n">outputs</span><span class="p">.</span><span class="n">logits</span>
    
    <span class="c1"># Get probability for target token
</span>    <span class="n">target_prob</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">logits</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)[</span><span class="n">target_token_id</span><span class="p">]</span>
    
    <span class="c1"># Backward pass
</span>    <span class="n">target_prob</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>
    
    <span class="c1"># Attribution is the gradient magnitude
</span>    <span class="n">attribution</span> <span class="o">=</span> <span class="n">embeddings</span><span class="p">.</span><span class="n">grad</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="n">attribution</span><span class="p">.</span><span class="n">detach</span><span class="p">()</span>
</code></pre></div></div>

<p>This is just scratching the surface, but it gives you an idea of how we can start understanding what drives model predictions.</p>

<h2 id="looking-forward">Looking Forward</h2>

<p>As language models become more powerful and ubiquitous, the need for robust attribution methods will only grow. I’m excited to continue working on making these models more interpretable and trustworthy.</p>

<p>What aspects of attribution are you most interested in? Feel free to reach out if you’d like to discuss any of these topics further!</p>

<hr />

<p><em>This post is based on insights from my ongoing research on attribution methods. Stay tuned for more technical deep-dives!</em></p>]]></content><author><name>Yupei Du</name></author><category term="attribution" /><category term="language-models" /><category term="interpretability" /><category term="research" /><summary type="html"><![CDATA[Attribution has become one of the most crucial topics in making language models more transparent and trustworthy. In this post, I’ll share some insights from my research on attribution methods and why they matter for building safe AI systems.]]></summary></entry></feed>