Featured
Universal Transaction Categorization: How Plaid unified four ML systems into one
By Wen Yao, Melody Zhao, Kevin Supakkul, Christine Zhou, Han Yu, Nick Sundin, Ozgur Seckin, Raghu Chetlapalli
One model change. No product code touched. Accuracy up 13–23% across every downstream product simultaneously.
That’s the payoff of UXC (Universal Transaction Categorization), Plaid’s unified categorization system. Here’s how we built it, and why unification turned out to be as much an organizational win as a technical one.
The problem: Four systems doing the same job
Plaid processes data associated with hundreds of millions of financial transactions daily. A raw description like ctlpquality inn debit hold needs to become something meaningful (in this case, a vending machine purchase at Quality Inn) before it can power a consumer’s budgeting apps, credit decisions, or fraud detection.
To serve the distinct needs of different consumers and customers, Plaid developed four specialized categorization systems:
- Personal Finance Categories (PFC): 16 primary / 104 detailed categories, designed for budgeting and financial wellness apps.
- Credit Categories (CC): 25 primary / 95 detailed categories, tailored for credit underwriting.
- Income categories: 13 income-related categories supporting Credit’s Income Insights.
- V1 categories (legacy): 600+ categories powering legacy products, never retrained
Each maintained its own ML model, rule engine, labeling pipeline, and monitoring infrastructure. This made sense as each product evolved. But as Plaid’s transaction intelligence matured, we saw a bigger opportunity: improvements to the underlying categorization model that could benefit every product at once. That led us to UXC.
Designing the UXC taxonomy
A unified system requires a unified label space, but it also has to work for product teams that already have their own taxonomic languages. Our solution was the shim layer: a deterministic, product-owned mapping from UXC labels to each downstream taxonomy. This meant product teams could keep speaking their own language while the underlying model was upgraded without their involvement.
With that architecture in place, we established four criteria for every UXC category:
- Unambiguous definition. Each category must have a clear, singular meaning. For example, a Starbucks transaction is FOOD_AND_DRINK_COFFEE, not FOOD_AND_DRINK_FAST_FOOD or FOOD_AND_DRINK_RESTAURANT. The boundary is explicit, accompanied by a clear description and representative examples. This matters because ambiguous categories produce inconsistent labels, which corrupt training data.
- Maximum granularity. UXC operates at a finer level of detail than any downstream taxonomy, enabling lightweight many-to-one mappings. LOAN_DISBURSEMENTS preserves seven subcategories at the UXC level (LOAN_DISBURSEMENTS_AUTO, LOAN_DISBURSEMENTS_CASH_ADVANCES, LOAN_DISBURSEMENTS_EWA, LOAN_DISBURSEMENTS_MORTGAGE, LOAN_DISBURSEMENTS_PERSONAL, LOAN_DISBURSEMENTS_STUDENT, LOAN_DISBURSEMENTS_OTHER) and in the PFC taxonomy, these collapse into a single TRANSFER_IN_CASH_ADVANCES_AND_LOANS category.
- MECE (Mutually Exclusive, Collectively Exhaustive). Every transaction maps to exactly one detailed category. If no meaningful category applies, it falls into an explicit “Other” bucket.
- Backward compatible by design. Each downstream product implements a shim layer: a deterministic mapping from UXC labels to its own taxonomy. This is where product-specific logic lives.
The resulting UXC taxonomy contains ~130 detailed categories, sourced from the union of existing taxonomies where PFC and CC were already 80%+ overlapping.
Bootstrapping labels: AI annotation at scale
Building a new taxonomy creates an immediate cold-start problem: where do the training labels come from? With ~130 categories and transaction descriptions that are often cryptic (AMZN MKTP US*AB123, DIR DEP ACME CORP PAYR, POS DEBIT CHKFILA 333222121 NY NY), even expert human labelers struggle with edge cases.
We built an AI annotation pipeline to solve this. The system takes a transaction as input, including a normalized description, posted date, and amount, and assigns a UXC label through two stages.
First, an LLM scans the transaction description and extracts key descriptors such as merchant name, income source, payment type, and general location. For unfamiliar entities, the system performs targeted web searches to gather context. A cryptic description like VIDRINE AUTO PRT gets resolved to an auto parts store through search results, which then informs the categorization.
Second, a label assignment LLM receives the transaction metadata and any enriched context, along with the full UXC taxonomy definitions, and assigns the most appropriate UXC label.
We validated quality through iterative evaluation rounds against a human-labeled holdout set: running annotation, identifying disagreements, analyzing error patterns, and refining prompts until AI and human labels agreed more than 90% of the time.
With that quality bar met, we generated ~1 million labeled transactions. To avoid over-representing common merchants, we used embedding-based stratified sampling: embed a large transaction sample, cluster by semantic similarity, sample proportionally from each cluster, and supplement with high-volume transactions. This balanced head-of-distribution coverage with long-tail diversity.
Model architecture: From BERT to a domain-specific foundation model
With the taxonomy and training data in place, we took a deliberate two-phase approach to the ML model: ship a reliable V1 quickly, then invest in a more powerful V2.
UXC V1: A BERT-based classifier
The first model fine-tuned a BERT encoder on our AI-annotated training data for multi-class classification, following the same approach described in our earlier posts on transaction categorization. It takes transaction descriptions and metadata as input and outputs a probability distribution over UXC labels.
Get Wen Yao’s stories in your inbox
Join Medium for free to get updates from this writer.
V1 validated the core thesis: the unified taxonomy worked, downstream shims mapped cleanly, and accuracy already improved over the fragmented systems it replaced. It shipped to production within weeks.
UXC V2: Fine-tuning a transaction foundation model with CLERT
With the taxonomy validated, we turned to improving the model’s representations. The V1 BERT base encoder had no prior understanding of the cryptic, domain-specific language of bank transactions. UXC V2 replaces that generic encoder with CLERT (Contrastive Learning-enhanced Encoder Representations of Transactions), a domain-specific foundation model built by Plaid’s Data Foundations and AI team.
How CLERT works
A standard language model treats DIR DEP ACME CORP PAYR as a meaningless string of tokens, because it has never encountered truncated merchant names, abbreviations, and formatting conventions found in bank transaction descriptions. CLERT solves this by learning transaction-specific representations through contrastive learning before being fine-tuned on UXC labels:
- Transaction interpretation. An agentic interpretation pipeline translates raw transactions into plain-English explanations. For each transaction, the system generates two correct “positive” interpretations and one plausible but incorrect “hard negative”:
2. Contrastive pretraining. Using a Multilingual-E5-Large encoder as its backbone (chosen for its strong performance on semantic similarity benchmarks and cross-task generalization), CLERT is trained on ~1M transaction-interpretation pairs. The result is a model that maps cryptic transaction strings and their plain-English meanings into a shared embedding space, so that semantically similar transactions end up close together, regardless of how they’re formatted or abbreviated. This helps CLERT learn the language of financial transactions, not just their surface tokens. The figure below illustrates this with “positive” examples as correct interpretations and “hard negatives” as plausible but incorrect.
A qualitative view of the learned embedding space is shown in the figure below. Before contrastive learning, the E5 encoder keeps transaction descriptions (blue) and their interpretations (orange) in separate regions, with the orange points forming tight clusters far from the blue cloud. After contrastive learning, the tighter alignment confirms that the encoder has learned to recognize the semantic equivalence between a cryptic transaction string and its plain-English explanation.
3. Fine-tuning for UXC. A single linear classification layer is attached to the pretrained CLERT encoder and fine-tuned on the labeled UXC transactions as mentioned in the previous section. The pretrained representations give the model a massive head start on understanding transaction semantics.
Why a foundation model matters
CLERT’s pretrained representations unlock three practical benefits:
- Data-efficient fine-tuning. Because CLERT already understands the language of transactions, downstream tasks require a fraction of the labeled data that a generic encoder would need to reach the same performance.
- Fast adaptation to new domains. The same data efficiency makes expansion to new markets practical. Adapting to a different country’s transaction formats only requires a small set of local examples because the model already understands the structure of financial transactions; it just needs to learn local merchants and conventions.
- A foundation for multiple tasks. Categorization is just one application. The same pretrained CLERT encoder can be fine-tuned for entirely different tasks like merchant name extraction with minimal additional data. Invest once in pretraining, then adapt cheaply to many downstream problems.
Results
UXC V2 delivers up to 13% higher accuracy on primary categories and 23% higher accuracy on detailed subcategories, with F1 gains of 10–30% on key categories like credit card payments, wages, and loan payments.
The V1-to-V2 upgrade required no changes to shim layers or downstream products. The model swap was fully contained within UXC: one fix, one deployment, universal impact.
Serving a 560M-parameter model at Plaid’s scale required care on the infrastructure side. FP16 quantization via ONNX Runtime produced no accuracy degradation versus FP32 and halved our GPU hosting footprint, keeping cost and latency within budget despite a ~5x parameter increase over v1.
The real payoff: Shared vocabulary and faster iteration
The most immediate benefit of unification is the ability to improve the underlying model without requiring changes from downstream product teams. We proved this with the V1 → V2 upgrade: CLERT replaced the generic BERT encoder, every downstream taxonomy saw accuracy improvements, and no product team changed a line of code. One model upgrade, one deployment, universal impact.
But the technical gains were only part of the story. Aligning taxonomies across PFC, Credit, and Income forced teams to debate definitions, reconcile edge cases, and agree on what transaction labels should actually mean. The result was more than a shared model architecture but also shared vocabulary.
What’s next
Merchant name parsing and normalization. We’re applying the same pretrained CLERT encoder to merchant name normalization, parsing and normalize merchant names from raw transaction descriptions. CLERT’s understanding of transaction structure means this requires significantly less labeled data than a standalone Named Entity Recognition model would.
Sequential foundation models. CLERT understands individual transactions. The next frontier is understanding sequences over time, like regular paychecks followed by rent payments, or rapid fund cycling that may indicate fraud. This behavioral understanding will power the next generation of Plaid’s risk and insights products.
Conclusion
UXC taught us that the hardest part of unifying systems isn’t the model — it’s getting teams to agree on what words mean. In the end, the organizational alignment turned out to be as durable as the technical architecture.
If you’re interested in working on problems like this, we’re hiring!
