Can LLMs improve product recommendations with re-ranking? Fascinating new paper from Meta on applying LLMs to recommendation systems. The domain discussed in the paper is content re-ranking, but I don't see why this couldn't be applied to ads. Re-ranking takes a ranked list of candidate items (following retrieval and, sometimes, pre-ranking) and updates the ordering to better optimize some objective function (eg., purchase). The authors describe how they utilize an LLM to re-rank candidates with a number of novel innovations: - Instead of building the LLM vocabulary from item embeddings, which would likely be too large to be useful, they decompose each item embedding into a sequence of "tokens" produced from a K-stage quantization process (with RQ-VAE). This process accepts the item embedding at k=1, calculates the residual vector from the nearest of C learned centroids for that step (called "codebooks"), and passes that output to k=2, and so on to K. This produces a Semantic ID (SID) of length K. - Because re-ranking must be done quickly, it requires a smaller LLM (the paper uses 8B parameters). So in training, they prompt a large model (Qwen-32B) with the user's history, the candidate items produced in ranking, the SIDs, and instructions to reason through its process of re-ranking these items. That model produces a reasoning trace and a re-ranked list. The authors use rejection sampling to retain only the outputs (reasoning traces + rankings) for which the ground-truth item is ranked sufficiently high. The 8B student model is fine-tuned via SFT on that distribution, learning P( reasoning trace + ranking | prompt ). - Finally, the authors fine-tune this model with RL on the outcome, using the ground truth's location in the list as the reward. This aligns the model's policy with the reward, enabling more thorough comparison across candidates (versus reasoning collapse). The authors make the point that LLMs can introduce additional product context, scalability, and "world knowledge" to RecSys, turning ranking into a structured reasoning task rather than a pure scoring task. The paper is quite dense but worth reading in full; link below.
Explainable AI Tools
Explore top LinkedIn content from expert professionals.
-
-
Can your AI system explain why it rejected benefits claims? If not, it's not going into production. Multiple central government departments are now requiring new AI systems, especially those used in decision making like benefit fraud detection. This is being done to pass a formal AI Explainability gate before production rollout. This elevates technical risk and transparency from a compliance checklist to a crucial delivery step. The change responds to both the EU AI Act's ripple effects and local GenAI pilot failures that exposed the risks of deploying black box systems in public facing services. This is the governance maturity the sector needs. Building AI systems is straightforward. Explaining how they reach decisions is significantly harder. The timing matters. Too many pilots have failed not because the technology didn't work, but because nobody could justify the decisions it made. What this means for delivery: → Explainability must be designed in from day one → Technical teams need to document decision logic in plain language for non-technical colleagues → Procurement specifications must now include explainability requirements upfront The departments getting this right are treating explainability as a core architectural requirement, not a final compliance hurdle. This gate will slow some projects initially. But it prevents the far costlier problem of deploying AI systems that make decisions nobody can defend. How is your organisation building explainability into AI systems from the start? #AI #PublicSector #AIGovernance
-
Exciting Research Alert: LLM-powered Agents Transforming Recommender Systems! Just came across a fascinating survey paper on how Large Language Model (LLM)-powered agents are revolutionizing recommender systems. This comprehensive review by researchers from Tianjin University and Du Xiaoman Financial Technology identifies three key paradigms reshaping the field: 1. Recommender-oriented approaches - These leverage intelligent agents with enhanced planning, reasoning, and memory capabilities to generate strategic recommendations directly from user historical behaviors. 2. Interaction-oriented methods - Enabling natural language conversations and providing interpretable recommendations through human-like dialogues that explain the reasoning behind suggestions. 3. Simulation-oriented methods - Creating authentic replications of user behaviors through sophisticated simulation techniques that model realistic user responses to recommendations. The paper introduces a unified architectural framework with four essential modules: - Profile Module: Constructs dynamic user/item representations by analyzing behavioral patterns - Memory Module: Manages historical interactions and contextual information for more informed decisions - Planning Module: Designs multi-step action plans balancing immediate satisfaction with long-term engagement - Action Module: Transforms decisions into concrete recommendations through systematic execution What's particularly valuable is the comprehensive analysis of datasets (Amazon, MovieLens, Steam, etc.) and evaluation methodologies ranging from standard metrics like NDCG@K to custom indicators for conversational efficiency. The authors highlight promising future directions including architectural optimization, evaluation framework refinement, and security enhancement for recommender systems. This research demonstrates how LLM agents can understand complex user preferences, facilitate multi-turn conversations, and revolutionize user behavior simulation - addressing key limitations of traditional recommendation approaches.
-
AI ≠ GenAI. And in HR, that distinction isn’t academic. It’s how you manage risk. Over the past few weeks at various HR events, I’ve noticed a worrying trend. People keep using “AI” like it’s a single, new category. It’s not. This isn’t semantics. It’s about whether you’re using something safe, or something you can’t control. 👉 Traditional AI (like machine learning models) is structured, explainable, and transparent. 👉 GenAI is a content engine—trained on massive public datasets, with unpredictable outputs and no audit trail. And when it comes to decisions that affect people’s futures (hiring, promotion, redundancy), that difference is everything. Just look at the hallucination rates: 📉 GPT-3.5: ~40% 📉 GPT-4: ~29% 📉 Google Bard: 91% (Source: Stanford University & Princeton University) Even with retrieval-augmented GenAI tools, some well-known legal-tech platforms still hallucinate at 17% ‼️. That’s not a bug. That’s a systemic use-case flaw. You simply can’t use GenAI outputs to drive high-risk decisions like those we face daily in HR. At Clu, because we're compliance nerds, we took a different path. We built a small language model trained solely on first-party, verified skills data. No scraping. No black boxes. No risks we can’t explain. Just good old-fashioned AI. That means every insight we deliver is: ✅ Fully traceable ✅ Auditable ✅ Fit for regulatory environments And most importantly, it is safe to use in the one function that touches the most sensitive data in any company: your people. We take huge pride in telling our clients that, Whatever way legislation around AI goes in the coming months, they have no downstream risk or cost implication working with us. ➡️ If your current providers can't give you that assurance ➡️ If they can’t explain how their tech reaches a decision, or ➡️ If it’s powered by GenAI and you don’t know what that means You shouldn’t be using it. Full stop. It’s time to get a Clu. #HRTech #ResponsibleAI #SkillsIntelligence #WorkforcePlanning #FutureOfWork #AICompliance #GetAClu ID: The lovely Cayelan (mixed-race male with short brown hair) stands at a busy London street crossing holding a cardboard sign above his head that reads “AI ≠ GenAI”. He’s wearing a grey hoodie with a small blue logo and stands in front of a classic red double-decker bus as pedestrians walk by.
-
🚨 LLMs Could Describe Complex Internal Processes that Drive Their Decisions. Determinism plus interpretability: that is the real foundation of trustworthy AI. This new paper shows something remarkable: with the right fine-tuning, LLMs can accurately describe the internal weights and processes they use when making complex decisions. Not just outputs, but the actual quantitative preferences driving those outputs. Even more, this “self-interpretability” improves with training and generalizes beyond the tasks it was trained on. Why it matters: - It moves beyond black-box probing or neuron-level reverse engineering. - It suggests that models have privileged access to their own internal processes, and can be trained to report them. - It could open a new path for interpretability, control, and safety—complementing the determinism breakthroughs we saw with Thinking Machines. Caveats: - Explanations may still drift toward plausible narratives rather than ground truth. - The cost of fine-tuning and generalization limits need more evidence. - Self-reports remain a proxy, not direct transparency. Still, this is a step forward. Deterministic outputs are essential—but equally essential is knowing why a model chose what it did. Self-interpretability could be the missing bridge. You can read the full paper here: https://lnkd.in/dY94qq4H #AI #ArtificialIntelligence #GenerativeAI #LLM #LargeLanguageModels #MachineLearning #DeepLearning #AIinBanking #AIinFinance #FinTech #BankingInnovation
-
When AI Bites Back: Lessons from Anthropic’s Legal Misstep In a striking twist, Anthropic - The AI company championing safety and reliability - found itself entangled in a legal controversy involving its own AI model, Claude.The situation escalated when a court filing submitted by Anthropic's legal team included a citation generated by Claude that contained incorrect metadata, such as the wrong title and author names, despite linking to a valid source. Anthropic's attorney acknowledged the error, attributing it to Claude's formatting process and a missed manual review. The attorney emphasized that the mistake was "embarrassing and unintentional," not a deliberate fabrication. This incident underscores the challenges of integrating AI tools into legal workflows, especially when the tools themselves are under scrutiny. Although this is an AI fabrication problem, but also more of a human-in-the-loop problem - in this case, didn't verify the information generated by AI. Key Takeaways: ✔️ AI Hallucinations Are Real and Risky: Even advanced AI models like Claude can produce plausible but inaccurate information, known as "hallucinations." In legal contexts, such errors can have serious consequences. ✔️ Human Oversight Is Crucial: Relying solely on AI for tasks like citation formatting without thorough human review can lead to mistakes that undermine credibility. ✔️ Transparency Builds Trust: Openly acknowledging and correcting errors, as Anthropic did, is essential for maintaining trust in both legal proceedings and AI technologies. ✔️ Develop Robust Verification Processes: Implementing multiple levels of review can help catch AI-generated errors before they become public issues. ✔️ Understand AI's Limitations: Recognizing that AI tools have limitations and can make mistakes is vital for their effective and responsible use. This case serves as a cautionary tale for all sectors integrating AI into their operations. As the Arabic proverb goes, "He died by the poison he made." It's a reminder that the tools we create can have unintended consequences if not used wisely. #AI #LegalTech #Anthropic #Claude #ArtificialIntelligence #EthicsInAI #LegalInnovation
-
Managing explanations: how regulators can address #AI explainability. Financial institutions increasingly use complex AI models for their core businesses, but these can be difficult to explain. Limited model explainability makes managing model risks challenging. Global standard-setting bodies have issued – mostly high-level – model risk management (MRM) requirements. However, only a few national financial authorities have issued specific guidance, and they tend to focus on models used for regulatory purposes. Many of these existing guidelines may not have been developed with advanced AI models in mind and do not explicitly mention the concept of model explainability. Rather, the concept is implicit in the provisions relating to governance, model development, documentation, validation, deployment, monitoring and independent review. It would be challenging for complex AI models to comply with these provisions. The use of third-party AI models would exacerbate these challenges. As financial institutions expand their use of AI models to their critical business areas, it is imperative that financial authorities seek to foster sound MRM practices that are relevant in the context of AI. Ultimately, there may be a need to recognise trade-offs between explainability and model performance, so long as risks are properly assessed and effectively managed. Allowing the use of complex AI models with limited explainability but superior performance could enable financial institutions to better manage risks and enhance client experiences, provided adequate safeguards are introduced. Source - Bank for International Settlements – BIS For regulatory capital use cases, complex AI models may be restricted to certain risk categories and exposures or subject to output floors. Regulators must also invest in upskilling staff to evaluate AI models effectively, ensuring that financial institutions can harness AI's potential without compromising regulatory objectives.
-
China’s recent court ruling on AI “hallucinations” is quietly significant. In a 2025 Hangzhou Internet Court case, the first publicly reported civil decision on AI hallucination liability, a Chinese court declined to impose automatic responsibility on an AI provider for false outputs. Instead, it applied familiar tort principles: fault, causation, attribution, and demonstrable harm. No breach of duty, no attributable fault, and no proven loss meant no liability. Crucially, the court refused to treat the model’s statements as legally binding expressions of the provider’s will. Generative AI was framed as a probabilistic service, not an agent capable of intent, and not a defective product subject to strict liability. At the same time, providers remain subject to strict duties in relation to illegal and harmful content. This is not deregulation, but calibrated restraint. Ordinary inaccuracies are treated differently from governance failures. The ruling also suggests that “hallucination harms” are not uniform. Harmless errors, economic reliance losses, reputational damage, and safety-related harms raise different liability questions, and may justify different standards of care. Contrast this with Europe’s precautionary model under the AI Act and revised Product Liability Directive, which pushes towards heavier ex ante compliance. The proposed AI Liability Directive has been withdrawn, but its risk-shifting logic remains influential. The United States remains more fragmented, relying on existing doctrines and regulatory guidance, for now. Singapore focuses on governance frameworks and shared responsibility rather than punitive exposure. What is emerging is not a single global AI liability regime, but three broad approaches: China’s fault-based pragmatism, Europe’s precautionary regulation, and the US-Singapore model of adaptive governance. The real question is not whether AI will make mistakes. It will. The question is how societies choose to allocate responsibility when it does. As AI becomes embedded infrastructure, that choice will shape innovation and institutional trust alike.
-
What happens when a judge includes fake, AI-hallucinated cases in an actual opinion? Here’s an interesting AI hallucination issue that legal tech companies may soon need to confront. In Williams v. Capital One Bank, N.A., 2025 U.S. Dist. LEXIS 49256, U.S. District Judge Rudolph Contreras included AI-generated fake cases in his opinion—not as legal authority, but as cautionary examples of what happens when litigants rely on unverified AI outputs. Here's the excerpt: "Courts have recently seen increasing reliance on artificial intelligence in legal proceedings, leading to the use of nonexistent citations in court documents. . . . For example, "Pettway v. American Savings & Loan Association, 197 F. Supp. 489 (N.D. Ala. 1961)" is not a case that exists. Id. at 4. While Williams v. Equifax Information Services, LLC is a case that exists, "560 F. Supp. 2d 903 (E.D. Va. 2008)" is the incorrect citation, . . ." Here’s the crux: By appearing in a published federal opinion, those hallucinated cases are now part of the official case opinion—and included in legal research databases (albeit not hyperlinked or independently accessible via traditional search) But if a lawyer uses Westlaw CoCounsel, Lexis Protege, or another AI-powered research assistant, is it possible those fake citations could be surfaced as legitimate due to their inclusion in the opinion? Lexis seems to have taken a first step by adding this disclaimer: “Notice: This decision contains references to invalid citations in the original text of the opinion. They are relevant to the decision and therefore have not been editorially corrected. Linking has been removed from those citations.” But when using RAG-powered legal research tools, how will they prevent these ghost citations from being misinterpreted as legitimate?