Daniel Kang·May 27Accelerating Analytical Joins on Unstructured DataSemantic joins over unstructured data have become essential to modern data analytics. Today, e-commerce business analysts track…
Daniel Kang·May 19SODIUM: From Open Web Data to Queryable DatabasesIn research workflows using public data, answering a single analytical question requires collecting and organizing data from many different…
Daniel Kang·Feb 24Launching the CVE-Bench Leaderboard: A Public Arena of AI for CybersecurityLast year, we introduced CVE-Bench, a rigorous benchmark with real-world web vulnerabilities to evaluate the cyberoffensive capabilities of…
Daniel Kang·Dec 16, 2025Claude 4.5 Opus Solves CORE-Bench — But Not REPRO-BenchIn our ACL 2025 paper, we introduced REPRO-Bench (GitHub), a benchmark designed to evaluate whether AI agents can accurately assess the…
Daniel Kang·Nov 10, 2025SafeSearch: Teaching LLM Search Agents to Be Both Smart and SafeLLMs are rapidly expanding their built-in knowledge from training. However, they still suffer from hallucinations and lack access to…
Daniel Kang·Nov 5, 2025When Your Home Robot Turns Against You: BEATing Vision-Language Agents with Visual BackdoorsHousehold humanoid robots promise to assist everyone in daily life, with several exciting demos released recently (NEO, Figure 03, Tesla…
Daniel Kang·Nov 3, 2025DRAMA: Enabling AI Agents to Collect Data to Support Data Science WorkflowsData science workflows generally include two major phases: data retrieval and data analysis. In practice, analysts (especially in the…
Daniel Kang·Oct 30, 2025CVE-Bench v2.0: Making Evaluation More Rigorous with ABCThis is the third post in the Agentic Benchmark Checklist (ABC) blog series. Written by Yuxuan Zhu, Antony Kellermann, and Daniel Kang.
Daniel Kang·Oct 5, 2025No, RL does not get “1 bit of information” per rolloutDwarkesh is one of the biggest podcasters in the AI space. He’s recently (and repeatedly) made the claim that reinforcement learning gives…
Daniel Kang·Aug 11, 2025Human Data is (Probably) More Expensive Than Compute for Training Frontier LLMsThis blog post is written by Yuxuan Zhu and Daniel Kang