Agentset’s cover photo
Agentset

Agentset

Software Development

Build Frontier RAG Apps

About us

The RAG platform built for scale: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.

Website
https://agentset.ai
Industry
Software Development
Company size
2-10 employees
Type
Privately Held
Founded
2025

Employees at Agentset

Updates

  • ZeroEntropy recently released zembed-1. We added it to our embedding leaderboard. Key takeaways: – Now #1 on our leaderboard – Wins 55–80% of head-to-head matchups across 16 models (OpenAI, Voyage, Cohere, etc.) – Strongest on general document retrieval and multilingual queries Impressive work by the ZeroEntropy (YC W25) team! Read the full breakdown in the post below.

    • No alternative text description for this image
  • We looked into how to detect hallucinations in RAG. We tested LLM judges, atomic claim verification, and encoder-based NLI. Even with correct retrieval, models can still produce confident but unsupported answers. Each approach trades off accuracy, latency, and cost. Encoder-based NLI turned out to be the most practical option for production, with some important caveats. Full write-up in the link below:

    • No alternative text description for this image
  • A lot of RAG issues we keep seeing (hallucinations, weak citations, unclear answers) often come down to prompt constraints. So we collected a set of 𝗥𝗔𝗚 𝗣𝗿𝗼𝗺𝗽𝘁 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲𝘀 that reflect common patterns used to work through these problems. Templates are copy-pasteable, with upvote / downvote so what works surfaces over time. You can also contribute prompts that have worked well for you. Link in the comments 👇

    • No alternative text description for this image
  • We curated a list of 𝗮𝘄𝗲𝘀𝗼𝗺𝗲 𝗿𝗲𝗿𝗮𝗻𝗸𝗲𝗿𝘀. It includes reranking models, libraries, benchmarks, and integrations (plus some more useful resources). When we started working on reranking - and honestly, still today - the information was scattered across docs, papers, and blog posts. To save others some time, we pulled what we found into one place. Check out the link for the list below 🔗 👇

    • No alternative text description for this image
  • We benchmarked Cohere’s new Rerank 4 (Pro + Fast) against v3.5 and our top rerankers. Key takeaways: – Pro jumped from lower half of the stack to #2 overall (right behind zerank-2) – It’s especially strong on business reports + finance Q&A – Pro stays under 1s but is ~2× slower than zerank-2 – Pro improved everywhere, Fast regressed on argumentation + web search Read full breakdown in the post below.

  • We plugged the newly dropped GPT-5.2 into our LLM RAG leaderboard next to GPT-5.1, Claude, Grok, Gemini, GLM, and the strongest open-source models. Here’s what stood out: • ~70% fewer tokens per answer • #1 on scientific claim verification • much more stable performance across workloads Full write-up and plots in the comments.

    • No alternative text description for this image

Similar pages