Inspiration
Large language models have revolutionized a number of key industries but are limited in market penetration because of a few key drawbacks. First, these models, while knowledgable about general topics, do not adapt well to specific sectors without deliberate fine tuning. Furthermore, the tendency to hallucinate or output incorrect (yet convincing) information limits the scope of this technology. Finally, since many of these models are so large, it requires sending data to the cloud on servers, often of a third party. Our group, Crossentropy, was curious if there was a better way to get use out of this technology in a few key sectors, mainly insurance and finance.
In these industries, it is important for information from these models to be precise, quick, and actionable. Furthermore, these models often need to be run on premises, to address security concerns. We've come up with a way for these businesses to quickly and easily parse through unstructured corpuses of data.
What it does
Our platform offers two key user modes. First, users can search with natural language through any of their company documents. A reasoning AI agent will then perform the search, analyze results and continue searching until it finds the most relevant information. Second, the system proactively monitors saved topics, sending real-time alerts when new, relevant documents are added, empowering users with immediate, actionable insights.
Unlike traditional document searches, which often miss the context behind queries, our solution combines the strengths of embedding search and language model reasoning. By understanding the deeper semantic meaning of both the query and the data, our system delivers results that are far more accurate and intelligent than simply using an embedding model or reasoning model alone.
Additionally, our unique approach of fine tuning a reasoning model to use another AI embedding model (as opposed to simply prompt engineering), effectively gives this one AI agent the ability to do the work of multiple AI agents in one. This means more efficient use at inferencing time so the model can be run locally.
Use case for insurance: Imagine that Assurant is taking in hundreds of documents per week (could be emails, messages, insurance fillings, etc.) on each customer. Let’s say a major storm just happened in Florida, where customer-A lives. If an employee wants to know if this customer was impacted, they could use Crossentropy’s platform and import all of this customer’s documents. Instead of manually searching or trying to guess the correct keywords, they simply can search for related information like “storm damage claim.” Our platform will find all documents that are relevant such as insurance fillings or any conversations customer-A had with support, saving man hours, and allowing Assurant to address this customer’s needs faster.
Use case for finance: Capital One is mainly a consumer-facing bank. When trying to decide who to give loans or credit cards to, the bank must parse through thousands of documents and data generated for each customer. Using our platform, Capital One, can search for any information that might help to make that decision: “did this customer ever default on a loan?” Any relevant documents will be returned. In addition, risk is something that changes for each customer. As new documents are processed, Capital One can be notified if there is any relevant information to a topic they care about (these important topics can be set by the user as a tag).
How we built it
We fine-tuned Deepseek's R1 8B model for enhanced tool use. By using frontier-methods like GRPO instead of other methods SFT or DPO techniques, we were able to preserve the reasoning capabilities of our model. We curated a dataset that included search queries, ideal responses, and diverse tools that LLMs could use to train the model. From there, the model learned to use its reasoning abilities to understand search results and figure out the optimal next course of action to get to the final answer! The best part - training only took ~3 hours on a Google Colab A100 and just a few hundred high-quality examples before it was able to successfully answer our datasets questions at a 90% accuracy. If we used a technique like SFT, performing this training would also cause the model to no longer output thinking tokens related to the final answer and require many more examples.
The ability to train our reasoner allows us to take all the agents that you would traditionally need to chains of thought process, tool use and reasoning and combine them into our unified R1-based search agent. Off of this new foundation, we can further train our model to generate code, file claims or in this case perform searches. By transforming a corpus of data into a vector database and having the reasoning agent use an embedding model, our system creates iteratively better searches that retrieve the most accurate results for the user. After the user submits a query, the unified agent uses the embedding model to search the vector database and the results are returned and processed by the unified agent. These results are then evaluated and the process repeats until the search results match what the user was searching for.
This hybrid method stands apart from stand-alone embedding search or LLM use, which might either lack context (and sometimes the depth of search) or risk generating misleading outputs. It's also more reliable and compute-efficient than traditional agents because the unified agent tends to follow JSON outputs better, reason better, and requires loading only one model. Our iterative process ensures that the final output is both contextually rich and highly reliable (since the results are directly from the data inputted). Furthermore, this use of an agentic AI and embedding model allows it to scale with the vast amount of data in insurance and finance.
Challenges we ran into
Managing our time with the variety of hackathon events, networking opportunities, and sponsor engagements was our biggest challenge. This made us choose what to prioritize and forced us to work efficiently.
Accomplishments that we're proud of
This project took a very ambitious scope for a hackathon. It is unusual for a team to try to fine tune such a large model in such a short timeframe, which meant a big risk for us. Additionally, tool use is a very new area of focus for AI research, so being able to correctly implement it is was a huge milestone for us.
What we learned
We did not anticipate reasoning to work so well with embedding search!
What's next for Crossentropy
Some team members are already integrating these innovations into their startup, where this experience will drive the development of even better search models, further expanding the technology's impact in critical sectors.
Log in or sign up for Devpost to join the conversation.