What DocumentDB Means for Open Source
There are at least three reasons why the open source community is paying attention to DocumentDB. The first is that it combines the might of two popular databases: MongoDB (DocumentDB is essentially an open source version of MongoDB) and PostgreSQL. A PostgreSQL extension makes MongoDB’s document functionality available to Postgres; a gateway translates MongoDB’s API to PostgreSQL’s API.
Secondly, the schemaless document store is completely free and accessible through the MIT license. The database utilizes the core of Microsoft Azure Cosmos DB for MongoDB, which has been deployed in numerous production settings over the years. Microsoft donated DocumentDB to the Linux Foundation in August. A DocumentDB Kubernetes Operator, enabling the solution to run in the cloud, at the edge, or on premises, was announced at KubeCon + CloudNativeCon NA in November.
Thirdly, DocumentDB reinforces a number of vital use cases for generative models, intelligent agents and multiagent instances. These applications entail using the database for session history for agents, conversational history for chatbots and semantic caching for vector stores.
According to Karthik Ranganathan, CEO of Yugabyte, which is on the steering committee for the DocumentDB project, these and other employments of the document store immensely benefit from its schema-free implementations. “Mongo gives this core database functionality, what the engine can do,” Ranganathan said. “And then there’s these languages on top that give the developer the flexibility to model these things.”
Free From Schema Restrictions
The coupling of MongoDB’s technology with PostgreSQL’s is so noteworthy because it effectively combines the relational capabilities of the latter, which Ranganathan termed as “semi-schematic,” with the lack of schema concerns characterizing the former. The freedom to support the aforementioned agent-based and generative model use cases without schema limitations is imperative for maximizing the value of these applications. With DocumentDB, users can avail themselves of this advantage at the foundational database layer.
“As everything is going agentic, it’s important to give this capability in the places where you’d be building those applications, as opposed to having a separate way of doing it,” Ranganathan said. For example, if an engineer were constructing a user profile for an application, the lack of schema would only behoove him as he was able to implement multiple fields for a mobile number, office number, fax number and anything else he thought of while coding. “You don’t want a strict schema for that,” Ranganathan said. “You want to just build those fields on the fly.”
Multiagent Deployments
The lack of schema and general adaptability of the document format are particularly useful for situations in which agents are collaborating. For these applications, DocumentDB can function as a means of providing session history for the various actions and interactions taking place between agents and resources, and between agents with each other.
“It’s super, super important for any agent, or any sequence of operations that you work with an agent to accomplish, for the agent to remember what it did,” Ranganathan said. Each of the operations agents perform individually or collectively can be stored in DocumentDB to serve as the memory for agents.
Without such a framework, agents would be constantly restarting their tasks. According to German Eichberger, principal software engineering manager at Microsoft, DocumentDB’s viability for this use case extends beyond memory. “As things progress, we’ll have multiple agents working together on transactions,” Eichberger said. “And they will not agree on something, so they’ll have rollbacks. We feel that doing this in a document will be better because they can all work on the same document and when they are happy, commit it.” Such utility is not dissimilar to the way humans work in Google Docs.
Chatbots and Semantic Caching
There are multiple ways in which DocumentDB underpins other applications of generative models, including Retrieval-Augmented Generation (RAG), vector database deployments and chatbots. For these use cases, the document store can also supply a centralized form of memory for bots discoursing with employees or customers. That way, developers of these systems can avoid situations in which, “If you forget everything we just talked about and just answer the next question, it’s completely out of context and meaningless,” Ranganathan remarked.
DocumentDB can also provide a semantic caching layer that preserves the underlying meaning of jargon, pronouns and other facets of episodic memory so intelligent bots can quickly retrieve this information for timelier, more savvy responses. With DocumentDB, such semantic understanding and memory capabilities are baked into the primary resource engineers rely on — the database.
“The history of what we talked about, that becomes extremely important,” Ranganathan said. “There’s different ways to solve it, but it must be in the context of the developer ecosystem. So, rather than give one way to solve it and ask everyone to integrate it that way, just give the way the person expects to build the AI application.”
What Developers Expect
With DocumentDB, developers get the overall flexibility to build applications the way they’d like. The document store is available through PostgreSQL, which is highly extensible and supports an array of workloads, including those involving vector databases and other frameworks for implementing generative models.
Moreover, they’re not constrained by any schema limitations, which spurs creativity and a developer-centric means of building applications. Lastly, it provides a reliable mechanism for agents to collaborate with each other, retain the history of what actions were done to perform a task and come to a consensus before completing it.
The fact that DocumentDB is free, as well as at the behest of the open source community for these applications of intelligent agents and more, can potentially further the scope of these deployments. “With AI, the growth is going to be exponential, but you’re not going to get there in one hop,” Ranganathan said. “You’ll get there in a series of rapid iterations. The mathematical way to represent it, it’s like 1.1 to the power of 365. This is a 10% improvement every day, which is like 10 raised to the 17th power, a huge number.”
DocumentDB may not be solely responsible for such advancements in statistical AI, but it may have contributed to the day’s improvement in this technology.