DZone Spotlight

Monday, March 30 View All Articles »

The Self-Healing Endpoint: Why Automation Alone No Longer Cuts It

By Anton Lucanus

CORE

Most organizations have poured heavy capital into endpoint automation. That investment has yielded partial results at best. IT teams frequently find themselves trapped maintaining the very scripts designed to save them time. Recent data from the Automox 2026 State of Endpoint Management report reveals that only 6% of organizations consider themselves fully automated. Meanwhile, 57% operate as partially automated using custom workflows. This setup still depends too heavily on people stepping in and undermines the whole point of automation in the first place. That’s why the industry is moving toward autonomous endpoint management systems that can enforce policies, catch configuration drift, and fix issues on their own without someone having to manually kick things off. The Partial Automation Trap Current automation efforts fall short of enterprise requirements. Traditional endpoint tools fail to match the pace of hybrid work and escalating compliance demands. When environments change, hardcoded scripts break. When key staff resign, organizations lose the undocumented knowledge required to maintain those workflows. Rigid systems cannot adapt to novel conditions. Teams still rely heavily on scripts and manual work, with patching and visibility tools seen as the biggest automation wins. Data highlights this maturity plateau. While 50% of IT teams automate OS patching in some capacity, this targeted approach ignores visibility gaps across diverse platforms. The Automox report shows 57% of teams rely heavily on custom scripts for recurring tasks. These act as helpful stopgaps but struggle to scale. Another 37% execute manual procedures based on written documentation. Only 23% have fully automated their recurring software deployments, leaving the vast majority exposed. Partial automation is merely a temporary plateau. It reduces manual entry but proves insufficient for closing exposure windows across distributed IT infrastructures. The Trust Barrier to Scaling Automation Even when organizations recognize the necessity of scaling their capabilities, deep-seated hesitation stalls progress. The barrier is not a failure to understand the value. The issue is risk amplification. "It's one thing to be wrong. It's a whole other thing to be wrong at scale," notes Jason Kikta, Chief Technology Officer at Automox. "If I'm wrong on an individual computer, that's a problem. If I'm wrong on the entire network, I might get fired. If I'm wrong for a day on a backup, that's not good. If I'm wrong for three months, that might end the company. And so that's where people's fears take them." This fear is entirely rational. Automation applied across thousands of assets amplifies both operational benefits and potential errors. The Automox report quantifies these concerns regarding autonomous adoption. Data privacy and security implications worry 46% of IT leaders. The risk of incorrect or unauthorized system changes holds back another 44%. Decision-makers also cite limited trust in AI-driven recommendations (36%). One of the biggest operational challenges, according to them, is not being able to clearly see what automated systems are doing in real time (36%). Another issue is seen in having to rely on algorithmic decisions that often feel like a black box (34%). Organizations need to provide solutions to these issues. They must show their IT teams that automated changes will remain controlled, transparent, and not be allowed to run unchecked. Guardrails Enable Scale Organizations overcome adoption hesitation by implementing strict operational boundaries. Guardrails act as the primary enabler for scale — not an obstacle to speed. Industry best practices from Datto emphasize testing patches before deployment. Datto also recommends using phased rollouts and maintaining rollback capabilities. With these mechanisms, organizations can expand automation confidently because they know they can intervene, verify, and recover immediately. IT leaders demand these safeguards before ceding control. Automox’s data shows that requested protections include automatic rollback (43%), the ability to pause or override anytime (42%), role-based access controls and audit logs (42%), and approval workflows for critical assets (41%). Control over when agent updates apply is highly important to 74% of respondents. But another 46% expressed strong concern regarding unauthorized device actions. The operating philosophy shifts to a pragmatic baseline: trust but verify. Even when automation works perfectly, you check in on it. What Autonomous Endpoint Management Actually Delivers Autonomous endpoint management (AEM) represents the convergence of visibility, policy enforcement, and adaptive response. Rather than replacing human judgment, it removes technicians from repetitive decision loops where raw speed dictates security outcomes. AEM platforms deliver continuous monitoring, AI-assisted insight, and integrated operations workflows that translate telemetry into timely decisions. These systems monitor environments around the clock. A simple way to think about it is as a self-healing endpoint defense layer for your organization. The platform identifies vulnerabilities and pushes out the required fixes automatically so IT teams don’t have to manually trigger every response. Policy-driven automation doesn't sideline human oversight; it actually gives IT personnel the speed to make decisive moves. Automox asked teams which single task they would automate today. Patch installation led the pack at 39%, followed by automating rollbacks (21%) and managing approvals (20%). AEM delivers these exact capabilities seamlessly. The Automation Ceiling Is Real, Autonomy Breaks Through It Partial automation serves as a temporary stopping point rather than a permanent end state. Organizations stuck at the script-and-schedule level face the same exposure risks as those with zero automation in place. They simply manage a higher degree of infrastructure complexity. AEM represents the definitive next stage of maturity for IT operations. These policy-driven systems continuously maintain the desired security state across distributed assets without requiring constant human oversight, transforming reactive defense into sustainable operational resilience. More

When Kubernetes Says "All Green" But Your System Is Already Failing

By Shamsher Khan

CORE

It's not a theoretical scenario. The cluster health checks all come back "green." Node status shows Ready across the board. Your monitoring stack reports nominal CPU and memory utilization. And somewhere in a utilities namespace, a container has restarted 24,069 times over the past 68 days — every five minutes, quietly, without triggering a single critical alert. That number — 24,069 restarts — came from a real non-production cluster scan run last week, an open-source Kubernetes scanner that operates with read-only permissions — it can see the state of the cluster, but it cannot and did not change a single thing. The failures we found were entirely of the cluster's own making. The namespace it lived in showed green in every dashboard the team monitored. No alert had fired. No ticket had been created. The workload had essentially been broken for over two months, and the cluster's observability layer had communicated exactly nothing about it. This is not a tooling failure. It is an architectural characteristic of how Kubernetes surfaces health information — and understanding that characteristic is what separates reactive incident response from operational awareness. The Illusion of Cluster Health Kubernetes communicates health through a layered abstraction. At the top of that abstraction — the level most teams observe — are node status, pod phase, and deployment availability. These signals are accurate and fast. They answer one question well: Is the cluster currently able to run workloads? What they do not answer is whether the workloads running on it are actually functioning. A pod in CrashLoopBackOff is, from Kubernetes' perspective, operating normally. The controller is doing exactly what it was designed to do: restarting the failed container on an exponential backoff schedule. The pod exists. The namespace exists. The deployment reports its desired replica count. If your alerting threshold for restart counts is set to a reasonable number — say, 50 or 100 restarts — a workload that has been failing continuously for months will eventually coast past that threshold and simply become background noise. This is not an edge case. In the scan that produced the 24,069-restart finding, there were fourteen additional containers in CrashLoopBackOff state across multiple namespaces, with restart counts ranging from 817 to 23,990. All of them were in a non-production environment. All of them had been failing for between three and sixty-eight days. The cluster health summary: nominal. Why Control Plane Signals Lag Runtime Reality The control plane knows what state it has requested. It reconciles against that desired state continuously. What it cannot observe — by design — is whether the application inside a running container is doing what it is supposed to do. This creates a specific and predictable gap. Kubernetes will tell you a pod is Running. It will not tell you that the running pod is connected to a database that stopped accepting connections six hours ago. It will tell you that a container restarted 24,000 times. It will not tell you whether that matters to anyone, or whether the failure has been silently swallowing work since December. The second failure type from the same scan illustrates a different dimension of this gap: A networking component — unschedulable for four days. The control plane recorded the scheduling failure accurately. The cluster health dashboard showed the node pool as healthy because the nodes themselves were healthy. The pod simply could not land on any of them. Whether the existing running replica of this component was operating at reduced capacity, or whether the failure to schedule a replacement had any operational consequence, was not surfaced anywhere in the standard observability layer. (Diagram: Control Plane Signal Timeline — from failure event to alert visibility across CrashLoopBackOff, OOMKill, and Unschedulable scenarios) The OOMKill Signal You Almost Miss Among the fifteen critical findings in the scan was a single OOMKill event in a system namespace: Shell kube-system]/[security-monitoring-pod] └─ Status: OOMKilled | Restarts: 1 | Age: 10h └─ Container killed due to out of memory One restart. Ten hours old. Easy to overlook next to containers with five-digit restart counts. But the significance is different: this is a system-level component — a security monitoring agent — that was killed because it ran out of memory. One restart means it recovered. It also means there was a period, however brief, during which security event collection from those nodes was interrupted. In a compliance-sensitive environment, that gap matters. Not because the sky fell, but because the gap exists and is not logged anywhere that post-incident reviewers would typically look. The restart count is 1. The container is Running. The audit trail of what happened in those nodes during the gap is incomplete. This is precisely why OOMKill events deserve separate attention from CrashLoopBackOff events in incident analysis. The failure mode is different, the cause is different, and the window of exposure is bounded and often short, which makes it easy to dismiss and hard to account for later. The Resource Allocation Gap The resource picture from the same cluster adds a different dimension to the health illusion. The cluster reports 237 CPU cores and 1,877 GB of memory available. Requested allocation sits at 63% of CPU and 15% of memory. Plain Text Cluster Capacity: 237.1 CPU cores 1877.5 GB memory Total Requested: 149.6 CPU cores 293.7 GB memory (63.1%) (15.6%) The memory figure is the more interesting one. 15.6% of available memory is requested across the entire cluster, while multiple namespaces carry an OVER-PROV flag. The over-provisioned namespaces are not requesting too little — they are requesting CPU allocations that suggest the workloads were sized for a traffic profile that no longer exists, or never existed. The scheduler sees requests as the unit of resource accounting. A pod requesting 2.1 CPU cores holds 2.1 cores of schedulable capacity regardless of whether it is actually using 0.3. This matters during incidents specifically because resource headroom feels like a safety margin. A cluster at 63% CPU requested feels like it has room to absorb load spikes. But if the workloads consuming that 63% are predominantly over-provisioned, the actual utilization is substantially lower, and the resource accounting is misleading when you are trying to understand whether a performance problem is capacity-related or configuration-related. (Diagram: Requested vs Actual Resource Utilization — showing the gap between scheduled reservation and real consumption, and how that gap obscures diagnosis during load incidents) What This Breaks in Post-Incident Analysis The consequences of these observability gaps are most visible after incidents, not during them. When a post-incident review asks "how long was this broken?", the answer depends on what signals were recorded and when. A container that has restarted 24,069 times over 68 days was broken on a specific day. Identifying that day requires correlating restart count history, deployment event timestamps, and application logs — none of which are surfaced in standard cluster health views. The cluster remembers the current state. It does not easily tell you when the current state began. For teams using AI-assisted or automated remediation, this gap becomes a reliability problem. Automated systems that trigger on pod status or restart thresholds will respond to symptoms rather than causes. A restart count of 24,069 looks the same to an automation rule as a restart count of 50. The automation cannot distinguish between a container that has been in a known-broken state for months and one that just started failing. Acting on the high-restart pod without understanding its history risks masking a dependency failure, triggering unnecessary rollbacks, or creating the appearance of remediation without actually fixing anything. The deeper issue is causal history. Kubernetes convergence is stateless in a useful sense: the system drives toward the desired state without preserving a record of how it got there. That property is what makes Kubernetes resilient. It is also what makes it difficult to reconstruct a failure timeline after the fact. The cluster that auto-recovered from an OOMKill ten hours ago left no evidence trail that most teams would find without specifically looking for it. What Platform Teams Should Institutionalize The gap described here is not closeable by any single tool. It is a structural property of how cluster health is defined and communicated. But it is manageable if teams build the right habits around it. Restart count history needs a retention policy and a query pattern. A container at 24,069 restarts did not arrive there overnight. Most teams have the data in their metrics store — they simply do not have a standing query or alert that surfaces sustained CrashLoopBackOff conditions as distinct from transient ones. An alert that fires at 100 restarts and resolves when the pod recovers is different from a signal that tracks cumulative restart velocity over a 24-hour window. OOMKill events in system namespaces warrant dedicated alerting. A security agent being OOMKilled is not the same severity event as an application container being OOMKilled, but it is not ignorable. System namespace OOMKills should route to a different channel than application health alerts. Resource allocation audits should be treated as operational hygiene, not optimization exercises. The 63%/15% split between CPU and memory requests on this cluster is not a cost problem — it is a diagnostic problem. When requests do not reflect actual usage, resource-based reasoning during incidents becomes unreliable. Finally, the question "how long has this been broken?" should have a fast answer. If it takes more than five minutes to determine when a CrashLoopBackOff condition started, the observability tooling is not configured to support incident response effectively. That question should be answerable from a single dashboard panel or query without log archaeology. The Honest Question for Your Cluster Every cluster of meaningful age and complexity carries some version of what this scan revealed. The combination of sustained crash loops, scheduling failures, and request/utilization gaps is not unusual — it is the natural state of a cluster that has been operated without systematic health archaeology. The question worth asking of your own environment is not whether these conditions exist. They almost certainly do. The question is whether your current observability layer would surface them before they became incident preconditions — or whether you would find them the same way they were found here: by looking specifically and deliberately, rather than by being alerted. If the answer is the latter, that is where the work is — and it starts with picking a namespace and looking deliberately. The 24,069-restart container in your cluster is waiting to be found. The scan data in this article was collected from a real non-production Azure Kubernetes Service cluster. All namespace and resource names have been anonymized. Findings were produced using opscart-k8s-watcher, a read-only open-source Kubernetes scanner that observes cluster state without making changes. No cluster state was modified during the investigation. Connect: Blog: https://opscart.comGitHub: https://github.com/opscartLinkedIn: linkedin.com/in/shamsherkhan More

Trend Report

Generative AI

Generative AI has become a default feature expectation, pushing engineering teams to treat models like production dependencies that are governed, measured, and operated with the same rigor as any other critical system in the stack. Model behavior and quality have to be measurable, failures must be diagnosable, data access needs to be controlled, and costs have to stay within budget as usage inevitably climbs. Operationalizing AI capabilities responsibly, not just having access to powerful models, is the differentiator for organizations today.This report examines how organizations are integrating AI into real-world systems with capabilities like RAG and vector search patterns, agentic frameworks and workflows, multimodal models, and advanced automation. We also explore how teams manage context and data pipelines, enforce security and compliance practices, and design AI-aware architectures that can scale efficiently without turning into operational debt.

Refcard #388

Threat Modeling Core Practices

By Apostolos Giannakidis

CORE

Refcard #401

Getting Started With Agentic AI

By Lahiru Fernando

Taming the JVM Latency Monster

An Architect's Guide to 100GB+ Heaps in the Era of Agency In the "Chat Phase" of AI, we could afford a few seconds of lag while a model hallucinated a response. But as we transition into the Integration Renaissance — an era defined by autonomous agents that must Plan -> Execute -> Reflect — latency is no longer just a performance metric; it is a governance failure. When your autonomous agent mesh is responsible for settling a €5M intercompany invoice or triggering a supply chain move, a multi-second "Stop-the-World" (STW) garbage collection (GC) pause doesn't just slow down the application; it breaks the deterministic orchestration required for enterprise trust. For an integrator operating on modern Java virtual machines (JVMs), the challenge is clear: how do we manage mountains of data without the latency spikes that torpedo agentic workflows? The answer lies in the current triumvirate of advanced OpenJDK garbage collectors: G1, Shenandoah, and ZGC. The Stop-the-World Crisis: Why Throughput Isn't Enough Garbage collection is the process of automatically reclaiming memory, but as our heaps grow beyond 50 GB to handle AI inference pipelines and massive event streams, traditional collectors can cause devastating latency spikes. In high-stakes environments, the predictability of pause times is just as critical as raw throughput. To achieve sub-millisecond or single-digit millisecond pauses on terabyte-scale heaps, we have moved beyond the "one-size-fits-all" approach. 1. G1: The Balanced Heavyweight (The Reliable Default) The Garbage-First (G1) collector, introduced in Java 7, was designed to handle large heaps with more predictability than its predecessors. It is now the default for most Hotspot-based JVMs because it self-tunes remarkably well for both stable and dynamic workloads. Architectural Mechanics Region-based heap: Instead of a single monolithic space, G1 divides the heap into fixed-size regions (typically 1 MB to 32 MB). These regions are logically categorized into Young, Old, and Humongous regions (for objects exceeding 50% of the region size).Garbage-first priority: G1 identifies regions with the most reclaimable "garbage" and collects them first, using a cost-benefit analysis to meet user-defined pause-time goals (set via -XX:MaxGCPauseMillis).Incremental compaction: By compacting memory incrementally during "mixed collections," G1 reduces the memory fragmentation that leads to catastrophic Full GC events. Best for: Most enterprise applications that require a balance of good throughput and predictable, manageable pause times. 2. Shenandoah: The Ultra-Low Pause Specialist When single-digit millisecond latency is the non-negotiable requirement, Shenandoah is the surgical tool of choice. Its primary differentiator is that it performs heap compaction concurrently with your application threads, unlike traditional collectors that pause the application to move objects. Architectural Mechanics Forwarding pointers and barriers: Shenandoah uses "forwarding pointers" to redirect object references to their new memory locations while they are being moved. It relies on specialized read and write barriers to intercept memory access and ensure the application always sees the correct location of an object.Concurrent evacuation: Most GCs pause the world to "evacuate" live objects from a region being reclaimed. Shenandoah performs this evacuation while the application is still running, keeping pauses typically under 10 milliseconds regardless of heap size.No generational model: Traditionally, Shenandoah treated the heap as a single space without dividing it into young and old generations, which simplifies implementation and avoids generational GC complexities. Best for: Near-real-time systems where a 100ms pause is a "service down" event. 3. ZGC: Taming Terabytes at Hyperscale The Z Garbage Collector (ZGC) is the "deep iron" solution for the most massive IT estates. It is engineered to handle heaps up to 16 TB while maintaining pause times under 1 millisecond. Architectural Mechanics Pointer coloring: ZGC uses 64-bit object pointers to encode metadata directly into the pointer itself. This metadata includes the Marking State (tracking live objects), Relocation State (tracking moved objects), and Generational State (identifying object age in JDK 21+).ZPages: The heap is divided into memory regions called ZPages, which come in three sizes: small (2 MB) for regular objects, medium (32 MB) for larger allocations, and large (1 GB) for humongous objects. This allows ZGC to manage memory with extreme efficiency at scale.Load barriers: Every memory read is intercepted by a "load barrier" that checks the "colored pointer" to ensure the application interacts only with valid, up-to-date references.Generational ZGC (JDK 21+): The latest evolution partitions the heap into young and old generations, optimizing reclamation for short-lived objects and significantly improving overall throughput. Best For: Hyperscale applications and AI orchestration layers that require sub-millisecond latency on massive datasets. The Architect’s Decision Matrix CollectorMax Heap SupportTypical Pause GoalKey StrategyG164 GB+200ms - 500msRegion-based, incremental compaction.Shenandoah100 GB+< 10msConcurrent evacuation using forwarding pointers.ZGCUp to 16 TB< 1msPointer coloring and concurrent compaction. The "Agentic Strangler" Pattern and Memory Management As an integrator, I often advocate for the Agentic Strangler Fig strategy: wrapping legacy monoliths in AI agents using the Model Context Protocol rather than attempting a "Big Bang" rewrite. However, this "facade" approach creates a new performance bottleneck. If your "Agent Facade" is running on a JVM with untuned garbage collection, the latency of your modernization layer will exceed the latency of the legacy system it is trying to strangle. Using ZGC or Shenandoah in your integration layer ensures that your modern "facade" remains invisible to the user, providing the low-latency "Doing" engine required for the Integration Renaissance. Tuning for the Real World: The "Player-Coach" Playbook As someone who has resolved critical production outages for Global 50 logistics providers through JVM heap dump analysis and GC tuning, I can tell you: the default settings are rarely enough for mission-critical loads. Fix your heap size. Resizing a heap is a high-latency operation. Set your initial heap size (-Xms) equal to your maximum heap size (-Xmx) to ensure predictable allocation from the start.Monitor distributions, not averages. Averages are a lie. A "10ms average" can hide a 2-second spike that kills your API gateway. Track frequency histograms and maximum pause times to understand the true "tail latency" of your system.Use realistic workloads. Synthetic benchmarks are "security theater" for performance. Test your GC strategy under real-world application pressure, accounting for the messy, unoptimized event streams that characterize the Integration Renaissance.Hardware-rooted trust. In high-security environments, remember that identity is the perimeter. Ensure your GC strategy isn't creating side-channel vulnerabilities. Leverage Hardware Roots of Trust (like IBM z16) to ensure your memory-intensive AI agents are governed in a secure "Citadel." Conclusion We can no longer treat garbage collection as a "set-and-forget" background task. In the era of autonomous agents and the Integration Renaissance, your choice of GC defines the reliability of your entire digital workforce. Whether you are balancing throughput with G1, chasing ultra-low latency with Shenandoah, or scaling to the stars with ZGC, the goal is the same: move from systems that merely "Show Me" data to systems that can reliably "Do It For Me" across mission-critical enterprise systems.

By Theo Ezell

Stateful AI: Streaming Long-Term Agent Memory With Amazon Kinesis

As autonomous agents evolve from simple chatbots into complex workflow orchestrators, the “context window” has become the most significant bottleneck in AI engineering. While models like GPT-4o or Claude 3.5 Sonnet offer massive context windows, relying solely on short-term memory is computationally expensive and architecturally fragile. To build truly intelligent systems, we must decouple memory from the model, creating a persistent, streaming state layer. This article explores the architecture of streaming long-term memory (SLTM) using Amazon Kinesis. We will dive deep into how to transform transient agent interactions into a permanent, queryable knowledge base using real-time streaming, vector embeddings, and serverless processing. The Memory Challenge in Agentic Workflows Standard large language models (LLMs) are stateless. Every request is a clean slate. While large context windows (LCW) allow us to pass thousands of previous tokens, they suffer from two major flaws: Recall degradation: Often referred to as “Lost in the Middle,” LLMs tend to forget information buried in the center of a massive context window.Linear cost scaling: Costs scale linearly (or worse) with context length. Passing 100k tokens for a simple follow-up question is economically unfeasible at scale. Long-term memory solves this by using retrieval-augmented generation (RAG). However, traditional RAG is often “pull-based” or batch-processed. For an agent that needs to learn from its current conversation and apply those lessons immediately in the next step, we need a push-based, streaming architecture. Architecture Overview: The Streaming Memory Pipeline To implement streaming memory, we treat every agent interaction — input, output, and tool call — as a data event. These events are pushed to Amazon Kinesis, processed in real-time, and indexed into a vector database. System Interaction Flow The following sequence diagram illustrates how an agent interaction is captured and persisted without blocking the user response. Why Amazon Kinesis for Agent Memory? Amazon Kinesis Data Streams serves as the nervous system of this architecture. Unlike a standard message queue (like SQS), Kinesis allows for multiple consumers to read the same data stream, enabling us to build complex memory ecosystems where one consumer handles vector indexing, another handles audit logging, and a third performs real-time sentiment analysis. Kinesis vs. Traditional Approaches FeatureKinesis Data StreamsStandard SQSBatch Processing (S3+Glue)OrderingGuaranteed per Partition KeyBest Effort (except FIFO)Not applicableLatencySub-second (Real-time)MillisecondsMinutes to HoursPersistenceUp to 365 daysDeleted after consumptionPermanent (S3)ThroughputProvisioned/On-demand ShardsVirtually UnlimitedHigh throughput (Batch)ConcurrencyMultiple concurrent consumersSingle consumer per messageDistributed processing Deep Dive: Implementing the Producer The “Producer” is your Agent application (running on AWS Lambda, Fargate, or EC2). It must capture the raw interaction and a set of metadata (session ID, user ID, timestamp) to ensure the memory remains contextual. Partition Key Strategy In Kinesis, the partition key determines which shard a record is sent to. For agent memory, the SessionID or AgentID is the ideal partition key. This ensures that all interactions for a specific user session are processed in strict chronological order, which is vital when updating a state machine or a conversation summary. Python Implementation (Boto3) Here is how you push an interaction to the stream using Python: Python import json import boto3 from datetime import datetime kinesis_client = boto3.client('kinesis', region_name='us-east-1') def stream_agent_interaction(session_id, user_query, agent_response): # Prepare the payload payload = { 'session_id': session_id, 'timestamp': datetime.utcnow().isoformat(), 'interaction': { 'user': user_query, 'assistant': agent_response }, 'metadata': { 'version': '1.0', 'type': 'conversation_step' } } try: response = kinesis_client.put_record( StreamName='AgentMemoryStream', Data=json.dumps(payload), PartitionKey=session_id # Ensures ordering for this session ) return response['SequenceNumber'] except Exception as e: print(f"Error streaming to Kinesis: {e}") raise e The Memory Consumer: Transforming Data into Knowledge The consumer is where the “learning” happens. Simply storing raw text isn’t enough; we need to perform memory consolidation. This involves: Cleaning: Removing noise, sensitive PII, or redundant system prompts.Summarization: Condensing long dialogues into key facts.Embedding: Converting the summary into a high-dimensional vector. The Lambda Consumer Pattern Using AWS Lambda with Kinesis allows for seamless scaling. When the volume of agent interactions spikes, Kinesis increases the number of active shards (if in On-Demand mode), and Lambda scales its concurrent executions to match. Python import json import base64 import boto3 from opensearchpy import OpenSearch, RequestsHttpConnection # Clients bedrock = boto3.client('bedrock-runtime') def lambda_handler(event, context): for record in event['Records']: # Kinesis data is base64 encoded raw_data = base64.b64decode(record['kinesis']['data']) data = json.loads(raw_data) text_to_embed = f"User: {data['interaction']['user']} Assistant: {data['interaction']['assistant']}" # 1. Generate Embedding using Amazon Bedrock (Titan G1 - Text) body = json.dumps({"inputText": text_to_embed}) response = bedrock.invoke_model( body=body, modelId='amazon.titan-embed-text-v1', accept='application/json', contentType='application/json' ) embedding = json.loads(response.get('body').read())['embedding'] # 2. Store in OpenSearch Serverless (Vector Store) # (Logic to upsert into your vector index goes here) index_memory(data['session_id'], embedding, text_to_embed, data['timestamp']) return {'statusCode': 200, 'body': 'Successfully processed records.'} Managing Memory State: The Lifecycle Memory isn’t binary (present vs. absent). Effective agents use a tiered approach similar to human cognition: working memory, short-term memory, and long-term memory. Tiered Memory Logic Working memory: The current conversation turn (stored in-memory or in Redis).Short-term memory: The last 5-10 interactions, retrieved from a fast cache.Long-term memory: Semantic history retrieved from the Vector Database using Kinesis-driven updates. Advanced Concept: Real-Time Summarization Sharding A common issue with long-term memory is vector drift. Over thousands of interactions, the vector space becomes crowded, and retrieval accuracy drops (O(n) search time, though optimized by HNSW/ANN algorithms, still suffers from noise). To solve this, use a "Summarizer Consumer" on the same Kinesis stream. This consumer aggregates interactions within a window (e.g., every 50 messages) and creates a "Consolidated Memory" record. This reduces the number of vectors the agent must search through while preserving high-level context. Comparative Analysis: Memory Storage Strategies StrategyStorage EngineBest ForComplexityFlat Vector RAGOpenSearch ServerlessGeneral semantic searchLowGraph-Linked MemoryAmazon NeptuneRelationship and entity mappingHighTime-Decayed MemoryPinecone / Redis VLRecency-biased retrievalMediumHierarchical SummaryDynamoDB + S3Large-scale longitudinal historyMediumHybrid (Search + Graph)OpenSearch + NeptuneContext-aware, relational agentsVery High Handling Scale and Backpressure When building a streaming memory system, you must design for failures. Kinesis provides a robust platform, but you must handle your consumers gracefully. Dead letter queues (DLQ): If the Lambda consumer fails to embed a record (e.g., Bedrock API timeout), send the record to an SQS DLQ. This prevents the Kinesis shard from blocking.Batch size optimization: In your Lambda trigger, set a BatchSize. A batch size of 100 is often the sweet spot between latency and cost-efficiency.Checkpointing: Kinesis tracks which records have been processed. If your consumer crashes, it resumes from the last successful sequence number, ensuring no memory loss. Data Flow Logic: The Consolidation Algorithm How do we decide what is worth remembering? Not every "Hello" needs to be vectorized. We can implement a filtering logic in our Kinesis consumer. Performance and Scaling Considerations When calculating the performance of your memory system, focus on the Time-to-Consistency (TTC). This is the duration between an agent finishing a sentence and that knowledge being available for retrieval in the next turn. With Kinesis and Lambda, the TTC typically looks like this: Kinesis ingestion: 20-50msLambda trigger overhead: 10-100msBedrock embedding (Titan): 200-400msOpenSearch indexing: 50-150ms Total TTC: ~300ms to 700ms. Since human users typically take 1–2 seconds to read a response and type a follow-up, a TTC of sub-700ms is effectively "instant" for the next turn in the conversation. Complexity Metrics In terms of search complexity, vector retrieval typically operates at O(log n) using Hierarchical Navigable Small World (HNSW) graphs. By streaming data into these structures in real-time, we maintain high performance even as the memory grows to millions of records. Security and Privacy in Streaming Memory Streaming agent memory involves sensitive data. You must implement the following: Encryption at rest: Enable KMS encryption on the Kinesis stream and the OpenSearch index.Identity isolation: Use AWS IAM roles with the principle of least privilege. The agent should only have kinesis:PutRecord permissions, while the consumer has kinesis:GetRecords and bedrock:InvokeModel permissions.PII redaction: Integrate Amazon Comprehend into your Kinesis consumer to automatically mask Personally Identifiable Information before it reaches the long-term vector store. Conclusion Building a long-term memory system with Amazon Kinesis transforms your AI agents from simple stateless functions into intelligent entities with a persistent "life history." By decoupling memory from the LLM and treating it as a real-time data stream, you achieve a system that is scalable, cost-effective, and deeply contextual. This architecture isn't just about storage; it's about building a foundation for agents that can truly learn and adapt over time, providing a superior user experience and unlocking new use cases in enterprise automation. Further Reading and Resources Amazon Kinesis Data Streams Developer GuideBuilding Vector Search Applications on AWSAmazon Bedrock DocumentationDesign Patterns for LLM-Based AgentsScaling Laws for Neural Language Models

By Jubin Abhishek Soni

CORE

MCP vs Skills vs Agents With Scripts: Which One Should You Pick?

I have been writing and building in the AI space for a while now. From writing about MCP when Anthropic first announced it in late 2024 to publishing a three-part series on AI infrastructure for agents and LLMs on DZone, one question keeps coming up in comments, DMs, and community calls: What is the right tool for the job when building with AI? For a long time, the answer felt obvious. You pick an agent framework, write some Python, and ship it. But the ecosystem has moved fast. We now have MCP servers connecting AI to the real world, Skills encoding domain know-how as simple markdown files, and agent scripts that can orchestrate entire workflows end to end. The options are better than ever. The confusion around them is too. I have seen teams spend weeks building a full agent setup for something a 50-line SKILL.md would have solved in an afternoon. I have also seen people reach for Skills when their agent actually needed live data from a real system. And I have watched MCP get used where a plain API call would have been simpler and faster. The problem is not a lack of options. The problem is that most content out there treats MCP, Skills, and Agent scripts as competing choices. They are not. They are different layers of the same stack, and knowing when to use each one is what separates a good AI system from a messy one. In this article, I want to give you a clear, practical breakdown of all three. Not theory. Not slides. Just the kind of thinking you need to make the right call on your next build. Quick Definitions First MCP (Model Context Protocol) is an open protocol that lets AI models connect to external tools, APIs, and data sources through a common interface. Think of it as USB-C for AI. Plug anything in, and the model knows how to use it. Skills are reusable instruction files (usually markdown) that tell an AI agent how to handle a specific type of task. SKILL.md files, best practices, step-by-step guides. They are not code. They are context. Agents with Scripts are programs (Python, Node.js, Bash) where the AI drives a loop: think, act, observe, repeat. The script owns the execution from start to finish. Comparison of MCP, Skills, and Agent with script The Big Comparison DimensionMCPSkillsAgents with ScriptsPrimary purposeTool and API connectivityTask-specific guidanceEnd-to-end task executionWho defines the logicServer builderPrompt engineerDeveloper + LLMHow it runsProtocol-driven callsContext injectionAgent loopState managementStatelessStatelessCan hold stateSetup effortMedium (needs a server)Low (just a markdown file)High (code + infra)ReusabilityHighHighLow to MediumDebuggingNetwork and protocol levelPrompt inspectionCode + LLM tracesBest forLive external dataConsistent output qualityComplex, multi-step work When to Use MCP Use MCP when your AI needs to talk to something outside itself: a database, an API, a file system, a calendar, a CRM. When Anthropic announced MCP in December 2024, the pitch was simple: instead of every team writing custom connectors for every data source, you build one server that speaks a common protocol, and any model can plug into it. That framing still holds. Good fit when: You need live data that is not in the model's training or contextMultiple AI clients need to hit the same tool in the same wayYou want access control at the protocol levelYou are building a platform where tools need to be easy to swap out Real example: You are building a support agent that pulls order status from Shopify and creates tickets in Jira. Set both up as MCP servers. Your agent calls the tool and gets back what it needs. It does not need to know anything about the underlying APIs. Pros: Common, vendor-neutral interfaceKeeps AI logic separate from API integration codeWorks across different models and clients Cons: You need a running MCP server, which means infra to manageAdds a bit of latency on every tool callToo much for simple or one-off integrations When to Use Skills Use Skills when the problem is not about getting data. It is about getting good output. Skills carry the know-how. They are like a senior teammate sitting next to the model, saying: "For this task, here is how we do it." Good fit when: You want repeatable, consistent results across many sessionsThe task has nuance that basic prompting does not catchTeams need outputs that all follow the same structure and toneYou want to document a process without writing any code Real example: Your team writes Word docs all the time: proposals, reports, SOWs. Without a Skill, every output looks different. Add a SKILL.md that defines the structure, tone, and formatting rules, and suddenly every doc comes out clean and consistent. Pros: No infra needed, just a markdown fileEasy to read, version, and updateWorks in the background without extra setup Cons: Cannot actually do things, it only guides how they are doneOnly as useful as the instructions inside itNo replacement for real tool access When to Use Agents With Scripts Use Agents with Scripts when the task has multiple steps and needs real decisions along the way. These are your power tools. I have written before about AI agent architectures on DZone, and the recurring theme is the same: the think-act-observe loop is powerful, but it needs structure, or it gets expensive and hard to debug fast. Scripts give you that structure. You control the flow. The LLM handles the reasoning inside each step. Good fit when: The workflow has if/else logic based on what happens at runtimeYou need to chain multiple tools in a specific orderThe job runs in the background without anyone watchingYou need retry logic, error handling, or a way to track progress Real example: A nightly agent that pulls your GitHub PRs, runs a quick check, posts a summary to Slack, updates a Notion tracker, and sends an alert email for anything critical. That is not a single tool call. It is a full workflow. Skills guide it. MCP servers feed it data. The agent script ties it all together. If you have built process monitoring scripts before, this loop will feel familiar. The same principles from Linux process monitoring apply here: watch what is running, handle failures, and log everything. Pros: Total flexibilityHandles complex if/else logic wellCan hold state, retry failed steps, and recover from errorsCan use Skills and MCP together as building blocks Cons: Most effort to set up and maintainLLM responses mid-loop can be unpredictableHarder to debug (you are tracking prompts, code, and external services at once)Token costs add up fast in long loops Pick your tool They Work Together, Not Against Each Other Here is the thing most people miss: MCP, Skills, and Agents are not competing options. They are layers in a stack. The layers in a stack A solid AI system uses all three together: MCP connects to tools and data sourcesSkills tell the model how to use them wellAgent scripts run the whole show Simple way to think about it: MCP is the hands, Skills are the know-how, and the Agent script is what decides what to do next. Effort vs. Power at a Glance SkillsMCPAgentsSetup effortLowMediumHighOutput controlMediumMediumHighInfra neededNoneServerFull stackAutonomyGuidedTool-drivenFully autonomousLearning curveEasyModerateSteep Conclusion If there is one thing I have learned from years of building and writing about AI systems, it is that complexity is easy to add and hard to remove. Every team I talk to wants to jump straight to agents. That makes sense. Agents feel like the real thing. But many of the problems they are trying to solve do not require an agent. They need better guidance baked into the model, or a clean interface to a tool. Start with Skills. They cost nothing, take an hour to write, and make your AI smarter right away. Then bring in MCP when your agent needs to reach outside itself and connect to real systems. Use Agent scripts when you have a genuine multi-step workflow that needs to run on its own and handle failures gracefully. This is not a new idea either. Look at how automation has evolved in infrastructure work. In my recent piece on how IaC evolved to power AI workloads, the pattern is identical: you start simple, layer in tooling as complexity grows, and resist the urge to over-engineer from day one. The same thinking applies here. If you want to go deeper on the tooling side, my complete guide to modern AI developer tools covers the broader ecosystem, and the AI infrastructure series on DZone goes into how all of these layers fit together at scale. The AI tooling space is moving fast, but the principles are stable. Pick the right layer for the job. Keep your stack simple until it needs to grow. And build things you can actually maintain six months from now.

By Vidyasagar (Sarath Chandra) Machupalli FBCS

CORE

Building Centralized Master Data Hub: Architecture, APIs, and Governance

Many enterprises operating with a large legacy application landscape struggle with fragmented master data. Core entities such as country, location, product, broker, or security are often duplicated across multiple application databases. Over time, this results in data inconsistencies, redundant implementations, and high maintenance costs. This article outlines Master Data Hub (MDH) architecture, inspired by real-world enterprise transformation programs, and explains how to centralize master data using canonical schemas, API-first access, and strong governance. Fragmented Master Data In a typical legacy environment, the applications manage the master data differently: Each application owns and maintains its own master tablesMaster definitions diverge over timeFunctional logic is reimplemented repeatedlyChanges require coordination across multiple systems The absence of a single source of truth increases operational risk and slows down innovation. Centralized Master Data Hub The proposed solution is to establish a centralized master data hub that acts as the single source of truth for enterprise-level master data. Below are the key principles for a centralized master data hub: Central ownership of master dataCanonical, version-controlled schemasAPI-only access to master dataStrong governance and auditability Below are the benefits of a centralized master data hub: Single, authoritative source of master dataReduced duplication and inconsistenciesLower maintenance and implementation costsImproved data quality and governanceMedium-risk migration strategy Below are the risks and challenges identified while building a centralized master data hub. These risks can be mitigated through caching, high availability design, and phased adoption. Synchronization between local and centralized mastersIncreased dependency on hub availabilityCareful change management requiredPerformance considerations for high-volume consumers Commonly, creating a scheme from fragmented data would be a challenge (migration from a multi-schema to a single-schema model). To address schema standardization challenges, the MDH is implemented as a centralized database platform with two logical tracks. Track 1: Canonical Schema Creation This track hosts enterprise-wide common masters, such as country, state, currency, branch, location, etc., with the characteristics of: Canonical, normalized schemaExposed via entity-level APIsLightweight master data UIBulk upload support These masters are designed for reuse across the enterprise. Track 2: As-Is Configurable Schemas This track supports application-specific masters that cannot be immediately standardized with the characteristics of: Per-application schemasMinimal or no schema changes initiallyExisting stored procedures can continueGradual migration to API-based access This dual‑track approach minimizes migration risk while supporting long‑term standardization. Master data migration plays a critical role in transitioning from fragmented data sources to a centralized master data hub. A well‑defined, structured migration process enables enterprise teams to establish the hub effectively and ensures seamless adoption by consuming applications. The migration flow outlines the high‑level activities involved in consolidating and standardizing master data. Once the MDH is established, enterprise‑wide access to master data becomes essential. APIs built on top of canonical master entities are consumed by applications across the organization. All consumers access master data exclusively through APIs, eliminating direct database access. This model decouples applications from underlying data structures and enables controlled, scalable evolution of master data. For any enterprise transformation initiative, strong governance is essential to manage master data effectively. The following governance use cases and processes help ensure consistency, control, and long‑term success of the MDH. Change in Data: Adding a new country/location in master data The process steps help the teams to adopt: Raise a service request or ticketReview and approval by the MDH teamUpdate via master data UIAutomatic synchronization to consuming applications Changes in Schema: Adding a new attribute to an existing master The process steps below help the teams to adopt: Change request initiationImpact analysis by MDH administratorsStakeholder review and approvalSchema, API, and UI enhancementsTesting (MDH and impacted systems)Deployment and closure Change in API: Modifying an existing master API, process mirrors schema changes, with additional focus on backward compatibility and consumer impact. Governance is managed by the two approaches: Conclusion Implementing a centralized master data hub based on canonical schemas, API‑driven access, and strong governance provides a scalable and maintainable approach to enterprise master data consistency. When paired with a pragmatic migration strategy, this model enables modernization without disrupting existing application ecosystems. The approach effectively balances standardization and flexibility, making it suitable for complex enterprise environments with multiple consuming applications

By Ravi Kiran Mallidi

CORE

Data-Driven API Testing in Java With REST Assured and TestNG: Part 4

APIs are at the heart of almost every application, and even small issues can have a big impact. Data-driven API testing with JSON files using REST Assured and TestNG makes it easier to validate multiple scenarios without rewriting the same tests again and again. By separating test logic from test data, we can build cleaner, flexible, and more scalable automation suites. In this article, we’ll walk through a practical, beginner-friendly approach to writing API automation tests with REST Assured and TestNG using JSON files as the data provider. Data-Driven API Testing With JSON files and TestNG’s @DataProvider The setup and configuration remain the same as discussed in the earlier tutorial. Additionally, the following dependency for the Google-gson library should be added to the pom.xml to handle the JSON files. XML <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.2</version> <scope>compile</scope> </dependency> For this demonstration, we will use the POST /addOrder API from the RESTful e-commerce demo application. The API schema is shown below for reference: JSON [ { "user_id": "string", "product_id": "string", "product_name": "string", "product_amount": 0, "qty": 0, "tax_amt": 0, "total_amt": 0 } ] The following are two approaches for handling the JSON file as a data provider: POJO-based (Object-Mapping) approachMap-based (Dynamic Parsing) approach POJO-Based (Object-Mapping) Approach In the POJO-based approach, JSON data is mapped directly to custom Java classes that represent the structure of the API request or response. Each field in the JSON corresponds to a variable in the POJO, making the data easy to read, access, and maintain. This approach is useful for stable APIs where the data format does not change frequently. Creating the POJO Class The following POJO class should be created to map the JSON file fields to the data provider: Java @Getter @Setter @AllArgsConstructor @ToString public class Order { private String user_id; private String product_id; private String product_name; private double product_amount; private int qty; private double tax_amt; private double total_amt; } The Order class maps each field of the JSON file to the request body of the POST /addOrder API. The annotations @Getter and @Setter provided by the Lombok library automatically generate getter and setter methods for all fields at compile time, helping in reducing boilerplate code. The @AllArgsConstructor annotation generates a constructor that accepts all class fields as parameters, making it easy to create a fully initialized order object. Each variable in the class corresponds to a field in the JSON data, such as user_id, product_id, product_name, product_amount, and so on. The JSON data can be automatically mapped to this class using the Google Gson library. The @ToString annotation automatically generates a toString() method. This is required so that the values provided in the Order object are printed correctly after test execution. Creating a Utility to Read JSON Files We need to create a utility method that reads and parses the JSON file, and finally returns the required data for testing. Java public class JsonReader { public static List<Order> getOrderData (String fileName) { InputStream inputStream = JsonReader.class.getClassLoader () .getResourceAsStream (fileName); if (inputStream == null) { throw new RuntimeException ("File not found: " + fileName); } try ( Reader reader = new InputStreamReader (inputStream)) { Type listType = new TypeToken<List<Order>> () { }.getType (); return new Gson ().fromJson (reader, listType); } catch (IOException e) { throw new RuntimeException ("File not found: " + fileName); } } Code Walkthrough The getOrderData() is a utility method that accepts the filename as a parameter. It searches for the specified file in the src\test\resources folder. If the file is not found, it throws a RunTimeException with the human-readable message “File not found.” The file is initially loaded as an InputStream and then converted into a Reader using try-with-resources to read the data. The try-with-resources ensures that the Reader is automatically closed after use. The Google Gson library needs type information to convert JSON into generic objects. It is done using the TypeToken class that tells Google Gson that the target type is List<Order>. Finally, the fromJson() method reads the JSON data from the file, converts it into a List<Order>, and returns it. Creating a DataProvider Method The following data provider method returns the test data from the JSON file as Iterator<Object[]>, which is further consumed by the test. Java @DataProvider (name = "orderData") public Iterator<Object[]> getOrderData () { List<Order> orderList = JsonReader.getOrderData ("orders_data.json"); List<Object[]> data = new ArrayList<> (); for (Order order : orderList) { data.add (new Object[] { order }); } return data.iterator (); } Code Walkthrough A TestNG @DataProvider named “orderData” is defined using this code that supplies test data to test methods. It reads a list of Order objects from the “orders_data.json” file using the JsonReader.getOrderData() method. Each Order is wrapped inside an Object[] and added to a list. Finally, it returns an Iterator<Object[]> so that each test execution receives one Order object at a time. JSON File With Test Data The following JSON file is used for testing the POST /addOrder API: JSON [ { "user_id": "1", "product_id": "1", "product_name": "iPhone", "product_amount": 500.00, "qty": 1, "tax_amt": 5.99, "total_amt": 505.99 }, { "user_id": "1", "product_id": "2", "product_name": "iPad", "product_amount": 699.00, "qty": 1, "tax_amt": 7.99, "total_amt": 706.99 }, { "user_id": "2", "product_id": "2", "product_name": "iPhone 15 PRO", "product_amount": 999.00, "qty": 2, "tax_amt": 9.99, "total_amt": 1088.99 }, { "user_id": "3", "product_id": "3", "product_name": "Samsung S24 Ultra", "product_amount": 4300.00, "qty": 1, "tax_amt": 5.99, "total_amt": 4305.99 } ] Writing the API Automation Test Let’s write the test for the POST /addOrder API that creates orders using the test data supplied from the JSON files using the data provider: Java @Test (dataProvider = "orderData") public void testCreateOrder (Order order) { List<Order> orderData = List.of (order); given ().contentType (ContentType.JSON) .when () .log () .all () .body (orderData) .post ("http://localhost:3004/addOrder") .then () .log () .all () .statusCode (201) .assertThat () .body ("message", equalTo ("Orders added successfully!")); } Code Walkthrough The testCreateOrder() method uses the orderData DataProvider to run the test repeatedly, using a different Order object from the JSON file each time. Before sending the POST request, each order is wrapped in a list with List.of(Order) because the POST /addOrder API expects a list of orders in the request body. The test then checks the response by ensuring the status code is 201 and that the success message “Orders added successfully” is returned. Test Execution When the test runs, TestNG automatically runs the testCreateOrder() method multiple times, each time using a different set of data pulled from the JSON file via the orderData DataProvider. Java Map-Based (Dynamic Parsing) Approach The POJO-based approach is good when the JSON is stable and well-defined. However, it requires continuous updates and maintenance whenever the JSON structure changes, which increases maintenance time and effort. This makes it less suitable for dynamic or frequently evolving JSON files, where even minor changes can break parsing and tests. In such situations, the Map-based approach comes in handy, where we do not need to maintain POJOs for the JSON. It can handle changing or unknown fields dynamically without requiring code changes. Creating the JSON Reader Utility With Java Map Let’s create a new utility method to parse the JSON files dynamically using a Java Map. Java public static List<Map<String, Object>> getOrderData (String fileName) { InputStream inputStream = JsonReader.class.getClassLoader () .getResourceAsStream (fileName); if (inputStream == null) { throw new RuntimeException ("File not found: " + fileName); } try ( Reader reader = new InputStreamReader (inputStream)) { Type listType = new TypeToken<List<Map<String, Object>>> () { }.getType (); return new Gson ().fromJson (reader, listType); } catch (IOException e) { throw new RuntimeException ("Error reading the file: " + fileName); } } Code Walkthrough The getOrderData() method reads a JSON file and converts it to a list of maps using the Google Gson library. Return type: It returns a List of Map<String, Object>, where: Each Map represents one JSON object.Keys are JSON field names.Values are their corresponding values. Loading the JSON file: The file is read from the src\test\resources folder and returns an InputStream. If the file is not found, the inputStream object will be null. In that case, the program throws a RuntimeException with the message “File not found.” Parsing the JSON file: A try-with-resources block is used to safely read the JSON file using a Java Reader, ensuring the stream is closed automatically. It defines the target type as List<Map<String, Object>> using TypeToken and then uses the fromJson() method of the Google Gson library to convert the JSON data into this dynamic structure. If any file-reading error occurs, it throws a runtime exception with a message “Error reading the file” with the file name. JSON File With Test Data The following JSON file is used for testing the POST /addOrder API: JSON [ { "user_id": "1", "product_id": "1", "product_name": "iPhone", "product_amount": 500.00, "qty": 1, "tax_amt": 5.99, "total_amt": 505.99 }, { "user_id": "1", "product_id": "2", "product_name": "iPad", "product_amount": 699.00, "qty": 1, "tax_amt": 7.99, "total_amt": 706.99 }, { "user_id": "2", "product_id": "2", "product_name": "iPhone 15 PRO", "product_amount": 999.00, "qty": 2, "tax_amt": 9.99, "total_amt": 1088.99 }, { "user_id": "3", "product_id": "3", "product_name": "Samsung S24 Ultra", "product_amount": 4300.00, "qty": 1, "tax_amt": 5.99, "total_amt": 4305.99 } ] Creating a DataProvider Method The following data provider method retrieves test data from the JSON file in Iterator <Object[]> format, which is then used by the test method for execution. Java @DataProvider (name = "orderData") public Iterator<Object[]> getOrderData () { List<Map<String, Object>> orderList = JsonReader.getOrderData ("orders_data.json"); List<Object[]> data = new ArrayList<> (); for (Map<String, Object> order : orderList) { data.add (new Object[] { order }); } return data.iterator (); } Code Walkthrough The getOrderData() DataProvider method reads order data from a JSON file and stores it as a List<Map<String, Object>>. It then converts each map into an Object[] and adds it to a list, which is returned as an iterator. This allows TestNG to run the test multiple times using a different set of order data supplied from the JSON file. Writing the API Automation Test Let’s write a test for the POST /addOrder API that creates orders using the test data from the JSON files through the data provider. Java @Test (dataProvider = "orderData") public void testCreateOrder (Map<String, Object> order) { List<Map<String, Object>> orderData = List.of (order); given ().contentType (ContentType.JSON) .when () .log () .all () .body (orderData) .post ("http://localhost:3004/addOrder") .then () .log () .all () .statusCode (201) .assertThat () .body ("message", equalTo ("Orders added successfully!")); } Code Walkthrough The Map<String, Object> order parameter to the testCreateOrder() method represents a single order read dynamically from the JSON file. It is wrapped inside a List<Map<String, Object>> as the API expects an array of orders in the request body, not just a single object. This approach allows the test to stay flexible and work with dynamic JSON data without relying on fixed POJO classes. The test then logs the request and response, verifies that the status code is 201, and verifies the response message confirming that the order was created successfully. Test Execution The following is a screenshot of the test executed using IntelliJ IDE. It shows that the same test was run multiple times using the test data from a JSON file. It can be noted that when we ran the tests using a POJO-based approach, the test data appeared with the POJO name: testCreateOrder[Order(userId…)}. However, using the Map-based dynamic approach, the test data appears directly with the field names as provided in the JSON file. Summary Data-driven API testing with JSON files, REST Assured, and TestNG allows running the same test multiple times using JSON files as input, making tests more reusable and comprehensive. When parsing JSON, POJO-based approaches provide type safety and clear structure but require frequent updates whenever the JSON changes, making them less flexible. In contrast, Map-based (dynamic) parsing is more flexible and low-maintenance, as it can handle unknown or changing fields without modifying code, though it offers less type safety. Choosing between them depends on the API’s stability: use POJOs for fixed structures and Maps for dynamic or evolving JSON data. Happy testing!

By Faisal Khatri

CORE

From Stream to Strategy: How TOON Enhances Real-Time Kafka Processing for AI

AI agents now increasingly require real-time stream data processing as the environment involving the decision-making is dynamic, fast-changing, and event-driven. Unlike batch processing, which is how traditional data warehouses and BI tools work, real-time streaming enables AI agents to analyze events as they happen, responding instantly to fraud, system anomalies, customer behavior shifts, or operational changes. In competitive and automated environments, a matter of seconds can make the difference between an accurate decision and one that is off by miles, a risk not many organizations are willing to take. Continuous data streams are also key to enabling AI agents to adjust and adapt to emerging patterns, observe trends in real time, and refine predictions on the fly rather than making decisions based on stale snapshots. As with other automation systems that rely on increasingly intelligent agents (usually AI/ML) over time, real-time stream processing ensures that AIs remain responsive and context-aware, enabling them to make timely, high-impact decisions. What and Why the TOON Data Format Is Better for AI Agents TOON stands for Token-Oriented Object Notation, a lightweight, line-oriented data format. It is too human-readable (more than binary formats), like JSON, but more compact and structured than raw text. TOON is built to be very simple to parse, where each line or “entry” begins with a token header (uppercase letters or digits), then uses pipe separators (|) for fields. Given the importance of streaming environments, it is optimized to be line-oriented, and we do not need to build a full in-memory parse tree (unlike JSON), which makes it suitable for low-memory contexts, embedded systems, or logs. The TOON data format specifically targets the propagation of structured, validated, and semantically consistent data, reducing ambiguity during real-time decision-making for both AI agents in mind. AI agents rely on well-defined features, correctly typed data, and accurate metadata (like timestamps, event types, or correlation IDs) to yield precise outputs. TOON requires adherent deserialization and enforces schema discipline with version control, so incoming data stays predictable, no matter how the systems themselves evolve. This reduces preprocessing complexity, minimizes runtime errors, and avoids potential misinterpretation of features, which is a common risk. Additionally, the traceability and model explainability that are critical in AI-driven environments are significantly improved due to TOON’s standardized organization of payload and contextual information. TOON enhances the reliability, performance, and scalability of AI agents operating in real-time systems by delivering clean, governed, and machine-friendly data. Where Apache Kafka Fits In In a stream-processing platform where millions of events per second must be handled with low latency, ingestion systems such as Apache Kafka can be leveraged to provide continuous input (data/events) for AI agents that work with real-time data. Apache Kafka can serve as a scalable, reliable, and quick streaming backbone to ingest new fast-moving event streams into the AI agents so that they can deliver desirable decisions/choices. Kafka’s distributed architecture ensures durability, horizontal scalability, and high throughput even under heavy loads. It uses a system where many AI models can use the same real-time data on their own. This lets different decisions happen at once, models be compared, and learning keep going without messing up systems further up the chain. By using the publish–subscribe model of Apache Kafka, multiple AI models can consume the same real-time data independently and eventually enable parallel decision-making, model comparison, and continuous learning without disrupting upstream systems. Besides, Kafka's replay capabilities and message ordering within partitions enable AI agents to retrain, backtest, or recover from errors using saved streams from streaming databases. Kafka guarantees that AI-driven systems can respond quickly, learn continuously, and scale without interruption in dynamic, data-intensive contexts by acting as a robust event pipeline. Why TOON in Apache Kafka To build or develop such a platform or architecture where processed streaming data eventually feeds into AI systems like TensorFlow, etc, TOON provides several key advantages over JSON. Specially for large language models (LLMs), where JSON is considered to be heavyweight for data exchange because of thousands of tokens in quotes, braces, colons, and repeated keys. Using TOON, we can reduce 30–50% fewer tokens for uniform data sets, and it has less syntactic clutter, which makes it easier for LLMs. Besides, TOON can be nested, similar to JSON. Similar to JSON, TOON can have a simple object, an array of values, an array of objects, and an array of objects with nested fields. In case of an array of objects with nested fields, TOON can be excellently understandable, as well as much smaller than the JSON format. TOON is a token-efficient serialization format that is primarily designed for streaming, low-memory environments, and LLM contexts. We can list down the following advantages of formatting Apache Kafka messages in TOON instead of plain JSON. JSON often mixes metadata and payload without enforced separation, but in TOON, we can clearly separate headers and business payload.TOON can support compact binary encoding that eventually reduces message size compared to verbose JSON.TOON specification follows semantic versioning to ensure compatibility.Better data validation as TOON enforces strict field validation rules.TOON declares the schema once, then provides just the values in a clean tabular format.Interoperability can be improved in Enterprise Systems because of a consistent structure across microservices.Because of stronger type safety in TOON, enforced data types reduce runtime errors. Wrapping Up TOON is a new data serialization format that is designed to reduce or minimize the number of tokens when exchanging structured data primarily with language models. Although primarily beneficial in LLM-specific pipelines, we can use it to ingest stream data into an Apache Kafka topic, as it’s a compact, token-efficient serialization format. TOON is not Kafka-native and still relatively young compared to JSON, Avro, or Protobuf. As TOON is not widely supported yet, we may need to write custom serializers/deserializers while integrating with existing message producers and consumers for downstream applications or components across the entire stream processing platform. If we are especially concerned with efficient parsing and minimizing overhead, then TOON could be a very well-suited message payload format for Apache Kafka. Together, TOON and Kafka provide a powerful foundation for building responsive, resilient, and future-ready streaming solutions. Thank you for reading! If you found this article valuable, please consider liking and sharing it.

By Gautam Goswami

CORE

Stop Leap-Second AI Drift in IoT Streams With PySpark

Fintech and Enterprise platforms ingest massive volumes of timestamped data (big data) from IoT devices such as payment terminals, wearables, and mobile apps. Accurate timing is essential for fraud detection, risk scoring, and customer analytics. Yet a subtle irregularity called the leap second can corrupt timestamps and trigger AI drift, gradually degrading model performance in production. In this article, I will attempt to explain clearly what drift types are and how they can be prevented, based on my research paper. Details can be found here. Let's start. What Is AI Drift? AI drift (also known as model drift) occurs when a deployed machine learning model loses accuracy because live data no longer matches the training data distribution. In fintech IoT pipelines, this leads to more false-positive fraud alerts, inaccurate risk scores, and lost revenue. Four key types of drift are relevant: 1. Data Drift (Covariate Shift) The statistical distribution of input features changes while the relationship to the target stays the same. Fintech example: A fraud model trained on average transaction amounts of $50–$200 suddenly sees many $1–$10 micro-payments from new IoT wearables. The feature distribution shifts, causing excessive false positives. 2. Concept Drift The underlying relationship between inputs and the target evolves. Fintech example: Fraudsters switch from large one-time charges to repeated small "card-testing" transactions across IoT devices. The model’s learned fraud patterns become outdated. 3. Label Drift (Prior Probability Shift) The overall proportion of target classes changes. Fintech example: During economic stability, the fraud rate drops from 2% to 0.2%. A model calibrated on the old rate over-predicts fraud and floods teams with alerts. 4. Temporal Drift Timestamp inconsistencies corrupt time-based features (often grouped under data drift). Fintech example: Leap seconds create duplicate timestamps or negative deltas. Features such as "seconds since last transaction" or velocity checks break, distorting every downstream score. These drift types frequently compound. Temporal drift from leap seconds can cascade into data, concept, or label drift if timestamps are not cleaned in real time. Verified Historical Leap-Second Incidents 2012 (June 30): Reddit, LinkedIn, and other major services suffered outages. A Linux kernel timing bug caused 100% CPU spikes and lockups when the extra second was inserted.2015 (June 30): Major exchanges took precautionary measures. The Intercontinental Exchange (ICE), which operates NYSE platforms, paused certain operations for 61 minutes, and other venues shortened after-hours sessions to avoid timestamp-related failures.2017 (January 1): Cloudflare experienced a partial global DNS outage. A negative time delta in their Go-based resolver caused a random-number generator to panic and crash. These documented events show why real-time leap-second handling is essential in financial systems. The Solution: PySpark Structured Streaming Pipeline The framework published in the original research uses Apache Spark Structured Streaming to detect and correct leap-second anomalies in real time, enforce temporal order, and deliver clean monotonic timestamps to AI/ML pipelines. Figure 1. PySpark Leap-Second Data Processing Pipeline Complete PySpark Implementation Python from pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types import StructType, StructField, StringType, DoubleType from pyspark.sql.window import Window import os # Initialize Spark session spark = SparkSession.builder \ .appName("LeapSecondsStreaming") \ .master("local[*]") \ .getOrCreate() spark.sparkContext.setLogLevel("ERROR") # Create input directory for streaming files input_dir = "input_data" os.makedirs(input_dir, exist_ok=True) # Define input schema schema = StructType([ StructField("transaction_id", StringType(), True), StructField("timestamp", StringType(), True), # format: yyyy-MM-dd HH:mm:ss StructField("amount", DoubleType(), True) ]) # Read streaming CSV data (easily replaceable with Kafka) raw_df = spark.readStream \ .schema(schema) \ .option("header", True) \ .option("maxFilesPerTrigger", 1) \ .csv(input_dir) # Leap-Second Cleaning cleaned_df = raw_df.withColumn( "cleaned_ts", regexp_replace(col("timestamp"), r":60$", ":59") ) # Parse timestamps and convert to Unix epoch parsed_df = cleaned_df \ .withColumn("event_time", to_timestamp(col("cleaned_ts"), "yyyy-MM-dd HH:mm:ss")) \ .withColumn("unix_ts", unix_timestamp(col("event_time"))) \ .filter(col("event_time").isNotNull()) # Real-time Temporal Validation in Micro-batches def process_batch(df, epoch_id): print(f"\n=== Processing micro-batch {epoch_id} ===") window_spec = Window.orderBy("event_time") df = df.withColumn("prev_unix_ts", lag("unix_ts").over(window_spec)) \ .withColumn("time_diff_sec", col("unix_ts") - col("prev_unix_ts")) \ .withColumn("anomaly_flag", when((col("time_diff_sec") == 0) | (col("time_diff_sec") > 2) | col("time_diff_sec").isNull(), "LEAP_SECOND_OR_GAP") .otherwise("OK")) df.select("transaction_id", "event_time", "amount", "time_diff_sec", "anomaly_flag") \ .orderBy("event_time") \ .show(truncate=False) # Write cleaned data to Delta Lake / feature store here # Start the streaming query query = parsed_df.writeStream \ .foreachBatch(process_batch) \ .outputMode("append") \ .option("checkpointLocation", "checkpoint_leapsecond") \ .start() query.awaitTermination() PySpark Logic Explanation The PySpark Structured Streaming pipeline processes fintech IoT data in real time by first initializing a SparkSession and reading incoming CSV files (or Kafka topics) as a continuous streaming DataFrame using a predefined schema for transaction_id, timestamp, and amount. The leap-second correction is applied immediately via regexp_replace to convert any invalid :60 second to :59, followed by to_timestamp parsing and conversion to Unix epoch seconds (unix_timestamp) for numerical stability. In every micro-batch processed by foreachBatch, a window function ordered by event_time computes the previous timestamp using lag and derives the time_diff_sec; any zero-difference, null, or excessively large gap is flagged as a leap-second anomaly with an anomaly_flag column. Cleaned, monotonic timestamps and validated time differences are then passed downstream for aggregations and feature engineering, ensuring temporal consistency before data reaches AI/ML pipelines. Why This Matters for All Types of Drift The leap-second cleaning and temporal validation steps directly eliminate temporal drift — the root cause in fintech IoT streams. By making timestamps monotonic and gap-free, the pipeline ensures that all derived features (time deltas, velocity checks, rolling windows, and event ordering) remain accurate. This single fix prevents temporal drift from cascading into the other 3 drift types. This pipeline achieved 100% detection and correction of injected leap-second anomalies in the paper’s controlled experiment (1,000 synthetic transactions with 10 anomalies) at an average batch latency of only 0.8 seconds. How the Pipeline Prevents All Types of Drift Data drift is eliminated by producing consistent Unix epoch timestamps and valid time deltas.Concept drift is avoided because accurate temporal sequences preserve true fraud and risk patterns.Label drift is controlled by reliable time windows that do not artificially inflate or deflate class balances. Figure 2. Types of AI/ML Drift Figure 3. Complete Layered PySpark Architecture – How Leap-Second Anomalies Are Detected and Fixed Conclusion Handling temporal anomalies such as leap seconds is often overlooked in large-scale data systems, yet it plays a critical role in ensuring the reliability of time-sensitive applications, especially in fintech and IoT environments. By leveraging PySpark and designing resilient data pipelines, organizations can proactively mitigate AI drift and maintain the integrity of predictive models operating at scale. As real-world data continues to grow in complexity, engineering systems that are both time-aware and fault-tolerant become essential. The approaches discussed here provide a foundation for building robust, production-grade data processing systems that can handle such edge cases effectively. References [1] Ram Ghadiyaram, Durga Krishnamoorthy, Vamshidhar Morusu, Jaya Eripilla, "Addressing AI Drift in Fintech IoT Data Processing: Handling Leap Seconds with PySpark for Robust Predictive Analytics," International Journal of Computer Trends and Technology, vol. 73, no. 5, 2025. https://doi.org/10.14445/22312803/IJCTT-V73I5P101

By Ram Ghadiyaram

CORE

Tokens and Transactions With AI

Based on NVIDIA CEO Jesen Huang’s commentary on the Role of Databases for the Agentic Era in his GTC 2026 keynote. The diagram below is a readable version of Jensen's "Best Slide"; the content is created using LLM from the talk's transcript and then edited. Summary of the Talk [wrt Databases] For a database audience, the keynote underscores a fundamental shift: Data is no longer just stored and queried — it is continuously activated to power agentic systems. The talk highlights that the center of gravity is moving from traditional transactional and analytical databases toward AI-driven data platforms that unify structured, unstructured, and real-time data streams into a single operational fabric. Massive growth in AI infrastructure — driven by data center expansion and trillion-dollar-scale compute demand — signals that data systems must scale not just for queries, but for continuous inference and agent workflows. A key theme is the rise of “agentic architectures,” where data is embedded, indexed, and retrieved dynamically to support reasoning systems, making vector search, hybrid retrieval, and multimodal indexing first-class primitives. The keynote also implicitly reframes databases as part of an end-to-end AI stack — tightly coupled with compute, models, and orchestration layers — rather than standalone systems. This creates pressure on databases to evolve toward low-latency, context-aware, and semantically rich data access patterns. Finally, the emerging paradigm suggests that competitive advantage will hinge on how effectively platforms can turn raw data into actionable intelligence for autonomous agents, blending transactions, analytics, and AI into a unified system of execution. Intro Back in the early 90s, when databases were ruling the world, client-server architecture was the mainframe modernization approach. Oracle, Informix, Sybase, and SQL Server(!) were ruling the database world. The big war between the big three was TPC-C and TPC-H – T for Transactions. Today, it’s T for Tokens! The goal of every enterprise application is to get the job done for customers and consumers. Consumers have embraced token computing, also known as generative AI. One of the big themes of Jensen Huang’s 2026 GTC keynote is that the five-layer cake of computing is to enable applications — both consumer and enterprise applications. Jensen said, behind all these applications are modern databases and the unreasonably effective SQL. While SQL has been extended for semi-structured data (with SQL++), SQL and the databases have to go beyond their roots and handle unstructured data - pure text, PDFs, images, and more. That will truly help enterprises bring the power of tokens to consumers. In essence, Jensen calls for databases to be natural! It’s now early in the second quarter of the 21st century. Time for databases to be reinvented. Again. Abstract Enterprise computing has long been built around transactions — Boolean algebra over ACID. In other words, deterministic units of work are executed over structured data. Recent advances in AI, however, introduce a complementary unit of computation: tokens. Tokens represent language, context, and reasoning, enabling systems to interpret unstructured data and perform multi-step inference. This article examines the emerging duality of tokens and transactions, based on Jensen Huang’s 2026 NVIDIA GTC keynote. He argues that databases, data systems, and data platforms are central to — from systems of record to systems of cognition that actively participate in reasoning, retrieval, and agent execution loops. All to enable intelligent consumer and enterprise applications. That’s really the job to be done. We explore architectural implications, evolving database responsibilities, and key design challenges for building enterprise systems that integrate deterministic transactions with probabilistic token-based computation. Introduction: A Shift in the Unit of Compute Enterprise systems have historically been defined by transactions and analytics (via data warehousing). A transaction encapsulates a unit of business logic — atomic, consistent, isolated, and durable. Analytical databases (data warehousing engines) are optimized for analyzing large volumes of data to discover business insights. Both operational and analytical databases are relational; both support transactions; both support SQL. Operational databases are optimized for executing and managing such transactions efficiently and reliably. And yet, modern AI systems operate on a fundamentally different unit: tokens. Tokens are the atomic units of language models. They represent fragments of meaning — words, subwords, or symbols — that collectively encode information, context, and reasoning. Every AI interaction — prompt ingestion, retrieval, reasoning, and response generation — is expressed in tokens. Recent developments suggest a structural shift: AI systems increasingly perform multi-step reasoning, not single-step queriesWorkloads are iterative and probabilistic, not deterministic.Data consumption extends beyond structured tables to multimodal data. E.g., data in PDFs, spreadsheets, images, etc. How do enterprise systems reconcile transaction-based computation with token-based reasoning? How do you reconcile deterministic computing we’ve been used to with the probabilistic computing of the generative AI era? The Duality: Transactions and Tokens Transactions remain the backbone of enterprise systems, enforcing data integrity and consistency, supporting auditability and compliance, and representing the authoritative record of business truth. They underpin core operations such as updating account balances, recording orders, and maintaining inventory — activities that demand precision and reliability. Tokens introduce a fundamentally different paradigm in computing. Rather than representing state, they capture context and meaning, enabling systems to perform approximate reasoning instead of exact computation. They make the applications human. Tokens operate over unstructured and multimodal data, allowing AI systems to interpret natural language, summarize documents, generate insights, and plan and execute actions. As recent industry developments suggest, AI workloads have reached an “inference inflection,” where real-time usage increasingly dominates over training. A critical misconception is that tokens will replace transactions. In reality, transactions define ground truth, while tokens interpret and act on that truth. Together, they form a complementary foundation for a new class of systems — one that integrates deterministic execution with probabilistic reasoning, enabling enterprises to both preserve correctness and unlock intelligence. The Changing Role of Databases The shift to token-based computation is fundamentally redefining the role of databases — from systems that answer queries to systems that construct context. Traditionally, databases were optimized to retrieve precise results, such as returning revenue for a given region. In AI-driven systems, however, the objective is no longer just row retrieval but context assembly: gathering relevant documents, retrieving semantically similar records, and combining structured and unstructured data into a coherent input for reasoning. These additional requirements require databases to evolve by adding vector indexing, hybrid query execution that blends structured predicates with semantic search, and pipelines that dynamically assemble context for downstream AI processing. At the same time, databases are evolving from query-answer systems into participants in a broader question-answer system. The answer doesn’t necessarily come from database systems and their Boolean logic. AI systems introduce multiple layers of memory, including durable storage (traditional databases), semantic memory (embeddings and vector indexes), and short-term memory (model context such as KV cache). Agentic workloads place intense pressure on these layers due to frequent context reuse and expansion — a phenomenon often described as “memory pounding.” As a result, databases must support low-latency access across memory tiers, integrate with context caching mechanisms, and manage stateful sessions and reasoning histories that persist across interactions. Perhaps most significantly, databases are transitioning from passive data stores to active participants in execution workflows. In traditional architectures, applications query databases and receive results. In agentic systems, AI agents orchestrate workflows, invoke databases as tools, and embed data access directly within multi-step reasoning loops. Increasingly, database interactions are expected to be performed by agents rather than humans, which requires databases to expose programmatic, agent-friendly interfaces, support semantic abstractions beyond SQL, and enforce policy-aware access controls. In this new paradigm, the database is no longer just a backend component — it becomes an integral part of the reasoning and execution fabric of enterprise systems. Architectural Implications The integration of tokens and transactions leads to a layered architecture that separates truth, retrieval, and reasoning into distinct but tightly coupled components. At the foundation lies the ground truth layer, consisting of OLTP databases, data warehouses, and dataframes, whose primary responsibility is to maintain correctness, ensure consistency, and provide authoritative data. This layer continues to anchor enterprise systems, serving as the definitive source of record against which all higher-level reasoning must be validated. Above this foundation sits the semantic retrieval layer, which enables systems to bridge structured and unstructured data. This layer includes vector databases, hybrid indexing systems, and multimodal data pipelines that support semantic search and retrieval-augmented generation. Its role is to unify disparate data forms and assemble relevant context for downstream processing. As the scale and complexity of data grow, technologies such as GPU-accelerated data processing — exemplified by libraries like cuDF and cuVS — are emerging to address the performance and freshness challenges inherent in this layer. At the top of the stack is the reasoning and agent layer, where intelligence is applied, and actions are executed. This layer comprises language models, orchestration frameworks, and tool-using agents that interpret user intent, perform multi-step reasoning, and drive decisions and workflows. Correspondingly, the data flow within enterprise systems evolves from a simple query-response model into a continuous loop — context is assembled, retrieved, reasoned over, acted upon, and fed back into the system. This shift introduces iterative execution patterns, dynamic query generation, and cross-system orchestration, fundamentally transforming how applications interact with data and how value is created. Data Evolution: From Structured to Multimodal A significant shift in modern data systems is the elevation of unstructured data from passive archives to active participants in computation. Historically, structured data was queryable and operationally central, while unstructured data — documents, images, videos, and logs — was largely stored and seldom analyzed at scale. Today, AI systems can parse, embed, and index this unstructured data, transforming it into searchable and usable information. As a result, the vast majority of enterprise data — often estimated at 80–90% — can now be incorporated into query and reasoning workflows, fundamentally expanding the scope of what databases must manage. This transformation requires databases to evolve beyond traditional tabular models and support multimodal data processing. Systems must now integrate document storage, embedding pipelines, multimodal indexing to unify structured and unstructured data, and importantly, the right SQL language extensions and optimizer improvements to execute queries on this data efficiently. The database is no longer just a repository of records; it becomes a platform for representing meaning across diverse data types, enabling richer and more comprehensive insights. Querying in the Agentic Era Query interfaces are undergoing a fundamental transition from declarative queries to intent-driven interactions while still being declarative. While SQL remains essential, users increasingly interact with systems through natural language, expecting not just answers but outcomes. Natural language interfaces can translate requests into SQL or workflows, but simple NL-to-SQL translation is insufficient for complex enterprise scenarios that require reasoning, iteration, and orchestration. In this new paradigm, query execution is increasingly mediated by agents rather than direct user input. Agents decompose tasks, select appropriate tools, execute multiple queries, and synthesize results into actionable outputs. For example, instead of executing a single query to retrieve revenue, an agent might detect anomalies, gather relevant data, correlate with external signals, and generate recommendations. This shifts the role of querying from a single-step operation to a multi-step reasoning process embedded within workflows. Supporting this evolution requires a new approach to query planning and optimization. Future systems must operate across structured predicates, vector similarity search, ranking models, and the cost of token generation. This introduces challenges such as cost-based optimization across heterogeneous modalities, dynamic query planning that adapts to context, and execution strategies that balance performance, accuracy, and cost in real time. Performance and Cost Considerations “There are three things important in the database world: performance, performance, and performance” – Bruce Lindsay Token-driven systems introduce new dimensions of performance that extend beyond traditional database metrics. One of the central trade-offs is between latency and context: larger context windows improve reasoning quality but increase both latency and computational cost. As a result, system design (including embedding generation, vector index options, hybrid index, hybrid search, quantization methods, and similarity search) must carefully balance the depth of context with effectiveness, efficiency, and responsiveness. At the same time, new throughput metrics are emerging that redefine how performance is measured. Metrics such as tokens per second, time to first token, and tokens per watt reflect the operational realities of AI-driven systems. The notion of the data center as a “token factory” captures this shift, emphasizing the production and delivery of intelligence rather than just query execution. These workloads also place unprecedented pressure on storage and indexing systems. Frequent context retrieval, large working memory requirements, and repeated access patterns stress storage architectures in ways that differ from traditional workloads. Additionally, vector indexes must be continuously maintained — rebuilt, incrementally updated, and aligned with evolving datasets — making freshness and maintenance efficiency as critical as query performance. Security and Governance Agent-driven systems introduce new categories of risk that require a rethinking of security and governance. Autonomous data access, cross-system interactions, and external communication expand the attack surface and increase the complexity of enforcing policies. As a result, security can no longer rely solely on perimeter-based approaches. Instead, security must move closer to the data itself, becoming context-aware and policy-driven with guardrails. This includes row-level and field-level access control, as well as policies that govern what data can be accessed, under what conditions, and by which agents. Systems must enforce these constraints dynamically, ensuring that data access remains compliant even as workflows become more autonomous and distributed. In parallel, hardware-level protections are emerging to strengthen data security. Confidential computing technologies enable data to remain encrypted even during processing, ensuring that sensitive information is protected not only from external threats but also from infrastructure operators. Together, these advances redefine the database as a critical enforcement point for both security and governance in AI-driven systems. Open Challenges and Research Directions The convergence of tokens and transactions introduces a range of open challenges that span systems design, optimization, and governance. One of the most pressing issues is the development of unified cost models that can optimize across SQL queries, vector search, and model inference. Traditional query optimizers are not equipped to handle the interplay between data retrieval and token generation, necessitating new approaches that consider both data and compute costs holistically. Another critical challenge lies in defining coherent memory abstractions. Systems must unify session memory, persistent storage, and reasoning context without compromising performance or isolation. This requires new models for managing state across interactions, ensuring that context can be reused efficiently while maintaining consistency and security. Additional challenges include indexing for multimodal data and governing agent behavior. Maintaining freshness, efficiency, and hybrid queryability across rapidly evolving datasets is non-trivial, particularly when combining structured and unstructured data. At the same time, ensuring safe execution, traceability, and compliance for agent-driven workflows demands new frameworks for governance and observability. Conclusion: Toward Agentic Data Systems The emergence of tokens as a unit of computation does not diminish the importance of transactions; rather, it creates a new synthesis. Transactions anchor truth, providing the deterministic foundation upon which enterprises operate, while tokens enable reasoning, allowing systems to interpret, infer, and act. Databases must evolve to support both, bridging the gap between correctness and intelligence. The future enterprise data platform will integrate multiple capabilities into a unified system. It will combine structured and unstructured data, support hybrid retrieval and reasoning, participate actively in agent execution loops, and enforce governance and policy at every level. This represents a significant evolution from traditional architectures, requiring databases to operate as both data stores and intelligent substrates. In this context, databases are transitioning from systems of record to systems of cognition. The challenge. — and opportunity — for database systems is to become the foundation on which intelligent systems reason, act, and learn. Those who successfully make this transition will define the next generation of enterprise computing. Closing Reflection The key architectural question is no longer how to query data efficiently, but how to make data continuously usable for reasoning, action, and autonomous systems. This shift reflects a broader transformation in enterprise computing, where the value of data is no longer defined solely by retrieval speed or storage efficiency, but by how effectively it can be interpreted, contextualized, and acted upon in real time. The answer lies in embracing both tokens and transactions — not as competing paradigms, but as complementary foundations. Transactions provide the deterministic backbone that ensures correctness and trust, while tokens enable the flexible, probabilistic reasoning required for intelligence and adaptation. Together, they form the basis of systems that can both preserve truth and generate insight. The future of enterprise systems will be defined by how well this integration is achieved. Organizations that successfully unify transactional integrity with token-driven reasoning will be able to build platforms that are not only data-driven, but intelligence-driven — capable of supporting autonomous systems that continuously learn, adapt, and act. References Supporting Our AI Overlords: Redesigning Data Systems to be Agent-FirstJensen Huang's GTC 2026 keynoteWhy Should Databases Go Natural?The Unreasonable Effectiveness of SQL

By Keshav Murthy

CORE

2026 Developer Research Report

Hello, our dearest DZone Community! Last year, we asked you for your thoughts on emerging and evolving software development trends, your day-to-day as devs, and workflows that work best — all to shape our 2026 Community Research Report. The goal is simple: to better understand our community and provide the right content and resources developers need to support their career journeys. After crunching some numbers and piecing the puzzle together, alas, it is in (and we have to warn you, it's quite a handful)! This report summarizes the survey responses we collected from December 9, 2025, to January 27 of this year, and includes an overview of the DZone community, the stacks developers are currently using, the rising trend in AI adoption, year-over-year highlights, and so much more. Here are a few takeaways worth mentioning: AI use climbs this year, with 67.3% of readers now adopting it in their workflows.While most use multiple languages in their developer stacks, Python takes the top spot.Readers visit DZone primarily for practical learning and problem-solving. These are just a small glimpse of what's waiting in our report, made possible by you. You can read the rest of it below. 2026 Community Research ReportRead the Free Report We really appreciate you lending your time to help us improve your experience and nourish DZone into a better go-to resource every day. Here's to new learnings and even newer ideas! — Your DZone Content and Community team

By Carisse Dumaua

Agent-of-Agents Pattern: Enhancing Software Testing

The Pre-Production Bottleneck A pull request (PR) gets merged, code review is complete, unit tests are green, and the feature looks good. But then comes the familiar question: Is this actually ready for production? Most engineering teams have a checklist: regression tests, security scans, performance validation, and integration checks. The problem is that executing all of this takes significant time. A full regression suite might take one to two hours. For a feature that touched a few files, running everything feels wasteful. But manually picking tests? That's how bugs slip into production. Both extremes have drawbacks. Run everything, and engineers wait while 800 tests execute — 90% of which are irrelevant to the changes made. Pick tests manually, and there's a real risk of missing that one edge case that breaks checkout flow during peak traffic. Neither approach scales when deploying multiple times daily. Multi-agent AI systems offer a different approach to pre-production validation. Instead of one AI attempting to handle code review, security analysis, and QA simultaneously, specialized agents collaborate on the task. One analyzes code changes to assess risk. Another determines which tests are actually relevant. A third handles security scanning. An orchestrator coordinates everything and makes the final deployment decision. How Multi-Agent Validation Works The architecture centers on an orchestrator managing workflow, with specialist agents handling specific validation tasks. When a PR merges, the orchestrator examines what changed and builds a testing strategy dynamically. Figure 1: Multi-agent system overview The orchestrator acts as a coordinator rather than executing tests itself. It analyzes the diff, identifies risky areas, and delegates work appropriately. If the authentication logic has changed, then security validation gets prioritized. If database queries were modified, then Performance testing becomes essential. If frontend components are updated, visual regression is triggered. Each specialist agent has a focused responsibility. The code analysis agent reviews diffs and identifies risk areas. The regression selector chooses relevant tests instead of running the entire suite. The security agent scans for vulnerabilities specific to the changes. The performance agent validates that modifications don't introduce latency issues. Agents operate independently and communicate through messages. The orchestrator sends work items, agents process them asynchronously, and results flow back. This enables parallel execution, i.e., security scanning can happen simultaneously with regression test selection. This concurrent approach eliminates sequential bottlenecks, and the orchestrator only waits for the results it needs to make a deployment decision. The Agent Roles A functional multi-agent system uses specialist agents plus the orchestrator. Each addresses a specific validation concern. The orchestrator serves as the entry point. Triggered by the CI/CD pipeline after a PR merge, it parses the diff, identifies affected services, creates a validation plan, and distributes work. After collecting results, it makes the deployment decision and reports back. The code analysis agent performs static analysis on changed files. It identifies which parts of the codebase are affected. This risk assessment guides all downstream validation. The regression selector agent addresses test suite efficiency. Rather than running an 800-test suite completely, this agent analyzes changes and selects relevant tests. For example, modifications to checkout logic trigger checkout, payment, and order confirmation tests, but skip unrelated user profile tests. This approach reduces test execution time. The security agent provides context-aware security validation. It checks whether specific changes introduce security risks. The focus stays narrow and relevant. The performance agent validates that changes don't degrade performance. It runs focused checks rather than full load tests. The quick validations catch obvious performance issues. The integration agent runs smoke tests on critical user flows. It validates that changes work correctly with the rest of the system. This catches scenarios where modifying one service creates unexpected breakage elsewhere. Real-World Application For instance, when validating a promotional code feature that modified payment and checkout logic, the multi-agent system identified a security vulnerability where single-use codes could be reused through rapid order submissions. The orchestrator flagged the PR as high-risk, the code analysis agent detected the payment complexity, the regression selector chose 47 relevant tests from an 800-test suite, and the security agent caught the vulnerability. The system blocked deployment and provided specific remediation guidance. After the fix was applied, validation passed, and the feature was deployed successfully, preventing what would have been a costly production incident. Deploying With Docker Each agent runs in its own container, providing isolation and straightforward scaling through Docker. Figure 2: Docker container architecture The orchestrator exposes a webhook endpoint that receives triggers from GitHub after PR merges. When validation requests arrive, the orchestrator publishes tasks to Redis. Agents subscribe to queues, process messages, and publish results back. Agents maintain no state. They receive work, process it, return results, and reset. This simplifies scaling and debugging. If an agent crashes, the orchestrator detects the failure and retries. When This Pattern Applies Multi-agent validation works best for teams deploying frequently with large test suites, where intelligent test selection can significantly reduce validation time. The pattern fits complex applications. It's less suitable for simple applications or teams deploying infrequently. Consider risk tolerance: systems handling financial transactions, healthcare data, or critical infrastructure may benefit from the extra validation. Implementation Approach Building a multi-agent system works best with an incremental approach. Starting with everything simultaneously leads to overwhelm and often abandonment. Begin with an orchestrator and two agents — code analysis and regression selection provide immediate value. These two components alone deliver measurable benefits. Once the message-passing pattern and agent coordination feel comfortable, additional agents can be introduced. For LLM inference, local models could be used to eliminate external API dependencies and control costs. Configure agents to use the chosen model provider for code understanding and test selection. Integrate with CI/CD pipelines incrementally. Start by triggering validation manually on selected PRs. Gather feedback and build confidence in the results. Once trust is established, add it as a GitHub Action. Configure it as a required check that blocks merges when critical issues surface. Conclusion Engineers no longer wait extended periods for validation when using the multi-agent validation system. Real issues get caught before reaching users. The targeted test selection alone provided sufficient value to justify the investment. The most significant benefit isn't purely technical; it’s the shift in thinking about testing strategy. Understanding that different changes carry different risk levels, and that the validation strategy should reflect those risks. The multi-agent system codifies this thinking into the deployment process. For teams dealing with frequent deployments, large test suites, and complex applications where quality directly impacts users, this pattern merits exploration.

By Vineet Bhatkoti