DZone Spotlight

Wednesday, March 18 View All Articles »

From SAST to “Shift Everywhere”: Rethinking Code Security in 2026

By Alex Vakulov

Several structural shifts have changed how source code security is approached. Software teams now deploy continuously, build on cloud-native architectures, and often depend on third-party and open-source components. As a result, security vulnerabilities propagate faster and across wider blast radii. Security expectations have shifted as well. Customers assess vendors not only on features but also on how reliably they manage source code risk throughout the whole software lifecycle. This pushes security considerations beyond isolated code scans into architecture, development practices, and operational processes. Modern development environments evolve faster than traditional security controls. Rapid release cycles, ephemeral infrastructure, large dependency graphs, and AI-assisted coding all increase the impact of design and tooling decisions. This article examines how source code security breaks down in modern development environments. It highlights the limits of secure coding practices, the decisive role of architecture and threat modeling, and the practical strengths and weaknesses of modern code analysis tools. It also addresses the operational risks introduced by open-source and third-party components across the software lifecycle. Secure Source Code: Reality or Fiction? There is no such thing as absolutely secure source code. There is only “insufficiently studied” code. No matter how many specialists are involved or what tools they use, sooner or later, a new vulnerability will be found. Threats always arise once code enters operation. At that point, a threat model emerges, potential attackers appear, and risks become salient. As a result, the security of code strongly depends on the company’s risk model and maturity level. This is clearly illustrated in practice. At early deployment stages, most security issues are often absent because developers follow secure coding practices. Vulnerabilities usually surface later, when the software is exposed to real-world operational conditions. At the same time, within a specific company or product, it is still possible to define what “secure” means in practical terms. Code can be considered secure when it meets security quality requirements that reflect the organization’s threat model and the attackers it expects to face. Those requirements naturally vary based on how and where the product is used. Security assessments are typically performed within an established regulatory or methodological framework. These frameworks allow organizations to judge whether an application meets a defined level of security, even if absolute security is unattainable. The exact level depends on the methodology applied. For example, teams may use OWASP testing methodologies, NIST guidance, or sector-specific security standards. Why Correct Code Can Still Be Insecure In principle, all developers aim to write secure code. In practice, only those with sufficient knowledge and experience consistently avoid critical mistakes. Security assessment always depends on system design. If a flaw is introduced at the architectural or design stage, even perfectly implemented code cannot ensure the product's security. When evaluating code security, it is essential to consider the threat model and attacker profile, as well as the environment in which the software operates. Systems are often compromised not through direct code exploitation, but through data leakage, abuse of trust relationships, or misconfiguration of supporting services or components such as routing, fax services, or APIs. The widespread use of AI-assisted coding tools amplifies this challenge. AI-generated code often appears correct and well-structured, but it inherits assumptions, patterns, and design decisions from its training data. When architectural choices are flawed, AI-assisted development tends to scale those flaws rather than eliminate them. As a result, even perfectly written code can still lead to an insecure system. Code security alone is not the end goal. Assessing the Risk of Using Insecure Code Security risks are often assessed quantitatively, for example, by estimating potential recovery and remediation costs. However, qualitative factors are equally important, including reputational damage, regulatory exposure, and loss of customer trust. Effective risk mitigation requires evaluating the entire software lifecycle, starting at the design stage. Even during development, organizations must assess the criticality of potential risks, define the types of data being processed, and determine acceptable security levels. Based on these decisions, appropriate security controls are built into the code and surrounding systems. Risk assessments commonly consider the effort required for an attacker to exploit a vulnerability or access sensitive data. This approach assumes that attacks requiring excessive time, expertise, or resources may be economically unattractive to adversaries. AI-assisted development may introduce changes to this calculation. A single insecure pattern introduced by an AI tool can be replicated across many services, components, or repositories before it is detected. As a result, modern risk assessments must account not only for impact and likelihood, but also for the speed and scale at which vulnerabilities can propagate. Verifying Code Security Across the SDLC The software development lifecycle involves multiple parties in code security assessment, including vendors, customers, and partners. Responsibility for achieving secure code is distributed differently in each case. While most organizations rely on internal development, weak governance in these teams can itself become a significant source of risk. It is also important to distinguish between open-source and closed-source software. In open-source scenarios, customers retain significant responsibility for security outcomes because they determine how the code is reviewed, integrated, patched, and maintained. Closed-source software requires a different approach, including clearly defined interaction and disclosure processes with the vendor. In these cases, vendors primarily bear reputational risk, whereas customers bear most of the technical and operational risk. For vendors, customer security is a critical concern, especially when serving large or regulated clients. Vendors must clearly understand, in advance, which events are unacceptable from the customer’s perspective. This enables structured evaluation of attack scenarios and the development of appropriate security processes. For customers working with external developers, responsibilities must be explicitly defined in technical specifications and contracts. Wherever possible, vulnerability remediation and bug fixes should remain the contractor’s responsibility. When development follows a time-and-materials model, customers must require that the vendor adhere to secure development and operational practices. Finally, security does not end with the DevSecOps cycle of development and deployment. Software continues to change throughout its operational life. So, security testing must be continuous and extend across the entire application lifecycle. Architecting a Code Security Testing Stack Choosing tools for source code security testing is always complex. Some organizations use the Building Security In Maturity Model (BSIMM), which describes a wide range of practices involved in building a mature, secure development process. Many organizations follow the “shift left” principle, placing security controls early in the SDLC. In practice, this often generates an unmanageable volume of checks and alerts, overwhelming development teams. The more recent “shift everywhere” approach aims to address this limitation. Security testing is performed whenever sufficient artifacts are available, at any stage of the SDLC. This allows security practices to be applied where they provide the most value. In this model, developers gain visibility into how a product is assembled, which components were used, and when changes were made. They can choose when and how to fix issues and receive actionable recommendations from security teams. Alongside traditional SAST, DAST, and dependency analysis tools, AI-based analysis is increasingly used to prioritize findings and reduce noise. These systems are most effective when they enrich context and assist decision-making rather than replacing deterministic checks. Establishing Code Security Practices Organizations should begin by selecting code testing/verification practices that best fit their structure and risk profile. A poor choice can lead to missed threats or excessive false positives. Most implementations start with static analysis tools. These tools are mature and widely adopted. Additional tools are added gradually, based on how well they integrate with existing workflows. Partial overlap between tools helps reduce blind spots. Executive support is critical. Without leadership commitment to investing resources and enforcing security controls during release cycles, security efforts remain ineffective. Reports may be generated and risks formally accepted, but insecure software still reaches production. Developer education is equally important. Developers are generally willing to write secure code, but may lack sufficient knowledge. When faced with long lists of issues discovered late in the process, motivation to remediate often declines. Final Thoughts In recent years, numerous weaknesses and vulnerabilities have been identified in modern software systems. Hacker groups actively target vulnerabilities in new releases, particularly in open-source components. Attackers manipulate authentication data, deploy destructive payloads, and embed malware in third-party libraries. This makes thorough source code inspection increasingly critical. At the same time, security assessment is increasingly a core property of software systems, driven by the adoption of secure-by-design and zero-trust principles. Security operations teams are expected to focus more on complex vulnerability scenarios while delegating routine cases to automated analysis tools to make the security work more analytical and context-driven. Developer communities will need to expand and more actively share security practices. Over time, trusted repositories of verifiably secure software may emerge, supported by both vendors and independent developers. Organizations will also need to raise overall security literacy among developers. For more mature companies, this will likely involve shifting away from traditional individual certifications toward accrediting secure build and release infrastructures. More

Swift Concurrency, Part 1: Tasks, Executors, and Priority Escalation

By Nikita Vasilev

Swift 6 introduced a new approach to concurrency in apps. In this article, we will explore the problems it aims to solve, explain how it works under the hood, compare the new model with the previous one, and take a closer look at the Actor model. In the upcoming parts, we will also break down executors, schedulers, structured concurrency, different types of executors, implement our own executor, and more. Swift Concurrency Overview: Problems and Solutions Concurrency has long been one of the most challenging aspects of software development. Writing code that runs tasks simultaneously can improve performance and responsiveness, but it often introduces complexity and subtle bugs as race conditions, deadlocks, and thread-safety issues. Swift concurrency, introduced in Swift 6, aims to simplify concurrent programming by providing a clear, safe, and efficient model for handling asynchronous tasks. It helps developers to avoid common pitfalls by enforcing strict rules around data access and execution order. If you use Swift 5.x and plan to migrate to Swift 6, you can enable Swift concurrency checks in your project settings. This allows you to gradually adopt the new concurrency model while maintaining compatibility with existing code. Enabling these checks helps catch potential concurrency issues early, making the transition smoother and safer. As you update your codebase, you can start integrating async/await syntax and other Swift concurrency features incrementally without a full rewrite. Some of the key problems Swift concurrency addresses include: Race conditions: Preventing simultaneous access to shared mutable state that can cause unpredictable behavior.Callback hell: Simplifying asynchronous code that used to rely heavily on nested callbacks or completion handlers, making code easier to read and maintain.Thread management complexity: Abstracting away low-level creation and synchronization, allowing developers to focus on the logic rather than thread handling.Coordinating concurrent tasks: Structured concurrency enables clear hierarchies of tasks with proper cancellation and error propagation. By leveraging new language features like async/await, actors, and structured concurrency, Swift 6 provides a more intuitive and robust way to write concurrent code, improving both developer productivity and app stability. Multitasking Modern operating systems and runtimes use multitasking to execute units of work concurrently. Swift concurrency adopts cooperative multitasking, which differs fundamentally from the preemptive multitasking model used by OS-level threads. Understanding the difference is key to writing performant and safe asynchronous Swift code. Preemptive Multitasking Preemptive multitasking is the model used by operating systems to manage threads and processes. In this model, a system-level scheduler can forcibly interrupt any thread at virtually any moment to perform a context switch and allocate CPU time to another thread. This ensures fairness across the system and allows for responsive applications — especially when handling multiple user-driven or time-sensitive tasks. Preemptive multitasking enables true parallelism across multiple CPU cores and prevents misbehaving or long-running threads from monopolizing system resources. However, this flexibility comes at a cost. Because threads can be interrupted at any point in their execution — even in the middle of a critical operation — developers must use synchronization primitives such as mutexes, semaphores, or atomic operations to protect shared mutable state. Failing to do so may result in data races, crashes, or subtle bugs that are often difficult to detect and reproduce. This model offers greater control and raw concurrency, but it also places a significantly higher burden on developers. Ensuring thread safety in a preemptive environment is error-prone and may lead to non-deterministic behavior — behavior that varies from run to run — which is notoriously difficult to reason about or reliably test. From a technical perspective, preemptive multitasking relies on the operating system to handle thread execution. The OS can interrupt a thread at almost any point — even in the middle of a function — and switch to another. To do this, the system must perform a context switch, which involves saving the entire execution state of the current thread (such as CPU registers, the instruction pointer, and stack pointer) and restoring the previously saved state of another thread. This process may also require flushing CPU caches, invalidating the Translation Lookaside Buffer (TLB), and transitioning between user mode and kernel mode. These operations introduce significant runtime overhead. Each context switch takes time and consumes system resources — especially when context switches are frequent or when many threads compete for limited CPU cores. Additionally, preemptive multitasking forces developers to write thread-safe code by default, increasing overall complexity and the risk of concurrency bugs. While this model provides maximum flexibility and true parallelism, it’s often excessive for asynchronous workflows, where tasks typically spend most of their time waiting for I/O, user input, or network responses rather than actively using the CPU. Cooperative Multitasking In contrast, Swift’s concurrency runtime uses cooperative multitasking. In this model, a task runs until it voluntarily yields control — typically at an await point or via an explicit call to Task.yield(). Unlike traditional threads, cooperative tasks are never forcibly preempted. This results in predictable execution: context switches occur only at clearly defined suspension points. Swift’s cooperative tasks are scheduled onto a lightweight, runtime-managed cooperative thread pool—separate from Grand Central Dispatch queues. Tasks running in this pool are expected to be “good citizens,” yielding control when appropriate, especially during long-running or CPU-intensive work. To support this, Swift provides Task.yield() as a manual suspension point, ensuring other tasks have a chance to execute. However, cooperative multitasking comes with a caveat: if a task never suspends, it can monopolize the thread it’s running on, delaying or starving other tasks in the system. Therefore, it is the developer’s responsibility to ensure that long-running operations include suspension points. In cooperative multitasking, the fundamental unit of execution is not a thread, but a chunk of work, often called a continuation. A continuation is a suspended segment of an asynchronous function. When an async function suspends at an await, the Swift runtime captures the current execution state into a heap-allocated continuation. This continuation represents a resumption point and is enqueued for future execution. Instead of associating a thread with a long-running task, the Swift runtime treats a thread as a pipeline of continuations. Each thread executes one continuation after another. When a continuation finishes or suspends again, the thread picks up the next ready continuation from the queue. As mentioned above, this model avoids traditional OS-level context switches. There is no need to save and restore CPU registers or thread stacks; the runtime simply invokes the next closure-like continuation. This makes task switching very fast and lightweight, though it involves increased heap allocations to store the suspended async state. The key trade-off: you use a bit more memory but gain dramatically lower overhead for task management. Cooperative scheduling gives tight control over when suspensions happen, which improves predictability and makes concurrency easier to reason about. Introducing to Task In Swift concurrency, a Task provides a unit of asynchronous work. Unlike simply calling an async function, a Task is a managed object that runs concurrently with other tasks in a cooperative thread pool. Tasks are managed by a cooperative thread pool. The cooperative thread pool is designed to manage concurrency efficiently by allowing tasks to yield the CPU while waiting for asynchronous operations to complete. This is achieved through the use of async functions and tasks, which are the fundamental units of concurrency in Swift. Tasks can be created to run concurrently, and they can also be awaited or canceled. They provide fine-grained control over asynchronous behavior and are an integral part of structured concurrency in Swift. Creating a Task A task can be created using the Task initializer, which immediately launches the provided asynchronous operation: Swift Task(priority: .userInitiated) { await fetchData() } When you create a Task using the standard initializer (i.e. not detached), it inherits the surrounding actor context, priority, and task-local values. This behavior is crucial for structured concurrency and safety in concurrent code. Swift 6.2 introduces a significant change to how concurrency is handled: by default, all code runs on MainActor. To run code on a background, Swift 6.2 adds a new attribute @concurrent. You can also use nonisolated if the code doesn’t require access to the main actor. WWDC Embracing Swift Concurrency Under the hood, in earlier versions of Swift concurrency, the Swift runtime used an internal mechanism called @_inheritActorContext to track which actor a task was associated with. Although this property wasn’t part of the public API, it played a key role in ensuring that tasks created inside actor-isolated code would execute on the same actor, preserving data race safety. With advancements in Swift, the runtime has started transitioning from @_inheritActorContext to a new mechanism known as sending, which is now more explicitly handled by the compiler and runtime. What is sending? sending is a new keyword introduced in Swift 6 as a part of language’s move toward safer and more explicit concurrency. It’s used to mark function parameters and return values that are moved across concurrency boundaries. It works especially well noncopyable types, ensuring memory safety and preventing use-after-move errors. When a parameter is marked with sending, the compiler enforces that the original instance is no longer accessed after the transfer. Swift func process(_ data: sending MyNonCopyableType) async { // `data` is moved here and can’t be used elsewhere after the call } When you launch a Task.detached, you must ensure that any values captured by the task are Sendable. The compiler now enforces this at compile-time using Sendable protocol and the @Sendable function type. Failing to conform to Sendable may result in a compile-time error, particularly in strict concurrency mode. Tasks also support priorities, similar to how Grand Central Dispatch queues handle them. Task vs. Task.detached When working with Swift Concurrency, it’s important to understand the difference between Task and Task.detached, as they define how and where asynchronous work is executed. Task Task inherits the current actor context (such as MainActor or any custom actor) and priority. It’s commonly used when you want to spawn a new asynchronous operation that still respects the current structured concurrency tree or actor isolation. This is especially useful for UI updates or working inside specific concurrency domains. Swift Task { await updateUI() } In the example above, if called from the main actor, the Task will also run on the main actor unless explicitly moved elsewhere. Task.detached Task.detached creates a completely independent task. It doesn’t inherit the current actor context or priority. This means it starts in a global concurrent context and requires manage safety, especially when accessing shared data. Swift Task.detached { await performBackgroundWork() } Use Task.detached when you need to run background operations outside the current structured context, such as long-running computations or escaping an actor’s isolation. Cooperative Thread Pool A cooperative thread pool in Swift concurrency is a mechanism that manages the execution of asynchronous tasks by scheduling them onto a limited number of threads, typically matching the number of CPU cores. Swift concurrency operates using a cooperative thread pool designed for efficient scheduling and minimal thread overhead. Unlike the traditional thread-per-task executing model, Swift’s approach emphasizes structured concurrency and resource-aware scheduling. A common oversimplification is to say that Swift concurrency uses one thread per core, which aligns with its goal to reduce context switching and maximize CPU utilization. While not strictly false, this view omits important nuances about quality-of-service buckets, task priorities, and Darwin scheduler behavior. Thread Count: Not So Simple On a 16-core Mac, it’s possible to observe up to 64-threads managed by Swift concurrency alone - without GCD involvement. This is because Swift’s cooperative thread pool maps not just per-core, but per core per QoS bucket. Formally: Plain Text Max threads = (CPU cores) × (dedicated quality-of-service buckets) Thus, on a 16-core system: Plain Text 16 cores × 4 QoS buckets = 64 threads Each QoS bucket is essentially a dedicated thread lane for a group of tasks sharing similar execution priority. These are managed internally by Darwin’s thread scheduling mechanism and are not the same as GCD queues. QoS Buckets and Task Priority Although TaskPriority exposes six constants, some of them are aliases: usedInitiated → highutility → lowdefault → already mapped to medium For the kernel’s perspective, this simplifies to 4 core priority levels, each mapped to a QoS bucket, which influences thread allocation in the cooperative thread pool. When Does Overcommit Happen? Under normal load, Swift concurrency respects the cooperative pool limits. However, under contention (e.g., high-priority tasks waiting on low-priority ones), the system may overcommit threads to preserve responsiveness. This dynamic adjustment ensures that time-sensitive tasks aren’t blocked indefinitely behind lower-priority work. This behavior is managed by the Darwin kernel via Mach scheduling policies and high-priority pthreads lanes — not something controlled explicitly by your code. Task Priority Swift provides a priority system for tasks, similar to Grand Central Dispatch (GCD), but more semantically integrated into the structured concurrency model. You can set a task’s priority via the Task initializer: Swift Task(priority: .userInitiated) { await loadUserData() } The available priorities are defined by the TaskPriority enum: Priority Description .high / .userInitiated For tasks initiated by user interaction that require immediate feedback. .medium For tasks that the user is not actively waiting for. .low / .utility For long-running tasks that don’t require immediate results, such as copying files or importing data. .background For background tasks that the user is not directly aware of. Primarily used for work the user cannot see. Creating Tasks With Different Priorities When you create a Task inside another task (default .medium priority), you can explicitly set a different priority for each nested task. Here, one child task is .low, and the other is .high. This demonstrates that priorities can be individually set regardless of the parent. Swift Task { // .medium by default Task(priority: .low) { print("\(1), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } Task(priority: .high) { print("\(2), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } } // 1, thread: <_NSMainThread: 0x6000017040c0>{number = 1, name = main}, priority: TaskPriority.low // 2, thread: <_NSMainThread: 0x6000017040c0>{number = 1, name = main}, priority: TaskPriority.high If you don’t explicitly set a priority for a nested task, it inherits the priority of its immediate parent. In this example, the anonymous tasks inside .high and .low blocks inherit those respective priorities unless overridden. Task Priorities Can Be Inherited Swift Task { Task(priority: .high) { Task { print("\(1), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } } Task(priority: .low) { print("\(2), "thread: \(Thread.current)", priority: \(Task.currentPriority)") Task { print("\(3), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } Task(priority: .medium) { print("\(4), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } } } // 2, thread: <_NSMainThread: 0x600001708040>{number = 1, name = main}, priority: TaskPriority.low // 1, thread: <_NSMainThread: 0x600001708040>{number = 1, name = main}, priority: TaskPriority.high // 3, thread: <_NSMainThread: 0x600001708040>{number = 1, name = main}, priority: TaskPriority.low // 4, thread: <_NSMainThread: 0x600001708040>{number = 1, name = main}, priority: TaskPriority.medium If you don’t explicitly set a priority for a nested task, it inherits the priority of its immediate parent. In this example, the anonymous tasks inside .high and .low blocks inherit those respective priorities unless overridden. Task Priority Escalation Swift Task(priority: .high) { Task { print("\(1), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } await Task(priority: .low) { print("\(2), "thread: \(Thread.current)", priority: \(Task.currentPriority)") await Task { print("\(3), "thread: \(Thread.current)", priority: \(Task.currentPriority)") }.value Task(priority: .medium) { print("\(4), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } }.value } // 1, thread: <_NSMainThread: 0x6000017000c0>{number = 1, name = main}, priority: TaskPriority.high // 2, thread: <_NSMainThread: 0x6000017000c0>{number = 1, name = main}, priority: TaskPriority.high // 3, thread: <_NSMainThread: 0x6000017000c0>{number = 1, name = main}, priority: TaskPriority.high // 4, thread: <_NSMainThread: 0x6000017000c0>{number = 1, name = main}, priority: TaskPriority.medium This mechanism is called priority escalation — when a task is awaited by a higher-priority task, the system may temporarily raise its priority to avoid bottlenecks and ensure responsiveness. As a result: Task 2, which is .low, is escalated to .high while being awaited.Task 3, which doesn’t have an explicit priority, inherits the escalated priority from its parent (Task 2) and is also executed with .highpriority.Task 4 explicitly sets its priority to .medium, so it is not affected by escalation. Task.detached Does Not Inherit Priority Detached tasks (Task.detached) run independently and do not inherit the priority of their parent task. They behave like global tasks with their own scheduling. This is useful for isolating background work, but can also lead to unexpected priority mismatches if not set manually. Swift Task(priority: .high) { Task.detached { print("\(1), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } Task(priority: .low) { print("\(2), "thread: \(Thread.current)", priority: \(Task.currentPriority)") Task.detached { print("\(3), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } Task(priority: .medium) { print("\(4), "thread: \(Thread.current)", priority: \(Task.currentPriority)") } } } // 1, thread: <NSThread: 0x60000174dec0>{number = 4, name = (null)}, priority: TaskPriority.medium // 2, thread: <_NSMainThread: 0x600001708180>{number = 1, name = main}, priority: TaskPriority.low // 3, thread: <NSThread: 0x60000174dec0>{number = 4, name = (null)}, priority: TaskPriority.medium // 4, thread: <_NSMainThread: 0x600001708180>{number = 1, name = main}, priority: TaskPriority.medium Suspension Points and How Swift Manages Async Execution In Swift, any call to an async function using await is a potential suspension point — a place in the function where execution might pause and resume later. It’s a transformation that involves saving the state of the function so it can be resumed later, after the awaited operation completes. Here’s an example: Swift func fetchData() async -> String { let result = await networkClient.load() return result } In this case, await networkClient.load() is a suspension point. When the function reaches this line, it may pause execution, yield control to the system, and later resume once load() finishes. Behind the scenes, the compiler transforms this function into a state machine that tracks its progress and internal variables. Under the Hood: Continuations and State Machines Every async function in Swift is compiled into a state machine. Each awaitmarks a transition point. Before reaching an await, Swift: Saves the current state of the function - including local variables and the current instruction pointer.Suspends execution and schedules a continuation.Once the async operation completes, it resumes the function from where it left off. This is similar to the continuation-passing style (CPS) used in many functional programming systems. In Swift’s concurrency model, this is orchestrated by internal types like ParticialAsyncTask and the concurrency runtime scheduler. Suspension != Blocking When you await something in Swift, the current thread is not blocked, instead: The current task yields control back to the executorOther tasks can run while waitingWhen the awaited operation completes, the suspended task resumes on the appropriate executor. This makes async/await fundamentally more efficient and scalable than thread-based blocking operations like DispatchQueue.sync. Task.yield: Letting Other Tasks Run Task.yield() is a static method provided by Swift’s concurrency system that voluntarily suspends the current task, giving the system opportunity to run other enqueued tasks. It’s especially useful in long-running asynchronous operations or tight loops that don’t naturally contain suspension points. Swift func processLargeBatch() async { for i in 0..<1_000_000 { if i % 10_000 == 0 { await Task.yield() } } } Without await, this loop would monopolize the executor. By inserting await Task.yield() periodically, you’re cooperating with Swift’s concurrency runtime, allowing it to maintain responsiveness and fairness. Under the Hood Calling await Task.yield() suspends the current task and re-enqueues it at the end of the queue for its current executor (e.g., main actor or a global concurrent executor). This allows other ready-to-run tasks to take their turn. It’s part of Swift’s cooperative multitasking model: tasks run to the next suspension point and are expected to yield fairly. Unlike preemptive systems (e.g., threads), Swift tasks don’t get forcibly interrupted — they must voluntarily yield control. Summary Swift 6 marks a significant step forward in how concurrency is handled, offering developers more control, predictability, and safety. While the learning curve may be steep at first, understanding these concepts opens the door to building highly responsive and robust applications. As we continue exploring the deeper aspects of Swift’s new concurrency model, it’s clear that these changes lay the groundwork for the future of safe and scalable app development. More

Trend Report

Database Systems

Every organization is now in the business of data, but they must keep up as database capabilities and the purposes they serve continue to evolve. Systems once defined by rows and tables now span regions and clouds, requiring a balance between transactional speed and analytical depth, as well as integration of relational, document, and vector models into a single, multi-model design. At the same time, AI has become both a consumer and a partner that embeds meaning into queries while optimizing the very systems that execute them. These transformations blur the lines between transactional and analytical, centralized and distributed, human driven and machine assisted. Amidst all this change, databases must still meet what are now considered baseline expectations: scalability, flexibility, security and compliance, observability, and automation. With the stakes higher than ever, it is clear that for organizations to adapt and grow successfully, databases must be hardened for resilience, performance, and intelligence. In the 2025 Database Systems Trend Report, DZone takes a pulse check on database adoption and innovation, ecosystem trends, tool usage, strategies, and more — all with the goal for practitioners and leaders alike to reorient our collective understanding of how old models and new paradigms are converging to define what’s next for data management and storage.

Refcard #388

Threat Modeling Core Practices

By Apostolos Giannakidis

CORE

Refcard #401

Getting Started With Agentic AI

By Lahiru Fernando

2026 Developer Research Report

Hello, our dearest DZone Community! Last year, we asked you for your thoughts on emerging and evolving software development trends, your day-to-day as devs, and workflows that work best — all to shape our 2026 Community Research Report. The goal is simple: to better understand our community and provide the right content and resources developers need to support their career journeys. After crunching some numbers and piecing the puzzle together, alas, it is in (and we have to warn you, it's quite a handful)! This report summarizes the survey responses we collected from December 9, 2025, to January 27 of this year, and includes an overview of the DZone community, the stacks developers are currently using, the rising trend in AI adoption, year-over-year highlights, and so much more. Here are a few takeaways worth mentioning: AI use climbs this year, with 67.3% of readers now adopting it in their workflows.While most use multiple languages in their developer stacks, Python takes the top spot.Readers visit DZone primarily for practical learning and problem-solving. These are just a small glimpse of what's waiting in our report, made possible by you. You can read the rest of it below. 2026 Community Research ReportRead the Free Report We really appreciate you lending your time to help us improve your experience and nourish DZone into a better go-to resource every day. Here's to new learnings and even newer ideas! — Your DZone Content and Community team

By Carisse Dumaua

When Similarity Isn’t Accuracy in GenAI: Vector RAG vs GraphRAG

Retrieval-augmented generation (RAG) based applications are being developed in high numbers with the advent of large language models (LLM) models. We are observing numerous use cases evolving around RAG and similar mechanisms, where we provide the enterprise context to LLMs to answer enterprise-specific questions. Today, most enterprises have developed, or are in the process of developing, a knowledge base based on the plethora of documents and content they have accumulated over the years. Billions of documents are going through parsing, chunking, and tokenization, and finally, vector embeddings are getting generated and stored in vector stores. Enterprises are trying to start with internal chat applications for employees or internal customers, providing natural language chat interfaces to query their enterprise knowledge, with a disclaimer that it’s AI-generated content and may not be 100% accurate. This knowledge base is generated using vector RAG from various technical documents, compliance, and regulatory documents, policies documents. Enterprises are even trying to take it to external customers as well. However, these customer-facing use cases are quality-sensitive; generated responses can’t go wrong, as correctness and completeness are most important. This is where most use cases don’t pass the production-grade eval, and they get stuck in a continuous loop of accuracy improvements through more chunks, better embeddings, reranking, and improved prompt engineering. None of these fully resolves a certain level of incorrect answers. This is the point when you realise that semantic similarity is not always accurate. In this article, I've explained "when similarity-based retrieval breaks," which also leads to the failure of the architectural assumption. This leads to the incorporation of graph-enhanced retrieval. Notes I’ve attached snapshots of my proof-of-concept for this analysis, which include the tools referenced, but this is not intended to promote any specific tool or framework. The sole purpose of this article is to clarify the architectural trade-offs while making your decision for the RAG system. For this analysis, I’ve used a document with entries of 100 students, and their details about where they have graduated from (25 unique universities) and where they work now (25 unique organisations). For example: Plain Text Student1 graduated from University11. Student1 now works at Company19. Student2 graduated from University16. Student2 now works at Company15. . . Student28 graduated from University10. Student28 now works at Company19. Student29 graduated from University9. Student29 now works at Company22. The scope of this analysis includes two variations of RAG: 1st uses standard vector embedding-based RAG.2nd one uses a Hybrid RAG, which combines both vector RAG and GraphRAG. Let’s Understand Vector RAG and Its Limitations Vector RAG works by: Chunking documentsGenerating embeddings and storing those in vector databasesRetrieving the most semantically similar chunks by using nearest neighbours search in the vector storePassing them to an LLM to generate the answer in the context of the semantically similar content The following snapshot is the vector embeddings created for the above data. This is reduced to two dimensions for visualization and used for similarity search using the nearest-neighbour algorithm. Snapshot 1: Vector embeddings reduced to two dimensions This captures semantic similarity but does not model structured logic. As the above snapshot gives us a sense, Nearest-neighbour search retrieves what is close, not what is logically related and required for answering the question accurately. It is quite clear that we assume that semantically similar content means it's factually complete. This assumption fails as soon as a question requires reasoning across multiple entities present in multiple documents, and which are not semantically similar but have an entity relationship. For example, suppose in a simple text document, we have multiple lines spread across multiple sections/pages like the following. Plain Text Alice graduated from MIT. . . . Bob graduated from Stanford. . . . Alice works at OpenAI. . . . Bob works at Google. . . . If I ask the question "Where do both Alice and Bob work?" It may retrieve the following: Plain Text Alice works at OpenAI. Bob works at Google. Or Alice works at OpenAI. Or Bob works at Google. The retrieve has no notion that two entities of type Person are involved, and two entities of type Organisation are also involved, and there is a relationship comparison required to answer the question. Hence, partial context is common, and accuracy becomes probabilistic rather than deterministic. I developed the following FAISS vector-store-based retriever and set the retrieval config to return the two most relevant chunks. Python def get_vector_retriever(): if not os.path.exists(VECTOR_FOLDER): raise ValueError("Vector index not found. Run ingestion first.") embeddings = GoogleGenerativeAIEmbeddings(model="text-embedding-004") vector_store = FAISS.load_local(VECTOR_FOLDER, embeddings, allow_dangerous_deserialization=True) return vector_store.as_retriever(search_kwargs={"k": 2}) Using this retriever, I developed the following vector chain method to answer the question based on the context retrieved. I used a simple prompt to request the LLM to generate the answer. Python def get_vector_chain(): retriever = get_vector_retriever() llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0) template = """Answer the question based only on the following context: {context} Question: {question} """ prompt = ChatPromptTemplate.from_template(template) chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) return chain I exposed this as a simple FastAPI and conducted a couple of evaluations. Python @app.post("/query/vector") async def query_vector(request: QueryRequest): try: chain = get_vector_chain() response = chain.invoke(request.question) return {"answer": response} except ValueError as e: raise HTTPException(status_code=500, detail=str(e)) except Exception as e: raise HTTPException(status_code=500, detail=f"Internal Server Error: {str(e)}") I conducted my first test in which details of two different Students were asked. I wanted to validate that when two entities are being questioned, does my VectorStoreRetriever retrieve both entities or not? You can see the same in the following LangSmith tracing snapshots, where the user's query is "Where do both Student1 and Student35 work?" Observation: You can see the VectorStoreRetriever fetches the content that is nearer to Student35, and it completely misses Student1. Snapshot 2: LangSmith traces showing partial context fetched from VectorStore (Student1 details are missing) Based on this context, Vector Chain invokes the LLM. Observation: You can see the answer generated by LLM is neither complete nor completely accurate. It answers correctly for Student35, but not for Student1. Snapshot 3: LangSmith traces showing a partial answer generated based on a partial context I conducted another test where multiple entities are involved, which need to fetch these entities based on the relationship. In this test, the question is, "Who all Persons have graduated from the same university from which Student1 has graduated?" As you can see, first the university of Student1 has to be fetched, then all Students who have graduated from the same university. This test will not work using semantic similar content alone. However, let’s observe the behaviour. Observation: VectorStoreRetriever has not been able to fetch all the correct associated entities; the same can be seen in the following snapshot. It couldn’t establish the relationship between the entities. Snapshot 4: LangSmith traces showing the wrong context fetched from VectorStore Now, Vector Chain executes LLM with this context, LLM says it does not have information about Student1 university, hence it can’t determine who graduated from the same university. You can see the same in the output section of the following snapshot. Snapshot 5: LangSmith traces showing LLM wrong response saying "there is no information of Student1’s university" This is where GraphRAG (graph-enhanced RAG or knowledge graph RAG) becomes valuable to enhance the accuracy of a knowledge graph built from the same content. This provides relational reasoning and fact consistency, especially for queries that involve relationship or multi-step reasoning. Why Semantic Similarity Is Not Enough Vector embeddings manage the nearness or proximity of meaning. As you can understand from the Snapshot1. This vector space does not manage factual relationships between entities. Nearest-neighbour searches vector embeddings that are nearer to the vector embeddings of the question, but it can not search for the facts that are required to answer the question completely. Later on, it is quite important when a question arises for multi-entities, comparison queries, joins (both, same and different), and enterprise data with structure. What GraphRAG Does Differently Than Vector RAG GraphRAG enhances RAG with a knowledge graph that has entities and relationships. Like in the above example, I’ve modelled the entities and relationship like following: Entities (Person, Organization, University) Relationships (WORKS_AT, GRADUATED_FROM) The following Neo4j snapshot shows these entities and relationships: Snapshot 6: Neo4j entities and relationship schema (for this use case) You can see the following relationships above: (Person)-[:WORKS_AT]->(Organization) (Person)-[:GRADUATED_FROM]->(University) Based on these entities and relationships, the following knowledge graph is created from the same text document. Snapshot 7: Neo4j entities and relationships knowledge graph (data used for this use case) Using these entities and relationships, we can now answer questions with factual proof using entity-grounded queries instead of relying completely on semantically similar context. In order to implement this, I first created a method that produces the correct Cypher query using the provided graph schema. The following snippet shows this method. Python def get_graph_chain_base(): graph = get_neo4j_graph() llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0) schema_text = """ Graph Schema: - (Person)-[:WORKS_AT]->(Organization) - (Person)-[:GRADUATED_FROM]->(University) """ cypher_prompt = PromptTemplate( input_variables=["question"], template=""" You are an expert Neo4j Cypher developer. Translate the question into a Cypher query using the provided graph schema. Rules: - Always return both the person name and the company name. - If multiple people are mentioned, use a WHERE ... IN [...] clause. - Do NOT collapse or aggregate results. - The output must preserve which Organization or University belongs to which person. Graph Schema: - (Person)-[:WORKS_AT]->(Organization) - (Person)-[:GRADUATED_FROM]->(University) Example: Question: Where does Alice work? Cypher: MATCH (p:Person)-[:WORKS_AT]->(c:Organization) WHERE p.name = "Alice" RETURN p.name AS person, c.name AS company Question: {question} Cypher: """ ) # GraphCypherQAChain is usually used for Natural Language -> Cypher -> Result -> Answer # We can use it as a component. chain = GraphCypherQAChain.from_llm( llm=llm, graph=graph, cypher_prompt=cypher_prompt, verbose=True, allow_dangerous_requests=True ) return chain As you can see, in this method, I’ve provided a prompt specific to Neo4j Cypher query generation with the graph schema created above, along with an example. There is a rule in the prompt to generate the Cypher query, which is specific to the current use case. Finally, I created a HybridRAG using this Graph chain and the earlier Vector Retriever. Python def get_hybrid_chain(): # Hybrid Strategy: # 1. Retrieve docs from Vector Store # 2. Retrieve info from Graph (GraphCypherQAChain) # 3. Combine contexts # 4. Apply guarded final answer prompt vector_retriever = get_vector_retriever() graph_chain = get_graph_chain_base() llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0) def retrieve_hybrid(question): # 1 Vector Retrieval vector_docs = vector_retriever.invoke(question) vector_context = "\n".join( d.page_content for d in vector_docs ) if vector_docs else "No relevant information from vector database." # 2 Graph Retrieval try: graph_response = graph_chain.invoke({"query": question}) graph_context = graph_response.get("result", "") if not graph_context: graph_context = "No relevant information from knowledge graph." except Exception as e: print(f"Graph Chain Error: {e}") graph_context = "No relevant information from knowledge graph." return { "vector_context": vector_context, "graph_context": graph_context, "question": question } # Guarded Final Answer Prompt final_prompt = ChatPromptTemplate.from_template( """ You are answering a factual question using the provided context. Use ONLY the information explicitly present in the context. Guardrails: - If the question involves multiple entities, verify that information for ALL entities is present. - If any entity is missing required information, explicitly state what is missing. - Do NOT guess, infer, or assume facts not stated. - If the answer cannot be determined, say so clearly. Vector Database Context: {vector_context} Knowledge Graph Context: {graph_context} Question: {question} Answer: """ ) chain = ( retrieve_hybrid | final_prompt | llm | StrOutputParser() ) return chain In this Hybrid chain method, I have a Hybrid Retriever, which retrieves docs from Vector Retriever and entities and relationships using Graph Chain. Once the combined context is generated, it creates the final prompt with this combined context and the user’s question. I’ve added the following guardrails: To validate if all entities are present in the context. To state what is missing, if there is any entity missing. To not guess, infer, or assume facts if they are not stated. To ensure that if the answer can’t be determined from the context, it can say so. I wrapped around his HybridRAG chain in another simple FastAPI as shown in the following snippet. Python @app.post("/query/hybrid") async def query_hybrid(request: QueryRequest): try: chain = get_hybrid_chain() response = chain.invoke(request.question) return {"answer": response} except Exception as e: raise HTTPException(status_code=500, detail=f"Internal Server Error: {str(e)}") Using this API, I conducted the same set of tests. First, with the user's query "Where do both Student1 and Student35 work?" I performed a manual query in Neo4J, including both Student entities. The following snapshot shows the filtered entities and their relationship from the knowledge graph. Snapshot 8: Neo4j entities and relationships for Student1 and Student35 Now, my expectation is that GraphRAG understands both Student entities and pulls the required entities with relationships. My assumption is validated with the trace in the following snapshots. This snapshot shows traces of how this user query has been used to form the Cypher query with all entities involved. Snapshot 9: LangSmith snapshot showing the correct Cypher query generated for the user question We can see in the output section of the above trace that it creates an IN clause with both entities, Student1 and Student35. As part of the HybridRAG chain, LLM gets invoked using the final prompt, and the answer generated is complete and accurate, which is "Student35 works at Company8, and Student1 works at Company19." You can see the same in the following trace snapshot in the output section. Snapshot 10: LangSmith snapshot showing traces of HybridRAG with partial Vector context and complete Knowledge Graph context, and correct output Let’s review how the HybridRAG performed with the second test. The user's query is "Who all Persons have graduated from the same university from where Student1 has?" Now, in order to correctly answer this, the Cypher query generated by GraphChain should find the university of Student1 using the "GRADUTED_FROM" relationship and use this university entity to find all Student entities who have the same relationship. This is possible using a join query on this Graph Schema. Observation: The output section of the following LangSmith trace snapshot shows the generation of join Cypher query. Snapshot 11: LangSmith snapshot showing traces of join Cypher query generation with correct entities and relationships GraphCypherQAChain executes this query on the graph database and gets the result set. It further invokes the LLM to generate the answer using this result. The generated answer is: "Student51, Student79, and Student23 have graduated from the same university as Student1." The same can be seen in the following trace snapshot. Snapshot 12: LangSmith snapshot showing traces of correct output based on the correct Graph Context fetched from the above Cypher query Observation: We can see that now the question becomes an entity-grounded query. It identifies all entities, traverses the relationships, and returns results when all entities and their relationships are resolved. The retrieval step enforces completeness before the next generation step is invoked in the chain. Guardrail restricts LLM from guessing; instead, it explains the complete context. This shift is not focused on better answer generation by LLM; instead, it's about changing the retrieval to guarantee the complete and accurate context. Challenges With the GraphRAG A HybridRAG, which combines vector RAG and GraphRAG, does not automatically improve accuracy. The LLM models can still hallucinate and produce incomplete answers. This happens when the Graph context is also not entity-bounded. In this case, LLM still has partial information, and it's forced to guess. This means, GraphRAG can improve accuracy only and only when entity bindings are preserved end-to-end. The real fix is to do Entity Grounding and application of the Guardrails. To make GraphRAG reliable, the following two things are required. Preserve entity bindings in graph queries, and don't collapse results; always return the explicit mappings.Add reasoning guardrails at answer generation, which must explicitly enforce completeness of all entities in questions (as shown in Snapshot 10). This will make the system a "trustworthy reasoning system," which is better than a "best guess generator." However, GraphRAG is not a universal fix. There are many tradeoffs that must be evaluated before applying this to your solution. Graph construction cost: It adds an additional ingestion activity, along with building graphs and maintaining them, which adds extra expense. Cypher query generation is an LLM call, which adds extra token consumption and cost to the overall system. Latency: You must evaluate the latency caused by extra steps in Cypher query generation and execution of relationship queries. This can add measurable latency, which can cause experience issues for chat or voice-based interactive systems.Entity extraction from a large set of unstructured content is a big challenge. This complexity becomes bigger in order to establish the right relationships between all entities. This becomes more challenging when a single relationship involves more than two entities, e.g., if a Person worked for Company1 from 2001 to 2005, and then for another company from 2005 to 2020, and so on. On top of this, the schema for classic domains is stable, but evolving domains need a change in schema quite often, which is another level of difficulty and maintenance overhead. This poses another big challenge for provenance, as the meaning of the context keeps evolving over time.Operational overhead: Graph database management, monitoring, quality, and observability are additional operational overheads. These trade-offs must be weighed while making architectural decisions for your RAG system. A Simple Capability Analysis The following table shows the capability comparison of vector RAG and GraphRAG. We can refer to this while making the decision for a RAG system. Capability vector rag graph rag Single-fact retrieval Yes Yes Multi-entity recall No Yes Relationship reasoning No Yes joins No Yes Explainability Low High Hallucination resistance Low High Conclusion Based on my POC, I conclude that we use vector RAG when use cases include "unstructured text," "semantic search," "fuzzy recall." This approach is effective for "synonyms and paraphrases" and "variations in natural language." It is resilient to "misspellings or noisy text" and "exploratory questions" in unstructured documents like PDFs, emails, or notes. However, the shift towards GraphRAG or HybridRAG (combining vector RAG and GraphRAG) is quite important for trust, completeness, and accuracy. When we implement HybridRAG, with GraphRAG serving as the source of truth, system performance improves.

By Birendra Kumar

Observability in AI Pipelines: Why “The System Is Up” Means Nothing

Monitoring vs Observability Observability is a term used widely in current systems, but it is often confused with monitoring. Monitoring tells developers whether something is not working or a flow is broken, whereas observability explains why a particular component within the pipeline is failing or malfunctioning. In most traditional applications, developers often monitor & track metrics around uptime, latency, error rates, CPU Usage, and memory. If the application API responds within the expected time and error rates stay within the limits, the application or system is considered healthy. If there is any deviation from the acceptable limits for any of these metrics, an email is triggered to the concerned team. Such a setup works for most of the systems. Observability is slightly deeper than these monitoring metrics. In observability, when something unusual happens, developers can examine system data to understand the cause of the odd behavior. They can trace a request, see where it slowed or failed, and reconstruct the sequence of events. In simple terms, observability answers this question: “When something feels off, can the team explain what happened?” Why AI Systems Need Job-Level Observability In traditional systems, these are easily traceable: for example, if the database is slow, latency goes up, or if a service is down, requests fail, etc. But in AI systems, things are different. AI systems can be showing fully up and still behave in ways that are expensive, inconsistent, or incorrect. For example, in a scenario where the server is responding to requests, the API is returning 200 code, dashboard health shows green status, but underneath, retries may be multiplying token usage and cost, Embeddings may be regenerating unnecessarily, and logical jobs may be running twice, etc. Traditional observability checks whether the application infrastructure is running. AI observability must tell whether the logical work is behaving correctly. Consider a typical AI enrichment pipeline inside a SaaS platform. A job enters the system. The system invokes an LLM. The result is written to a database. An event is emitted downstream. From an infrastructure perspective, everything might be fine. Requests are being served. CPU is stable. No crashes/errors are reported. But what if that single logical job triggered three retries? What if the LLM was called twice due to a timeout? What if embeddings were regenerated unnecessarily? What if the cost per job doubled during peak hours? None of these appear in the basic uptime dashboard. That’s why AI observability must begin at the job level. Instead of observing “requests,” observe logical jobs. A job in a production AI system should leave behind a structured, readable trail. Let’s understand this with the help of the code. Python from dataclasses import dataclass from typing import Optional import time @dataclass class JobMetrics: job_id: str tenant_id: str stage: str attempts: int input_tokens: int output_tokens: int model_cost: float latency_ms: int status: str timestamp: float def log_job_metrics(metrics: JobMetrics): print({ "job_id": metrics.job_id, "tenant_id": metrics.tenant_id, "stage": metrics.stage, "attempts": metrics.attempts, "input_tokens": metrics.input_tokens, "output_tokens": metrics.output_tokens, "model_cost": metrics.model_cost, "latency_ms": metrics.latency_ms, "status": metrics.status, "timestamp": metrics.timestamp }) In this code, logging is not for debugging; it is for operational clarity. Now, imagine every LLM call wraps interference with structured measurement, as shown in this code: Python import time def call_model_with_metrics(job_id, tenant_id, payload, pricing): start = time.time() response = call_model(payload) end = time.time() input_tokens = response["usage"]["input_tokens"] output_tokens = response["usage"]["output_tokens"] cost = ( (input_tokens / 1000) * pricing["input_per_1k"] + (output_tokens / 1000) * pricing["output_per_1k"] ) metrics = JobMetrics( job_id=job_id, tenant_id=tenant_id, stage="LLM_CALL", attempts=1, input_tokens=input_tokens, output_tokens=output_tokens, model_cost=cost, latency_ms=int((end - start) * 1000), status="COMPLETED", timestamp=time.time() ) log_job_metrics(metrics) return response Tracking Costs, Retries, and Performance in AI Pipelines Using this code, the team not only knows if a request has succeeded, but it also knows about: How many tokens were usedHow much did the single job costHow much time does a job takeWhat’s the trigger source of the JobHow many attempts were required to achieve the results It helps the team understand why the cost per job increased by 18% this week or why retry attempts are higher for a particular source trigger, etc. In AI systems, retries are major cost drivers. The team should be able to answer questions such as: What is the average number of attempts per job? Which stage has the highest retry rate? How many logical jobs resulted in more than one inference call? A simple modification to the code makes this visible: Python def process_job_with_retry(job_id, tenant_id, payload, policy): attempts = 0 for attempt in range(1, policy.max_attempts + 1): attempts += 1 try: return call_model_with_metrics(job_id, tenant_id, payload, policy.pricing) except RetryableError: if attempt == policy.max_attempts: raise continue This modification makes attempts observable. In a scenario where the average attempts per job increase from 1.1 to 1.8 during a traffic spike, the team can investigate and identify the root cause. The root cause can range from upstream throttling to network instability. The main point is that the system now exposes the behavior rather than hiding it behind a simple success or failure status. Observability in the AI systems is about understanding the patterns over time rather than logging more data. In multi-tenant AI systems, this visibility becomes extremely critical. A single tenant's misconfigured settings can increase token usage or trigger multiple retries. Without tenant-level metrics, dashboards might show a system-wide cost increase, whereas the actual root cause might be isolated to one source. Structured job metrics allow teams to isolate the behavior precisely and respond effectively. Another common issue in AI pipelines is silent degradation. The system does not crash. APIs do not fail. But output quality shifts, cost drifts, or retries increase gradually. These changes are slow and difficult to notice without proper visibility. By the time the team investigates, the financial impact has already accumulated. This is why observability in AI pipelines must go beyond infrastructure health. In AI systems, predictability protects reliability, cost efficiency, and user trust. If observability focuses only on uptime and error rates, teams rely solely on basic details that aren't enough to identify the root cause. True AI observability begins when the team can explain how each logical job behaved, how much it cost, and whether that behavior is changing over time.

By Aditya Gupta

Stranger Things in Java: Enum Types

This article is part of the series “Stranger things in Java,” dedicated to language deep dives that will help us master even the strangest scenarios that can arise when we program. All articles are inspired by content from the book “Java for Aliens” (in English), the book “Il nuovo Java”, and the book “Programmazione Java.” This article is a short tutorial on enumeration types, also called enumerations or enums. They are one of the fundamental constructs of the Java language, alongside classes, interfaces, annotations, and records. They are particularly useful to represent sets of known and unchangeable values, such as the days of the week or the cardinal directions. What Is an Enum? An enum is declared with the enum keyword and typically contains a list of values, called the elements (or values, or also constants) of the enumeration. Let’s consider, for example: Java public enum CardinalDirection { NORTH, SOUTH, WEST, EAST; } Here, we defined an enum named CardinalDirection, with four elements: NORTH, SOUTH, WEST and EAST. The elements defined in the enumeration are the only possible instances of type CardinalDirection, and it is not possible to instantiate other objects of the same type. Therefore, if we tried to instantiate an object from the CardinalDirection enumeration, we would get a compilation error: Java var d = new CardinalDirection(); // ERROR: you cannot create new instances Elements of an Enumeration Using an enumeration, therefore, mainly means using its elements. For example, the following method returns true if the direction parameter matches the NORTH element of CardinalDirection: Java static boolean isNorth(CardinalDirection direction) { return direction == CardinalDirection.NORTH; } In the following example, instead, we assign references to the elements of CardinalDirection: Java CardinalDirection d1 = CardinalDirection.SOUTH; System.out.println(d1 == CardinalDirection.SOUTH); // true var d2 = CardinalDirection.EAST; System.out.println(d2 == CardinalDirection.WEST); // false Each element is implicitly declared public, static and final. In fact, from these examples we can observe that: To use the elements of an enumeration, you must always refer to them via the name of the enumeration (for example CardinalDirection.SOUTH).We can compare elements directly with the == operator because they are implicitly final and unique.The names of enumeration elements follow the naming conventions for constants. During compilation, the CardinalDirection enumeration is transformed into a class similar to the following: Java public class CardinalDirection { public static final CardinalDirection NORTH = new CardinalDirection(); public static final CardinalDirection SOUTH = new CardinalDirection(); public static final CardinalDirection WEST = new CardinalDirection(); public static final CardinalDirection EAST = new CardinalDirection(); } While the compiler ensures that no other elements can be instantiated besides those declared in the enumeration. * Backward compatibility is a fundamental feature of Java that ensures code written for earlier versions of the platform continues to work on more recent versions of the JVM, without requiring changes. Backward compatibility is one of the main reasons why Java is widely used in enterprise environments and long-lived systems. Why Use Enumerations? One of the main advantages of enumerations is the ability to represent a limited set of values in a safe way. Without an enum, there is a risk of using strings or “magic” numbers, which can introduce errors that are difficult to detect. Let us consider the following example: Java public class Compass { public void move(String direction) { String message; if (direction.equals("NORTH")) { message = "You move north"; } else if (direction.equals("SOUTH")) { message = "You move south"; } else if (direction.equals("WEST")) { message = "You move west"; } else if (direction.equals("EAST")) { message = "You move east"; } else { message = "Invalid direction: " + direction; } System.out.println(message); } } With this approach, it is possible to pass any string, even an invalid one. For example, the value of the direction parameter could be "north" or "North", but it should be "NORTH" in order for the method to work correctly. The compiler cannot help us prevent such errors. In the following code, we use the CardinalDirection enumeration to completely eliminate arbitrary values and delegate to the compiler the validation of the allowed values: Java public class Compass { public void move(CardinalDirection direction) { String message = switch (direction) { case NORTH -> "You move north"; case SOUTH -> "You move south"; case WEST -> "You move west"; case EAST -> "You move east"; }; System.out.println(message); } } In this way: The direction parameter can only take the values defined in the enumeration.It is not possible to specify an invalid value: such an error would be detected at compile time.The switch expression must be exhaustive**, therefore the compiler requires that all alternatives are handled in order to compile without errors. Enumerations make code safer and more readable, because they avoid the use of “magic” values or arbitrary strings to represent concepts that have a limited number of alternatives. ** To learn about the concept of exhaustiveness related to switch expressions, introduced in Java 14, you can read the article entitled “The new switch.” Enumerations and Inheritance We have seen that the compiler transforms the CardinalDirection enum into a class whose elements are implicitly declared public, static, and final. However, we have not yet said that such a class: Is itself declared final. This implies that enumerations cannot be extended.Extends the generic class java.lang.Enum. Consequently, it cannot extend other classes (but it can still implement interfaces). In practice, the declaration of the CardinalDirection class will be similar to the following: Java public final class CardinalDirection extends Enum<CardinalDirection> { // rest of the code omitted } Therefore, we cannot create hierarchies of enumerations in the same way we do with classes. Moreover, all enumerations inherit: The methods declared in the Enum class.The methods and properties of the Serializable and Comparable interfaces, which are implemented by Enum.The methods from the Object class. However, in the last paragraph of this tutorial, we will see how it is possible, in some sense, to extend an enumeration. Methods Inherited From the Enum Class By extending Enum, enumerations inherit several methods: name: returns the name of the element as a string (it cannot be overridden because it is declared final).toString: returns the same value as name, but it can be overridden.ordinal: returns the position of the element in the enumeration starting from index 0 (it is declared final).valueOf: a static method that takes a String as input and returns the enumeration element corresponding to the name.values: a static method not actually present in java.lang.Enum, but generated by the compiler for each enumeration. It returns an array containing all enumeration elements in the order in which they are declared. For example, the name method is defined to return the element name, so: Java System.out.println(CardinalDirection.SOUTH.name()); // prints SOUTH will print the string "SOUTH". The equivalent toString method also returns the enum name, so the instruction: Java System.out.println(CardinalDirection.SOUTH); // prints SOUTH produces exactly the same result (since println calls toString on the input object). The difference is that the name method cannot be overridden because it is declared final, while the equivalent toString method can always be overridden. Enum also declares a method complementary to toString: the static method valueOf. It takes a String as input and returns the corresponding enumeration value. For example: Java CardinalDirection direction = CardinalDirection.valueOf("NORTH"); System.out.println(direction == CardinalDirection.NORTH); // prints true The special static values method returns an array containing all enumeration elements in the order in which they were declared. You can use this method to iterate over the values of an enumeration you do not know. For example, using an enhanced for loop, we can print the contents of the CardinalDirection enumeration, also introducing the ordinal method: Java for (CardinalDirection cd : CardinalDirection.values()) { System.out.println(cd + "\t is at position " + cd.ordinal()); } Note that the ordinal method (also declared final) returns the position of an element within the array returned by values. The output of the previous example is therefore: Java NORTH is at position 0 SOUTH is at position 1 WEST is at position 2 EAST is at position 3 For completeness, the Enum class actually declares two other, less interesting methods: getDeclaringClass: returns the Class object associated with the enum type to which the value on which the method is invoked belongs.describeConstable: a method introduced in Java 12 to support advanced constant descriptions for low-level APIs. This is a specialized API that is not used in traditional application development. Methods and Properties Inherited From the Serializable and Comparable Interfaces The Enum class implements the Serializable and Comparable interfaces, and consequently, enumeration objects have the properties of being serializable and comparable. While the marker interface Serializable does not contain methods, the functional interface Comparable makes the natural ordering of the elements of an enumeration coincide with the order in which the elements are defined within the enumeration itself. This means that the only abstract method compareTo of the Comparable interface determines the ordering of two enumeration objects based on the position of the objects within the enumeration. Note that the compareTo method is declared final in the Enum class, and therefore it cannot be overridden in our enumerations. Methods Inherited From the Object Class Enumerations inherit all 11 methods of the Object class. In particular, as already mentioned, we can override the toString method. The other methods we usually override, such as equals, hashCode, and clone, are instead declared final and therefore cannot be overridden. In fact, to compare two enumeration instances, it is sufficient to use the == operator, since enumeration values are constants, and therefore, there is no need to redefine the equals and hashCode methods. Moreover, enumerations cannot be cloned, since their elements must be the only possible instances. In this case as well, the java.lang.Enum class declares the clone method inherited from Object as final. Being also declared protected, it is not even visible outside the java.lang package of Enum. Customizing an Enumeration Since it is transformed into a class, an enumeration can declare everything that can be declared in a class (methods, variables, nested types, initializers, and so on), with some constraints on constructors. In fact, constructors are implicitly considered private and can only be used by the enumeration elements through a special syntax. Compilation will instead fail if we try to create new instances using the new operator. For example, the following code redefines the CardinalDirection enumeration: Java public enum CardinalDirection { NORTH("north"), // invokes constructor 2 SOUTH("south"), // invokes constructor 2 WEST("west"), // invokes constructor 2 EAST; // invokes constructor 1 // equivalent to EAST() // instance variable private String description; // constructor number 1 private CardinalDirection() { this("east"); // calls constructor 2 } // constructor number 2 (implicitly private) CardinalDirection(String direction) { setDescription("direction " + direction); } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } @Override public String toString() { return "We are pointing " + description; } } We can observe that: We declared two constructors in the enumeration: one explicitly private and the other implicitly private. It is not possible to declare constructors with public or protected visibility. Apart from this, the same rules that apply to class constructors also apply here. As with classes, if we do not declare any constructors, the compiler will add a no-argument constructor for us (the default constructor). Also, as with classes, the default constructor will not be generated if we explicitly declare at least one constructor, as in the previous example.The enumeration elements invoke the declared constructors using a special syntax. In the declaration of the NORTH, SOUTH, and WEST elements, we added a pair of parentheses and passed a string parameter. This ensures that constructor number 2, which takes a String parameter, is invoked when these instances are created. The EAST element, instead, does not use parentheses and therefore invokes the no-argument constructor. Note that we could also have added an empty pair of parentheses to obtain the same result.The declaration of the enumeration elements must always precede all other declarations. If we placed any declaration before the element list, compilation would fail. Note that the semicolon after the element list is optional if no other members are declared. Static Nested Enumerations and Static Imports It is not uncommon to create nested enumerations, which, unlike nested classes, are always static. For example, suppose we want to create an enumeration that defines the possible account types (for example "standard" and "premium") for customers of an online shop. Since the account type is strictly related to the concept of an account, it makes sense to declare the Type enumeration nested within the Account class: Java package com.online.shop.data; public class Account { public enum Type {STANDARD, PREMIUM} // static nested enum // other code omitted... public static void main(String[] args) { System.out.println(Type.PREMIUM); // access to the static enumeration } } If instead we want to print an enumeration element from outside the Account class, we can use the following syntax: Java System.out.println(Account.Type.PREMIUM); Of course, we can also use static import when appropriate, for example: Java import static com.online.shop.data.Account.Type; // ... System.out.println(Type.PREMIUM); Or even: Java import static com.online.shop.data.Account.Type.PREMIUM; // ... System.out.println(PREMIUM); Enumerations and static import were both introduced in Java 5. static import, in fact, allows us to reduce verbosity when using enumerations. Extending an Enumeration We know that we cannot extend an enumeration; however, it is possible to use the anonymous class syntax for each element, redefining the methods declared in the enumeration. We can define methods in the enumeration and override them in its elements. Let us rewrite the CardinalDirection enumeration once again: Java public enum CardinalDirection { NORTH { @Override public void test() { System.out.println("method of NORTH"); } }, SOUTH, WEST, EAST; public void test() { System.out.println("method of the enum"); } } Here, we defined a method called test that prints the string "method of the enum". The NORTH element, however, using a syntax similar to that of anonymous classes, also declares the same method, overriding it. In fact, the compiler will turn the NORTH element into an instance of an anonymous class that extends CardinalDirection. Therefore, the statement: Java CardinalDirection.NORTH.test(); Will print: Java method of NORTH While the statement: Java CardinalDirection.SOUTH.test(); Will print: Java method of the enum Because SOUTH does not override test. The same output will be produced when invoking the test method on the EAST and WEST elements as well. Enumerations and Polymorphism After examining the relationship between enumerations and inheritance, we can now use enumerations by exploiting polymorphism in a more advanced way. For example, let us consider the following Operation interface: Java public interface Operation { boolean execute(int a, int b); } We can implement this interface within an enumeration Comparison, customizing the implementation of the execute method for each element: Java public enum Comparison implements Operation { GREATER { public boolean execute(int a, int b) { return a > b; } }, LESS { public boolean execute(int a, int b) { return a < b; } }, EQUAL { public boolean execute(int a, int b) { return a == b; } }; } With this structure, we can write code such as the following: Java boolean result = Comparison.GREATER.execute(10, 5); System.out.println("10 greater than 5 = " + result); result = Comparison.LESS.execute(10, 5); System.out.println("10 less than 5 = " + result); result = Comparison.EQUAL.execute(10, 5); System.out.println("10 equal to 5 = " + result); Which will produce the following output: Java 10 greater than 5 = true 10 less than 5 = false 10 equal to 5 = false Implementing an interface in an enumeration allows you to associate a behavior with each enum value and exploit polymorphism, making the code more extensible, readable, and robust. Conclusion Enumerations are particularly useful when: The domain of values is closed and known in advance, such as cardinal directions, object states, days of the week, priority levels, and so on.You want to make code safer by eliminating arbitrary strings or “magic” numbers.Each element must be able to have specific properties or methods, while maintaining clarity and readability. They are less suitable when: The elements can vary dynamically over time, for example, if they come from a database or external configurations.You want to model an extensible hierarchy of types, for which classes and interfaces remain more flexible solutions. In this article, we have seen that enumerations are not simply lists of constants, but real classes with predefined instances, methods inherited from Enum, the ability to implement interfaces, and even the possibility to redefine behavior for individual values through anonymous classes. These aspects make enum a surprisingly powerful and, in some cases, unexpected tool: a perfect example of stranger things in Java. Author’s Note This article is based on some paragraphs from chapters 4 and 7 of my book “Programmazione Java” and from my English book “Java for Aliens.”

By Claudio De Sio Cesari

8 Core LLM Development Skills Every Enterprise AI Team Must Master

When organizations talk about adopting large language models, the conversation usually starts with model choice. GPT versus Claude. Open source versus proprietary. Bigger versus cheaper. In real enterprise systems, that focus is misplaced. Production success with LLMs depends far more on architecture discipline than on the model itself. What separates a fragile demo from a resilient, governable system is mastery of a small set of core engineering skills. These skills shape how models are instructed, grounded, deployed, observed, and evolved over time. In this article, I am going to discuss eight such skills from the perspective of building real systems, not experimenting in notebooks. Each section explains why the skill matters, when it should be applied, and how it fits into a clean enterprise architecture. Prompt Engineering Prompt engineering is the foundation layer of any LLM system. It translates human intent into precise, structured instructions that a model can execute reliably. In production environments, prompts are not handwritten strings. They are assembled programmatically using templates, roles, constraints, examples, and safety rules. Strong prompt engineering reduces hallucinations, improves consistency, and often delays the need for more complex approaches such as fine-tuning or agents. Poor prompts, on the other hand, amplify variability and force teams to compensate with brittle downstream logic. In mature systems, prompts are versioned, tested, and reviewed just like application code. This discipline is what allows teams to change models without rewriting business logic. Context Engineering Context engineering determines what information the model sees at inference time. Instead of overloading a single prompt with everything, systems dynamically assemble relevant context from memory stores, structured databases, documents, and APIs. This is where enterprise reliability truly begins. Context engineering is deterministic and auditable. You can explain why a model responded the way it did because you know exactly what data it was given. Teams that skip this step often rely on the model to infer missing information. That approach may work in demos but fails under regulatory scrutiny or operational scale. Context engineering turns LLMs from probabilistic guessers into controlled reasoning components. Fine-Tuning Fine-tuning modifies the model itself so that the desired behavior is internalized rather than instructed repeatedly. This approach is most effective when the same task repeats at scale, such as classification, extraction, or domain-specific reasoning. The tradeoff is flexibility. Fine-tuned models are harder to change and require disciplined data governance. Training data must be curated, versioned, and reviewed for bias and drift. In enterprise settings, fine-tuning should be a deliberate optimization step, not the default starting point. Many teams fine-tune prematurely when prompt and context engineering would have been sufficient. Retrieval-Augmented Generation Retrieval-augmented generation, or RAG, grounds model outputs in external knowledge. Instead of trusting what the model remembers, the system retrieves relevant information at runtime and injects it into the prompt. This pattern dominates enterprise adoption because it balances accuracy, freshness, and explainability. Knowledge can be updated without retraining models, and responses can be traced back to source documents. Well-designed RAG systems treat retrieval as a first-class concern. Chunking strategy, embedding choice, ranking logic, and context all materially affect outcome quality. Agents Agents introduce autonomy. An agent does not simply respond to input. It reasons, plans, calls tools, evaluates results, and iterates until a goal is achieved. This capability is powerful and dangerous if misapplied. Agents are best suited for workflows such as multi-step analysis, orchestration, and decision support. They are poorly suited for factual retrieval or compliance-sensitive outputs. Common failure modes include infinite loops, tool hallucination, runaway cost, and unpredictable behavior. In enterprise systems, agents must be constrained with explicit goals, step limits, tool allow lists, and strong observability. Autonomy without guardrails is not intelligence. It is a risk. LLM Deployment Deployment turns models into dependable services. This layer handles routing, scalability, authentication, authorization, and versioning. A clean deployment architecture allows teams to swap models without forcing application changes. In enterprise environments, deployment also defines security boundaries. It determines where data flows, how requests are logged, and how failures are isolated. Treating LLMs as just another API dependency is a mistake. They are probabilistic systems that require careful exposure and lifecycle management. LLM Optimization Optimization ensures performance and cost efficiency at scale. This includes caching frequent responses, compressing context, routing requests to different models, and applying techniques such as quantization. Optimization is often invisible to end users but critical to sustainability. Without it, even well-designed systems become prohibitively expensive as usage grows. Teams should treat optimization as an ongoing discipline rather than a one-time exercise. Usage patterns evolve, and so should optimization strategies. LLM Observability Observability provides visibility into prompts, responses, latency, cost, and failure modes. Without it, LLM systems are effectively ungovernable. In regulated industries, observability is not optional. Teams must be able to trace outputs, audit decisions, and detect drift or misuse. Effective observability combines tracing, metrics, and structured logging. It allows teams to debug behavior, enforce policy, and continuously improve system quality. This is end to end reference architecture diagram: Reference Architecture Explanation 1. Prompt Engineering The foundation of the system is prompt engineering, where user intent is transformed into structured instructions. In production, prompts are assembled programmatically using templates, system roles, constraints, and examples. Tools like LangChain and LlamaIndex allow for modular, reusable prompt templates, improving determinism and reducing hallucinations. 2. Context Engineering Context engineering ensures the model sees the right information at inference time. Instead of embedding all data in a single prompt, the system dynamically assembles relevant context from multiple sources. This includes memory databases such as Redis or DynamoDB, document stores such as S3 or Blob Storage, structured enterprise data such as Postgres or Snowflake, and vector stores such as Pinecone, Weaviate, or FAISS. The context builder ranks and filters data to provide deterministic, auditable input to the model. 3. Fine-Tuning Fine-tuning customizes models to internalize behaviors for repeated tasks. This is essential for domain-specific tasks such as classification, extraction, or reasoning at scale. Fine-tuned models are implemented using platforms like HuggingFace or SageMaker and provide consistency at the cost of flexibility. 4. Retrieval Augmented Generation RAG ensures outputs are grounded in external knowledge rather than relying solely on what the model remembers. The system retrieves and embeds relevant information from vector stores, document repositories, enterprise databases, and memory layers into prompts. This balances accuracy, freshness, and explainability, forming a critical part of enterprise reliability. 5. Agents and Tooling Agents orchestrate autonomous reasoning and task execution. They decide, iterate, and call external tools to achieve a goal. Enterprise tools include search, SQL queries, Python scripts, or APIs. Agents provide a structured workflow layer above raw inference, enabling complex multi-step operations while keeping control and auditability intact. 6. Model and Inference This layer manages the execution of foundation and fine-tuned models. A model router selects the appropriate model based on cost, latency, or other criteria. Foundation models such as GPT 4.x, Claude, or Gemini handle general tasks, while fine-tuned models execute domain-specific operations. This layer turns the model into a dependable service that can scale and evolve without changing application logic. 7. Optimization Optimization ensures performance and cost efficiency. Techniques include response caching with Redis, context compression using summarization and chunking, and model quantization with INT8 or INT4 representations. These optimizations are invisible to users but crucial for sustainability at scale. 8. Observability and Governance The final layer provides visibility, traceability, and monitoring. Tools like OpenTelemetry and LangSmith trace prompt and model activity. Metrics and cost tracking are handled via Prometheus or Datadog. Logs from prompts and model outputs are collected in ELK or CloudWatch, and dashboards such as Grafana provide a comprehensive view for engineers and decision makers. Observability enables governance, auditing, and operational reliability. Conclusion When you understand these eight skills and how they compose, you stop thinking in terms of models and start thinking in systems. That shift is what turns LLM adoption from experimentation into real engineering leadership.

By Ram Ghadiyaram

CORE

Google Cloud AI Agents With Gemini 3: Building Multi-Agent Systems That Actually Work

The transition from large language models (LLMs) as simple chat interfaces to autonomous AI agents represents the most significant shift in enterprise software since the move to microservices. With the release of Gemini 3, Google Cloud has provided the foundational model capable of long-context reasoning and low-latency decision-making required for sophisticated multi-agent systems (MAS). However, building an agent that "actually works" — one that is reliable, observable, and capable of handling edge cases — requires more than a prompt and an API key. It requires a robust architectural framework, a deep understanding of tool use, and a structured approach to agent orchestration. The Architecture of a Modern AI Agent At its core, an AI agent is a loop. Unlike a standard LLM call, which is a single input-output transaction, an agent uses the model's reasoning capabilities to interact with its environment. In the context of Gemini 3 on Google Cloud, this environment is managed through Vertex AI Agent Builder. The Agentic Loop: Perception, Reasoning, and Action Perception: The agent receives a goal from the user and context from its internal memory or external data sources.Reasoning: Using Gemini 3's advanced reasoning capabilities (such as Chain of Thought or ReAct), the agent breaks the goal into sub-tasks.Action: The agent selects a tool (a function call, an API, or a search) to execute a sub-task.Observation: The agent evaluates the output of the action and decides whether to continue or finish. System Architecture To build a multi-agent system, we must move away from a monolithic agent. Instead, we use a modular approach where a "Manager" or "Orchestrator" agent delegates tasks to specialized "Worker" agents. In this architecture, the Manager Orchestrator serves as the brain. It uses Gemini 3's high-reasoning threshold to determine which worker agent is best suited for the current task. This prevents "token bloat" in worker agents, as they only receive the context necessary for their specific domain. Why Gemini 3 for Multi-Agent Systems? Gemini 3 introduces several key advantages for agentic workflows that weren't present in previous iterations: Native function calling: Gemini 3 is fine-tuned to generate structured JSON tool calls with higher accuracy, reducing the "hallucination" rate during API interactions.Expanded context window: With a massive context window, Gemini 3 can retain the entire history of a multi-turn, multi-agent conversation without needing complex vector database retrieval for every step.Multimodal reasoning: Agents can now "see" and "hear," allowing them to process UI screenshots or audio logs as part of their reasoning loop. Feature Comparison: Gemini 1.5 vs. Gemini 3 for Agents FeatureGemini 1.5 ProGemini 3 (Agentic)Tool Call Accuracy~85%>98%Reasoning LatencyModerateOptimized Low-LatencyNative Memory ManagementLimitedIntegrated Session StateMultimodal ThroughputStandardHigh-Speed Stream ProcessingTask DecompositionManual PromptingNative Agentic Reasoning Building a Multi-Agent System: Technical Implementation Let's walk through the implementation of a multi-agent system designed for a financial analysis use case. We will use the Vertex AI Python SDK to define our agents and tools. Step 1: Defining Tools Tools are the "hands" of the agent. In Gemini 3, tools are defined as Python functions with clear docstrings, which the model uses to understand when and how to call them. Python import vertexai from vertexai.generative_models import GenerativeModel, Tool, FunctionDeclaration # Initialize Vertex AI vertexai.init(project="my-project-id", location="us-central1") # Define a tool for fetching stock data get_stock_price_declaration = FunctionDeclaration( name="get_stock_price", description="Fetch the current stock price for a given ticker symbol.", parameters={ "type": "object", "properties": { "ticker": {"type": "string", "description": "The stock ticker (e.g., GOOG)"} }, "required": ["ticker"] }, ) stock_tool = Tool( function_declarations=[get_stock_price_declaration], Step 2: The Worker Agent A worker agent is specialized. Below is an example of a "Data Agent" that uses the stock tool. Python model = GenerativeModel("gemini-3-pro") chat = model.start_chat(tools=[stock_tool]) def run_data_agent(prompt): """Handsoff logic for the data worker agent""" response = chat.send_message(prompt) # Handle function calling logic if response.candidates[0].content.parts[0].function_call: function_call = response.candidates[0].content.parts[0].function_call # In a real scenario, you would execute the function here # and send the result back to the model. return f"Agent wants to call: {function_call.name}" Step 3: The Orchestration Flow In a complex system, the data flow must be managed to ensure that Agent A's output is correctly passed to Agent B. We use a sequence diagram to visualize this interaction. Advanced Pattern: State Management and Memory One of the biggest challenges in multi-agent systems is "state drift," where agents lose track of the original goal during long interactions. Gemini 3 addresses this with native session state management in Vertex AI. Instead of passing the entire conversation history back and forth (which increases cost and latency), we can use context caching. This allows the model to "freeze" the initial instructions and background data, only processing the new delta in the conversation. Code Example: Context Caching for Efficiency Python from vertexai.preview import generative_models # Large technical manual context long_context = "... thousands of lines of documentation ..." # Create a cache (valid for a specific TTL) cache = generative_models.Caching.create( model_name="gemini-3-pro", content=long_context, ttl_seconds=3600 ) # Initialize agent with the cached context agent = GenerativeModel(model_name="gemini-3-pro") # The agent now has 'memory' of the documentation without re-sending it Challenges in Multi-Agent Systems Building these systems isn't without hurdles. Here are the three most common technical challenges and how to solve them: 1. The "Infinite Loop" Problem Agents can sometimes get stuck in a loop, repeatedly calling the same tool or asking the same question. Solution: Implement a max_iterations counter in your Python controller and use an "Observer" pattern where a separate model monitors the agentic loop for redundancy. 2. Tool Output Ambiguity If a tool returns an error or unexpected JSON, the agent might hallucinate a solution. Solution: Use strict Pydantic models for function outputs and feed the validation error back into the agent's context, allowing it to self-correct. 3. Context Overflow Despite Gemini 3's large window, multi-agent systems can produce massive amounts of logs. Solution: Use an "Information Bottleneck" strategy. The Orchestrator should summarize the output of each worker before passing it to the next agent, ensuring only high-signal data moves forward. Testing and Evaluation (LLM-as-a-Judge) Traditional unit tests are insufficient for agents. You must evaluate the reasoning path. Google Cloud's Vertex AI Rapid Evaluation allows you to use Gemini 3 as a judge to grade the performance of your agents based on criteria like: Helpfulness: Did the agent fulfill the intent?Tool efficiency: Did it use the minimum number of tool calls?Safety: Did it adhere to the defined system instructions? Evaluation MetricDescriptionTarget ScoreFaithfulnessHow well the agent sticks to retrieved data.> 0.90Task CompletionSuccess rate of complex multi-step goals.> 0.85Latency per StepTime taken for a single reasoning loop.< 2.0s Conclusion Gemini 3 and Vertex AI Agent Builder have fundamentally changed the barrier to entry for building intelligent, autonomous systems. By utilizing a modular multi-agent architecture, leveraging native function calling, and implementing rigorous evaluation cycles, developers can move past the prototype stage and build production-ready AI systems. The key to success lies not in the size of the prompt, but in the elegance of the orchestration and the reliability of the tools provided to the agents. As we move into the era of agentic software, the role of the developer shifts from writing logic to designing ecosystems where agents can collaborate effectively.=

By Jubin Abhishek Soni

CORE

Engineering an AI Agent Skill for Enterprise UI Generation

Large language models have recently made it possible to generate UI code from natural language descriptions or design mockups. However, applying this idea in real development environments often requires more than simply prompting a model. Generated code must conform to framework conventions, use the correct components, and pass basic structural validation. In this article, we describe how we built an Agent Skill called zul-writer that generates UI pages and controller templates for applications built with the ZK framework. For readers unfamiliar with it, ZK is a Java-based web UI framework, and ZUL is its XML-based language used to define user interfaces. A typical page is written in ZUL and connected to a Java controller that handles the application logic. The goal of this agent skill is to transform textual descriptions or UI mockups into ZUL pages and a Java controller scaffold, while validating the output to ensure it conforms to ZK’s syntax and component model. This article focuses on the technical design of the agent, including prompt design, validation steps, and how we guide the model to generate framework-specific UI code. Architecting the Agent: Guiding the LLM Toward Valid UI Code When building tools for enterprise developers, free-form LLM generation is a liability. LLMs often invent non-existent tags, use unsupported properties, and mix architectural patterns. The solution is strictly architecting the agent's constraints. The prompt constraints (SKILL.md): Instead of writing a prompt that "teaches" the LLM how to write ZUL, we use Markdown frontmatter and structured sections inside SKILL.md to establish ironclad constraints. These constraints bind the LLM to a strict 4-step process, effectively removing its freedom to improvise outside of our defined architecture. Structuring the context (RAG in practice): To prevent the LLM from guessing components, we feed it an exact UI-to-component mapping (references/ui-to-component-mapping.md) and base XML templates (assets/). By providing these reference assets directly within the skill, it minimizes the LLM's chance of making up invalid UI tags or layout structures. It doesn't need to guess how an MVVM ViewModel should look; it just follows mvvm-pattern-structure.zul. Designing a Deterministic Workflow for LLMs (The 4-Step Process) Why does free-form prompting fail for complex UI generation? Because generating a full UI requires multiple context switches: understanding the layout, mapping the components, writing the XML, validating the schema, and finally wiring the backend controller. To handle this, zul-writer uses Dual Input Modes (text vs. image), natively supporting both descriptive text requirements and direct image inputs (like mockups or screenshots). Here is the deterministic workflow the skill enforces: Requirement gathering and visual analysis: If an image is provided, the agent performs a visual analysis to identify layouts, tables, and buttons. It then asks necessary clarifying questions: Target ZK version (9 or 10)? MVC or MVVM? Layout preferences?Context-aware generation: The agent generates the ZUL using the exact component mappings and base XML templates provided in the assets/ directory.Local validation: (Covered in the next section).Controller generation: Ensuring the Java code (Composer or ViewModel) is generated to match the IDs and bindings of the generated ZUL perfectly. Trust, But Verify: Validating AI Output In a professional engineering workflow, you cannot blindly trust AI-generated code. XML-based languages are particularly prone to LLMs inventing invalid attributes or placing valid attributes on the wrong tags, e.g., putting an iconSclass on a textbox. Why local script validation? (cost and efficiency): You might think: "Why not just ask the LLM to validate its own code against the XSD?" Validating against massive XSD schemas via LLM prompts consumes huge amounts of tokens, takes too long, and might be prone to "sycophancy" (the LLM telling you it looks fine when it doesn't). Offloading this to a local Python script is deterministic, vastly cheaper, and significantly faster. The zul-writer skill employs a local Python validation script (validate-zul.py) featuring a 4-layer validation strategy: Layer 1: XML well-formedness.Layer 2: XSD schema validation.Layer 3: Attribute placement checks (catching context-specific errors).Layer 4: ZK version-specific compatibility checks. The agentic loop: If the local script throws an error, the agent intercepts the stack trace, understands what went wrong, and self-corrects the ZUL file before presenting the final code to the developers. Test-Driven AI Development Building an AI workflow requires applying traditional software engineering practices — specifically, testing. Testing the agent with Google Stitch and human-in-the-loop: To test zul-writer, I used Google Stitch to rapidly generate diverse UI screenshots to serve as test inputs. The iteration loop looks like this: Feed the Stitch-generated image into zul-writer.Manually review the generated ZUL output for layout accuracy and component misuse.Identify the AI's "bad habits" and write explicit rules/constraints into SKILL.md to prevent future recurrences. (This is Prompt Optimization in action). Codebase testing: The repository includes a test/ directory with known good and bad ZUL files to independently verify the Python validation script. Furthermore, a zulwriter-showcase/ gallery serves as a runnable integration test to prove that the AI-generated UIs (like enterprise Kanban boards and Feedback dashboards) actually render perfectly. Developer pro-tip: During the development of the zul-writer skill, managing file changes can be tedious. Instead of repeatedly copy-pasting the skill directory into Claude Code's skill folder every time you make a change, use a Mac Symbolic Link to point ~/.claude/skills/zul-writer directly to your actual local project directory. This single trick saves endless context switching and allows for instant testing during development! The ZUL-Writer Showcase The screenshot generated by Stitch: The ZUL page generated by ZUL-writer: As you can see, the generated result is very similar to the mockup. But what makes the result particularly useful is that the generated page is not just a generic HTML layout. The agent understands the ZK component ecosystem and generates the interface using ZK components and icon libraries. As a result, the generated page is usually very close to what a developer would write manually. Layouts, components, and event hooks are already structured correctly for a typical ZK application. Developers typically only need to: Adjust minor UI detailsRefine component propertiesImplement the actual business logic inside the generated composer In practice, this reduces a large portion of repetitive scaffolding work and allows developers to focus on application logic rather than UI boilerplate. Conclusion and Key Takeaways Large language models are becoming increasingly capable of generating code, but producing reliable results in real development environments usually requires additional structure. In this article, we explored how an agent skill can guide the model to generate framework-specific UI code and validate the output through simple checks such as XML and XSD validation. While this example focuses on generating ZUL pages and Java controller templates, the same approach can be applied to many other libraries and technologies. By combining LLM prompts, domain knowledge, and lightweight validation, developers can build agent skills that automate repetitive scaffolding tasks. Hopefully, this article provides some ideas and inspiration for building similar agent skills for the frameworks and tools you use in your own projects. Also, if you are interested in trying out the ZUL-writer, it is available on GitHub.

By Hawk Chen

CORE

Essential Techniques for Production Vector Search Systems, Part 4: Multi-Vector Search

After implementing vector search systems at multiple companies, I wanted to document efficient techniques that can be very helpful for successful production deployments of vector search systems. I want to present these techniques by showcasing when to apply each one, how they complement each other, and the trade-offs they introduce. This will be a multi-part series that introduces all of the techniques one by one in each article. I have also included code snippets to quickly test each technique. Before we get into the real details, let us look at the prerequisites and setup. For ease of understanding and use, I am using the free cloud tier from Qdrant for all of the demonstrations below. Steps to Set Up Qdrant Cloud Step 1: Get a Free Qdrant Cloud Cluster Sign up at https://cloud.qdrant.io.Create a free cluster Click "Create Cluster."Select Free Tier.Choose a region closest to you.Wait for the cluster to be provisioned.Capture your credentials. Cluster URL: https://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east.aws.cloud.qdrant.io:6333.API Key: Click "API Keys" → "Generate" → Copy the key. Step 2: Install Python Dependencies PowerShell pip install qdrant-client fastembed numpy Recommended versions: qdrant-client >= 1.7.0fastembed >= 0.2.0numpy >= 1.24.0python-dotenv >= 1.0.0 Step 3: Set Environment Variables or Create a .env File PowerShell # Add to your ~/.bashrc or ~/.zshrc export QDRANT_URL="https://your-cluster-url.cloud.qdrant.io:6333" export QDRANT_API_KEY="your-api-key-here" Create a .env file in the project directory with the following content. Remember to add .env to your .gitignore to avoid committing credentials. PowerShell # .env file QDRANT_URL=https://your-cluster-url.cloud.qdrant.io:6333 QDRANT_API_KEY=your-api-key-here Step 4: Verify Connection We can verify the connection to the Qdrant collection with the following script. From this point onward, I am assuming the .env setup is complete. Python from qdrant_client import QdrantClient from dotenv import load_dotenv import os # Load environment variables from .env file load_dotenv() # Initialize client client = QdrantClient( url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"), ) # Test connection try: collections = client.get_collections() print(f" Connected successfully!") print(f" Current collections: {len(collections.collections)}") except Exception as e: print(f" Connection failed: {e}") print(" Check your .env file has QDRANT_URL and QDRANT_API_KEY") Expected output: Plain Text python verify-connection.py Connected successfully! Current collections: 2 Now that we have the setup out of the way, we can get into the meat of the article. Before the deep dive into multi-vector search, let us look at a high-level overview of the techniques we have covered so far/ about to cover in this multi-part series. Techniqueproblems solvedperformance impactcomplexityHybrid SearchWe will miss exact matches if we employ semantic search purely.Huge increase in the accuracy, closer to 16%MediumBinary QuantizationMemory costs scale linearly with data.40X memory reduction, 15% fasterLowFilterable HNSWNot a good practice to apply post-filtering as it wastes computation.5X faster filtered queriesMediumMulti Vector SearchA single embedding will not be able to capture the importance of various fields.Handles queries from multiple fields, such as title vs description, and requires two times more storage.MediumRerankingOptimized vector search for speed over precision.Deeper semantic understanding, 15-20% ranking improvementHigh Keep in mind that production systems typically combine two to four of these techniques. For example, a typical e-commerce website might use hybrid search, binary quantization, and filterable HNSW. We covered Hybrid Search in the first part of the series, Binary Quantization in the second part, and filterable HNSW in the third part. In this part, we will cover multi-vector search. Multi Vector Search Before we get into multi-vector search, we should understand that single vector search treats all the text fields equally. The problem with this approach is that there is a high chance of missing the structural importance of the various fields. For example, a product titled "Engine Oxygen Sensor" is more important for keyword matching than a detailed description mentioning "sensor" buried in specifications. High-Level Conceptual Flow Diagram for Multi-Vector Search Let us look at how the query vector is used with multiple fields and related vectors to arrive at a fusion score as an output. Let us now take a look at it in more detail with the code below. Python """ Example usage of the multi_vector module. This demonstrates Named Vectors (Multi-Field Vector Search) with Qdrant, and shows concrete value: when title-only or description-only search misses relevant results, and how multi-vector fixes it. """ from multi_vector import ( multi_vector_search, single_vector_search, display_multi_vector_results, get_qdrant_client, get_collection_vector_names, create_demo_collection, cleanup_demo_collection, ) from dotenv import load_dotenv load_dotenv() client = get_qdrant_client() EXISTING_COLLECTION_NAME = "automotive_parts" DEMO_COLLECTION_NAME = "multi_vector_demo" # --- Collection setup --- available_vectors = get_collection_vector_names(EXISTING_COLLECTION_NAME, client) print("=" * 80) print("COLLECTION CONFIGURATION CHECK") print("=" * 80) use_demo_collection = False if available_vectors: print(f"✓ Found named vectors in '{EXISTING_COLLECTION_NAME}': {', '.join(available_vectors)}") COLLECTION_NAME = EXISTING_COLLECTION_NAME vector_names = available_vectors[:2] if len(vector_names) == 1: vector_names = [vector_names[0], vector_names[0]] weights = {name: 1.0 / len(vector_names) for name in vector_names} else: print(f"⚠️ No named vectors in '{EXISTING_COLLECTION_NAME}'. Using demo collection.") print() if create_demo_collection(DEMO_COLLECTION_NAME, client, force_recreate=False): COLLECTION_NAME = DEMO_COLLECTION_NAME vector_names = ["title", "description"] weights = {"title": 0.6, "description": 0.4} use_demo_collection = True print("Demo collection ready. Running value demonstrations.\n") else: print("Failed to create demo collection. Exiting.") exit(1) LIMIT = 5 def _name(payload): return payload.get("part_name", payload.get("name", "Unknown")) def run_value_demo(query: str, title_hint: str, description_hint: str): """Run title-only, description-only, and multi-vector search; show misses and value.""" title_results = single_vector_search( COLLECTION_NAME, query, vector_name="title", client=client, limit=LIMIT ) desc_results = single_vector_search( COLLECTION_NAME, query, vector_name="description", client=client, limit=LIMIT ) multi_results = multi_vector_search( COLLECTION_NAME, query, vector_names=vector_names, weights=weights, client=client, limit=LIMIT ) title_ids = {r["id"] for r in title_results} desc_ids = {r["id"] for r in desc_results} # Items in description top but not in title top → "missed by title-only" missed_by_title = [r for r in desc_results if r["id"] not in title_ids] # Items in title top but not in description top → "missed by description-only" missed_by_desc = [r for r in title_results if r["id"] not in desc_ids] # Items in multi top 3 that weren't in both single-field top 3 title_top3_ids = {r["id"] for r in title_results[:3]} desc_top3_ids = {r["id"] for r in desc_results[:3]} both_top3 = title_top3_ids & desc_top3_ids multi_only_top = [r for r in multi_results[:3] if r["id"] not in both_top3] print(f"Query: \"{query}\"") print("-" * 80) print("Title-only (top {}):".format(LIMIT)) for i, r in enumerate(title_results, 1): print(f" {i}. {_name(r['payload'])} (score: {r['score']:.4f})") print() print("Description-only (top {}):".format(LIMIT)) for i, r in enumerate(desc_results, 1): print(f" {i}. {_name(r['payload'])} (score: {r['score']:.4f})") print() print("Multi-vector / fused (top {}):".format(LIMIT)) for i, r in enumerate(multi_results, 1): print(f" {i}. {_name(r['payload'])} (fused: {r['score']:.4f})") print() if missed_by_title: print("Value of multi-vector:") print(f" • Found by DESCRIPTION but not in title-only top {LIMIT}:") for r in missed_by_title[:3]: print(f" - {_name(r['payload'])} (description score: {r['score']:.4f})") print(f" → {title_hint}") if missed_by_desc: print(f" • Found by TITLE but not in description-only top {LIMIT}:") for r in missed_by_desc[:3]: print(f" - {_name(r['payload'])} (title score: {r['score']:.4f})") print(f" → {description_hint}") if not missed_by_title and not missed_by_desc and multi_only_top: print("Value of multi-vector:") print(" • Multi-vector ranking surfaces the best overall match even when single-field rankings differ.") print() # --- Example 1: Query where DESCRIPTION field shines --- print("=" * 80) print("EXAMPLE 1: Query Where DESCRIPTION Field Adds Value") print("=" * 80) print("Query: \"device that monitors exhaust gases\"") print("Many relevant items describe exhaust monitoring in the description but not in the short title.") print("Title-only search can miss them; description-only and multi-vector find them.\n") run_value_demo( "device that monitors exhaust gases", title_hint="Title-only misses these because 'exhaust' is in the description, not the title.", description_hint="Description-only finds these; multi-vector combines both signals.", ) # --- Example 2: Query where TITLE field shines --- print("=" * 80) print("EXAMPLE 2: Query Where TITLE Field Adds Value") print("=" * 80) print("Query: \"brake pad\"") print("Products with 'brake pad' in the title are easy for title search; description may be generic.") print("Title-only finds them; multi-vector keeps them at the top.\n") run_value_demo( "brake pad", title_hint="Description-only may rank these lower; title has the exact phrase.", description_hint="Multi-vector keeps title matches at the top while still using description signal.", ) # --- Example 3: Query that needs DESCRIPTION (short title) --- print("=" * 80) print("EXAMPLE 3: Query That Matches DESCRIPTION, Not Title") print("=" * 80) print("Query: \"device that measures air flow\"") print("MAF Sensor has a short title ('MAF Sensor'); the phrase 'air flow' is in the description.") print("Description-only finds it; title-only may miss or rank it lower.\n") run_value_demo( "device that measures air flow", title_hint="Title 'MAF Sensor' doesn't contain 'air flow'; description does.", description_hint="Multi-vector finds this item by combining title and description.", ) # --- Example 4: Multi-vector search only (summary) --- print("=" * 80) print("EXAMPLE 4: Multi-Vector Search (Single View)") print("=" * 80) print("One query, multi-vector result: combines title and description for best relevance.\n") multi_only = multi_vector_search( collection_name=COLLECTION_NAME, query="engine sensor", vector_names=vector_names, weights=weights, client=client, limit=5, ) display_multi_vector_results( multi_only, "engine sensor", show_fields=["part_name", "part_id", "category", "description"], ) # --- Summary --- print("\n" + "=" * 80) print("SUMMARY: When Multi-Vector Search Adds Value") print("=" * 80) print(""" • Example 1: \"exhaust monitoring\" → Description field finds O2/exhaust items that title-only misses. Multi-vector includes them and ranks well. • Example 2: \"brake pad\" → Title field finds brake pads; multi-vector keeps them at top while still using description. • Example 3: \"measures air flow\" → Description finds MAF/air flow sensor (title is just \"MAF Sensor\"). Multi-vector combines both. • Takeaway: Users search in different ways. Single-vector (title OR description) can miss relevant results. Multi-vector (title + description, fused) covers both short/keyword and detailed/contextual queries. """) if use_demo_collection: print("=" * 80) print("DEMO COLLECTION") print("=" * 80) print(f"Collection '{DEMO_COLLECTION_NAME}' is available for further runs.") print(f"To delete: cleanup_demo_collection('{DEMO_COLLECTION_NAME}', client)") print("To force refresh data: create_demo_collection('{DEMO_COLLECTION_NAME}', client, force_recreate=True)") Now let us look at it with the help of the output for multi vector search Plain Text ================================================================================ EXAMPLE 1: Query Where DESCRIPTION Field Adds Value ================================================================================ Query: "device that monitors exhaust gases" Many relevant items describe exhaust monitoring in the description but not in the short title. Title-only search can miss them; description-only and multi-vector find them. Query: "device that monitors exhaust gases" -------------------------------------------------------------------------------- Title-only (top 5): 1. Air Flow Meter (score: 0.4396) 2. Engine Oxygen Sensor (score: 0.4032) 3. Knock Sensor (score: 0.4024) 4. Coolant Temperature Sensor (score: 0.3895) 5. O2 Sensor (score: 0.3571) Description-only (top 5): 1. O2 Sensor (score: 0.8199) 2. Exhaust Oxygen Sensor (score: 0.6804) 3. Catalytic Converter (score: 0.6628) 4. Engine Oxygen Sensor (score: 0.6308) 5. Air Flow Meter (score: 0.6086) Multi-vector / fused (top 5): 1. O2 Sensor (fused: 0.5422) 2. Air Flow Meter (fused: 0.5072) 3. Engine Oxygen Sensor (fused: 0.4943) 4. Exhaust Oxygen Sensor (fused: 0.4761) 5. Coolant Temperature Sensor (fused: 0.4602) Value of multi-vector: • Found by DESCRIPTION but not in title-only top 5: - Exhaust Oxygen Sensor (description score: 0.6804) - Catalytic Converter (description score: 0.6628) → Title-only misses these because 'exhaust' is in the description, not the title. • Found by TITLE but not in description-only top 5: - Knock Sensor (title score: 0.4024) - Coolant Temperature Sensor (title score: 0.3895) → Description-only finds these; multi-vector combines both signals. ================================================================================ EXAMPLE 2: Query Where TITLE Field Adds Value ================================================================================ Query: "brake pad" Products with 'brake pad' in the title are easy for title search; description may be generic. Title-only finds them; multi-vector keeps them at the top. Query: "brake pad" -------------------------------------------------------------------------------- Title-only (top 5): 1. Performance Brake Pads (score: 0.6525) 2. Brake Pad Set (score: 0.6431) 3. Brake Rotor (score: 0.5653) 4. Shock Absorber (score: 0.2641) 5. Radiator (score: 0.1838) Description-only (top 5): 1. Brake Rotor (score: 0.3998) 2. Performance Brake Pads (score: 0.3483) 3. Shock Absorber (score: 0.3471) 4. Brake Pad Set (score: 0.3052) 5. Catalytic Converter (score: 0.1464) Multi-vector / fused (top 5): 1. Performance Brake Pads (fused: 0.5308) 2. Brake Pad Set (fused: 0.5080) 3. Brake Rotor (fused: 0.4991) 4. Shock Absorber (fused: 0.2973) 5. Radiator (fused: 0.1560) Value of multi-vector: • Found by DESCRIPTION but not in title-only top 5: - Catalytic Converter (description score: 0.1464) → Description-only may rank these lower; title has the exact phrase. • Found by TITLE but not in description-only top 5: - Radiator (title score: 0.1838) → Multi-vector keeps title matches at the top while still using description signal. ================================================================================ EXAMPLE 3: Query That Matches DESCRIPTION, Not Title ================================================================================ Query: "device that measures air flow" MAF Sensor has a short title ('MAF Sensor'); the phrase 'air flow' is in the description. Description-only finds it; title-only may miss or rank it lower. Query: "device that measures air flow" -------------------------------------------------------------------------------- Title-only (top 5): 1. Air Flow Meter (score: 0.5513) 2. O2 Sensor (score: 0.3919) 3. Mass Air Flow Sensor (score: 0.3841) 4. Engine Oxygen Sensor (score: 0.3820) 5. Exhaust Oxygen Sensor (score: 0.3720) Description-only (top 5): 1. Air Flow Meter (score: 0.6539) 2. O2 Sensor (score: 0.6490) 3. Exhaust Oxygen Sensor (score: 0.5451) 4. Mass Air Flow Sensor (score: 0.5413) 5. Engine Oxygen Sensor (score: 0.4760) Multi-vector / fused (top 5): 1. Air Flow Meter (fused: 0.5923) 2. O2 Sensor (fused: 0.4947) 3. Mass Air Flow Sensor (fused: 0.4470) 4. Exhaust Oxygen Sensor (fused: 0.4412) 5. Engine Oxygen Sensor (fused: 0.4196) Value of multi-vector: • Multi-vector ranking surfaces the best overall match even when single-field rankings differ. ================================================================================ EXAMPLE 4: Multi-Vector Search (Single View) ================================================================================ One query, multi-vector result: combines title and description for best relevance. Multi-Vector Search Results for: 'engine sensor' ================================================================================ Found 5 results using multi-vector search (weighted fusion) 1. Oxygen Sensor for Engine Part_name: Engine Oxygen Sensor Part_id: DEMO-007 Category: Engine Components Description: High-precision oxygen sensor for engine exhaust monitoring. Detects oxygen levels in exhaust gases t... Fused Score: 0.7479 -------------------------------------------------------------------------------- 2. Engine Knock Sensor Part_name: Knock Sensor Part_id: DEMO-009 Category: Engine Components Description: Piezoelectric sensor that detects engine knock or detonation. Protects engine by adjusting ignition ... Fused Score: 0.6744 -------------------------------------------------------------------------------- 3. Engine Coolant Temperature Sensor Part_name: Coolant Temperature Sensor Part_id: DEMO-008 Category: Engine Components Description: Thermistor sensor that monitors coolant temperature. Sends signal to ECU for fuel and ignition tunin... Fused Score: 0.6651 -------------------------------------------------------------------------------- 4. O2 Sensor Part_name: O2 Sensor Part_id: DEMO-003 Category: Engine Components Description: Device that monitors oxygen levels in exhaust gases. Critical for fuel mixture control and emission ... Fused Score: 0.5716 -------------------------------------------------------------------------------- 5. Wideband Oxygen Sensor Part_name: Exhaust Oxygen Sensor Part_id: DEMO-004 Category: Engine Components Description: Precision sensor for monitoring exhaust gas composition. Used for engine tuning and emission diagnos... Fused Score: 0.5532 -------------------------------------------------------------------------------- ================================================================================ SUMMARY: When Multi-Vector Search Adds Value ================================================================================ • Example 1: "exhaust monitoring" → Description field finds O2/exhaust items that title-only misses. Multi-vector includes them and ranks well. • Example 2: "brake pad" → Title field finds brake pads; multi-vector keeps them at top while still using description. • Example 3: "measures air flow" → Description finds MAF/air flow sensor (title is just "MAF Sensor"). Multi-vector combines both. • Takeaway: Users search in different ways. Single-vector (title OR description) can miss relevant results. Multi-vector (title + description, fused) covers both short/keyword and detailed/contextual queries. Benefits As you can clearly see from the results, multi-vector search handles different query styles seamlessly. There are some queries that work with title-only vectors, while description-only might struggle. There are certain other queries where the title might not help as much, and the description vector might be the best. The fused scores by the multi-vector search ensure that the multi-vector search adapts to query style automatically and help prevent missed results regardless of how the searches are performed. Costs The biggest cost driver for a multi-vector search is the fact that it requires 2X storage overhead. If we tie it down to the result shown above, we have a title vector and a description vector, and they should be stored separately. If we take a million parts into account for example and the actual storage for the million parts related data is 1.5GB we are now looking at 3GB for storage. Other costs would be the additional time added to index more fields. Also, the latency for the query increases considerably because of the dual vector searches and the logic for the fusion scoring. Not to forget the added complexity that the fusion scoring adds to the logic, and it is dependent on the specific search use case. When to Use When you have structured product data with a lot of distinct fields.When the search behavior is different for different users.When the product information is provided by more than one field, the detail is needed.When latency and complexity are manageable, and storage is not a concern. When Not to Use When there is no structural distinction between fields.When most of the fields are semantically similar.When the latency or storage constraints are tight.When a single vector does the work for you. Efficiency Comparison (From the Results) Let us quickly compare the efficiency based on the results. Query Typetitle onlydescription onlymulti vectorcoverageBrake Pad0.65250.34830.5308Preserves title qualitymonitor exhaust gases0.43960.81990.5422Uses descriptionCoverage60%70%90%Found items others missed Performance Characteristics Based on the results, the performance characteristics are as follows Metrictitle onlydescription onlymulti vectorevidence from the dataShort query accuracyExcellentPoorExcellentSearch for Brake PadLong Query accuracyPoorExcellentExcellentSearch for monitors exhaustQuery LatencyLowLowTwiceAdded latency for dual search + fusionAdaptabilityFixedFixedAutomaticAdjusts as per query style Conclusion Multi-vector search is primarily driven by the necessity of the use case. It helps unfold the value of the field importance, as it is evident from the results, the title-only search completely missed a few searches that were captured by the multi-vector search. If the application is ready for the trade-off of 2X the storage and the added latency in exchange for comprehensive coverage across all query styles, then multi-vector search is the way to go. In the next and final part of the series, we will look into reranking and also look at all of the techniques and their applications as a recap.

By Pavan Vemuri

CORE

Standards as the Invisible Infrastructure of Software

Standards are often treated as bureaucracy — something slow, heavy, and occasionally disconnected from “real engineering.” Yet if we look at history with a bit more rigor, that narrative collapses quickly. Software development is not exempt from the forces that shaped civilization. It is, in fact, one of the latest chapters in a very old story: the story of standardization as a multiplier of human capability. The word “standard” itself comes from the Old French estandart — a rallying flag, a visible sign under which people gather. That origin is revealing. A standard is not merely documentation; it is a coordination mechanism. It allows independent actors to align around shared expectations. Without that alignment, scaling knowledge becomes nearly impossible. Human progress has always depended on standards. Writing systems transformed ephemeral speech into persistent knowledge. Once symbols were formalized, ideas could survive generations. When Isaac Newton famously wrote that he stood “on the shoulders of giants,” he was acknowledging accumulated knowledge preserved through standardized language, notation, and scholarly norms. Mathematics itself works only because its symbols and rules are shared and stable. The Industrial Revolution amplified this principle. Mass production did not emerge merely from better machines; it required interchangeable parts. That meant tolerances, measurements, and repeatable specifications. A bolt had to fit a nut regardless of where it was manufactured. Standards turned craftsmanship into scalable industry. Why should software be different? In the browser you are using right now, a silent agreement is in effect: the World Wide Web Consortium defines the specifications for HTML, CSS, and related technologies. Browser vendors interpret and implement these documents independently, yet the web works because they converge on shared behavior. Imperfectly? Sometimes. But without a common reference, the web would fragment into incompatible silos. A typical software standard has multiple structural elements. First, there is the specification — the formal description of APIs, semantics, and expected behavior. A specification is not an implementation. It defines what must happen, not how it must be achieved. Second, there is the committee or expert group responsible for evolving that specification. This governance layer is often misunderstood. Its purpose is not control; it is consensus-building. Diverse stakeholders — vendors, independent experts, and community representatives — debate trade-offs so that no single company dictates the ecosystem. Third, there are vendors or implementers. These are the runtime engines, frameworks, or tools that bring the specification to life. Multiple implementations introduce competition, innovation, and resilience. Finally, there is verification. In some ecosystems, partial conformance is tolerated. In others, compliance is binary. In the Java ecosystem, the Technology Compatibility Kit (TCK) model enforces strict alignment: you either pass all required tests, or you cannot claim compatibility. This binary approach dramatically reduces ambiguity and vendor fragmentation. To understand the power of standards in modern software, consider the Java platform. Java is not defined solely by a single company’s runtime. It is defined by specifications maintained through the Java Community Process. Multiple vendors implement the language and platform — yet all must adhere to the same specification and compatibility requirements. This is what makes Java portable in practice rather than merely in marketing. Without a standard, “Java” would become a brand attached to incompatible dialects. With a standard, it becomes a contract. The same architectural logic applies to Jakarta EE, stewarded by the Eclipse Foundation. Jakarta EE is not just a collection of APIs. It is a coordinated effort: specifications, open governance, multiple compatible implementations, and TCK validation. This structure enables innovation at the implementation layer while protecting portability at the application layer. Standards are not limited to languages and platforms. They shape architectural thinking itself. Design patterns, for example, became influential not because they were mandated, but because they were documented, named, and conceptually standardized. A shared vocabulary — “Factory,” “Strategy,” “Observer” — allows teams across continents to collaborate efficiently. In architecture, REST became dominant because Roy Fielding formalized its constraints. Naming and formalization are acts of standardization. At this point, it is worth distinguishing between related but distinct concepts: open source and open standards. Open source refers to software whose source code is publicly available, typically under licenses that allow inspection, modification, and redistribution. It is an implementation model. An open standard, on the other hand, refers to a publicly accessible specification developed through a transparent and inclusive process. It is a governance model. You can have open source without an open standard — a single project, fully transparent, but controlled by one vendor without formal specification processes. You can also have an open standard implemented by proprietary software. The most resilient ecosystems combine both: open governance for the specification and open implementations that compete on quality and performance. When these two forces align, something powerful happens. Developers gain portability. Organizations reduce vendor lock-in. Innovation accelerates because differentiation shifts from reinventing interfaces to optimizing implementations. Standards also provide long-term stability. In a world obsessed with rapid iteration, it is tempting to view standards as a form of friction. But from an engineering perspective, stability is not the enemy of innovation; it is its enabler. When APIs remain predictable, teams can invest in higher-level abstractions, tooling, and architecture without fear of constant foundational shifts. Critics often argue that standards slow progress. The more precise question is: slow progress compared to what? A proprietary ecosystem may move faster initially, but it risks fragmentation, lock-in, and incompatibility. Standards introduce negotiation and consensus, which can feel slower, yet they produce ecosystems that endure for decades. Consider the analogy with manufacturing. Interchangeable parts may have required additional upfront coordination, but they enabled exponential scaling. Software standards function similarly. They are coordination overhead that unlocks systemic scale. For software engineers, standards are not merely abstract governance constructs. They are daily tools. Every HTTP request relies on standardized semantics. Every SQL query depends on decades of evolving agreement. Every JVM-based application trusts compatibility guarantees defined outside its codebase. If we step back, the skeptical question becomes unavoidable: what would modern software look like without standards? Likely a patchwork of incompatible protocols, isolated frameworks, and fragile integrations. Standards are not glamorous. They rarely trend on social media. But they are the invisible infrastructure that allows distributed collaboration at a planetary scale. They enable thousands of engineers, across companies and continents, to build systems that interoperate reliably. In the end, standards are not about restriction; they are about shared foundations. They embody the same principle that enabled writing, science, and industrialization: codify knowledge, align expectations, and allow independent actors to build upon a stable base. Software is no different from any other engineering discipline. If history teaches us anything, it is this: progress scales when agreement scales. Standards are how agreement becomes durable.

By Otavio Santana

CORE

Understanding Custom Authorization Mechanisms in Amazon API Gateway and AWS AppSync

AWS provides Lambda-based authorization capabilities for both API Gateway and AppSync, each designed to secure different API paradigms, highlighting their complementary roles and the confidence they inspire in combined security potential. Amazon API Gateway positions Lambda authorizers as a security checkpoint between incoming requests and backend integrations — whether Lambda functions or HTTP endpoints. The authorizer validates credentials, executes custom authentication workflows, and produces IAM policy documents that explicitly grant or deny access. These policies guide API Gateway’s decision to forward or reject requests to backend services. In contrast, AppSync integrates Lambda authorizers directly into the GraphQL request lifecycle, intercepting operations before resolver execution. The authorizer examines request credentials (tokens, headers, or other authentication artifacts) and returns an identity context object upon successful validation. This context propagates through the resolver chain, enabling fine-grained, context-aware authorization decisions at the data access layer. Common Characteristics The shared attributes of Lambda authorizers in AppSync and API Gateway emphasize their core security capabilities: Security Enforcement: Both validate incoming requests and enforce access-control decisions, ensuring requesters possess sufficient privileges to access protected resources.Serverless Execution Model: Authorization logic runs within AWS Lambda functions in both cases, providing a consistent serverless compute foundation for custom authentication workflows.Flexible Authorization Logic: Developers can implement bespoke authentication and authorization rules within the Lambda function to meet diverse security requirements and business needs.AWS Service Integration: Both authorizer types integrate seamlessly with the broader AWS ecosystem. AppSync can leverage DynamoDB or Lambda as data sources, while API Gateway connects to Lambda functions and HTTP endpoints.IAM-Based Access Decisions: Both services use IAM policy frameworks to govern access permissions. The authorizer generates policy documents that explicitly define allowed or denied actions for authenticated principals. Key Distinctions These distinctions clarify the operational scope, timing, and architecture of Lambda authorizers in AppSync versus API Gateway: Functional Scope: AppSync authorizers operate at the GraphQL operation layer, making access decisions before resolver invocation. API Gateway authorizers function at the API route level, protecting endpoints before backend integration occurs.Execution Timing and Context Flow: AppSync authorizers produce an identity context object that propagates through the resolver chain. API Gateway triggers authorizers before backend integration, generating IAM policies that determine request disposition.Architectural Integration: AppSync authorizers integrate directly into the GraphQL resolver pipeline within GraphQL-specific contexts. API Gateway authorizers function as infrastructure components alongside endpoints, deployment stages, and API configurations.Data Format Expectations: AppSync authorizers process GraphQL requests, accessing operation types, field selections, and input arguments. API Gateway authorizers handle requests in formats dictated by integration types—JSON, form data, or others.Authorization Output Patterns: AppSync authorizers return identity context objects, enabling resolvers to make granular decisions. API Gateway authorizers generate explicit IAM policy documents defining precise access permissions for API resources. These architectural differences influence implementation strategies and should guide service selection based on API paradigms, use cases, and authorization needs. Amazon API Gateway Authorization Flow Client initiates a request to the API Gateway endpoint.Lambda authorizer intercepts the request, executing authentication and authorization validation.Upon successful authorization, the authorizer returns the IAM policy document defining access scope.API Gateway evaluates the policy, allowing or denying the request.Approved requests forward to the backend integration (Lambda, HTTP endpoint, etc.).Backend processes the request and generates a response payload.API Gateway returns the backend response to the client. Lambda Authorizer Contract for API Gateway: Input Parameters: event: Request metadata object containing HTTP headers, method, path parameters, and request context.context: Lambda execution context providing AWS account ID, request ID, and function metadata. Output Parameters: principalId: String identifier representing the authenticated principal for tracking and auditing.policyDocument: IAM policy document defining allowed or denied actions on API resources.context: Optional key-value object for passing additional metadata downstream. AWS AppSync Authorization Flow AWS AppSync Authorization Flow Client submits GraphQL request to AppSync API endpoint.Lambda authorizer intercepts the request, performing authentication and authorization validation.Successful authorization produces an identity context object.Request proceeds to the appropriate GraphQL resolver with the identity context.Resolver executes data source operations (DynamoDB queries, Lambda invocations, etc.).Resolver returns a GraphQL response to the client. Lambda Authorizer Contract for AppSync: Input Parameters: event: GraphQL request object containing operation payload, HTTP headers, and request metadata.context: Lambda execution context providing request ID, function version, and runtime information. Output Parameters: isAuthorized: Boolean flag indicating authorization decision (true/false).context: Identity context object propagated to downstream resolvers.resolverConfig: Optional object for customizing resolver behavior, including caching strategies and field-level authorization rules. Use Case Scenarios Lambda authorizers address diverse authentication and authorization requirements across both services. The following scenarios illustrate practical applications for each implementation: API Gateway Lambda Authorizer Applications: Token-Based RESTful API Security: When building RESTful APIs that require route-specific protection, Lambda authorizers validate authentication tokens and generate IAM policies that control resource access. This pattern suits applications needing granular endpoint-level authorization with explicit access control definitions.External Authentication Provider Integration: Organizations that leverage OAuth 2.0, OpenID Connect, or other third-party identity providers can implement Lambda authorizers to validate external tokens, perform claims-based authorization, and map external identities to AWS IAM policies for seamless integration.Complex Business Rule Authorization: Applications that require authorization beyond standard IAM capabilities benefit from Lambda authorizers that can invoke external services, evaluate business rules, query databases, and apply multidimensional access control policies before granting API access. AppSync Lambda Authorizer Applications: Field-Level GraphQL Authorization: GraphQL APIs requiring fine-grained data access control leverage AppSync Lambda authorizers to enforce schema-specific authorization rules. The authorizer validates credentials and generates identity contexts that enable resolvers to make field-level and type-level access decisions based on user attributes.Multi-Factor Authentication Workflows: GraphQL applications implementing MFA can utilize Lambda authorizers to orchestrate multistep verification processes, integrating with external MFA providers, validating time-based tokens, or enforcing adaptive authentication policies based on request context and risk assessment.Federated Identity Resolution: Complex identity scenarios involving multiple authentication sources, user attribute aggregation, or cross-system identity federation benefit from AppSync Lambda authorizers that can resolve identities across disparate systems, enrich user contexts, and provide unified identity information to GraphQL resolvers. API Gateway authorizers excel in RESTful API protection, third-party authentication integration, and policy-based access control. AppSync authorizers specialize in GraphQL-specific authorization patterns, context-aware data access control, and identity-enriched resolver workflows. Quick Comparison FeatureAPI GatewayAppSyncAPI TypeRESTful APIsGraphQL APIsAuthorization LevelRoute/endpoint levelOperation/field levelOutput FormatIAM policy documentIdentity context objectCachingBuilt-in (TTL: 0-3600s)Not built-inTimeout30 seconds max10 seconds maxUse CaseToken validation, OAuthFine-grained data accessIntegration PointBefore backendBefore resolverPolicy GenerationExplicit IAM policiesContext-based decisions Implementation Examples API Gateway Lambda Authorizer (Python) Python import json import jwt from typing import Dict, Any def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]: """ API Gateway Lambda Authorizer for JWT token validation """ try: # Extract token from Authorization header token = event['authorizationToken'].replace('Bearer ', '') # Validate JWT token (simplified - use proper validation in production) decoded = jwt.decode(token, 'your-secret-key', algorithms=['HS256']) # Extract user information principal_id = decoded['sub'] # Generate IAM policy policy = generate_policy(principal_id, 'Allow', event['methodArn']) # Add context data (available in backend) policy['context'] = { 'userId': decoded['sub'], 'email': decoded.get('email', ''), 'role': decoded.get('role', 'user') } return policy except jwt.ExpiredSignatureError: raise Exception('Unauthorized: Token expired') except jwt.InvalidTokenError: raise Exception('Unauthorized: Invalid token') except Exception as e: raise Exception(f'Unauthorized: {str(e)}') def generate_policy(principal_id: str, effect: str, resource: str) -> Dict[str, Any]: """ Generate IAM policy document """ return { 'principalId': principal_id, 'policyDocument': { 'Version': '2012-10-17', 'Statement': [ { 'Action': 'execute-api:Invoke', 'Effect': effect, 'Resource': resource } ] } } AppSync Lambda Authorizer (Node.js) TypeScript const jwt = require('jsonwebtoken'); exports.handler = async (event) => { try { // Extract token from request headers const token = event.authorizationToken.replace('Bearer ', ''); // Validate JWT token const decoded = jwt.verify(token, process.env.JWT_SECRET); // Return authorization response with identity context return { isAuthorized: true, resolverContext: { userId: decoded.sub, email: decoded.email, roles: decoded.roles || [], permissions: decoded.permissions || [] }, deniedFields: [], // Optional: fields to deny access ttlOverride: 300 // Optional: cache TTL in seconds }; } catch (error) { console.error('Authorization failed:', error.message); // Return unauthorized response return { isAuthorized: false, resolverContext: {}, deniedFields: ['*'] // Deny all fields }; } }; Performance and Cost Considerations Caching Strategies API Gateway: Built-in authorizer result caching (TTL: 0-3600 seconds) significantly reduces Lambda invocations and latencyAppSync: No built-in caching; implement custom caching in authorizer using ElastiCache or DynamoDBRecommendation: Enable caching for high-traffic APIs to reduce costs and improve response times Cold Start Impact Lambda cold starts add 100-500ms latency to authorizationMitigation: Use provisioned concurrency for critical APIs or implement connection poolingConsider keeping authorizer functions warm with scheduled invocations Cost Optimization API Gateway caching reduces Lambda invocations by 90%+ for repeated requestsTypical cost: $0.20 per million authorizer invocationsA cache hit rate of 80% can minimize authorization costs by $0.16 per million requests Security Best Practices Token Validation Always validate token signatures using proper cryptographic librariesVerify token expiration (exp claim) and not-before (nbf claim)Check token issuer (iss) and audience (aud) claimsImplement token revocation checking for critical operations Secret Management Store JWT secrets and API keys in AWS Secrets Manager or Parameter StoreRotate secrets regularly using automated rotation policiesNever hardcode secrets in Lambda function codeUse IAM roles for Lambda to access secrets securely Error Handling Return generic error messages to clients (avoid leaking security details)Log detailed error information to CloudWatch for debuggingImplement rate limiting to prevent brute force attacksUse AWS WAF for additional protection against common attacks Example: Secure Secret Retrieval Python import boto3 import json from functools import lru_cache @lru_cache(maxsize=1) def get_jwt_secret(): """Retrieve and cache JWT secret from Secrets Manager""" client = boto3.client('secretsmanager') response = client.get_secret_value(SecretId='jwt-secret') return json.loads(response['SecretString'])['secret'] Error Handling and Responses API Gateway Error Responses When authorization fails, API Gateway returns: 401 Unauthorized: Invalid or missing token403 Forbidden: Valid token but insufficient permissions500 Internal Server Error: Authorizer function error AppSync Error Responses AppSync returns GraphQL errors: JSON { "errors": [ { "message": "Unauthorized", "errorType": "Unauthorized", "locations": [ { "line": 1, "column": 1 } ] } ] } Custom Error Handling Python def handle_authorization_error(error_type: str) -> Dict[str, Any]: """Return appropriate error response based on error type""" error_messages = { 'expired': 'Token has expired', 'invalid': 'Invalid token format', 'missing': 'Authorization token required', 'insufficient': 'Insufficient permissions' } # Log error for monitoring print(f"Authorization error: {error_type}") # Return generic error to client raise Exception('Unauthorized') Limitations and Constraints API Gateway Authorizer Limits Timeout: Maximum 30-second execution timePayload: Request/response limited to 10KBCache TTL: 0-3600 seconds (1 hour maximum)Concurrent Executions: Subject to Lambda account limits AppSync Authorizer Limits Timeout: Maximum 10 seconds execution timePayload: Request limited to 1MBNo Built-in Caching: Must implement custom cachingResolver Context: Limited to 5KB Best Practices for Limits Keep authorizer logic lightweight and fastImplement connection pooling for external service callsUse caching aggressively to avoid timeout issuesMonitor CloudWatch metrics for timeout and error rates Monitoring and Debugging Key Metrics to Monitor Invocation Count: Track authorization request volumeError Rate: Monitor failed authorizationsDuration: Track authorization latency (target: <100ms)Cache Hit Rate: For API Gateway authorizersThrottles: Identify capacity issues CloudWatch Logs Python import logging logger = logging.getLogger() logger.setLevel(logging.INFO) def lambda_handler(event, context): logger.info(f"Authorization request: {event['methodArn']}") # Authorization logic logger.info(f"Authorization granted for user: {principal_id}") Debugging Tips Enable detailed CloudWatch logging for authorizer functionsUse AWS X-Ray for distributed tracingTest authorizers locally using SAM CLIImplement structured logging for easier analysis across disparate systems, enrich user contexts, and provide unified identity information to GraphQL resolvers. API Gateway authorizers excel in RESTful API protection, third-party authentication integration, and policy-based access control. AppSync authorizers specialize in GraphQL-specific authorization patterns, context-aware data access control, and identity-enriched resolver workflows. Summary Lambda authorizers serve distinct yet complementary roles across AWS's API service portfolio. The selection between AppSync and API Gateway authorizers hinges on your API architecture, authorization complexity, and integration requirements. API Gateway's Lambda authorizer implementation excels in RESTful API scenarios, providing robust route-level protection through IAM policy generation. The authorizer validates credentials before backend integration occurs, producing explicit policy documents that govern resource access. This approach proves remarkably effective for third-party authentication integration (e.g., OAuth, OpenID Connect) and for custom authorization workflows that extend beyond standard IAM capabilities. Built-in caching capabilities make it cost-effective for high-traffic applications. AppSync's Lambda authorizer operates within the GraphQL paradigm, intercepting requests at the operation level before resolver execution. Rather than generating explicit IAM policies, it produces identity context objects that flow through the resolver pipeline, enabling granular, field-level authorization decisions. This model suits GraphQL APIs requiring fine-grained data access control and complex identity resolution workflows. The architectural distinctions between these implementations are substantial. AppSync authorizers integrate directly into the GraphQL resolver chain, processing GraphQL-formatted requests and returning context objects for downstream authorization logic. API Gateway authorizers function as infrastructure-level components, handling various request formats (JSON, form data) and producing IAM policies that determine the disposition of requests. Your implementation choice should align with your API paradigm: GraphQL applications benefit from AppSync's context-aware authorization model, while RESTful APIs leverage API Gateway's policy-based approach. Consider factors including authentication provider integration requirements, authorization granularity needs, caching requirements, timeout constraints, and the level of customization your security model demands. Both services provide serverless, scalable authorization mechanisms — the optimal choice depends on your specific architectural context and security requirements. When implementing Lambda authorizers, prioritize security best practices, including proper token validation, secret management, error handling, and monitoring. Leverage caching strategies to optimize performance and costs and ensure your authorizer logic remains lightweight to avoid timeout issues. Regular monitoring of authorization metrics helps identify potential security issues and performance bottlenecks before they impact production systems. References AWS Official Documentation API Gateway Lambda Authorizers Use API Gateway Lambda authorizersLambda authorizer input and output formatAWS AppSync Authorization AWS AppSync Lambda authorizersAppSync authorization use casesAWS Lambda Lambda function handler in PythonLambda function handler in Node.jsBest practices for working with AWS Lambda functionsSecurity and Best Practices AWS Secrets ManagerIAM JSON policy referenceSecurity best practices in AWS LambdaMonitoring and Debugging Monitoring REST APIs with CloudWatchMonitoring and logging for AWS AppSyncUsing AWS X-Ray with API Gateway Additional Resources AWS API Gateway PricingAWS AppSync PricingAWS Lambda PricingJWT.io - JSON Web Token IntroductionOAuth 2.0 Authorization Framework

By Leslie Daniel Raj