Language Resources

DZone's Featured Languages Resources

A Practical Guide to Building Generative AI in Java

By Xavier Portilla Edo

CORE

Building generative AI applications in Java used to be a complex, boilerplate-heavy endeavor. You’d wrestle with raw HTTP clients, hand-craft JSON payloads, parse streaming responses, manage API keys, and stitch together observability, all before writing a single line of actual AI logic. Those days are over. Genkit Java is an open-source framework that makes building AI-powered applications in Java as straightforward as defining a function. Pair it with Google’s Gemini models and Google Cloud Run, and you can go from zero to a production-deployed generative AI service in minutes, not days. This is a complete, working example. Clone it, set your API key, and run. Why Genkit Java? If you’re a Java developer, you’ve probably watched the GenAI revolution unfold mostly in Python and TypeScript. The tooling, frameworks, and tutorials are all skewed toward those ecosystems. Java developers were left to either build everything from scratch or use verbose, low-level SDKs. Genkit Java changes that. Here’s what makes it different: FeatureWithout GenkitWith GenkitCall GeminiManual HTTP client, JSON parsing, error handlinggenkit.generate(...), one method callExpose as APISet up Spring Boot, write controllers, handle serializationgenkit.defineFlow(...), auto-exposed as HTTP endpointStructured outputParse raw JSON strings, deserialize manuallyoutputClass(MyClass.class), Gemini returns typed Java objectsTool callingParse function call responses, execute tools, re-submitDefine tools with genkit.defineTool(...), automatic executionObservabilityManual OpenTelemetry setup, custom spans, metricsBuilt-in tracing, metrics, and latency tracking, zero configDev/test your flowscURL, Postman, write test harnessesGenkit DevUI, visual, interactive, built-in What We’re Building A Java application with a translation AI flow powered by Gemini via Genkit, showcasing: Typed flow inputs – TranslateRequest class with @JsonProperty annotations as the flow inputStructured LLM output – Gemini returns a TranslateResponse Java object directly (no manual JSON parsing)Typed flow outputs – The flow returns a fully typed TranslateResponse to the caller All of this in a single Java file + two model classes. No Spring Boot. No annotations soup. No XML configuration. Just clean, readable, type-safe code. Prerequisites Java 21+ (Eclipse Temurin recommended)Maven 3.6+Node.js 18+ (for the Genkit CLI)A Google GenAI API key (free from Google AI Studio)Google Cloud SDK (only for Cloud Run deployment) Install the Genkit CLI The Genkit CLI is your command-line companion for developing and testing AI flows. Install it globally: JavaScript npm install -g genkit Verify the installation: JavaScript genkit --version The CLI is what powers the DevUI and provides a seamless development experience, more on that below. Project Structure Plain Text genkit-java-getting-started/ ├── src/ │ └── main/ │ ├── java/ │ │ └── com/example/ │ │ ├── App.java # ← The main application │ │ ├── TranslateRequest.java # ← Typed flow input │ │ └── TranslateResponse.java # ← Typed flow + LLM output │ └── resources/ │ └── logback.xml # Logging configuration ├── pom.xml # Maven config with Genkit + Jib ├── run.sh # Quick-start script └── README.md # This article Getting Started 1. Clone and Set Your API Key Java git clone https://github.com/xavidop/genkit-java-getting-started.git cd genkit-java-getting-started 2. Run With the Genkit DevUI (Recommended) JavaScript genkit start -- mvn compile exec:java That’s it. Two commands. Your AI-powered Java server is running on http://localhost:8080, and the Genkit DevUI is available at http://localhost:4000. 3. Or Run Directly (Without DevUI) JavaScript mvn compile exec:java The Code, It’s Stupidly Simple Step 1: Define Typed Input/Output Classes Instead of using raw Map or String, define proper Java classes with Jackson annotations. Genkit uses these annotations to generate JSON schemas that tell Gemini exactly what structure to return. TranslateRequest.java, the flow input: Java import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonPropertyDescription; /** * Input for the translate flow. */ public class TranslateRequest { @JsonProperty(required = true) @JsonPropertyDescription("The text to translate") private String text; @JsonProperty(required = true) @JsonPropertyDescription("The target language (e.g., Spanish, French, Japanese)") private String language; public TranslateRequest() {} public TranslateRequest(String text, String language) { this.text = text; this.language = language; } public String getText() { return text; } public void setText(String text) { this.text = text; } public String getLanguage() { return language; } public void setLanguage(String language) { this.language = language; } @Override public String toString() { return String.format("TranslateRequest{text='%s', language='%s'}", text, language); } TranslateResponse.java, the flow output, and the LLM structured output: Java import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonPropertyDescription; /** * Structured output for the translate flow. */ public class TranslateResponse { @JsonProperty(required = true) @JsonPropertyDescription("The original text that was translated") private String originalText; @JsonProperty(required = true) @JsonPropertyDescription("The translated text") private String translatedText; @JsonProperty(required = true) @JsonPropertyDescription("The target language") private String language; public TranslateResponse() {} public TranslateResponse(String originalText, String translatedText, String language) { this.originalText = originalText; this.translatedText = translatedText; this.language = language; } public String getOriginalText() { return originalText; } public void setOriginalText(String originalText) { this.originalText = originalText; } public String getTranslatedText() { return translatedText; } public void setTranslatedText(String translatedText) { this.translatedText = translatedText; } public String getLanguage() { return language; } public void setLanguage(String language) { this.language = language; } @Override public String toString() { return String.format( "TranslateResponse{originalText='%s', translatedText='%s', language='%s'}", originalText, translatedText, language); } } The @JsonPropertyDescription annotations are key, Genkit passes them to Gemini as part of the JSON schema, so the model knows exactly what each field means. Step 2: Initialize Genkit Java Genkit genkit = Genkit.builder() .options(GenkitOptions.builder() .devMode(true) .reflectionPort(3100) .build()) .plugin(GoogleGenAIPlugin.create()) .plugin(jetty) That’s the entire setup. The GoogleGenAIPlugin reads your GOOGLE_API_KEY automatically. The JettyPlugin handles HTTP. Genkit wires everything together. Step 3: Define a Flow With Typed Classes and Structured Output Java genkit.defineFlow( "translate", TranslateRequest.class, // ← typed input TranslateResponse.class, // ← typed output (ctx, request) -> { String prompt = String.format( "Translate the following text to %s.\n\nText: %s", request.getLanguage(), request.getText() ); return genkit.generate( GenerateOptions.<TranslateResponse>builder() .model("googleai/gemini-3-flash-preview") .prompt(prompt) .outputClass(TranslateResponse.class) // ← Gemini returns a typed object! .config(GenerationConfig.builder() .temperature(0.1) .build()) .build() ); } ); Look at what’s happening here: TranslateRequest.class as the flow input, Genkit automatically deserializes incoming JSON into a TranslateRequest object. No Map.get() casting.TranslateResponse.class as the flow output, the flow returns a typed object, serialized automatically to JSON for the HTTP response.outputClass(TranslateResponse.class) on the generate call, this is the magic. Genkit sends the JSON schema derived from TranslateResponse to Gemini, and Gemini returns structured JSON that Genkit deserializes into a TranslateResponse object. No response.getText() + manual parsing. That single defineFlow call: Registers the flow in Genkit’s internal registryExposes it as a POST /api/flows/translate HTTP endpointMakes it visible in the DevUIAdds full OpenTelemetry tracing automaticallyTracks token usage, latency, and error rates Compare that to writing a Spring Boot controller + service + DTO + config + exception handler for the same functionality. The Genkit DevUI: Your AI Playground This is where Genkit truly shines for development. When you run with genkit start, the CLI launches a visual DevUI at http://localhost:4000. What Can You Do in the DevUI? Browse all flows. See every flow you’ve registered, like translate, with its typed input/output schemas.Run flows interactively. Fill in a TranslateRequest JSON, click “Run”, see the TranslateResponse instantly. No cURL needed.Inspect traces. Every flow execution is traced. See exactly which model was called, what the input/output was, how long it took, and how many tokens were used.View registered models and tools. See all available Gemini models and any tools you’ve defined.Test tool calling. Watch Gemini decide to call your tools in real-time.Manage datasets and evaluations. Create test datasets and evaluate your AI outputs. Deploying to Google Cloud Run The project uses Jib to build and push container images directly from Maven, no Dockerfile and no Docker daemon required. Jib is configured in the pom.xml and builds optimized, layered container images. Step-by-Step Deployment Shell # Set your GCP project export PROJECT_ID=$(gcloud config get-value project) export REGION=us-central1 # Build the container image and push it to Google Container Registry # No Docker needed, Jib does it all from Maven! mvn compile jib:build -Djib.to.image=gcr.io/$PROJECT_ID/genkit-java-app # Deploy to Cloud Run gcloud run deploy genkit-java-app \ --image gcr.io/$PROJECT_ID/genkit-java-app \ --region $REGION \ --platform managed \ --allow-unauthenticated \ --set-env-vars "GOOGLE_API_KEY=$GOOGLE_API_KEY" \ --memory 512Mi \ Two commands. No Docker. Your Java GenAI application is now live on a globally-distributed, auto-scaling, serverless platform. Why Jib? No Dockerfile – Container image is built directly from your Maven projectNo Docker daemon – Doesn’t require Docker installed or running on your machineFast rebuilds – Separates dependencies, classes, and resources into layers, so only changed layers are rebuiltReproducible – Builds are deterministic and don’t depend on the local Docker environmentDirect push – Sends the image straight to GCR/Artifact Registry without a local docker push You can also build a local Docker image (requires Docker running) with: Shell mvn compile jib:dockerBuild -Djib.to.image=genkit-java-app Available Flows and API Examples Once the server is running, test the translate flow: Translate Text Send a TranslateRequest JSON object and receive a structured TranslateResponse: Shell curl -X POST http://localhost:8080/api/flows/translate \ -H 'Content-Type: application/json' \ -d '{"text": "Building AI applications has never been easier", "language": "Spanish"}' Example response (a TranslateResponse object): JSON { "originalText": "Building AI applications has never been easier", "translatedText": "Construir aplicaciones de IA nunca ha sido tan fácil", "language": "Spanish" Try other languages: Shell # French curl -X POST http://localhost:8080/api/flows/translate \ -H 'Content-Type: application/json' \ -d '{"text": "Genkit makes Java AI development simple", "language": "French"}' # Japanese curl -X POST http://localhost:8080/api/flows/translate \ -H 'Content-Type: application/json' \ -d '{"text": "Hello world", "language": "Japanese"}' Notice how the response is always a structured JSON object, not a raw string. That’s the power of outputClass(TranslateResponse.class). Gemini returns structured data that Genkit deserializes into your Java class automatically. What Genkit Gives You for Free When you use Genkit, you’re not just getting a wrapper around API calls. You get a production-grade framework: Observability (Zero Config) Every flow execution is automatically traced with OpenTelemetry: Latency tracking per flow, per model callToken usage (input/output/thinking tokens)Error rates and failure trackingSpan hierarchy showing the full execution path Plugin Ecosystem Need to swap Gemini for another model? Change one line: Java // Switch from Gemini to OpenAI .plugin(OpenAIPlugin.create()) // Or use Anthropic Claude .plugin(AnthropicPlugin.create()) // Or run locally with Ollama Genkit supports 10+ model providers, vector databases (Pinecone, Weaviate, PostgreSQL), Firebase integration, and more. Type Safety This is where Genkit really shines for Java developers. Flows, generate calls, and even LLM responses are fully typed: Java // The flow takes a TranslateRequest and returns a TranslateResponse genkit.defineFlow("translate", TranslateRequest.class, TranslateResponse.class, ...); // The LLM returns a TranslateResponse directly, no string parsing genkit.generate( GenerateOptions.<TranslateResponse>builder() .outputClass(TranslateResponse.class) .build() ); Genkit derives JSON schemas from your @JsonProperty and @JsonPropertyDescription annotations and sends them to Gemini, so the model returns structured data that maps directly to your Java classes. No Object casting, no response.getText() + objectMapper.readValue(), no runtime surprises. What’s Next? This getting-started project covers the fundamentals. Genkit Java can do much more: RAG – Retrieval-augmented generation with vector stores (Firestore, Pinecone, pgvector, Weaviate)Multi-agent orchestration – Coordinate multiple AI agentsChat sessions – Multi-turn conversations with session persistenceEvaluations – RAGAS-style metrics to measure your AI output qualityMCP integration – Connect to Model Context Protocol serversSpring Boot – Use the Spring plugin instead of Jetty for existing Spring appsFirebase – Deploy as Cloud Functions with Firestore vector search Explore the full Genkit Java documentation and the samples directory to dive deeper. Conclusion As you can see, it is very easy to use Genkit Java and Gemini to build powerful generative AI applications with minimal code. The combination of typed inputs/outputs, structured LLM responses, built-in observability, and seamless deployment makes Genkit Java the best way to build GenAI features in Java. You can find the full code of this example in the GitHub repository. Happy coding! More

Cutting P99 Latency From ~3.2s To ~650ms in a Policy‑Driven Authorization API (Python + MongoDB)

By Rohit Muthyala

Modern authorization endpoints often do more than approve a request. They evaluate complex policies, compute rolling aggregates, call third‑party risk services, and enforce company/card limits, all under a hard latency budget. If you miss it, the transaction fails, and the failure is customer-visible. This post walks through a practical approach to take a Python authorization API from roughly ~3.2s P99 down to ~650ms P99, using a sequence of changes that compound: query/index correctness, deterministic query planning, connection pooling and warmup, and parallelizing third‑party I/O. 1. Baseline First: Measure Where The Tail Comes From Before optimizing, capture: End‑to‑end p50/p95/p99A dependency breakdown (DB vs application vs third‑party)Production-like load and dataset scale Minimal timing middleware (FastAPI): Python import time from fastapi import FastAPI, Request app = FastAPI() @app.middleware("http") async def timing_middleware(request: Request, call_next): start = time.perf_counter() resp = await call_next(request) elapsed_ms = (time.perf_counter() - start) * 1000 resp.headers["Server-Timing"] = f"app;dur={elapsed_ms:.2f}" return resp Structured spans around dependencies: Python import time import logging log = logging.getLogger(__name__) class Span: def __init__(self, name: str): self.name = name def __enter__(self): self.t0 = time.perf_counter() return self def __exit__(self, exc_type, exc, tb): dt_ms = (time.perf_counter() - self.t0) * 1000 log.info("span=%s duration_ms=%.2f", self.name, dt_ms) def authorize(req): with Span("db.rollup"): rollup = compute_rollup(req) with Span("third_party.risk"): risk = call_risk(req) with Span("policy.eval"): decision = eval_policy(req, rollup, risk) return decision This tells you whether P99 is dominated by slow queries, connection setup, third‑party calls, or CPU. 2. Fix Index Usage: Composite Indexes Work Left to Right A common P99 killer: an index exists, but the query shape prevents using it efficiently. Assume a transaction document: JSON { "company": "acme", "role": "employee_123", "card": "card_456", "transaction_date": "2026-01-25T12:00:00Z", "amount_cents": 1299 } And a composite index: Python from pymongo import ASCENDING db.transactions.create_index([ ("company", ASCENDING), ("role", ASCENDING), ("card", ASCENDING), ("transaction_date", ASCENDING), ]) The Pitfall: Missing the Left-Most Field(s) Bad (omits company): Python query = { "role": role_id, "card": card_id, "transaction_date": {"$gte": start, "$lt": end}, } Better: Python query = { "company": company_id, "role": role_id, "card": card_id, "transaction_date": {"$gte": start, "$lt": end}, } Verify With explain(). Python plan = db.transactions.find(query).explain() print(plan["queryPlanner"]["winningPlan"]) If you see COLLSCAN or a plan not using the intended index, that’s usually a direct line to tail latency. 3. Make Query Planning Deterministic (When You Must) Even with correct indexes, intermittent P99 spikes can come from the query planner choosing a different index based on changing stats or data shape. Detect plan variance. Python def plan_and_run(coll, query, projection=None, limit=0): plan = coll.find(query, projection=projection).limit(limit).explain() winning = plan["queryPlanner"]["winningPlan"] rows = list(coll.find(query, projection=projection).limit(limit)) return winning, rows If slow traces correlate with a different winning plan, you can: Adjust indexes so the correct plan is always best, orApply a selective hint for the most latency-sensitive queries Hint example (MongoDB): Python idx_name = "company_1_role_1_card_1_transaction_date_1" cursor = db.transactions.find(query).hint(idx_name) Use hints carefully: they can become wrong as the data distribution evolves. If you hint, add monitoring (e.g., periodic explain() in staging with production-like data). 4. Stop Paying Connection Setup on the Request Path: Pool + Warm Connection setup costs can be surprisingly high (TCP/TLS/auth/discovery), and cold pods can drag P99. Use a single global pooled client per process. Python from pymongo import MongoClient mongo = MongoClient( MONGO_URI, maxPoolSize=200, minPoolSize=20, serverSelectionTimeoutMS=250, connectTimeoutMS=250, socketTimeoutMS=900, ) db = mongo["payments"] Warmup on startup (avoid cold-start penalties). Python def warmup(): db.command("ping") db.transactions.find_one({"company": "__warmup__"}) db.limits.find_one({"company": "__warmup__"}) warmup() In Kubernetes, run warmup before the service reports readiness so traffic only hits warmed instances. 5. Parallelize Third‑Party I/O With Local/DB Work Third‑party risk/fraud calls often dominate the tail. If you call synchronously, you idle while waiting. A better pattern is to start the call first, do independent work, then join with a strict timeout. Threaded overlap pattern (requests + ThreadPoolExecutor). Python import requests from concurrent.futures import ThreadPoolExecutor, TimeoutError SESSION = requests.Session() EXEC = ThreadPoolExecutor(max_workers=64) def risk_call(payload, timeout_s: float = 1.1): resp = SESSION.post( "https://risk.example.com/score", json=payload, timeout=(0.2, timeout_s), # connect, read ) resp.raise_for_status() return resp.json() def authorize(req): fut = EXEC.submit(risk_call, {"company": req.company, "amount": req.amount_cents}) # Independent work while the network call is in flight rollup = compute_rollup(req) # DB aggregation / cached rollups base = eval_local_policy(req, rollup) # pure CPU / local logic # Join with timeout and safe fallback behavior try: risk = fut.result(timeout=1.1) except TimeoutError: risk = {"status": "timeout", "score": None} except Exception: risk = {"status": "error", "score": None} return finalize_decision(base, risk) This preserves a hard cap while reclaiming idle time. It’s one of the most reliable ways to reduce P99 when a dependency is volatile. 6. Prevent Regression: Make Performance A Testable Contract Once you’ve improved P99, keep it. Add a repeatable performance test that: Runs the hot path thousands of timesAsserts p95/p99Uses a realistic dataset scale and index configuration Python import time import statistics def run_trials(n=3000): times = [] for _ in range(n): t0 = time.perf_counter() authorize(sample_request()) times.append((time.perf_counter() - t0) * 1000) times.sort() return { "p50": statistics.median(times), "p95": times[int(0.95 * len(times)) - 1], "p99": times[int(0.99 * len(times)) - 1], "max": max(times), } print(run_trials()) Takeaways If you’re trying to move tail latency (not just average latency), these tend to be the biggest levers: Query shape + index correctness (especially composite indexes)Stabilizing query planning (index tuning or selective hinting)Connection pooling and warmup (remove cold-start penalties)Overlapping third‑party I/O with useful work (reduce idle wait) Apply them in that order, validate each change with P99 measurements, and you’ll get repeatable gains instead of “optimizations” that don’t show up where it matters. More

How to Configure JDK 25 for GitHub Copilot Coding Agent

By Bruno Borges

Data Driven API Testing in Java with Rest-Assured and TestNG: Part 1

By Faisal Khatri

CORE

Building a Sentiment Analysis Pipeline With Apache Camel and Deep Java Library (DJL)

By Vignesh Durai

Hurley: A High-Performance HTTP Client and Load Testing Tool Engineered in Rust

Introduction and Motivation This article examines the technical architecture, capabilities, and use cases of hurley, a project developed in Rust that functions as both a general-purpose HTTP client and a performance testing tool. It explores the efficiency advantages gained by managing API testing and performance analysis through a unified tool within software development processes. With the proliferation of microservices architectures and distributed systems, communication via the HTTP protocol has become the lifeblood of the software ecosystem. In this context, developers face two fundamental needs: (1) A flexible HTTP client to verify the functional correctness of API endpoints, and (2) Performance testing tools to analyze system behavior under load. Typically, distinct toolsets are employed for these two requirements (specialized HTTP clients vs. load testing tools like `wrk` or `Apache Benchmark`). hurley aims to minimize context switching in development and testing processes and offer a unified testing experience by consolidating these two functions into a single command-line interface (CLI). Core Capabilities and HTTP Client Mode hurley features a client mode that demonstrates full compliance with modern HTTP standards. It supports all fundamental operations required by RESTful architectures. 1. Protocol Support and Request Structure The tool supports all standard HTTP methods (GET, POST, PUT, DELETE, PATCH, HEAD). Request configuration can be flexibly structured via command-line arguments: Header management: Defining custom HTTP headers using the -H parameter.Payload management: Inline data submission via the -d parameter or file-based data streaming via the -f parameter.Redirect policies: Automatic tracking of HTTP 3xx series responses using the -L parameter. Shell # Example: POST request containing custom headers and payload hurley -X POST https://api.example.com/v1/resource \\ -H "Content-Type: application/json" \\ -H "X-Client-ID: system-a" \\ -d '{"key": "value", "timestamp": 1678900000}' Performance Testing and Load Simulation A distinguishing feature of the tool is its ability to instantaneously transform existing HTTP requests into a load test without requiring external configuration. 1. Concurrency Model hurley is built upon Rust’s `Tokio` asynchronous runtime. This architecture allows for the management of a high number of concurrent connections while utilizing system resources (CPU and Memory) at minimal levels. The intensity of the test scenario is determined by the -c (concurrency) and -n (total requests) parameters. 2. Dataset-Based Stochastic Testing To simulate real-world traffic patterns, hurley supports non-deterministic test scenarios. Through a dataset defined in JSON format, requests with different endpoints, methods, and payloads can be distributed randomly or sequentially. This approach is critical for eliminating the misleading effects of cache mechanisms (cache warming bias) and measuring the general stability of the system. JSON /* Example Dataset Schema */ [ { "method": "GET", "path": "/api/users/101" }, { "method": "POST", "path": "/api/orders", "body": { "id": 55, "item": "A-1" } } ] Performance Metrics and Statistical Analysis In reporting test results, the tool goes beyond average values to present statistical distribution analyses. Percentile metrics are vital for detecting tail latency. The fundamental metrics reported include: Throughput: The number of requests processed per second (RPS).Latency distribution: P50 (Median): The completion time for 50% of requests.P95 and P99: System performance in the slowest 5% and 1% segments. These values are critical indicators for Service Level Agreement (SLA) compliance.Jitter: The standard deviation and range of variation in response times. Shell Statistical Summary Total Requests: 1000 Error Rate: 0.00% Requests/sec: 450.25 Latency Distribution (Percentiles) p50 (Median): 45.12 ms p95: 120.45 ms p99: 210.88 ms Technical Architecture The project is constructed upon the performance and reliability-oriented libraries of the Rust ecosystem: Asynchronous I/O: Non-blocking I/O operations via `tokio` and `reqwest` libraries.Statistical computation: High dynamic range histogram analysis via the `hdrhistogram` library.Error management: Deterministic management of runtime errors via a strong type system and the `thiserror` library. These architectural choices ensure that the tool delivers performance at the C/C++ level without compromising memory safety. Installation hurley can be installed via Cargo, Rust’s package manager, or built from source. Via Cargo (Recommended): Shell cargo install hurley From Source: Shell git clone https://github.com/dursunkoc/hurley.git cd hurley cargo build --release The binary will be available at `target/release/hurley.` Conclusion The project continues to be developed as open source, with plans to add features such as distributed testing capabilities and HTTP/3 support to the roadmap. Project source code: https://github.com/dursunkoc/hurley

By Dursun Koç

CORE

A Unified Defense Against MITRE’s Top Injection Attacks

This is how I created a Go library to address 41 actively exploited vulnerabilities. The Problem That Keeps Security Teams Up at Night On December 11, 2025, MITRE released its annual 2025 CWE Top 25 Most Dangerous Software Weaknesses list, analyzing 39,080 CVE records from the past year. The results should concern every developer. Four injection vulnerabilities alone account for 41 Known Exploited Vulnerabilities (KEVs) — these aren’t theoretical risks. They’re being actively exploited by attackers right now: Rank Vulnerability KEVs Score#1Cross-site scripting (XSS)760.38#2SQL injection428.72#6Path traversal108.99#9OS command injection207.85 Look at that last row: 20 KEVs for command injection. That’s 20 vulnerabilities in production systems that attackers are actively using to compromise organizations. As a Go developer building web applications and APIs, I went looking for a unified library that would help me defend against all of these. Here’s what I found: bluemonday: Excellent for HTML/XSS sanitization, but nothing elsego-sanitize: General purpose, but incomplete coveragesqlx/database/sql: Parameterized queries help, but don’t validate identifiersStandard library: Requires knowing exactly what to call and when No unified solution existed. So I built one. Introducing go-safeinput go-safeinput is a context-aware input sanitization library for Go that addresses the top injection vulnerabilities in the MITRE CWE Top 25. It’s built with three core principles: 1. Context-Aware Sanitization Different output contexts require different sanitization strategies. What’s safe for HTML isn’t safe for SQL, and what’s safe for SQL isn’t safe for file paths. Go package main import ( "fmt" "github.com/ravisastryk/go-safeinput" ) func main() { s := safeinput.Default() userInput := "<script>alert('xss')</script>Hello!" // HTML context: strip dangerous tags htmlSafe, _ := s.Sanitize(userInput, safeinput.HTMLBody) fmt.Println(htmlSafe) // Output: Hello! // File path context: prevent traversal pathInput := "../../etc/passwd" _, err := s.Sanitize(pathInput, safeinput.FilePath) fmt.Println(err) // Output: path traversal detected // SQL identifier context: validate pattern sqlInput := "users; drop tbl--" _, err = s.Sanitize(sqlInput, safeinput.SQLIdentifier) fmt.Println(err) // Output: invalid SQL identifier } 2. Defense-in-Depth Even when using prepared statements (and you should always use prepared statements for values), there are cases where you need to dynamically construct SQL — like table names or column names. These can’t be parameterized. Go // The problem: table names can't be parameterized tableName := r.URL.Query().Get("table") db.Query("SELECT * FROM " + tableName + " WHERE id = ?", id) // Dangerous! // The solution: validate the identifier first s := safeinput.Default() // Defense layer 1: Validate identifier pattern tableName, err := s.Sanitize(userInput, safeinput.SQLIdentifier) if err != nil { return fmt.Errorf("invalid table name: %w", err) } // Defense layer 2: Parameterized query for values db.Query("SELECT * FROM " + tableName + " WHERE id = ?", id) // Safe! 3. Zero External Dependencies The entire library uses only Go’s standard library. No transitive dependencies, no supply chain risks, no version conflicts. Go module github.com/ravisastryk/go-safeinput go 1.24 That’s the entire go.mod. Your security library shouldn't add attack surface. Implementation Deep Dive XSS Prevention (CWE-79) Cross-site scripting has held the #1 spot for two consecutive years with a score of 60.38 — more than double the #2 vulnerability. The HTML sanitizer strips dangerous elements while preserving safe content: Go // html/sanitizer.go var ( scriptPattern = regexp.MustCompile(`(?i)<script[\s\S]*?</script>`) stylePattern = regexp.MustCompile(`(?i)<style[\s\S]*?</style>`) iframePattern = regexp.MustCompile(`(?i)<iframe[\s\S]*?</iframe>`) eventPattern = regexp.MustCompile(`(?i)\s+on\w+\s*=\s*["'][^"']*["']`) ) func (s *Sanitizer) SanitizeBody(input string) string { result := scriptPattern.ReplaceAllString(input, "") result = stylePattern.ReplaceAllString(result, "") result = iframePattern.ReplaceAllString(result, "") result = eventPattern.ReplaceAllString(result, "") // ... more patterns return strings.TrimSpace(result) } For attribute values, we escape rather than strip: Go func (s *Sanitizer) SanitizeAttribute(input string) string { return html.EscapeString(input) // Uses Go's standard library } Path Traversal Prevention (CWE-22) With 10 KEVs, path traversal is one of the most actively exploited vulnerabilities. The path sanitizer blocks multiple attack vectors: Go // path/sanitizer.go var blockedSequences = []string{ "..", "../", "..\\", // Basic traversal "..%2f", "..%5c", "%2e%2e", // URL-encoded "..%252f", "..%255c", // Double-encoded ".%2e", "%2e.", // Mixed encoding } func (s *Sanitizer) Sanitize(input string) (string, error) { // Block null bytes (truncation attacks) if strings.ContainsRune(input, 0) { return "", ErrInvalidCharacter } // Check for traversal sequences lower := strings.ToLower(input) for _, seq := range blockedSequences { if strings.Contains(lower, seq) { return "", ErrPathTraversal } } // Ensure path stays within base directory if s.basePath != "" { absResult, _ := filepath.Abs(filepath.Join(s.basePath, cleaned)) if !strings.HasPrefix(absResult, absBase) { return "", ErrOutsideBasePath } } return cleaned, nil } SQL Injection Prevention (CWE-89) SQL injection moved up to #2 this year. While parameterized queries handle values, identifiers need pattern validation: Go // sql/sanitizer.go var dangerousPatterns = []*regexp.Regexp{ regexp.MustCompile(`(?i)(\bor\b|\band\b)\s*[\d'"]+\s*=\s*[\d'"]+`), regexp.MustCompile(`(?i)--`), regexp.MustCompile(`(?i)/\*`), regexp.MustCompile(`(?i)\bunion\b.*\bselect\b`), regexp.MustCompile(`(?i)\b(benchmark|sleep|waitfor)\b`), } var validIdentifier = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*$`) func (s *Sanitizer) SanitizeIdentifier(input string) (string, error) { if !validIdentifier.MatchString(input) { return "", ErrInvalidIdentifier } if reservedWords[strings.ToLower(input)] { return "", ErrReservedWord } return input, nil } Command Injection Prevention (CWE-78) With a staggering 20 KEVs, OS command injection is the most actively exploited vulnerability on this list. The shell argument sanitizer uses an allowlist approach: Go // safeinput.go func SanitizeShellArg(input string) string { var b strings.Builder for _, r := range input { if isAllowedShellChar(r) { b.WriteRune(r) } } return b.String() } func isAllowedShellChar(r rune) bool { return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || (r >= '0' && r <= '9') || r == '-' || r == '_' || r == '.' || r == '/' } Production-Ready Features CI/CD The repository includes GitHub Actions workflows for: golangci-lint: 20+ linters including gosec, staticcheck, and revivegosec: Security-focused static analysis with SARIF outputgovulncheck: Go vulnerability database checkingCodeQL: GitHub’s semantic security analysisTrivy: Container vulnerability scanning Docker Support Multi-stage Dockerfile for minimal production images: Dockerfile FROM golang:1.24-alpine AS builder # Build and test FROM scratch AS production COPY --from=builder /app/bin/safeinput /safeinput USER 65534:65534 ENTRYPOINT ["/safeinput"] Test Coverage The library maintains >90% test coverage with tests for: Normal inputsEdge casesKnown attack vectorsURL-encoded bypassesDouble-encoding attemptsNull byte injections Getting Started Installation: Go go get github.com/ravisastryk/go-safeinput Basic usage: Go package main import ( "log" "github.com/ravisastryk/go-safeinput" ) func main() { // Create sanitizer with secure defaults s := safeinput.Default() // Or customize configuration s = safeinput.New(safeinput.Config{ MaxInputLength: 5000, StrictMode: true, StripNullBytes: true, BasePath: "/var/www/uploads", }) // Sanitize based on context safe, err := s.Sanitize(userInput, safeinput.HTMLBody) if err != nil { log.Printf("Invalid input: %v", err) return } // Use the sanitized value renderHTML(safe) } What’s Next The library is actively maintained with plans for: CWE-94: Code injection prevention for template enginesCWE-918: SSRF prevention helpersFramework middleware: Drop-in integration for Gin, Echo, and Chi Conclusion MITRE’s 2025 CWE Top 25 makes one thing clear: injection vulnerabilities aren’t going away. They’ve been at the top of these lists for years, and attackers continue to exploit them successfully. As developers, we have a responsibility to build secure software. Libraries like go-safeinput won’t solve every security problem, but they can eliminate entire classes of vulnerabilities when used correctly. The code is open source, the tests are comprehensive, and the API is simple. Give it a try: GitHub: github.com/ravisastryk/go-safeinputDocumentation: pkg.go.dev/github.com/ravisastryk/go-safeinput References MITRE 2025 CWE Top 25 Most Dangerous Software WeaknessesCISA Announcement: 2025 CWE Top 25MITRE Unveils 2025 List of Top 25 Most Dangerous Software WeaknessesCISA Secure by Design Ravi Sastry Kadali is a security-focused software developer. Connect on LinkedIn or follow on GitHub.

By Ravi Sastry Kadali

Production-Ready Observability for Analytics Agents: An Open Telemetry Blueprint Across Retrieval, SQL, Redaction, and Tool Calls

An analytics agent works great in demos: ask a question, and it fetches context, runs SQL queries, and summarizes the results. Then the real incident happens: a VP challenges a number, the security team asks whether restricted fields were exposed, or an auditor requests to see how the answer was produced and which controls were applied. Most teams can’t answer confidently because their observability was built for latency and debugging — not governance. They either: log everything such as prompts, retrieved chunks, tool transcripts, and accidentally create a shadow warehouse in the logging system, orlog too little and have no traceability when something goes wrong (this exact failure is repeatedly called out in security postmortems that have no audit trail). This article gives you a practical blueprint: OpenTelemetry semantic conventions for agents — a trace spine that connects policy decisions, retrieval provenance, SQL execution evidence, verification, redaction, and every tool call. If your org or team already uses OTel for microservices or Kubernetes, this is the missing layer that makes agents production-grade: measurable, debuggable, and audit-ready. The Enterprise Gap: Agents Need Traceability, Not Just Logs For analytics agents specifically, failures are often silent: SQL runs successfully, but the answer is wrong (wrong join path, wrong grain, missing filter).The agent “checked policy” but still leaked data via summaries.A prompt injection shifts tool behavior, and your logs become the exfiltration channel. So the correct framing is that observability is a governance control surface. Architecture at a Glance: The Agent Trace Spine One user request → one trace with a consistent set of spans: agent.request: request envelope and routingpolicy.evaluate: decision and controls appliedretrieval.*: provenance (vector / graph / semantic layer)db.query + verification.checks: SQL evidence and faithfulness checksai.generate: model call metrics (no raw prompt)redaction.apply: output sanitization evidencetool.call: any evidence-producing action (catalog, ticketing, feature store, etc.) You can implement this in any stack, but the point is standardization: the same span names and attributes across teams, services, and tools. Optimization 1: Make Observability a Cross-Cutting Advisor, Not Scattered Code Create an Agent Telemetry Advisor that wraps retrieval calls, tool calls, SQL execution, redaction, and policy checks, and emits spans and events in a consistent way. What this buys you: Instrumentation doesn’t get forgotten in new tools.Policy and redaction become observable by default.You can centrally enforce “no raw payloads in telemetry.” Advisor responsibilities: Start and propagate trace context (W3C trace context).Emit standardized spans for each stage.Scrub or hash sensitive attributes before export.Attach stable IDs such as request_id, tenant_id, policy_version, and dataset IDs. Optimization 2: Define Governance-First Semantic Conventions A. Root Span: agent.request Purpose: correlate everything; support multi-turn sessions Recommended attributes: agent.request_idagent.session_idagent.channelagent.purposeenduser.id_hash (salted hash; no raw email)ai.pipeline_version B. Policy Span: policy.evaluate Attributes: policy.enginepolicy.bundle_versionpolicy.decisionpolicy.reason_codespolicy.controls_applied (row_filter, column_mask, semantic_layer_required)policy.risk A common failure this catches is policy checked but not enforced. You’ll see missing controls or a mismatch between policy intent and downstream SQL enforcement. C. Retrieval Spans retrieval.vector / retrieval.graph / retrieval.semantic_layer Attributes: retrieval.top_kretrieval.items_countretrieval.index_nameretrieval.query_typeretrieval.source_types Events: retrieval.item_hashretrieval.source_idretrieval.source_version Common failure caught here: stale definitions or wrong sources (e.g., a metric definition was updated, but retrieval pulled an older version). D. SQL Span: db.query and verification span: verification.checks Use standard OTel DB fields where possible, plus governed analytics fields such as: db.systemdb.operationsql.interfacesql.fingerprintsql.datasets_touchedsql.row_filter_enforcedsql.columns.classification_countssql.result_rowcount_bucketsql.plan_hash or sql.query_id Verification attributes: verify.checksverify.statusverify.failure_code Common SQL failures caught: bypassing the semantic layer, runaway scans, and joins touching restricted datasets. Verification spans turn “plausible but wrong” answers into explicit signals. E. Model Span: ai.generate Attributes: ai.model, ai.providerai.input_tokens, ai.output_tokensai.latency_ms, ai.prompt_hashai.cost_bucket F. Redaction Span: redaction.apply Attributes: redaction.appliedredaction.typesredaction.countsredaction.ruleset_version Common failures caught: secrets or PII in output, and redaction-disabled regressions. G. Tool Span: tool.call Attributes: tool.name, tool.operationtool.statustool.retriestool.latency_mstool.error_code Optimization 3: Add Cost and Control Signals Useful attributes to add: agent.reasoning_steps (bucketed: 1, 2–3, 4–5, 6+)agent.tool_fanoutagent.retry_countagent.fallback_usedagent.abstained Then build dashboards such as fanout vs. latency, fanout vs. token usage, policy denies by tenant, semantic-layer usage rate, and verification failure rate. This turns tracing into an operational guardrail — not just a recorder. Optimization 4: Make It Audit-Ready Without Turning Telemetry into a Data Leak Practical rules: Hash content and identifiers.Store classifications and counts, not raw values.Prefer dataset IDs and policy versions over human-readable names if sensitive. Split retention tiers: Short retention for verbose debug tracesLonger retention for MVE-style governance traces (policy, provenance hashes, SQL fingerprints) What a Good Trace Answers in 30 Seconds With these conventions, you can answer: Was it allowed? → policy.evaluate decision, reason, and controlsWhat influenced the answer? → retrieval source_id, item_hash, versionsWhat data was touched? → SQL datasets, classifications, enforcement flagsWas it faithful? → verification checks and statusDid we sanitize output? → redaction span evidenceWhy did it cost so much? → tool fanout, retries, token counts Sample Example JSON { "span.name": "policy.evaluate", "agent.request_id": "b7c1-…", "agent.tenant_id": "t-42", "enduser.id_hash": "u:9ad3-…", "policy.engine": "OPA", "policy.bundle_version": "2026-01-15.3", "policy.decision": "allow_with_redaction", "policy.reason_codes": ["ROW_FILTER_APPLIED", "MASK_SENSITIVE_FIELDS"], "policy.controls_applied": ["ROW_FILTER", "COLUMN_MASK", "SEMANTIC_LAYER_REQUIRED"], "policy.risk": "medium" } Conclusion Production-ready GenAI systems don’t win because they prompt better. They win because they make correctness, compliance, and cost measurable and enforceable. Standardizing agent traces with OpenTelemetry semantic conventions is one of the fastest ways to get there. It gives engineers faster debugging, security teams a safer evidence trail, and auditors a consistent chain — from request to policy to retrieval to SQL to redaction to response — without dumping sensitive payloads into your logging stack.

By Anusha Kovi

10 Go Best Practices Every Backend Developer Should Know

Go has become a cornerstone language for building scalable backend services, cloud-native applications, and DevOps tooling. While Go’s simplicity is one of its greatest strengths, writing production-ready Go code requires more than just knowing the syntax. This guide distills practical best practices that teams can adopt to improve code quality, consistency, and maintainability. If you're new to Go, start with the official Go documentation and Effective Go. This article builds on those foundations with team-specific patterns that work well in enterprise environments. 1. Tooling: Your First Line of Defense a. Formatting and Linting Consistency is king in collaborative codebases. Go provides excellent built-in tooling: gofmt: Always use gofmt to ensure consistent formatting across your codebase. This eliminates bikeshedding about code style.golangci-lint: Use golangci-lint for comprehensive static analysis. Make sure your CI build fails on lint errors — this catches issues before they reach code review. b. Mock Generation Use go:generate with mockgen to facilitate mock generation. Your Makefile should include a target to generate and update mocks, making it easy for developers to keep mocks in sync with interfaces. 2. Code Organization: Structure for Clarity a. Repository Structure Follow the Standard Go Project Layout. This convention makes it easy for new team members to navigate your codebase and sets clear expectations about where different types of code live. b. Import Statement Ordering Import statements should be grouped and ordered consistently: Standard libraryThird-party librariesOrganization/company librariesCurrent project packages ❌ Bad example: Go import ( "context" "github.com/sirupsen/logrus" "github.company.com/org/repo/gapi" "github.company.com/org/repo/snow" "github.company.com/org/repo/internal/clients/slack" gdocs "google.golang.org/api/docs/v1" "time" ) ✅ Good example: Go import ( "context" "time" "github.com/sirupsen/logrus" gdocs "google.golang.org/api/docs/v1" "github.company.com/org/repo/gapi" "github.company.com/org/repo/snow" "github.company.com/org/repo/internal/clients/slack" ) c. File Layout Within a Package After import statements, organize your Go code in this order: Constants and variables (const, var)Structs and interfacesExported (public) methodsNon-exported (private) methodsHelper methods This predictable structure makes code easier to navigate and review. 4. Naming: Keep It Simple and Non-Redundant Avoid names that create redundancy. The package name already provides context. ❌ Bad: Go type DeploymentTransformerHandlerIntf interface {} type DeploymentTransformerHandler struct {} ✅ Good: Go type DeploymentTransformerIntf interface {} type DeploymentTransformer struct {} The word Handler adds no value here; it’s just noise that makes the code harder to read. 5. Context Objects: Pass Them, Don’t Store Them When a context object is needed, pass it through each function call rather than storing it in a struct. This pattern ensures proper context propagation and cancellation handling. ❌ Bad: Go type Client struct { ctx context.Context } func (c *Client) DoSomething(foo string) { // use c.ctx - Don't do this! } ✅ Good: Go func (c *Client) DoSomething(ctx context.Context, foo string) { // use ctx } Rule of thumb: define your methods to accept a context.Context as the first parameter. 6. Method Signatures: Embrace Functional Options Avoid methods with long parameter lists. Instead, leverage Go’s functional options pattern for flexible, readable configuration. ❌ Bad: Go svr := server.New("localhost", 8080, time.Minute, 120) What do time.Minute and 120 mean? Without reading the function signature, it's impossible to tell. ✅ Good: Go svr := server.New( server.WithHost("localhost"), server.WithPort(8080), server.WithTimeout(time.Minute), server.WithMaxConn(120), ) This pattern is self-documenting, flexible for future additions, and makes optional parameters trivial to implement. 7. Error Handling: Context Is Everything a. Use Stack Traces Use yerrors or an equivalent library to provide stack frames with your errors. Wrap errors with additional context as needed. Why? Errors need enough context for engineers to debug. Explicitly specifying method names in error messages is fragile and has high maintenance cost. Stack traces solve this elegantly. b. Security Considerations Stack frames should be written to logs but never returned in API responses. Leaking implementation details to callers is a security risk. c. Log Levels Matter Error level: Reserve for internal server errors onlyWarning level: Use for user or client errors This distinction enables cleaner monitoring and alerting. Don’t Log at Every Level With proper error wrapping and stack frames, you don’t need to log errors at each level of code execution. Propagate the error upward and log it once at the top level. ❌ Bad: Go docSvc, err := docs.NewService(context, option.WithCredentialsJSON([]byte(gSASecret))) if err != nil { log.Errorf("ERROR :: Google Docs :: NewGoogleApiHandler :: Unable to create Google Docs service :: %v", err) return nil, err } docsHandler, err := gdocs.NewGoogleDocsHandler(docSvc, mc, l) if err != nil { log.Errorf("Error :: Google Docs :: NewGoogleApiHandler :: Unable to create Google Docs handler :: %v", err) return nil, err } ✅ Good: Go docSvc, err := docs.NewService(context, option.WithCredentialsJSON([]byte(gSASecret))) if err != nil { return nil, yerrors.Errorf("failed to create google docs service: %w", err) } docsHandler, err := gdocs.NewGoogleDocsHandler(docSvc, mc, l) if err != nil { return nil, yerrors.Errorf("failed to create google docs handler: %w", err) } The second approach shown is cleaner, more consistent, and the stack trace provides all the context you need. 8. Logging: Structure Over Strings Use Logrus for logging. Enable JSON format for non-local environments to make logs machine-parseable. a. Use Structured Logging ❌ Bad: Go h.Logger.Infof("Got Deployment Number: %+v", deployment) h.Logger.Errorf("Failed to write AI-generated summary. Error from WriteToRCADocument: %s", err) ✅ Good: Go log.WithFields(log.Fields{ "event": event, "topic": topic, "key": key, }).Infof("Processing new event") Structured logging makes it easy to search, filter, and aggregate logs in production. b. Handling Known Errors Gracefully When you’ve identified an error that can be safely ignored, it’s acceptable to log at debug level and continue. Go user, err := client.AddUser(context, userID) if err != nil { if err.Error() == "user_exists" { log.WithError(err).Debug("user exists, continuing") continue } return nil, yerrors.Errorf("unable to add user: %s, %w", userID, err) } c. Request-Scoped Logging with Transaction IDs For low-level code, expose a customizable LoggerFn to enable request-scoped logging with transaction IDs. Go package health type Handler struct { LoggerFn func(ctx context.Context) *logrus.Entry } type HandlerIntf interface { CheckHealth(ctx context.Context) } func (h *Handler) CheckHealth(ctx context.Context) { logger := h.LoggerFn(ctx) logger.Info("log something here") } Callers can then inject their own logger that extracts the transaction ID from context: Go healthHandler := health.Handler{ LoggerFn: middleware.RequestIDLogger, } This pattern enables consistent tracing across your entire request lifecycle. 9. HTTP: Prefer Context Timeouts Over Client Timeouts Use context-based timeouts instead of HTTP client timeouts. This provides more fine-grained control and integrates better with Go’s cancellation model. Go ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() req, _ := http.NewRequestWithContext(ctx, "GET", url, nil) resp, err := client.Do(req) 0. Unit Testing: Table-Driven Tests for the Win a. Use Testify Use Testify for easy assertions: assert.EqualValues: For maps, slices, or similar objects where ordering shouldn’t matterassert.EqualError: For simple error comparisons b. Table-Driven Test Structure Table-driven tests are the preferred pattern in Go. Structure your tests as follows: Test cases definition: Define all test cases with their inputs, expected outputs, and mocksTest driver code: Loop through and execute each test case Key principles: Test cases should be self-contained, including mocks and expectationsThe test driver should only call the code under test and perform assertions — no custom logicWhen using gomock, avoid Any() for parameters, return values, and call counts Example structure: Go func TestMyFunction(t *testing.T) { tests := []struct { name string input string expected string wantErr bool }{ { name: "valid input", input: "hello", expected: "HELLO", wantErr: false, }, { name: "empty input", input: "", expected: "", wantErr: true, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { result, err := MyFunction(tt.input) if tt.wantErr { assert.Error(t, err) return } assert.NoError(t, err) assert.Equal(t, tt.expected, result) }) } } Conclusion Writing production-ready Go code isn’t just about making things work — it’s about making them maintainable, debuggable, and consistent. By adopting these practices, you’ll: Reduce cognitive load through consistent formatting and organizationImprove debuggability with proper error handling and structured loggingEnable better testing through table-driven tests and proper mockingCreate self-documenting code with functional options and clear naming These aren’t arbitrary rules. Each practice solves a real problem teams encounter at scale. Start with the practices that address your biggest pain points, and gradually adopt the rest as your team matures. Have questions or additional best practices to share? Drop a comment below!

By Akshay Pratinav

Why “At-Least-Once” Is a Lie: Lessons from Java Event Systems at Global Scale

At-least-once delivery is treated like a safety net in Java event systems. Nothing gets lost. Retries handle failures. Duplicates are “a consumer problem.” It sounds practical, even mature. That assumption doesn’t survive production. At-least-once delivery is not useless. It is just not a correctness guarantee, and treating it like one is where systems get into trouble. Once you’re operating at scale, especially in regulated environments where correctness and auditability matter, at-least-once delivery stops being a comfort. It becomes a liability. Not because duplication exists, but because duplication collides with real software, real workflows, and real constraints, as the slogan never mentions. The myth survives because it fits a clean mental model. If something fails, retry. If a message appears twice, make the consumer idempotent. The pipeline keeps moving, and teams can claim reliability without confronting the harder question. Does the system preserve truth when reality gets messy? The First Crack: Semantic Duplication Two identical messages are rarely identical in effect. A retry can write a row twice, advance a workflow twice, emit a second downstream event, or trigger business logic that was never designed to be replayed. Engineers often talk about idempotency as if it were a single checkbox. In production, it is a boundary problem. Where does the “one true effect” begin and end? Getting the same message twice usually isn’t a crisis. The pain starts when that message is driving a workflow. Think of a simple chain like received, validated, and posted. The handler creates a row, updates the status, and fires an event downstream. Then a retry hits. Suddenly, the workflow can step forward again, and the next system sees a second “valid” transition. No alarms go off. Logs look normal. But you’ve just created a business error that’s hard to spot and even harder to unwind. The Second Crack: Temporal Corruption Event systems don’t only move data. They move history. Retries and replays bring old facts back into the present, where they can collide with newer states. If you have ever seen a reconciliation mismatch appear with no clear root cause, you’ve seen this failure mode. In regulated systems, the damage is not limited to incorrect numbers. The system loses its ability to explain itself. When you cannot prove how the state was derived, your “high availability” system turns into a high-availability generator of ambiguity. Replays get especially ugly when time matters. A reporting pipeline might rebuild a month-end report by replaying events from storage. If late-arriving events exist, or if the replay happens against a newer reference dataset, you can regenerate a report that does not match the report you generated last month, even with “the same events.” At that point, you do not have a reporting system. You have a narrative generator. In many environments, “we can’t reproduce last month” is not an inconvenience. It is a compliance incident. The Third Crack: False Confidence At-least-once delivery becomes a psychological shortcut. Teams stop reasoning deeply about correctness because delivery “guarantees” feel like safety. Monitoring focuses on throughput and consumer lag, not semantic accuracy. Errors don’t show up as crashes. They show up later, during audits, reconciliations, or customer disputes, after the trail is cold and the retention window has rolled forward. Recovery stops being engineering and becomes forensics. Why Java Makes This Worse Java makes these problems easier to create and harder to diagnose. Transaction boundaries rarely align with acknowledgment boundaries. Side effects leak through abstractions. ORMs flush state implicitly. Retries reenter code paths that were never designed to be replay-safe. Even the comforting presence of transactional annotations can encourage the wrong intuition, because database atomicity is not the same thing as system truth. The classic failure pattern looks like this: You process a message, you commit a database transaction, and then the acknowledgment fails because the network hiccups or the consumer process restarts. The broker redelivers. Your database now contains the effect, but your consumer sees the message again and replays the effect. Teams often believe transactional annotations save them here. They don’t. They only make the writing atomic, not the world. If you cannot detect that you have already applied the business effect, retries turn into silent duplication. What Actually Works at Scale At this point, teams often reach for exactly once semantics, hoping to buy certainty. That impulse is understandable, but it usually replaces one illusion with another. Exactly-once is not a magic property you turn on. It is a coordination problem, and at scale, it is expensive, leaky, and full of edge cases. The better framing is simpler and more honest. Reliable systems do not worship delivery guarantees. They control side effects. Controlling side effects changes how systems are designed. It pushes architects to model business state explicitly instead of letting it emerge from message flow. It forces a distinction between receiving an event and applying its meaning. In practice, this often leads to designs where messages describe intent, while durable state changes are validated, versioned, and recorded separately. This shift also changes how failures are handled. Instead of retrying blindly, systems become capable of answering a more important question. Has this effect already been applied, and under what conditions? When that answer is explicit, retries are no longer dangerous. They become boring. And boring is exactly what you want when correctness matters. This starts with treating replay as a normal operating condition, not an exceptional bug. Consumers should be designed to tolerate reprocessing without changing the business outcome. That means binding identifiers to business meaning, not transport mechanics. It means making state transitions explicit and versioned, so “apply again” can be proven to be safe. It means building an audit trail that explains what happened, when it happened, and why repeating it would not change the outcome. There is also a cost model here that teams tend to ignore. Every at-least-once system eventually pays an interest rate on duplicates. Early on, that interest is small and hidden. Later, it becomes engineering time spent writing compensations, building reconciliation jobs, cleaning up drift, and explaining mismatches to stakeholders. The cost is not the broker's. It’s the complexity tax you pay forever because you never defined what “the same effect” means. Closing Thoughts At-least-once delivery persists because it is convenient. It lowers the cost of getting started and hides complexity early. In small systems, the cost of being wrong is often tolerable. In large-scale, regulated, or globally distributed systems, costs accumulate quietly until they become intolerable. Java engineers building event-driven architectures should stop asking how often a message is delivered and start asking a harder question. Can the system ever prove what actually happened? Reliability at scale is not about delivery. It is about truth.

By Krishna Kandi

Beyond Ingestion: Teaching Your NiFi Flows to Think

If you are working with data pipelines, chances are you have crossed paths with Apache NiFi. For years, it's been the go-to way for getting data from point A to point B (and often C, D, and E). Its visual interface makes building complex routing, transformation, and delivery flows surprisingly easy, handling everything from simple log collection to intricate IoT data streams across countless organizations. It's powerful, it's flexible, and honestly, it just works really well for shuffling bits around reliably. We set up our sources, connect our processors, define our destinations, and watch the data flow — job done, right? AI Opportunity Well, mostly. While Apache NiFi is fantastic at the logistics of data movement, I started wondering: what if we could make the data smarter while it's still in motion? We hear about AI everywhere, crunching massive datasets after they've landed in a data lake or warehouse. But what about adding that intelligence during ingestion? Imagine enriching events, making routing decisions based on predictions, or flagging anomalies before the data even hits its final storage. That got me thinking about integrating AI directly within a NiFi flow. Sure, we can use processors InvokeHTTP to call out to external AI APIs, and that definitely has its place. But I couldn't find many hands-on examples showing how to embed and run an AI or machine learning model inside a custom NiFi processor using Java, while leveraging NiFi's scalability and data handling capabilities for the AI component as well. It felt like a gap, a missed opportunity to truly combine the strengths of both worlds right there in the pipeline. So, I decided to roll up my sleeves and figure out how to do it. Code In this article, I want to share what I learned. We will walk through building a custom NiFi processor in Java that loads and runs a real machine learning model (using the Deep Java Library, or DJL) to perform analysis directly on the FlowFile data as it passes through. No external calls needed for the core AI task! Let's dive into the code. You can refer to the full working code ( which generates NAR for NiFi) in my GitHub here. Below is the Java code, which is the main function where I have shown how to call PyTorch and DJL. This is a very simple use case where input is classified as positive, negative, or neutral based on incoming text. Example text: Plain Text I am very happy with the results, it exceeded my expectations --> positive This film was terribly boring and poorly acted. --> negative This also shows the score from the given model. Java public void loadModel(final ProcessContext context) { getLogger().info("Loading command line classification model..."); try { // Define criteria to load a text classification model. // *** IMPORTANT: Replace with a model fine-tuned if possible. *** // Using a generic BERT for sequence classification as a placeholder. Criteria<String, Classifications> criteria = Criteria.builder() //.optApplication(Application.NLP.TEXT_CLASSIFICATION) .setTypes(String.class, Classifications.class) // Input text, Output classification .optEngine("PyTorch") .optModelUrls("djl://ai.djl.huggingface.pytorch/distilbert-base-uncased-finetuned-sst-2-english") // If using a local model: .optModelPath(Paths.get("/path/to/your/model")) .optProgress(new ProgressBar()) .build(); this.model = criteria.loadModel(); this.predictor = model.newPredictor(); getLogger().info("Command line classification model loaded successfully."); } catch (Exception e) { getLogger().error("Failed to load command line classification model.", e); this.predictor = null; // Ensure predictor is null on failure // Throwing here will prevent the processor from starting if the model fails to load throw new RuntimeException("Failed to initialize AI model", e); } } Results: For simplicity, I have used the results as attributes of the flow file itself. Java code can be changed to add these enrichments to the flow file content, and at the end, the flow file can be routed to final outputs such as a data lakehouse or warehouse. Key Considerations Consider making our own models based on requirements. For example, if we want to find out whether the incoming URL is malicious or benign, it would be better to prepare our own model based on the organization's needs. Generally, the NiFi memory/CPU spikes won't be high if we code the NiFi customer processors in the right way, even when an AI/ML model is incorporated in them. But when we prepare our own AI model, it's better to consider these aspects of how it behaves for larger incoming flowfiles. I have used the PyTorch AI engine here, but there are other libraries to explore as well, such as TensorFlow, Apache MXNet, and ONNX Runtime.URLs can also be passed directly by the NiFi processor rather than embedded in the code. But we have to make sure all dependencies in pom.xml are already in place, else it will throw an error. Conclusion Adding AI smarts directly into NiFi with a custom processor isn't just theory; as we have seen, it's practically achievable using tools like DJL within Java. This approach lets you leverage NiFi's robust data handling while performing sophisticated analysis right in the flow. It moves AI processing closer to the data source, enabling immediate enrichment and smarter routing decisions. Give it a try — you might be surprised how much intelligence you can pack directly into your data pipelines. References https://djl.ai/ https://docs.djl.ai/master/index.htmlhttps://github.com/deepjavalibrary/djl/blob/master/docs/model-zoo.mdhttps://nifi.apache.org/docs/nifi-docs/

By Madhusudhan Dasari

Breaking the Vendor Lock in Network Automation: A Pure Python Architecture

In the world of Infrastructure as Code (IaC), servers are a solved problem. We spin up thousands of VMs with a single script. But the network layer? That often remains a manual bottleneck. The reason is the “Multi-Vendor Trap.” Enterprise networks are rarely homogeneous. They are a patchwork of routers, switches, and load balancers from different vendors (Cisco, Juniper, F5), each with its own proprietary CLI syntax. This fragmentation makes standard automation difficult, leading to long lead times (often weeks) just to open a VLAN or update a firewall rule. Based on recent case studies in large-scale financial systems, this article outlines a proven architecture for multi-vendor network automation. By abstracting vendor specifics into standardized Python functions, engineering teams can reduce operational lead times by 50% and cut manual effort by 80%. The Problem: The “Human Middleware” In traditional network operations, the engineer acts as the middleware. Request: A developer asks for a new subnet.Translation: The engineer translates this generic request into vendor-specific commands (e.g., Cisco IOS vs. Junos OS).Execution: The engineer SSHs into each device and pastes the commands.Validation: The engineer runs show commands to verify the state. This process is slow, error-prone, and heavily dependent on individual expertise. If the only engineer who knows the F5 syntax is on vacation, the deployment stalls. The Architecture: Abstraction via Python (Nornir) The solution is to decouple the intent (what we want to do) from the implementation (how we do it on specific hardware). While tools like Ansible use YAML, a pure Python approach using Nornir allows for more robust debugging, type checking, and complex logic handling. Layer 1: The Standardized Input (Intent) Instead of vague ticket descriptions, we define the desired state as a structured Python object (dictionary or data class). This acts as the “contract” for the automation. request_payload.py Python # The Intent: Vendor-Agnostic network_request = { "type": "vlan_addition", "params": { "vlan_id": 100, "name": "app_tier_01" }, "targets": ["switch_core_01", "switch_dist_01"] } Layer 2: The “Translator” Function This is the core architectural pattern. We write a Python task that acts as a translator. The script detects the device platform at runtime and injects the correct syntax. automation_engine.py Python from nornir import InitNornir from nornir_netmiko.tasks import netmiko_send_config def configure_vlan(task, vlan_id, vlan_name): """ The Abstraction Layer: Translates generic intent into vendor-specific commands. """ config_cmds = [] # Logic for Cisco IOS if task.host.platform == "cisco_ios": config_cmds = [ f"vlan {vlan_id}", f"name {vlan_name}" ] # Logic for Juniper Junos elif task.host.platform == "junos": config_cmds = [ f"set vlans {vlan_name} vlan-id {vlan_id}" ] # Execution if config_cmds: task.run(task=netmiko_send_config, config_commands=config_cmds) # 1. Initialize Inventory nr = InitNornir(config_file="config.yaml") # 2. Filter Targets based on Request targets = nr.filter(name__in=network_request["targets"]) # 3. Execute the Task across all targets in parallel result = targets.run( task=configure_vlan, vlan_id=network_request["params"]["vlan_id"], vlan_name=network_request["params"]["name"] ) This abstraction allows you to run the same script across a mixed fleet. If you swap a Cisco switch for an Arista one, you simply update the inventory platform variable; the automation logic handles the rest. Validation Design: The “Health Check” Pattern Automating the change is only half the battle. The biggest risk in network automation is breaking the “ecosystem.” A firewall change might succeed technically but break the application heartbeat. We implement a pre-check / post-check pattern using Python’s native comparison capabilities. Snapshot: Before the change, run Python scripts to capture the current state of routing tables.Execution: Apply the configuration change.Validation: Run the capture again and diff the results programmatically. Python validation logic: Python def validate_routes(pre_change_routes, post_change_routes): """ Compares routing tables sets to ensure no routes were lost. """ missing_routes = pre_change_routes - post_change_routes if missing_routes: raise Exception(f"CRITICAL: Routes lost after change: {missing_routes}") else: print("SUCCESS: Network state is healthy.") The Workflow: From Ticket to Trunk The final architecture integrates this logic into a CI/CD pipeline (e.g., Azure DevOps or Jenkins). Results and ROI Implementing this multi-vendor abstraction strategy yields significant operational improvements: Lead time reduction: Network delivery time dropped by 50% (from 10 days to 5 days).Effort reduction: Manual engineering hours reduced by 86%.Standardization: The codebase becomes the documentation. “Tribal knowledge” is converted into Git repositories, eliminating dependency on specific individuals. Conclusion Network automation in a multi-vendor environment is not about writing a separate script for every device. It is about architectural abstraction. By treating network changes as software (Python functions) and validation as unit tests, you can break vendor lock-in and build a network that moves as fast as your servers.

By Dippu Kumar Singh

The Death of the CSS Selector: Architecting Resilient, AI-Powered Web Scrapers

Introduction: The High Cost of Fragile Data Pipelines For over a decade, web scraping has been a game of cat and mouse. You write a script to scrape a job board, targeting specific DOM elements like div.job-title or span#salary. It works perfectly for a month. Then, the website deploys a frontend update. The class names change to random hashes (common in React/Next.js apps), your selectors fail, and your data pipeline crashes. The hidden cost of web scraping isn't the compute; it's the engineering maintenance hours spent debugging and fixing broken selectors. With the rise of large language models (LLMs), we have reached an inflection point. We no longer need to tell the scraper where the data is (the x-path or selector); we only need to tell it what the data is. In this article, we will analyze an architectural pattern for building agentic scrapers. These are systems that use visual rendering and semantic understanding to extract structured data from any website, regardless of its underlying HTML structure. The Architecture: The "Semantic Scraper" Stack To build a scraper that mimics human understanding, we need three distinct layers: The rendering layer (Playwright): To handle dynamic JavaScript and single-page applications (SPAs).The reasoning layer (AWS Bedrock and LangChain): To interpret the raw HTML and extract semantic meaning.The validation layer (Pydantic): To force the non-deterministic LLM to output strictly typed, API-ready JSON. Let's dissect how these layers interact to solve the "brittle scraper" problem. 1. The Validation Layer: Contract-First Development In traditional scraping, you write the extraction logic first. In AI-driven scraping, you define the data contract first. We use Pydantic to define exactly what a "Job Posting" looks like. This schema serves two purposes: it validates data quality in Python and, crucially, generates formatting instructions for the LLM. Python from typing import Union, List from pydantic import BaseModel, Field class JobInformationSchema(BaseModel): job_title: Union[str, None] = Field(description="The official title of the role") company_name: Union[str, None] = Field(description="Name of the hiring company") location_type: Union[str, None] = Field(description="Must be 'remote', 'onsite', or 'hybrid'") location: Union[List[str], None] = Field(description="List of physical locations") commitment: Union[str, None] = Field(description="e.g., 'full-time', 'contract'") description_summary: Union[str, None] = Field(description="A concise summary of the role, stripping HTML tags") By defining location_type and commitment with specific descriptions, we are essentially "programming" the LLM to normalize data automatically (e.g., converting "Work from home" in the HTML to "remote" in the JSON). 2. The Reasoning Layer: Cost vs. Intelligence The biggest argument against using LLMs for scraping is cost. Sending raw HTML to GPT-4 is prohibitively expensive for high-volume scraping. However, the economics have changed with the release of smaller, highly efficient models like Anthropic’s Claude 3 Haiku via AWS Bedrock. For extraction tasks, we don't need "reasoning" capability; we need "comprehension" capability. We use LangChain to orchestrate the prompt. The key technique here is injecting the Pydantic schema into the prompt instructions: Python async def extract_job_information(html_document, apply_url): # Setup the parser based on our Pydantic schema parser = PydanticOutputParser(pydantic_object=JobInformationSchema) prompt = PromptTemplate( template=""" You are a data extraction agent. Analyze the following HTML snippet. Extract the job title, company, and location details. Strictly follow these format instructions: {format_instructions} HTML Content: {html_document} """, input_variables=['html_document'], partial_variables={'format_instructions': parser.get_format_instructions()}, ) # Use Bedrock with Claude 3 Haiku for speed and low cost bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1') llm = ChatBedrock( model_id='anthropic.claude-3-haiku-20240307-v1:0', client=bedrock_client, model_kwargs={"temperature": 0.0} # Temperature 0 ensures deterministic output ) chain = prompt | llm | parser return await chain.ainvoke({"html_document": html_document}) Architectural note: Setting temperature: 0.0 is critical. We want the LLM to act as a deterministic extraction engine, not a creative writer. 3. The Rendering Layer: Handling the Modern Web BeautifulSoup and Requests are no longer sufficient for the modern web. Most job boards (Greenhouse, Lever, Workday) are React applications that hydrate content via JavaScript. If you just GET the URL, you receive an empty generic HTML shell. We use Playwright in async mode to launch a headless browser. This allows the page to load, execute JavaScript, and render the full DOM before we attempt extraction. Python async def scrape_dynamic_content(url: str): async with async_playwright() as p: # Launch Chromium in headless mode browser = await p.chromium.launch(headless=True, args=['--no-sandbox']) page = await browser.new_page() # Wait for the DOM to settle (network idle) await page.goto(url, wait_until="domcontentloaded") # Extract the raw HTML content content = await page.content() await browser.close() return content 4. Integration: The FastAPI Microservice To make this architecture usable in a production environment, we wrap it in FastAPI. This allows the scraper to be deployed as a scalable microservice (e.g., on AWS Fargate or Lambda). The async nature of FastAPI pairs perfectly with Playwright's async API, allowing the server to handle multiple scraping requests concurrently without blocking the event loop. Python @app.post("/extract-jobs", status_code=200) async def extract_jobs_endpoint(target_url: URL): # 1. Scrape raw HTML (Browser Layer) raw_html = await scraper.scrape(target_url.url) # 2. Clean HTML to save tokens (Optimization) cleaned_html = remove_script_tags(raw_html) # 3. Extract structured data (Reasoning Layer) structured_data = await extract_job_information(cleaned_html) return {"status": "success", "data": structured_data} Optimization Strategies: Making it Production-Ready While this pattern solves the brittleness problem, it introduces new challenges: latency and token costs. Here are three strategies to optimize this architecture: HTML cleaning: LLMs have context window limits and charge per token. Raw HTML is full of noise (<script>, <style>, SVG paths). Before sending HTML to Bedrock, use a regex or a lightweight parser to strip all non-content tags. This can reduce token usage by 60-80%.Pagination strategy: Do not feed an entire paginated list to the LLM at once. Use Playwright to detect pagination buttons, iterate through the pages to collect raw URLs first, and then process individual job pages in parallel batches.Hybrid approach: You don't always need AI. You can use this architecture to generate selectors. Use the LLM once to identify that the job title is in h2.css-1234, and then use standard scraping for the next 1,000 pages. If the selector fails, trigger the LLM again to "heal" the scraper by finding the new selector. Conclusion The era of maintaining regex patterns and CSS selectors is ending. By treating web pages as unstructured text and using LLMs to apply semantic structure, we can build data pipelines that are remarkably resilient. While the compute cost is higher than traditional scraping, the reduction in engineering maintenance and the reliability of the data stream make the semantic scraper pattern the superior choice for modern data engineering teams.

By Iyanuoluwa Ajao

A Guide to Parallax and Scroll-Based Animations

Parallax animation can transform static web pages into immersive, interactive experiences. While traditional parallax relies on simple background image movement and tons of JavaScript code, scroll-based CSS animation opens up a world of creative possibilities with no JavaScript at all. In this guide, we’ll explore two distinct approaches: SVG block animation: Creating movement using SVG graphics for unique, customizable effects.Multi-image parallax background: Stacking and animating multiple image layers for a classic parallax illusion. We'll walk through each technique step by step, compare their strengths and limitations, and offer practical tips for responsive design. Part 1: SVG Block Animation Scalable vector graphics (SVGs) are perfect for sharp, resolution-independent visuals. With CSS and JavaScript, you can animate SVG elements in response to user scrolling, creating effects like walking figures, moving clouds, or shifting landscapes. Step 1: Create Your Scene Start by designing SVG. For our example, I picked the stock layered SVG and cut it into the layers, creating single-layer SVGs. Let's set up the scene first. Place layers and create static CSS.Codepen - static content setup. HTML <section class="content"> <div class="layout-container layout-block"> <p> some content there </p> </div> </section> <section class="hero"> <div class="hero-block"> <div class="hero-background"> <div class="layer layer1"> svg1 there </div> <div class="layer layer2"> svg2 there </div> <div class="layer layer3"> svg3 there </div> <div class="layer layer4"> svg4 there </div> <div class="layer layer5"> svg5 there </div> </div> <div class="hero-content"> <span class="emoji">✅</span> <h1>markup SVG</h1> <h2>images animation</h2> </div> </div> </section> <section> <div class="layout-container layout-block"> <p>some more content</p> </div> </section> Tip: Use SVG editors or design tools like Figma to create more complex shapes. As I had quite big SVGs, I moved their declarations to a separate file. In real life, you could use the assets folder instead. Step 2: Style the Container and SVG CSS .layout-block { position: relative; height: 50vh; min-height: 300px; } .hero-block { position: relative; height: 100vh; min-height: 400px; max-height: 70vw; } .hero { position: relative; background-color: white; animation: parallax linear; } .hero-background { top:0; left:0; width: 100%; height: 100%; } .hero-content { text-align: center; position: absolute; width: 100%; top: 25%; } .layer { display: flex; z-index: 0; position: absolute; height: 100%; width: 100%; bottom: 0; left: 0; } Important: Avoid overflow: hidden on the container, as it can cut off animated SVG elements when they move outside the bounds. Step 3: Add Scroll-Based Animation Now, let’s move the SVG blocks horizontally as the user scrolls. Our goal looks approximately like this: The old-fashioned way would look like the code below: JavaScript // DON'T DO THIS ❌ window.addEventListener('scroll', () => { const scrollY = window.scrollY; document.querySelector('.block').style.transform = `translateX(${scrollY * 0.5}px)`; }); Now we're going to do the magic using CSS scroll-driven animation, which provides two new functions view() and scroll(). Let’s add an animation to the background pattern within each hero section to modify the background position using scroll() (see documentation). CSS .hero { animation: parallax linear; animation-timeline: scroll(); } @keyframes parallax { from { top: 0; } to { top: -40%; } } Add an animation layer by moving the title down with view() (documentation). CSS .hero-content { top: 25%; animation: float-25-50 linear; animation-timeline: view(-100px); } @keyframes float-0-25 { from { top: 0; } to { top: 25%; } } Actually, there we can stop, and it will cover most cases for the parallax background. But let's step a bit further and play with the animation of layers. CSS .layer1 { opacity: 0.6; animation: parallax linear; animation-timeline: view(); animation-range: 40vh 120%; } .layer2 { animation: parallax2 linear; animation-timeline: view(); animation-range: 60vh 100%; } .layer3 { animation: parallax-bottom linear; animation-timeline: view(); animation-range: 60vh 100%; } .layer4 { animation: float-right; animation-timeline: view(); animation-range: 50vh 100%; /* will not work as max-width is set for parent*/ } .layer5 { animation: float-left; animation-timeline: view(); animation-range: 50vh 100%; } @keyframes parallax { from { top: 0; } to { top: -40%; } } @keyframes parallax2 { from { top: 0; } to { top: -20%; } } @keyframes parallax-bottom { from { top: 0; } to { top: 30%; } } @keyframes float-left { 0% { left: 0; } 100% { left: -40%; } } @keyframes float-right { 0% { left: 0; } 100% { left: 40%; } } Add a rotate animation for the icon. CSS .emoji { z-index: 0; display: inline-block; font-size: 50px; animation: rotate linear, orbit-out ease; animation-timeline: view(); animation-range: 0% 80%, 80% 110%; } @keyframes rotate { 0% { transform: rotate(200deg); } 100% { transform: rotate(0deg); } } Add some extra styles. Play a bit. Here we are! Check out the live demo on CodePen. Step 4: Make It Responsive SVGs can scale, but their container needs responsive handling: Use width: 100vw; or max-width: 100%; for the container.Adjust viewBox and SVG dimensions.Use @media queries to tweak height or layout on different devices. Step 5: Limitations Overflow: If you need content clipped, consider a different animation method, as SVG transform animations are restricted by overflow: hidden. Repeatability: SVGs can’t be easily repeated as backgrounds like raster images.Performance: SVGs are efficient for simple shapes, but complex scenes may slow down rendering, especially on mobile. Part 2: Multi-Image Parallax Background There I decided to experiment wth background images. The idea is the same. Create a multilayer structure, and move the background in different directions to create a "walk" effect. Step 1: Prepare Your Layers Create separate images for each depth layer (background, midground, foreground). PNGs with transparency work well. HTML <section class="hero bg-night"> <div class="hero-block"> <div class="hero-background"> <div class="bg bg1"> </div> <div class="bg bg2"> </div> <div class="bg bg3"> </div> <div class="bg bg4"> </div> </div> <div class="hero-content"> <span class="emoji">✅</span> <h1>Background </h1> <h2>images animation</h2> </div> </div> </section> Step 2: Style the Layers CSS .bg { display: flex; z-index: 0; position: absolute; height: 100%; width: 100%; bottom: 0; left: 0; background-position: 0; } .bg1 { opacity: 0.6; background-image: url("https://raw.githubusercontent.com/h-labushkina/ccs-parallax-animation/609b4b41529f3d2aaf3d7e8be223e7376f793b23/svg/1.svg"); } .bg2 { background-image: url("https://raw.githubusercontent.com/h-labushkina/ccs-parallax-animation/609b4b41529f3d2aaf3d7e8be223e7376f793b23/svg/2.svg"); } .bg3 { background-image: url("https://raw.githubusercontent.com/h-labushkina/ccs-parallax-animation/609b4b41529f3d2aaf3d7e8be223e7376f793b23/svg/3.svg"); } .bg4 { background-image: url("https://raw.githubusercontent.com/h-labushkina/ccs-parallax-animation/609b4b41529f3d2aaf3d7e8be223e7376f793b23/svg/5.svg"); } Step 3: Add Animation Now let's make it alive! There we use animation-range to control animation speed from one side and for behavior, so mountains on the back start moving first, then in 20 more vh we add the next layer. Play with this range to make your animation perfect. CSS .bg1 { animation: parallax-bg linear; /* moves top */ animation-timeline: view(); animation-range: 40vh 120%; } .bg2 { animation: parallax2-bg linear; /* moves top */ animation-timeline: view(); animation-range: 60vh 100%; } .bg3 { animation: parallax-bottom-bg linear; animation-timeline: view(); animation-range: 60vh 100%; } .bg4 { animation: float-left-bg; /* moves left */ animation-timeline: view(); animation-range: 20vh 120%; } @keyframes parallax-bg { from { background-position: 0; } to { background-position: 0 100%; } } @keyframes parallax2-bg { from { background-position: 0; } to { background-position: 0 70%; } } @keyframes parallax-bottom-bg { from { background-position: 0; } to { background-position: 0 -10%; } } @keyframes float-left-bg { 0% { background-position: 0; } 100% { background-position: 60% 0; } } Each layer moves at a different rate, creating depth. Step 4: Responsive Design Use @media queries to adjust the background size for smaller screens. Don't use background-size: cover; as it will prevent vertical animation. CSS @media screen and (min-width: 1024px) and (max-width: 2024px) { .bg { background-size: 2024px; } } @media screen and (min-width: 768px) and (max-width: 1024px) { .bg { background-size: 1024px; } } @media screen and (max-width: 768px) { .bg { background-size: 70vh 70vh; } Step 5: Troubleshooting and Limitations Image sizing: Large images can slow down loading. Compress and optimize all assets.Repeating: Image backgrounds can be repeated if desired, with background-repeat: repeat;.Responsiveness: Different screen ratios might crop or stretch images; test thoroughly.Don't use background-size: cover; as it will prevent vertical animation. Comparison: SVG vs. Multi-Image Parallax FeatureSVG Block AnimationMulti-Image BackgroundCustom VisualsHigh (any shape, path, or style)Limited to static imagesRepeatabilityNot supportedSupported with background-repeatOverflow HandlingLimited (no overflow: hidden)Not an issueResponsivenessRequires careful container scalingNeeds media queriesPerformanceGreat for simple SVGsDepends on image size/countAnimation ControlFine-grained (per element)Layer-basedBrowser SupportExcellentExcellent Limited availability. Browseranimation-timeline / Scroll AnimationsChrome✅ (from 115)Edge✅ (from 115)Safari✅ (from 17.4)Firefox❌ Final Thoughts We’ve just taken a look at how the new CSS view() and scroll() functions work with animation-timeline to bring scroll-based animations to life — all without needing JavaScript. Instead of writing event listeners and math to track scroll position, you can now describe these effects right in your CSS, making things simpler and cleaner. The example we built shows how easy it is to get smooth, responsive animations that react as you scroll down the page. It’s a great way to keep your code tidy and take advantage of what modern browsers can do. If you’re ready to try out some scroll magic in your projects, definitely give these new CSS features a shot. Explore further: GitHub Source CodeLive Demo on CodePen

By Hanna Labushkina

Backing Up Azure Infrastructure with Python and Aztfexport

In an ideal DevOps world, every cloud resource is spawned from Terraform or Bicep. In the real world, we deal with “ClickOps.” An engineer manually tweaks a Network Security Group (NSG) to fix a production outage, or a legacy resource group exists with no code definition at all. When a disaster strikes — such as the accidental deletion of a resource group — you can’t just “re-run the pipeline” if the pipeline doesn’t match reality. To solve this, we need a Configuration Backup Engine. While Azure creates backups for data, it does not natively back up infrastructure state. This article outlines a solution using Python to orchestrate aztfexport (Microsoft’s Azure Export for Terraform). By wrapping this tool in a Python script, we can dynamically discover resources, reverse-engineer them into Terraform code, and ship them to immutable storage. The Problem: The “State Gap” Infrastructure as Code (IaC) is usually uni-directional: Code → Cloud The problem arises when the cloud changes independently of the code. Drift: The live environment diverges from the repositoryLegacy: Resources created years ago have no IaC definitionAudit: You need a snapshot of exactly how the firewall looked last Tuesday What we need is a workflow that goes: Cloud → Code → Backup The Solution: A Python Automation Controller Instead of relying on rigid CI/CD YAML files, we use Python. This allows us to dynamically loop through subscriptions, handle authentication errors gracefully, and interface directly with Azure Blob Storage for archiving. The Architecture The workflow runs as a scheduled task (a CRON job or CI stage). The Implementation: From Cloud to ZIP We use the subprocess module to control the aztfexport CLI tool and the Azure SDK for Python to handle storage. Prerequisites aztfexport installed on the runnerazure-identity and azure-storage-blob libraries backup_infra.py Python import os import subprocess import shutil import datetime from azure.identity import DefaultAzureCredential from azure.storage.blob import BlobServiceClient # Configuration SUBSCRIPTION_ID = os.getenv("AZURE_SUBSCRIPTION_ID") TARGET_RG = "mission-critical-rg" BACKUP_CONTAINER = "infra-backups" STORAGE_ACCOUNT_URL = "https://mybackupvault.blob.core.windows.net" def run_export(resource_group): """ Runs aztfexport to reverse-engineer the Azure Resource Group into Terraform configuration files. """ timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") output_dir = f"./exports/{resource_group}/{timestamp}" print(f" Starting export for {resource_group}...") # Construct the command # --non-interactive: Auto-accepts defaults # --hcl-only: Generates .tf files (you can omit this to get state file as well) cmd = [ "aztfexport", "resource-group", resource_group, "--non-interactive", "--output-dir", output_dir ] try: # Run the export tool result = subprocess.run( cmd, check=True, capture_output=True, text=True ) print(f" Export successful for {resource_group}") return output_dir except subprocess.CalledProcessError as e: print(f" Export failed: {e.stderr}") return None def archive_and_upload(source_dir, resource_group): """ Compresses the Terraform files and uploads to Azure Blob Storage """ # 1. Create Zip Archive zip_name = f"{resource_group}-backup" shutil.make_archive(zip_name, 'zip', source_dir) full_zip_path = f"{zip_name}.zip" # 2. Upload to Blob Storage print(f" Uploading {full_zip_path} to immutable storage...") credential = DefaultAzureCredential() blob_service_client = BlobServiceClient(account_url=STORAGE_ACCOUNT_URL, credential=credential) container_client = blob_service_client.get_container_client(container=BACKUP_CONTAINER) with open(full_zip_path, "rb") as data: container_client.upload_blob(name=full_zip_path, data=data, overwrite=True) print(f" Backup secured: {full_zip_path}") # Cleanup local files os.remove(full_zip_path) shutil.rmtree(source_dir) if __name__ == "__main__": # Orchestration Logic export_path = run_export(TARGET_RG) if export_path: archive_and_upload(export_path, TARGET_RG) Why Python Over Bash or YAML? Error handling: If aztfexport fails partially (for example, a locked resource), Python can catch the specific error code, log it, and decide whether to retry or skip, rather than crashing the whole pipeline.Dynamic discovery: You can easily add a function to query Azure Resource Graph (az graph query) to list all resource groups tagged Backup=True) and iterate through them. YAML pipelines struggle with this level of dynamic looping.SDK integration: Direct integration with Azure Key Vault and Blob Storage is more secure and robust via the Python SDK than using CLI commands in a script. The Restore Strategy: The “Review and Apply” Gate Automating backups is useless if you can’t restore them. Because the output is standard Terraform code, restoration follows familiar IaC practices. When a disaster occurs (for example, a subnet is deleted): Fetch: Download the latest ZIP from Blob StorageReview: Inspect the .tf files — aztfexport generates a mapping file that connects the code to the resource IDsApply: Run terraform apply. Terraform will detect that the resource ID is missing in the cloud and recreate it according to the configuration defined in the backup Conclusion Infrastructure should be treated with the same data-protection rigor as databases. By implementing a Reverse-IaC pattern using Python and aztfexport, you create a self-documenting, self-backing-up environment. This approach transforms the “black box” of legacy Azure resources into transparent, versioned code — providing both a safety net for disaster recovery and a foundation for future modernization.

By Dippu Kumar Singh

Languages

DZone's Featured Languages Resources

Top Languages Experts

The Latest Languages Topics