DoiT - Medium

Stop Node Hunting: How Kubernetes DRA Simplifies GPU Scheduling for AI Workloads

Chimbu Chinnadurai — Thu, 02 Apr 2026 09:01:03 GMT

Source: NotebookLM

AI is no longer a side project. Teams everywhere are running large language models (LLMs), training pipelines, and inference servers on Kubernetes. And when your workload needs a GPU, things get complicated fast.

GPUs and TPUs are expensive and hard to obtain, and for a long time Kubernetes used the Device Plugin framework to manage them. This framework was originally developed during a period when Kubernetes workloads were relatively simple, primarily demanding only “a GPU,” which the system would then supply. However, modern AI workloads are significantly more complex. As soon as you introduce mixed GPU types, NVLink topologies, or specific VRAM requirements, Device Plugins fall apart, requiring extensive manual configuration.

Dynamic Resource Allocation (DRA) is Kubernetes’ answer to these shortcomings. Rather than the old model where nodes advertised fixed hardware counts and the scheduler blindly claimed them, DRA introduces a request-based allocation model. Workloads describe what they need, and DRA’s control plane figures out how to satisfy that claim across the cluster. This shift moves hardware awareness out of individual node agents and into a centralized, expressive API.

At KubeCon Europe 2026, NVIDIA donated its DRA GPU driver to CNCF, and Google announced the open-source release of the DRA TPU driver. These weren’t just community goodwill gestures; they show that the two leading AI hardware vendors have fully adopted DRA as the standard interface for managing hardware in Kubernetes. For platform teams, this means you no longer need to depend on vendor-specific workarounds or proprietary scheduling logic. The same DRA primitives now operate reliably whether you’re using NVIDIA GPUs, Google TPUs, or both in the same cluster.

The Old Way: Device Plugins and Their Pain Points

Before DRA, if you wanted your pod to use a GPU, you’d add something like this to your pod spec:

resources:
 limits:
 nvidia.com/gpu: 1

This simple integer request created three massive pain points:

1. Lack of Attribute-Based Selection and Native Fractional Support

Device Plugins natively supported basic integer counting (e.g., “1 GPU”) and no fractional GPUs! While there are workarounds like NVIDIA’s Time-Slicing or Multi-Instance GPU (MIG) to split hardware resources, these remain external “hacks” that the Kubernetes scheduler doesn’t truly understand. It also lacked the ability to request resources based on specific attributes, like “I need one with at least 40 GB of VRAM” or “I need one from a specific architecture.” This often resulted in workloads being assigned to underpowered or incompatible hardware.

2. Manual Orchestration Overhead

Device Plugins don’t provide the Kubernetes scheduler with any useful information about hardware. Because the scheduler lacked granular hardware awareness, administrators were forced to manually map workloads to specific nodes using hard-coded labels. This approach is not scalable in large clusters, as it requires constant manual updates whenever hardware is added, removed, or decommissioned.

3. Static Provisioning Constraints

Device Plugins required hardware to be pre-configured and available before a task was initiated. There was no mechanism for dynamic, “just-in-time” resource allocation, nor for the system to search for and initialize hardware in response to a pending request.

Enter Dynamic Resource Allocation (DRA): The New Standard for AI Hardware

Dynamic Resource Allocation (DRA) is the new Kubernetes standard for managing specialized hardware. The primary objective of DRA is to decouple resource management from the core Kubernetes scheduler. Instead of having the user identify specific nodes or manually tag hardware, DRA allows the workload to define its requirements. The system then dynamically identifies, claims, and prepares the optimal hardware across the entire cluster.

DRA introduces three key concepts that make this work.

1. DeviceClass — Abstractions for Platform Teams

DeviceClass is a blueprint defined by platform or cluster admins. Instead of making developers know hardware specifics, admins can create named classes like high-memory-gpu or low-latency-fpga. Developers just request a class by name, and the scheduler handles the rest.

2. ResourceSlice — What’s Available

Think of a ResourceSlice as a hardware inventory report that represents one or more devices in a pool. DRA drivers (like the NVIDIA GPU driver or Google's TPU driver) publish detailed information about the devices on each node. Not just "this node has 4 GPUs," but rich details like:

Total GPU memory (VRAM)
Architecture and hardware model
Which PCIe root complex or NUMA node the device sits on
Number of compute cores

This is the key shift: Hardware details that used to be hidden are now fully visible to the scheduler.

3. ResourceClaim — What You Need

A ResourceClaim is how you describe your workload's requirements. This is where DRA gets powerful. Instead of asking for "1 GPU," you can now say things like:

“I need a GPU with at least 40 GB of VRAM”
“I need a GPU and a high-speed NIC that are on the same NUMA node”
“I need an accelerator that matches the high-memory-gpu class"

The scheduler reads this claim, looks at all the ResourceSlices published across the cluster, and finds the best match automatically.

Real-World Example: Running vLLM with DRA

Let’s say you’re running a large language model inference server using vLLM. You need a GPU with plenty of VRAM, and you want scheduling to be automatic and no manual node pinning. With DRA, your setup might look something like this:

Source: Gemini + Nano Banana

Step 1: The cluster admin creates a DeviceClass

This class filters for any GPU with more than 40GB of memory using a Common Expression Language (CEL) filter.

---
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: high-memory-gpu
spec:
  selectors:
    - cel:
        expression: device.capacity["memory"].isGreaterThan(quantity("40Gi"))

Step 2: You create a ResourceClaim for your pod spec

The user requests one device from that specific class.

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: vllm-gpu-claim
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: high-memory-gpu
      count: 1

Step 3: Reference the claim in your pod

Finally, the Pod simply points to the claim. No nodeSelector or complex affinity rules required.

apiVersion: v1
kind: Pod
metadata:
  name: vllm-inference
spec:
  resourceClaims:
  - name: gpu-claim
    resourceClaimName: vllm-gpu-claim
  containers:
  - name: vllm
    image: vllm/vllm-openai:latest
    resources:
      claims:
      - name: gpu-claim

That’s it. You described what you need. Kubernetes — now armed with full visibility into every node’s hardware via ResourceSlices — finds a suitable node and schedules your pod there. No nodeSelector. No hunting through kubectl get nodes.

The Strategic Value of DRA

DRA ensures you get the full performance by eliminating manual configuration errors and hardware bottlenecks.

For developers: Stop the manual “node hunting.” Define the hardware requirements your code needs, and let Kubernetes handle the discovery and attachment.
For platform teams: You can create hardware “tiers” using DeviceClasses and expose them, without giving everyone raw access to node labels and hardware specs.
For operations, Resource utilization improves. The scheduler has better information, which means better placement decisions, less fragmentation, fewer idle GPUs and better ROI on expensive hardware.
For AI at scale: DRA has already become the foundation of the Kubernetes AI Conformance program. It is no longer an optional feature; it has become the industry standard.

Wrapping Up

Kubernetes has evolved from simply hosting web servers and microservices to becoming the preferred platform for some of the most intensive AI workloads. However, this exciting progress has also brought new infrastructure challenges, particularly in managing specialized hardware such as GPUs, TPUs, and high-speed networking devices.

The Device Plugin framework served its purpose, but it was designed for a simpler time. DRA is built for our current environment: clusters with diverse, costly hardware, workloads with specific and complex needs, and teams that need to move quickly without becoming hardware topology experts.

Optimize Your AI Infrastructure with DoiT

If you are currently running AI workloads or planning to implement DRA, DoiT can accelerate your journey. Our team of over 100 cloud experts specializes in tailored solutions to optimize your infrastructure, ensure compliance, and maximize your hardware ROI.

Contact us today to transition your cluster to the new Kubernetes AI standard.

If you want to dig in further, here are the best places to start:

Stop Node Hunting: How Kubernetes DRA Simplifies GPU Scheduling for AI Workloads was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

GKE Native Support for Custom Metrics: Smarter Autoscaling Beyond CPU and Memory

Chimbu Chinnadurai — Thu, 19 Mar 2026 10:01:01 GMT

Source: Gemini Nano Banana Pro

Modern cloud-native applications rarely scale perfectly using CPU or memory metrics alone. Many workloads are driven by signals like request rate, queue depth, GPU usage, or application latency. Traditional autoscaling approaches struggle to capture these signals, which often leads to inefficient scaling decisions.

Google Kubernetes Engine (GKE) recently introduced native support for custom metrics, simplifying how applications expose metrics and enabling more intelligent autoscaling decisions without complex adapters or additional infrastructure.

In this article, we will explore:

The challenges with traditional autoscaling approaches
How GKE’s native custom metrics support addresses those challenges
How the new feature works internally
A practical example demonstrating custom-metric–based autoscaling

⚠️ Note: This feature is currently in Preview, available on GKE 1.35.1-gke.1396000 or later in the Rapid channel only. Check the official docs for the latest GA status before adopting in production.

The Problem with Traditional Autoscaling

Autoscaling in Kubernetes typically relies on the Horizontal Pod Autoscaler (HPA). By default, HPA scales workloads based on resource utilization, such as CPU or memory. While this works well for many workloads, it doesn’t always reflect the real demand placed on an application.

For example:

A web API may experience heavy request traffic without high CPU utilization.
A queue-processing service may need more workers when the backlog grows.
AI inference workloads may depend more on GPU usage than on CPU.
Streaming services may need scaling based on requests per second.

These scenarios require scaling based on application-level metrics, not infrastructure metrics.

Kubernetes supports autoscaling based on custom metrics, but historically implementing this required additional components such as:

Prometheus adapters — a separate deployment that bridges Prometheus metrics to the Kubernetes metrics API
Custom metric pipelines — ingestion, storage, and query layers that sit between your app and the HPA
Complex IAM and service account configuration — especially in managed environments like GKE

Source: Gemini Nano Banana Pro

The operational burden of these components was high, as teams managed adapter compatibility across Kubernetes versions, debugged multi-hop metric pipelines, and handled latency from infrastructure layers. This friction slowed adoption, leading many teams to revert to CPU-based scaling.

Native Custom Metrics Support in GKE

GKE now provides native integration for exposing custom metrics, simplifying how applications share metrics with the autoscaling system. Instead of routing metrics through external adapters and monitoring systems, custom metrics are now collected directly from pods and fed into the HPA.

This enables the autoscaling system to react to actual application behavior, such as request throughput or resource utilization. This significantly simplifies the process of integrating application metrics into scaling strategies.

Source: Gemini Nano Banana Pro

How the Solution Works

The new capability works through a resource called AutoscalingMetric.

This resource defines:

Which pods expose the metrics (via label selectors)
Where the metrics endpoint exists (port and path)
Which specific metric should be collected
How the metric should be exported to the autoscaling system

Once defined, GKE collects these metrics and makes them available to components such as the load balancer or autoscaler.

Key requirements include:

Metrics must be available via an HTTP endpoint
The format should follow Prometheus standards
Only gauge metrics are supported.
Maximum of 20 unique metrics can be exposed per cluster.

Gauge vs. other metric types: A gauge represents a value that can go up or down at any point in time (e.g., current queue length = 45). Counters only go up (e.g., total requests processed = 10,432) and are not suitable for direct autoscaling targets. If your application currently exposes counters, you will need to derive a gauge (e.g., derived request rate) before using it with this feature.

Once the metric is registered, GKE continuously reads it and feeds the value into scaling logic.

Example: Autoscaling Based on Queue Length

Let’s walk through a complete, self-contained example.

Scenario: A background worker service processes jobs from a queue. We want to scale the number of worker pods based on the current queue length — not CPU usage.

Step 1: Expose the Metric from the Application

Deploy your application that exposes a Prometheus-format gauge metric at its /metrics endpoint:

# HELP job_queue_length Current number of jobs waiting in the queue
# TYPE job_queue_length gauge
job_queue_length 45

Let’s assume the metrics are available at http://worker-service:9090/metrics

Step 2: Create an AutoscalingMetric Resource

Define an AutoscalingMetric resource that tells GKE where to find and how to export the metric:

apiVersion: autoscaling.gke.io/v1beta1
kind: AutoscalingMetric
metadata:
  name: worker-queue-metric
  namespace: default
spec:
  selector:
    matchLabels:
      app: job-worker #The label name and value matching the Pods
  endpoints:
  - port: 9090 #The metrics port number
    path: /metrics #The path to the metric
    metrics:
    - gauge:
      name: job_queue_length #The name of the metric that you are exposing.
      prometheusMetricName: job_queue_length #optional: The Prometheus metric name as exposed by the Pod.

Once applied, the metric becomes available to the autoscaling system.

Step 3: Configure the Horizontal Pod Autoscaler

Now we create an HPA that uses the custom metric.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: job-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: autoscaling.gke.io|worker-queue-metric|queue_utilization
      target:
        type: AverageValue
        averageValue: 20

This configuration means:

If the average queue size exceeds 20 jobs per pod, the system scales up.
If the queue shrinks, the system scales down.

Benefits of Native Custom Metrics

This feature provides several key advantages.

Application-Aware Scaling: Scaling decisions reflect real demand, not infrastructure proxies.
Reduced Operational Complexity: No need for external adapters or complicated metric pipelines.
Improved Performance: Applications scale more accurately and respond faster to workload spikes.
Better Resource Utilization: Infrastructure costs decrease because scaling aligns with actual demand.
Flexible Autoscaling Strategies: Teams can design policies around any gauge metric their application exposes, such as queue depth, active sessions, pending renders, and more.

Conclusion

GKE’s native custom metrics support removes a major obstacle that has traditionally made application-aware autoscaling difficult to implement. By removing the need for external adapters and Prometheus pipelines, teams can now connect scaling directly to the metrics that matter—queue depth, request rate, GPU saturation, or any gauge metric their application exposes.

If your workloads frequently encounter CPU or memory scaling blind spots, this feature is worth considering. If you’re already exploring a proof of concept or want to learn more about this feature, DoiT can help. Our team of 100+ experts specializes in tailored cloud solutions and is ready to guide you through the process and optimize your infrastructure to ensure compliance and meet future demands. Contact us today.

Useful Links:

GKE Native Support for Custom Metrics: Smarter Autoscaling Beyond CPU and Memory was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Upgrading Your Database to an Iceberg Data Lake (Part 1)

Sayle Matthews — Mon, 02 Feb 2026 10:01:02 GMT

The Series

This is a series of articles about replicating data from your traditional database (SQL or NoSQL) into a platform-agnostic data lake (or data lakehouse, depending on who you ask) for your more analytics-based workloads. Throughout it, I will be showing the theoretical architecture, building up the framework for it, and lastly giving a real-world example of implementing this from a Postgres database.

The idea for this series of articles came about while helping a couple of customers in succession get data from their Postgres databases into Iceberg tables to be queried by their BI tooling. After doing multiple sessions covering the same topic, I decided this needs to be written down as I couldn’t find a good step-by-step on it.

As a bonus, I am going to be using strictly open-source software here and off-the-shelf services from the cloud vendors, so it will be as cloud platform agnostic as possible.

Now let’s get this show on the road and dive in!

The Scenario

You are starting to reach critical mass where your transactional data is living in one or more traditional databases, and your data team is starting to have issues doing their jobs with the data being in multiple places and not transformed for their needs. In addition, they are looking at using a data warehouse for their analytical workloads that won’t work with the existing databases.

Your data team is also showing you that growth is going to be picking up, further exacerbating this issue, so the time is now to start transforming your data and storing it in a data warehouse.

So you are seeing the big names out there like Google BigQuery, Snowflake, Databricks, DuckDB, ClickHouse, and Redshift, but you are completely overloaded by choices and want to make the decision that will allow you to be platform agnostic for this.

If this sounds familiar, then have I got some news for you: carry on reading!

If you aren’t in this boat, but want to learn more about creating a platform-agnostic data warehouse (or data lake, technically), then carry on reading as well.

The (Abstract) Architecture

In this series, I will be showing a reference architecture and a sample implementation of getting data from a traditional RDBMS database into a data lake on a cloud provider’s BLOB storage in Apache Iceberg format, which a data warehouse system can then utilize for day-to-day operations.

The reference architecture is very simple and looks something like this:

As can be (and typically is) predicted, there is going to be a lot more to it than this, but this very simple diagram shows the power of this process and also shows the platform agnosticism that can be achieved by this.

This is also going to be the super simple version of this, replicating raw data to an Iceberg table without transforms. This is by design for the simplicity of this article. I will put notes on how and where to implement further enhancements to add the T in ETL in later parts of this series, but for this article’s sake, let’s Keep It Simple, Stupid.

The Pieces of This Puzzle

I am going to go over each piece of what the example system I am showing here is to define the pieces, and also show the definition I will be using of some of the modern buzzwords that float around.

Traditional

This will be your database system that your applications use. It will generally be for things such as transactions (think purchases, clicks on a page, audit entries, etc.) or for storing application data.

If you are coming from the traditional relational database world, this will probably fall under a class of database called Online Transaction Processing (OLTP), which, at its core, is a fancy way of saying it handles transactions very well. Some examples of these are MySQL, Microsoft SQL Server, PostgreSQL, MariaDB, Oracle Database, and IBM DB2. There are countless others out there, but these are the main ones that are still around today.

If you are coming from the non-relational, or NoSQL, world, then there can be a few different options. As long as it is supported by Debezium, then it will work. As of the time of this writing, that list is MongoDB, Cassandra, Google Spanner, and anything that has a compatibility layer for utilizing them as a data source.

Data Warehouse

This will be the product that performs the querying and analytics processing for you. This will probably fall under a class of database called OLAP or Online Analytics Processing (OLAP), which, at its core, is a fancy way of saying it handles analytics workloads and processes very well. Examples of these are Google BigQuery, Snowflake, Databricks, DuckDb, ClickHouse, Amazon Redshift, and Microsoft Azure Synapse.

Note: In the context of this article, I am only using the data warehouse for the compute side of things, not for storage. Hence why I didn’t go into detail on that.

Data Lake

This is the location where you store all of your raw data for consumption and where you write your results for transformations that will be read for your analytics processes.

Many times this is a hodge-podge of file formats, but I HIGHLY recommend standardizing on a single format such as Apache Iceberg. This is one of the main reasons that Iceberg and its brethren were created to help standardize data lake storage formats.

A data lake often lives on a class of storage devices called BLOB (stands for Binary Large OBject) storage. In simpler terms, this is an almost infinite-sized hard drive that lives in a cloud environment (or on some on-premise data center systems).

Some of the most common examples of this are AWS Simple Storage Service (S3), Google Cloud Storage, Azure Blob Storage, Oracle Object Storage, or Digital Ocean Spaces Object Store.

Apache Iceberg

Apache Iceberg is a data storage format that works very well for storing slow-changing data (read as data warehouse-style data) on BLOB storage devices, allowing analytical databases to query and interact with it seamlessly as if it were natively stored in that database’s storage format. It acts as a common format for data lakes to achieve data platform agnosticism, or at least as close to it as you can get right now.

Note that it should be mentioned that Iceberg is great for analytical queries, but I would HIGHLY advise against using it for transactional workloads. Fivetran has this article, which covers the basics of row versus columnar databases, which directly correlates to the why of this.

It’s a lot more complicated than that, but this is the simplest explanation for it. If you wish to know the full feature list and more, I recommend reading the official docs here.

The Why?

In this day and age, there are MANY choices on the data warehousing front, just to list a few: BigQuery, ClickHouse, Snowflake, Databricks, DuckDB, Redshift, Firebolt, and Synapse Analytics.

Each of these data warehouses, by default, use their own proprietary format for storing the data, so that means you are locked into them, and some are locked into the cloud they are exclusive to. This also means that this data is siloed and unable to be read by other competing data warehouses or other tools that aren’t tied to that platform.

This means it is to your benefit to store data in a common format that can be read by multiple platforms or tools.

This scenario is also a true story based upon reality, a few years ago. If tomorrow vendor X raises their price by 2x+ and gives you 90 days till these costs are fully realized, then one of the largest migration steps has been done for you: storage. You would only need to worry about the compute aspect of your data warehouse instead of the entire ecosystem.

The other unspoken benefit to using a common format is reduced data team costs. It’s something that isn’t brought up enough, but if your data teams have a single data format to work with in a single environment, then that greatly reduces complexity and their time to implementation. It all comes back to cost, so in short, the faster the team can implement, the lower the overall team cost.

The How?

I will be showing an example of how to implement this in Part 2, but here is a rough overview of how this looks.

Instead of storing data in the native format of your data warehouse, you will instead store data in a common format (in this case, Apache Iceberg) in a common storage bucket that is broken up into a logical partitioning scheme. So you will have a single source of truth in a single location.

As the single source of truth, it is directly queried by data warehouse tools, and changes are reflected in new tables in this same format. It’s a cycle that keeps things simple and prevents a lot of complexities that stem from proprietary solutions.

Since there is a single location and single format for this source of truth, a lot of the complexities are simply abstracted away to the infrastructure, allowing you to focus on the data instead of the platform.

Coming Up in Part 2

In Part 2 of this series, I will be diving into how to implement the replication of data from a traditional database into an Iceberg table as the basis of a data warehouse.

How We DoiT

Here at DoiT International, we tackle problems like this all of the time, and I am always asked about ways to help save money while implementing data projects.

Helping out on implementing projects in the most effective and in the most cost effective way is part of our mission for our customers. Handling everything from cases like these to providing the best FinOps solutions for our customers is what we do, and I may be a little biased in saying this, but we do it VERY well.

Upgrading Your Database to an Iceberg Data Lake (Part 1) was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

When to Use AlloyDB Instead of Cloud SQL for PostgreSQL

Aamir Haroon — Fri, 30 Jan 2026 10:01:04 GMT

A data-backed comparison featuring performance benchmarks, pricing breakpoints, and architectural trade-offs.

Generated by Gemini

Google Cloud now offers multiple managed PostgreSQL options, each tailored for specific performance, availability, cost, and AI-readiness requirements. What used to be a simple binary choice between standard Cloud SQL and AlloyDB has evolved with the introduction of Cloud SQL Enterprise and Cloud SQL Enterprise Plus editions.

While this expanded portfolio offers better solutions for specific needs, it also adds complexity to the decision-making process. By the end of this article, you’ll understand exactly which PostgreSQL service fits your performance, cost, and AI strategy.

🤔 What is AlloyDB?

AlloyDB for PostgreSQL is Google Cloud’s next-generation, PostgreSQL-compatible database service, purpose-built for high-performance, mission-critical workloads. Featuring a cloud-native architecture, it separates compute and storage, enabling each to scale independently. AlloyDB further stands out with AlloyDB AI for advanced vector search and a native columnar engine that accelerates analytical queries directly on transactional data.

Essential Extensions & Compatibility

A frequent question from architects is whether AlloyDB is compatible with the standard PostgreSQL ecosystem. AlloyDB offers full PostgreSQL compatibility, supporting major open-source extensions and introducing powerful native features:

Standard Extensions: Fully supports PostGIS (geospatial), pg_cron (job scheduling), pgaudit (compliance logging), and pg_stat_statements (monitoring).
AlloyDB-Specific Extensions:
* google_columnar_engine: Automatically accelerates analytical queries (HTAP — Hybrid Transactional and Analytical Processing).
* vector: An optimized version of pgvector for faster AI similarity search.
* alloydb_ai: Native integration with Vertex AI for calling ML models directly from SQL.

Compared to traditional PostgreSQL deployments, AlloyDB delivers significantly higher throughput, lower latency, and faster analytical performance, while remaining fully managed and PostgreSQL compatible.

🧐 Cloud SQL Editions Explained

Before comparing all options, it’s important to understand the Cloud SQL editions:

Cloud SQL Enterprise: The foundational managed PostgreSQL tier designed for general-purpose and business-critical workloads. Supports up to 96 vCPUs and 624 GB RAM with 99.95% availability SLA.
Cloud SQL Enterprise Plus: Built for higher-scale and higher-availability needs, with performance-optimized machine types supporting up to 128 vCPUs and 864 GB RAM. Key enhancements include:
- Near-zero downtime maintenance: <1 second connectivity loss during maintenance.
- Data Cache: Delivers up to 4x improved read performance
- Enhanced Performance: Up to 2x write latency improvement.
- Superior Availability: 99.99% SLA (inclusive of maintenance) with eligibility for up to 100% financial credit.

Both editions feature a traditional PostgreSQL architecture, making them the ideal choice for “lift-and-shift” migrations where you need managed PostgreSQL without refactoring your application.

📊 Cloud SQL Enterprise vs Enterprise Plus vs AlloyDB

Table: Cloud SQL Enterprise vs Enterprise Plus vs AlloyDB (View raw data)

🏋️‍♂️ Performance Benchmarks

To provide concrete performance insights, I conducted comprehensive benchmarks² across all three PostgreSQL offerings using identical 4-vCPU, 32GB RAM configurations.

The testing methodology utilized a dual approach:

Standardized Baselines: Using pgbench to measure raw transactional throughput (TPS) and latency.
Real-World Simulation: A custom e-commerce workload involving 100,000 transactions, 10,000 users, and 1,000 products to model complex application patterns.

OLTP (Transactional) Performance

Table: OLTP (Transactional) Performance (View raw data)

Key findings:

Cloud SQL Enterprise Plus delivers the highest overall transactional throughput (48% faster than Enterprise)
AlloyDB excels at SELECT operations with 2.7x better performance than Enterprise Plus
Performance gaps are significant enough to impact application scalability

Note: AlloyDB’s disaggregated architecture incurs a slight network overhead for transaction management. In smaller instances (4 vCPU), the raw CPU power of the monolithic Cloud SQL Enterprise Plus edges it out. However, AlloyDB’s superior scalability typically reverses this trend on larger instances (16+ vCPUs).

OLAP (Analytical) Performance

Table: OLAP (Analytical) Performance (View raw data)

Key findings:

Cloud SQL Enterprise Plus shows 42% faster complex aggregations than Enterprise
AlloyDB’s columnar engine provides the fastest simple analytical queries
Enterprise Plus delivers the most consistent analytical performance across query types

Mixed Workload (HTAP) Performance

Table: Mixed Workload (HTAP) Performance (View raw data)

Key findings:

AlloyDB handles mixed workloads best with 48% higher OLTP performance than Enterprise
Enterprise Plus excels at concurrent analytical queries
Both advanced options significantly outperform Enterprise in mixed scenarios

📋 Quick Decision Guide

Based on the performance benchmarks above, here’s a quick reference for choosing the right PostgreSQL service:

Table: Quick Decision Guide (View raw data)

❓ When to Use AlloyDB

AlloyDB is not a replacement for all PostgreSQL workloads. It is best suited for scenarios where performance, availability, and scale are primary concerns. For deeper insights into each scenario, here are the detailed use cases:

1. High-Performance Transactional Workloads

AlloyDB excels at workloads that demand consistently high throughput and low latency. My benchmarks show AlloyDB delivering 867 TPS with exceptional SELECT performance (2,148 ops/sec), making it ideal for:

Large-scale e-commerce platforms with heavy read traffic
Financial services and payment processing systems requiring fast data retrieval
Gaming platforms with real-time state and leaderboard updates

Performance insight: While Cloud SQL Enterprise Plus achieved higher overall TPS (943), AlloyDB’s 3.6x faster SELECT operations make it superior for read-heavy transactional workloads.

2. Hybrid Transactional and Analytical Processing (HTAP)

AlloyDB enables transactional and analytical queries to run on the same data without offloading to a separate analytics system. My benchmarks show AlloyDB handling mixed workloads with 839 concurrent OLTP ops/sec, making it ideal for:

Real-time fraud detection requiring immediate analysis of transaction patterns
Operational dashboards on live production data
Embedded analytics in SaaS platforms

Performance insight: AlloyDB demonstrated the best mixed workload performance, handling 48% more concurrent OLTP operations than Cloud SQL Enterprise while maintaining strong analytical query performance.

Important design consideration: When using AlloyDB for analytical workloads, teams need to think differently than with traditional row-based RDBMS systems. The columnar engine is optimized for scanning specific columns rather than full rows. As a result:

Analytical queries typically do not rely on indexes in the traditional sense
Queries perform best when they select specific columns rather than SELECT *.
Schemas and queries should be designed with column-level access patterns in mind

Embracing this mindset is crucial for unlocking the full potential of AlloyDB’s analytical capabilities.

3. Mission-Critical Applications with High Availability

AlloyDB provides a 99.99% SLA, including maintenance, fast failover, and minimal operational overhead. Cloud SQL Enterprise Plus also offers 99.99% SLA with sub-second maintenance downtime. Both are well-suited for:

Healthcare systems requiring continuous availability
Trading and financial platforms
Global ERP and core business systems

Performance insight: Enterprise Plus delivers <1 second maintenance downtime compared to ~30 seconds for Enterprise, while AlloyDB offers near-zero downtime for all operations.

4. AI/ML and Data-Intensive Workloads

AlloyDB integrates well with Google Cloud’s AI and data ecosystem and supports high-performance access patterns for:

Personalization and recommendation engines
IoT and telemetry ingestion
AI-driven applications that require fast access to fresh operational data

5. Vector Search and AI Applications

AlloyDB provides optimized vector search capabilities that significantly outperform standard PostgreSQL implementations. With pgvector optimizations including IVFFlat and HNSW algorithms, AlloyDB delivers up to 10x faster vector queries compared to standard pgvector implementations. This makes it ideal for:

Semantic search applications requiring fast similarity matching
Recommendation systems using embedding-based filtering
RAG (Retrieval-Augmented Generation) applications needing rapid vector lookups
AI-powered chatbots with large knowledge bases

Performance advantage: AlloyDB’s model endpoint integration allows direct embedding generation within the database, eliminating the need for external API calls and reducing latency for AI workloads.

6. Cost Efficiency at Scale

At scale, AlloyDB can be cost-effective, especially for high-availability and read-heavy workloads. In Cloud SQL Enterprise and Enterprise Plus, HA configurations and read replicas each require separate storage, increasing total storage costs (Primary + Standby + Replica). AlloyDB, by contrast, charges for storage only once, as HA nodes and read pools share the same underlying storage layer.

Cost comparison for identical 4-vCPU configurations with HA + 1 read replica (3 total instances) at 1.75 TiB storage:

Cloud SQL Enterprise: $2,066/month
Cloud SQL Enterprise Plus: $2,141/month
AlloyDB: $2,064/month

The storage advantage: When you need high availability plus read replicas (3 total instances), AlloyDB becomes increasingly cost-effective as your storage grows beyond 1.75 TiB¹. This is because Cloud SQL stores three full copies of the data (primary + HA + replica), while AlloyDB stores one shared copy across all nodes.

Bottom line: AlloyDB’s superior performance comes at a cost premium for smaller deployments, but achieves cost parity with Enterprise as storage scales beyond 1.75 TiB, making it increasingly attractive for data-intensive applications.

🧾 Conclusion

Google Cloud now offers three strong managed PostgreSQL paths, each serving a distinct purpose:

Cloud SQL Enterprise for reliable, general-purpose workloads (635 TPS baseline performance)
Cloud SQL Enterprise Plus for higher-scale applications requiring enhanced availability and performance (943 TPS with 42% faster analytics)
AlloyDB for mission-critical systems demanding maximum performance, scalability, and built-in analytics (867 TPS with 3.6x faster SELECT operations)

Key insight: AlloyDB becomes more cost-effective as storage increases. With 1.75 TiB of storage and an HA plus read replica setup, AlloyDB and Cloud SQL Enterprise have nearly identical monthly costs ($2,064 vs $2,066). For smaller storage volumes, however, Cloud SQL Enterprise remains the better value because AlloyDB’s compute costs are higher.

Performance-based recommendations:

Choose Enterprise Plus for pure OLTP workloads requiring maximum transactional throughput
Choose AlloyDB for read-heavy applications, mixed HTAP workloads, or when SELECT performance is critical
Choose Enterprise for cost-sensitive applications with moderate performance requirements

There is no single right answer, only the right choice for your workload. If you’re evaluating these options and need guidance on performance, cost, or architectural considerations, the experts at DoiT can help you make a confident, data-driven decision. Connect with us to craft the ideal PostgreSQL strategy for your cloud journey.

References

¹ Storage Cost Calculation: Based on Google Cloud Pricing Calculator (December 2025), Cloud SQL storage costs $0.17/GiB/month per instance. For the HA + 1 read replica configuration, you need 3× storage allocation (primary + HA + replica) plus backup storage (approx. $0.08/GiB/month) for the primary instance only. AlloyDB storage costs $0.30/GiB/month + $0.10/GiB backup = $0.40/GiB total, but uses shared storage across all nodes.

Breakpoint analysis:

Cloud SQL (3 instances): (3 × $0.17) + $0.08 = $0.59/GiB effective rate
AlloyDB (shared): $0.40/GiB total rate
AlloyDB’s storage is always more cost-effective, but total cost parity occurs around 1.75 TiB, where storage savings offset AlloyDB’s higher compute costs.

Pricing Calculator Links:

² Performance Test Methodology: Complete test suite and methodology available at: https://github.com/aamir814/gcp-postgres-benchmarks. Tests include OLTP (pgbench + custom transactions), OLAP (complex analytical queries), and mixed HTAP workloads across identical infrastructure configurations.

When to Use AlloyDB Instead of Cloud SQL for PostgreSQL was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

First look at Google Cloud N4A VMs: benchmarked against N4, C4A and AWS M8g.

Alex Gkiouros — Wed, 28 Jan 2026 20:50:27 GMT

I’ve tested the newest N4A instance family offering from Google Cloud so you don’t have to.

Continue reading on DoiT »

Common Cloud Mistakes Early-Stage Startups Make and How to Avoid Them

Avi Keinan — Mon, 26 Jan 2026 10:22:54 GMT

In my role at DoiT, I work daily with startups that are just beginning their cloud journey.

And I keep seeing the same mistakes, over and over again.

I’m sharing a few of them here, hoping you can avoid them before they become expensive/painful/hard to fix:

1. The AWS Account Is Opened Under a Founder’s Personal Email

Very often, the first AWS account is created using a founder’s private Gmail address.

In this situation, the account legally and operationally belongs to a person, not to the company.

If the founder leaves, there is a conflict, or (worst case) something happens to them, ownership becomes a serious business risk.

Beyond that, personal email is usually less secure than a corporate mailbox: no enforced MFA, no centralized auditing, weaker identity governance and a major obstacle when performing an audit for certification.

A compromise of the founder’s email often means full control over the company’s AWS environment.

This also creates a single point of failure.

Best practice:

Create a shared group mailbox, for example: aws@example.com

Add at least two team members (someone is always sick, on vacation, or at a conference), and use plus-addressing to separate environments in different AWS accounts:

aws+production@example.com
aws+organization@example.com
aws+development@example.com

2. Creating resources in the Management Account

A startup begins with a single AWS account that hosts everything: production, development, demos, and a developer's playground.

As the company grows, it becomes necessary to separate environments. The easiest way seems to be to turn this single account into the Management Account in AWS Organizations and create new member accounts via the organization.

The problem: the management account is the only account that cannot be governed by organization-wide controls.

AWS Organization Service Control Policies documentation

Typical policies you want to enforce:

All EC2 EBS volumes must be encrypted.
Resources cannot be created in unauthorized regions.
Expensive instance types (GPU) are restricted.
IAM users are forbidden (only roles are allowed).

None of these applies to the Management Account, which, in this case, is also the production account.

Worse, AWS does not allow converting a Management Account into a Member Account.

Once the production account is also the management account, the only solution is a full organizational migration: rebuilding the organization, recreating policies, reconfiguring permissions, and reonboarding every employee to a new Single Sign On (SSO) endpoint.

If you are currently in a stage where everything runs in a single AWS account and want to start separating environments, create a new standalone AWS account, set it as the Management Account via AWS Organizations, and invite the existing Production account to become a member account in the new organization.

3. Postponing Compliance

If you plan to sell to enterprise customers, you will eventually be asked for compliance:

SOC 2 / ISO 27001 for B2B SaaS
HIPAA for healthcare
PCI DSS for fintech and payments

Retrofitting an existing product and organization to new compliance requirements is extremely hard.

It affects not only cloud architecture, but also development workflows, access control, and even employment contracts (IP ownership, confidentiality, device management, etc.).

Fixing cloud infrastructure is challenging but achievable. Changing contracts and organizational processes later is far more difficult. Therefore, engaging a compliance advisory partner from the very first lines of your product’s code will significantly simplify the journey.

4. All Eggs in One Basket

AWS allows you to manage domain registration, renewal and DNS inside Route 53; this is convenient and dangerous.

If the AWS account is suspended (due to a billing issue, security incident, or leaked IAM key), your domain may be affected as well.

When email stops working, it becomes very difficult to communicate with AWS Support to resolve the issue.

posts from r/aws in Reddit.com

Best practice:

Keep the company’s primary domain outside the main production AWS account, even in a separate provider like GoDaddy or NameCheap or in a dedicated AWS account. If you are locked out of your production account, you can still maintain the DNS records with the other provider/account.

5. Maintain an expiration calendar so you don’t miss critical renewal dates

Even in the first year, a startup quickly accumulates:

Contracts
API keys
Credit cards

What they all have in common is an expiration date.

The unwanted outcomes are usually:

Blocked AWS accounts due to non-payment.
Unexpected service outages due to an expired API key.
Rushed contract renewal negotiations due to missed or imminent renewal dates.

A shared calendar with automated reminders (“Credit card 3344 expires in 30 days”, “MAP Provider contract renewal in 45 days”) helps prevent these issues.

Final Thought

There are many challenges to deal with. I would strongly recommend finding a mentor who has been a founder before and has already led several startups. Experienced advice is truly worth gold.

If you want to build your cloud the right way from day one, DoiT brings the technology, the expertise, and years of experience helping both startups and global enterprises avoid these exact mistakes — and many others.

Let’s make your cloud a growth engine, not a risk factor.

Common Cloud Mistakes Early-Stage Startups Make and How to Avoid Them was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

MCP Toolbox for Databases with AlloyDB: A Hands-on Exploration

Joseph Bharath Reddy Allam — Fri, 23 Jan 2026 10:02:05 GMT

Generative AI is increasingly becoming an integral part of how teams operate, reason about systems, and query data. What started with code completion and chat-based assistants is now expanding into infrastructure, operations, and data systems. Databases are no exception. Teams are increasingly experimenting with natural language interfaces for querying data, exploring schemas, and assisting with analysis.

At the same time, databases remain some of the most sensitive and operationally critical components in any system. Exposing them directly to free-form AI prompts raises real concerns around correctness, security, and control. As AI capabilities move closer to production systems, the way models interact with databases needs more structure than simple prompt-based SQL generation.

MCP Toolbox for Databases is an open source project from Google that addresses this problem. It provides a structured, protocol-based way for AI tools to interact with databases through well defined operations, rather than raw text prompts. MCP Toolbox provides a database-agnostic framework for AI tool interaction, while individual databases are supported through database-specific MCP tools. These tools can be prebuilt integrations or custom implementations and can be used directly with Gemini CLI extensions. In this post, the focus is on AlloyDB as a PostgreSQL-compatible example. The source code and project documentation are available on GitHub: https://github.com/googleapis/genai-toolbox

Introduction: Why MCP Toolbox for Databases Matters

Large language models are increasingly used around databases. Developers ask them to write SQL, explain schemas, or answer analytical questions. In practice, many of these workflows still rely on simple text-to-SQL generation, where the model produces a query based on a prompt and hopes it matches the actual schema and data.

This approach works for small demos, but it quickly shows limitations. The model often has incomplete knowledge of the schema, no visibility into permissions, and no awareness of how a query is executed. From a database perspective, this makes it hard to trust the results or safely integrate these tools into real environments.

The Model Context Protocol, or MCP, takes a different approach. Instead of asking a model to guess, MCP allows AI tools to interact with systems through explicit, well defined operations. A database can expose capabilities such as listing tables, describing schemas, executing queries, or retrieving query plans as tools. The AI client calls those tools and works with real outputs, not assumptions.

MCP Toolbox for Databases is an open source implementation of this idea from Google. It exposes database functionality through MCP and provides a database-agnostic framework for AI tool interaction, while individual databases are supported through database-specific MCP tools. In this walkthrough, AlloyDB is used as a PostgreSQL-compatible example. Each operation is handled as a concrete tool invocation, which makes interactions more predictable, observable, and easier to control.

In this post, I walk through a hands-on setup using AlloyDB for PostgreSQL and the Gemini CLI. The focus is on what this looks like in practice. We explore a database schema without writing SQL, answer business questions using natural language backed by real queries, and inspect how those queries are executed. The goal is not to promote automation, but to show how MCP changes the way AI tools interact with databases in a way that database engineers can reason about and trust.

Architecture Overview: Gemini CLI, MCP Toolbox, and AlloyDB

Before looking at individual queries and examples, it helps to understand how the components in this setup fit together. Although there are several moving parts, the overall architecture is simple and intentionally layered.

At a high level, the Gemini CLI acts as the user-facing interface. It is where prompts are entered and responses are displayed. When database-related prompts are issued, Gemini does not interact with AlloyDB directly. Instead, it relies on MCP to discover and invoke database capabilities in a structured way.

The AlloyDB integration for Gemini CLI bundles MCP Toolbox for Databases. In this setup, MCP Toolbox runs implicitly as part of the Gemini CLI extension. There is no separate MCP server process to install or manage when using Gemini CLI with the AlloyDB extension. When connecting from other IDEs or MCP clients, MCP Toolbox is typically run as a standalone server. When Gemini CLI starts, it loads the AlloyDB extension, which registers MCP servers and exposes a set of database tools.

These tools fall into two broad categories. One set covers administrative and infrastructure operations, such as listing clusters or instances. The other focuses on database-level operations, including schema introspection, query execution, and performance-related metadata. Each operation is exposed as a discrete tool with a well defined input and output.

When a prompt requires database interaction, Gemini selects the appropriate MCP tool and invokes it. The MCP server executes the request against AlloyDB and returns structured results. Gemini then uses those results to produce a natural language response. The model is not guessing schema details or fabricating results. It is working with live data returned by explicit tool calls.

Authentication and authorization are handled outside the prompt flow. Access to Google Cloud APIs uses Application Default Credentials, which map the local user identity to IAM permissions. Database access uses standard PostgreSQL credentials. AlloyDB itself remains a standard PostgreSQL-compatible system. MCP does not change how the database works. It changes how AI tools interact with it.

The setup shown in this post follows the AlloyDB integration approach documented by Google, which describes how Gemini CLI uses MCP Toolbox to connect to AlloyDB instances: https://docs.cloud.google.com/alloydb/docs/connect-ide-using-mcp-toolbox.

Secure Authentication and Access Control with ADC

Authentication is a critical part of this setup, even though it is mostly invisible once configured. Gemini CLI and MCP Toolbox authenticate to Google Cloud using Application Default Credentials, commonly referred to as ADC. Application Default Credentials provide a standard way for local tools and applications to obtain Google Cloud credentials based on the user’s identity and configured environment.

ADC allows local tools to authenticate to Google Cloud APIs using the developer’s own identity. There are no API keys or embedded service account files. Permissions are controlled through standard IAM roles, which determine what infrastructure-level operations MCP tools are allowed to perform.

Database access is handled separately. SQL execution uses standard PostgreSQL authentication with a database user. This separation allows cloud permissions and database permissions to be managed independently, which aligns with how most teams already operate.

This model avoids embedding secrets in prompts or configuration files. It also makes actions auditable using existing cloud and database logging. For AI-assisted database workflows, this separation of identity, authorization, and execution is essential.

Database Introspection Without Writing SQL

With the environment in place, the first useful capability to test is database introspection. Instead of relying on inferred knowledge, MCP allows the database to expose metadata directly through tools.

A simple prompt requesting an overview of the database triggers the MCP database_overview operation. This returns live information such as engine version, uptime, and connection statistics.

The next step is schema discovery. Asking to list all tables causes Gemini to invoke the appropriate MCP tool, which queries the system catalogues directly.

No SQL is written by the user, and no assumptions are made by the model. The interaction is grounded entirely in explicit database operations.

Schema Understanding and Relationships

Once basic schema discovery is available, understanding relationships between tables becomes critical. This is where many natural language approaches struggle if they lack direct access to metadata.

Using MCP, Gemini can describe a table by querying the database schema directly. In this example, the orders table is described, including its columns, primary key, and foreign key relationships.

Because this information comes directly from the database, it remains accurate as the schema evolves. This level of schema awareness is a prerequisite for generating correct analytical queries.

Natural Language to SQL Analytics

With schema awareness in place, higher-level analytical questions can be answered using natural language. In this setup, natural language queries are translated into explicit SQL operations executed through MCP tools.

The first example identifies the top five customers by total order value. Gemini generates a SQL query that joins customers, orders, and order details, aggregates revenue, and sorts the results. The SQL is executed directly against AlloyDB.

A second example introduces a more complex, join-heavy query that calculates revenue by product and supplier.

The key point is transparency. The generated SQL is visible, the execution is real, and the results come directly from the database.

Query Execution Plans and Performance Insight

Beyond query results, understanding how a query is executed is essential for real systems. MCP can surface query execution plans and related metadata.

When asked to explain how a query is executed, Gemini invokes an MCP tool that retrieves the query plan from AlloyDB. The database generates the plan using its normal PostgreSQL logic, and Gemini explains it in natural language.

On small demo datasets, sequential scans and hash joins are expected. On larger datasets, the same workflow can highlight index usage, parallel execution, and tuning opportunities. MCP does not optimize queries automatically. It helps interpret what the database is already doing.

Why MCP Toolbox Is Different from Prompt-Based SQL

Traditional prompt-based SQL relies on the model guessing schema details and query structure. MCP replaces this with explicit, tool-backed operations.

Because schema information, query execution, and performance metadata come directly from the database, results are easier to validate and reason about. This makes MCP-based workflows more suitable for real environments where correctness and control matter.

When to Use MCP Toolbox

MCP Toolbox is well suited for schema exploration, analytics, developer productivity, and assisted query analysis. It is not a replacement for transactional application paths or unrestricted automation. Its strength lies in making AI-assisted database interactions observable and grounded in real operations.

These examples focus on exploration and analysis workflows. In production environments, access controls, permissions, and operational safeguards should be applied in the same way as any other database interaction.

Conclusion

MCP Toolbox for Databases represents a practical step forward in how generative AI can interact with production databases. Rather than relying on prompt-based guesswork, it introduces a structured, tool-driven model where AI works with real schema metadata, executes real queries, and surfaces real execution plans. Combined with AlloyDB and Gemini CLI, this approach makes AI-assisted database exploration and analysis more transparent, auditable, and aligned with how database teams already operate.

As organizations begin to experiment with natural language interfaces for data access, the underlying interaction model matters. MCP provides a foundation that prioritizes correctness, control, and observability, which are essential when AI moves closer to critical data systems.

If you are exploring how to safely introduce AI-assisted workflows into your database environment, or evaluating MCP-based integrations on Google Cloud, DoiT can help. Our team of cloud architects and data specialists works with organizations worldwide to design, validate, and optimize modern data platforms, from proof of concept through production.

Let’s discuss how MCP, AlloyDB, and generative AI can fit into your data strategy, and how to do it in a way that remains secure, reliable, and aligned with your operational goals.

MCP Toolbox provides a structured way for AI tools to interact with databases without relying on guesswork. Combined with Gemini CLI and AlloyDB, it enables natural language workflows that remain transparent, auditable, and aligned with database fundamentals.

Rather than abstracting databases away, MCP exposes them in a way that AI tools can work with safely. For teams exploring how generative AI fits into data workflows, this approach offers a practical path forward.

MCP Toolbox for Databases with AlloyDB: A Hands-on Exploration was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Understanding Google Cloud's New Spend-Based CUD Model

Alex Gkiouros — Thu, 22 Jan 2026 17:36:26 GMT

January 2026 brings simplified billing, expanded discounts, same savings

Continue reading on DoiT »

AWS European Sovereign Cloud: What It Is and Why It Matters

Kate Gawron — Fri, 16 Jan 2026 13:13:30 GMT

Amazon Web Services (AWS) has officially launched the AWS European Sovereign Cloud (ESC), a dedicated cloud infrastructure built entirely within the European Union (EU) and designed to meet the most stringent data residency, operational autonomy, and sovereignty requirements for European organisations. This marks a major step in addressing regulatory concerns around cloud sovereignty, particularly for highly regulated sectors such as government, healthcare, finance, defence, and telecommunications.

What Is the AWS European Sovereign Cloud?

The ESC is an independent AWS cloud environment physically and logically isolated from AWS’s global Regions. It was created to give European organisations the highest levels of data sovereignty, operational control, compliance, and governance, while still offering AWS’s broad cloud and AI capabilities.

Key features of this sovereign cloud include:

Fully EU‑based infrastructure: All core infrastructure, data centres, networking, and operational tooling are physically located within the EU.
Physical and logical separation: It is distinct from AWS’s existing global Regions, with controls to prevent access or dependencies from outside the EU.
European governance and personnel: Operations, technical support, customer service, and governance structures are headed by EU residents under European legal entities.
Compliance and sovereignty framework: AWS provides a Sovereignty Reference Framework and third-party audit reports to demonstrate compliance with stringent sovereignty and regulatory standards.

Technical and Personnel Setup: How It’s Built Differently

The main aim of the ESC is to sever any legal connection to any non-EU entity or personnel so that it becomes impossible to fulfil any data access requests from outside of the EU in line with GDPR. The ESC has been built to ensure that only someone located within the EU can access it, including AWS staff. This extends to the usage of Nitro instances only, as they cannot be accessed at any time by anyone, again making it physically impossible for the data to be taken from the instance without the customer's direct involvement.

Sovereign Infrastructure and Controls

Unlike traditional cloud deployments, the ESC includes several technical constructs to enforce sovereignty:

Dedicated partition and Region naming: The sovereign cloud uses its own AWS partition and Region identifiers, isolating it from global AWS Regions.
Dedicated European trust services: A sovereign European Certificate Authority and local Route 53 DNS infrastructure operate entirely within the EU.
Network isolation: Dedicated connectivity and networking prevent cross-Region traffic from leaving the EU boundary unless explicitly configured otherwise.

Personnel and Governance

Operational control is crucial to sovereignty otherwise, the risk of bad actors from outside of the EU increases:

EU-resident operators: Day-to-day operations, security, and support for the sovereign cloud are carried out exclusively by AWS employees residing in the EU.
Legal entities under EU law: The cloud is governed by European legal entities (e.g., AWS subsidiaries in Germany), overseen by a board with independent European members.
European Security Operations Center (SOC): Security monitoring and incident response are performed by a dedicated European SOC that mirrors AWS’s global security practices.

This combination of EU-based personnel and governance ensures that sensitive operational decisions and access control adhere strictly to European legal frameworks.

Day One Services: What’s Available at Launch

At launch, the AWS ESC is available in Germany, Brandenburg with access granted to AWS Outposts or Local Zones within the EU only. The next expected region is Portugal, with further expansion across Europe subject to demand. The ESC supports a broad set of core AWS services across key categories, making it suitable for many enterprise workloads, however, some areas are not yet represented (any code pipeline or data manipulation tooling for example) which could make it impossible for larger projects to move across:

Compute:

Amazon EC2
AWS Lambda

Containers & Orchestration:

Amazon EKS (Kubernetes)
Amazon ECS

Databases:

Amazon RDS
Amazon DynamoDB
Amazon Aurora

Storage & Networking:

Amazon S3
Amazon EBS
Amazon VPC
Amazon Route53

Security & Identity:

AWS KMS (Key Management Service)
AWS Private Certificate Authority

AI & ML:

Amazon SageMaker
Amazon Bedrock

This lineup is designed to support full application stacks, from modern AI/ML workloads to traditional enterprise systems, but you can easily see that many features commonly used are missing, which makes it harder for complex or niche architectures to migrate. One of the most interesting services here is Route53. AWS has encountered several global outages due to reliance on Route53 services based in us-east-1, so theoretically, this improves reliability for the ESC by decoupling.

How It Protects Data (and What Makes It Unique)

The ESC combines several technical and organisational controls to protect customer data:

Sovereign Data Residency

Unlike merely hosting data in a geographic region, sovereign clouds enforce policy mechanisms ensuring:

All content and metadata (roles, configurations, identifiers) remain within EU boundaries.
Data does not leave the sovereign cloud unless the customer explicitly chooses to transfer it.

Isolation and Security

Let’s talk about AWS Nitro and what makes it special:

Minimal trusted computing base: The Nitro System removes the traditional hypervisor OS, reducing the attack surface by eliminating components such as SSH and shell access. That’s right, no SSH, sysadmins!
Hardware-based isolation: Functions such as networking and storage are offloaded to dedicated Nitro cards, separating them from compute resources and enhancing security boundaries.
Strong tenant isolation: Each EC2 instance runs in its own isolated environment, using either the lightweight Nitro Hypervisor or bare metal with no shared resources.
Secure boot and root of trust: Nitro uses cryptographic validation at every boot stage to ensure the system has not been tampered with.
No operator access to instance memory or storage: Even AWS personnel cannot access customer workloads, ensuring tenant isolation by design. Even if someone gained physical access to an AWS data center and attempted to access the server, they would still be unable to get anything useful from it, as even the memory is encrypted.

The Nitro System is what has made the ESC technically possible.

Operational Separation

No operational dependencies exist outside the EU. Tools, access logs, and control planes are all EU-centric, insulating operations from foreign jurisdictional access.

In contrast to other clouds that might simply offer European data centres, AWS’s sovereign cloud is built to provide certifiable operational autonomy aligning with EU digital sovereignty goals.

Migrating to the ESC

Migrating to the ESC is much more complex than to a different region. The ESC has no connectivity to other regions and is entirely isolated. Organizations must treat it as a distinct cloud environment with a separate partition, which may require reconfiguring or duplicating tools, services, and automation scripts. Networking presents one of the most significant migration challenges, as traditional cross-region VPC peering, Transit Gateways, or shared services patterns are not supported across the sovereign and standard AWS environments. This demands rearchitecting for intra-region networking only, deploying duplicate networking stacks like firewalls, NATs, and DNS resolvers within the new environment.

Additionally, data transfer mechanisms such as AWS Snowball or secure API-based replication may be needed, since direct pipelines or peering to existing AWS environments are restricted. Organizations should also prepare for strict identity and access management (IAM) remapping, endpoint reconfiguration, and potential latency shifts due to isolated infrastructure.

Trade-offs and Considerations

While the ESC brings significant benefits, there are trade-offs:

Complexity of Setup

Organisations may need to adjust IAM roles, accounts, and tooling to align with a separate sovereign partition.
Existing global accounts don’t automatically extend into the sovereign cloud.

Potential Feature Lag

Not every new AWS service will be available immediately in the sovereign cloud. AWS will prioritise critical services first.
Potential for the latest patches and versions being delayed, which could increase security risk.

Cost and Operational Overhead

Sovereign deployments often have different pricing structures and currency considerations (e.g., billing in EUR).

Regulatory Uncertainty

As EU digital sovereignty frameworks evolve (e.g., data governance laws), compliance standards may change, requiring ongoing adjustments.

Use Cases and Target Markets

The European Sovereign Cloud is especially relevant for organisations with strict sovereignty needs:

Highly Regulated Industries

Government: Public sector agencies with national security data requirements.
Healthcare: Systems handling patient data subject to GDPR and local data protection laws.
Financial Services: Banking and insurance platforms needing strict operational controls.
Telecommunications & Energy: Critical infrastructure operators managing sensitive operational data.

Emerging Tech and AI Workloads

AI workloads that process sensitive datasets (e.g., healthcare analytics, industrial AI) benefit from sovereign controls while leveraging AWS’s advanced AI services.

Cross-border EU Organisations

Multinational EU enterprises that must comply with diverse national laws can centralise sovereignty compliance within a unified EU cloud footprint.

Unknowns and Future Questions

Despite the significant advancement, some aspects remain unclear:

Service Roadmap Timing: How quickly newer AWS services will be certified for the sovereign cloud environment.
Third-party Ecosystem Support: How soon partner tools and ISV solutions will become fully supported.
Long-term Regulatory Alignment: How evolving EU digital policies might impact cloud sovereignty requirements and compliance obligations.

Conclusion

The AWS European Sovereign Cloud introduces a new era of data autonomy for EU-based organisations, offering strict residency, operational independence, and compliance frameworks. Built with physically isolated infrastructure and managed exclusively by EU personnel, it’s tailored for highly regulated industries and public sector needs. Migrating requires a fresh approach, especially in networking, IAM, and tooling. And if you’re building apps or automation, get used to using aws-eusc and eusc-de-east-1 =API calls from now on. This isn’t just another AWS Region, it’s a sovereign ecosystem.

If you are considering a migration to the ESC, you are not alone. DoiT International is here to help you assess, plan and migrate with a strong focus on your business outcomes. With over 130 senior cloud experts specializing in crafting customized cloud solutions, our team is ready to help you navigate this process smoothly and optimize your infrastructure to ensure compliance and meet future demands efficiently.

Our experts are ready to provide you with strategic guidance and technical expertise every step of the way. The ESC will likely complicate your FinOps too. DoiT specializes in helping customers improve their cost visibility, management and savings in even the most complex cloud setups.

Let’s discuss what makes the most sense for your company during this policy enforcement phase, ensuring your cloud infrastructure is robust, compliant, and optimized for success. Contact us today.

AWS European Sovereign Cloud: What It Is and Why It Matters was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Streamline Cloud Management via DoiT CLI

Luca Serpietri — Fri, 16 Jan 2026 10:03:08 GMT

How the DoiT Cloud Intelligence CLI unlocks superior cloud management

We have just released the DoiT Cloud Intelligence CLI and you wouldn’t imagine the smile this announcement put on my face. Maybe I’m dating myself here but, I grew up in a world where bash scripting felt like a superpower and those who were eloquent in grep at the time were revered as we now revere the most acute data analyst. But I’m digressing.

Immediately, I jumped at the opportunity to take our newly released CLI for a spin and from the get-go, I learned something. We used restish as the bedrock for its implementation: as long as your API is based on OpenAPI (and DoiT’s is), it will handle schema discovery and authentication.

The Setup

To get started, I followed our documentation, installed restish via brew on my terminal and configured to work with the DoiT APIs. The default profile allows for call customizations which I didn’t use at the beginning but I’m showing here as some of you might find interesting.

It’s now time to verify everything is working correctly so I’ve validated my login using the specific command restish dci validate which triggered the OAuth workflow (shout-out to Browserosaurus, an amazing browser selector I cannot live without)

After logging in my DoiT Cloud Intelligence tenant, all looked good: I’m ready to script my way through superior cloud management.

The Goal

For my first use case I’ve blatantly copied what our customers have struggled with for years before partnering with DoiT: cost accountability.

I wanted to ensure a smooth, simple way for customers to:

Invite a new user on DoiT Cloud Intelligence;
Create an Allocation based on their username that would track the costs they generated;
Create a budget tracking the newly created Allocation that will send notifications if a specific threshold is met.

This will allow any new user to have their own personal feedback loop on cloud costs, with minimal friction. So let’s get started.

The Implementation

Inviting the user

This is as simple as it gets, the command is:

restish dci invite-user email:

you have e-mail

Creating a user-dedicated Allocation

If you’re new around here and you don’t know about Allocations, I highly recommend you get familiar with them.

The TL;DR is: Allocations allow users to logically group cloud costs based on any available data dimension. In our case, I’d like to logically group all the cloud costs generated by our newly created user, so they can easily monitor their spend.

The command is

restish dci create-allocation < allocation_conf.json

but we need to pass a few parameters as configuration. I’m going to slightly cheat and assume that there’s a Tag key available in our billing data called member and each value corresponds to a cloud engineer (in our case, luca+test )

The conf.json file is going to be as follows:

{
    "name": "luca+test generated costs",
    "description": "All costs generated by luca+test",
    "rule": {
        "formula": "A OR B",
        "components": [
            {
                "key": "member",
                "type": "label",
                "mode": "is",
                "values": [
                    "luca+test"
                ]
            }
        ]
    }
}

Running the command provides the Allocation id which will come in handy shortly.

Creating a user-dedicated Budget

The sound of the word “budget” instills fear into most of the engineers I talk to: it’s a synonym for slowing down innovation through restrictions and barriers.

While I acknowledge this might have happened in the past, the way Budgets are implemented in DoiT Cloud Intelligence creates a productive feedback loop between cloud engineers and their costs, where bad surprises are practically extinct and awareness is the key mantra.

If you want to learn more, please head over our Documentation where we dive deeper into all their functionality.

In my use case, I want to keep it simple. I just need a simple monthly budget of $1000 to make sure newly-created users know their spend and are proactively alerted if the spend we identified through the Allocation is hitting 80% of the budget.

Once more, it’s pretty straight-forward:

restish dci create-budget < budget_conf.json

and the budget_conf.json looks as follows:

{
    "name": "luca+test - Monthly Budget",
    "scope": [""],
    "amount": 1000,
    "type": "recurring",
    "timeInterval": "month",
    "startPeriod": $(( $(date -u -v1d -v0H -v0M -v0S +%s) * 1000 )),
    "currency": "USD",
    "alerts": [{"percentage": 80}],
    "collaborators": [
        {
            "email": "",
            "role": "owner"
        }
    ]
}

A couple of things worth noting here:

The startPeriod value should be expressed in seconds since epoch. I’ve included in the configuration file the command on how to extract that.
You can have multiple alerts set at different percentage thresholds: in this case, we have limited ourselves at one.

Running the command results in a successful response

no more surprises!

The WebUI shows the tracked data and relative thresholds

This concludes my initial exploration of the DoiT Cloud Intelligence CLI, but sky is the limit. Imagine combining these steps into an onboarding script!

Maybe that’s a great idea for another blog. In the meantime get the DoiT Cloud Intelligence CLI installed and do not hesitate to reach out to us for any support.

Streamline Cloud Management via DoiT CLI was originally published in DoiT on Medium, where people are continuing the conversation by highlighting and responding to this story.