Stories by Chris St. John on Medium

AI Chat Coding Essentials — Adding Tools (AI Agent Coding Series #3)

Chris St. John — Sun, 25 Jan 2026 22:41:34 GMT

AI Chat Coding Essentials - Adding Tools (AI Agent Coding Series #3)

Single-Turn example of adding tooling and API calls to your AI Chat requests, complete tool-calling flow.

So far, we did a quick review of how OpenAI Chat Completion API works.

We started with that just so we were sure to know the fundamentals before building on it.

Now we start getting into the fun stuff that leads to understanding AI Agents.

Function calling, “tools”, is how LLMs interact with the real world.

Function calling is a mechanism where the LLM can request to execute external functions by returning structured data (function name + arguments) instead of just text, enabling interaction with APIs, databases, and external systems.

We are still using OpenAI Chat Completion SDK but getting increasingly more advanced.

We’re following this path to get a better understanding of how AI agents work. These principles can be applied to other different and more sophisticated agent SDKs as well, but this uses the most popular OpenAI API.

The principles are the same.

Function calling is the foundation of all AI agents.
This tutorial helps you understand the request/response cycle that gets more complex and we build on for AI agents.
Reviews and demonstrates practical async API knowledge.
By the end you have a couple basic examples of how to use it.

Let’s look at the steps for getting this working.

Project setup.
Tool schema definition.
Adding multiple tools.
Sending tools and parsing response.
Response structure when tool is called.
Parsing tool calls (OpenAI SDK v6.x).
Executing functions with error handling.
Sending tool results back — single-turn tool loop.
Single-turn template — simple example (mock data).
Single-turn template — actual API call example.

🛠️ Get more tips like this at https://www.systemsarchitect.io and follow new articles in the series here. 🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

If it feels a bit slow or tedious to learn this, a few KEY points:

We are building on past lessons, this makes it easier to learn and will make your understanding more complete. For example, even the setup I tell you below you can copy-paste from last lesson.
I developed this code in an afternoon, it should be even quicker for you to copy-paste what I provided here.
It’s all logical once you understand the steps and tool calling loop.
We build on this for future lessons.

Are we using the advanced code class version from Article #2?

No. We are going to circle back to that one in a couple articles, but right now I want us to get the function calling for the tools down.

Previous articles in the series:

AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)

AI Chat Coding Essentials — Advanced version (AI Agent Coding Series #2)

AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)

Single-turn vs. Multi-turn in Agentic AI

Just to be clear of the terminology of what single/multiple-turn means:

Single-turn: The entire tool resolution happens within one user message. The program/agent handles the back-and-forth internally (LLM → tool → LLM) and returns the final answer without asking the user anything else.
Multi-turn: The conversation requires additional user messages at some point during tool use. For example, if the user is booking a hotel it returns the hotel options and asks for the user’s input for which option and then continues.

This tutorial is for Single-turn AI agent workflows:

Basic Tool Request/Response Flow

User message →
LLM →
Tool calls →
Execute →
Tool result →
LLM →
Final answer

1. Project setup

To save time and space here I am not going too deep into the initial setup — I covered that here: AI Chat Coding Essentials Advanced version (AI Agent Coding Series #2)

I copied and pasted my package.json and tsconfig.json from the last project we did (url above). That works!

But if you do not have that, you need to have these dependencies in package.json:

{
  "dependencies": {
    "dotenv": "^17.2.3",
    "openai": "^6.15.0"
  },
  "devDependencies": {
    "@types/node": "^25.0.3",
    "ts-node": "^10.9.2",
    "typescript": "^5.9.3"
  }
}

and a tsconfig.json to configure ts-node for CommonJS modules (this can vary but usually has this):

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "CommonJS",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true,
    "outDir": "./dist"
  },
  "ts-node": {
    "transpileOnly": true
  }
}

Store your API key in a .env file at your project root:

OPENROUTER_API_KEY=your_api_key_here

So the start of your script — you can call it single-turn-tool.ts or something similar.

Single turn means where we are making this code to make one tool call loop, not multiple tool calls.

import dotenv from "dotenv";
import path from "path";

// Load .env from project root (adjust path as needed)
dotenv.config({ path: path.resolve(__dirname, "../../.env") });

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

// Use model: "openai/gpt-4o" for OpenAI models via OpenRouter

I am using OpenRouter as I mentioned in previous articles, check the first one in the series if you are not sure what this means.

That is why we have baseURL: “https://openrouter.ai/api/v1".

If you want to call OpenAI directly use their API key (from their API portal/site) and remove the baseURL.

2. Tool schema definition

Tools are the bridge between natural language and executable code.

When you define a tool schema, you’re telling the LLM: “Here’s a function you can request, here’s what it does, and here’s exactly what arguments it needs.”

The LLM reads your schema to understand:

What the tool does (from the description).
What inputs it needs (from the parameters). All parameters are optional by default. Add parameter names to the required array to make them mandatory. Only required params are listed in that array.
What values are valid (from types, enums, and required fields). Enum is used if you want to specify that it must be of a custom value.

location and unit are names that can vary — you set that based on your API.

// A tool is defined with: name, description, parameters (JSON Schema)
const tools: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City name, e.g. 'San Francisco, CA'"
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "Temperature unit"
          }
        },
        required: ["location"]
      }
    }
  }
];

3. Adding multiple tools (optional)

You can also add multiple tools that can be called by the LLM.

Here is an example (up to you if you want to add this, it’s optional):

const tools: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string", description: "City name" }
        },
        required: ["location"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "search_database",
      description: "Search for records in the database",
      parameters: {
        type: "object",
        properties: {
          query: { type: "string", description: "Search query" },
          limit: { type: "number", description: "Max results to return" }
        },
        required: ["query"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "send_email",
      description: "Send an email to a recipient",
      parameters: {
        type: "object",
        properties: {
          to: { type: "string", description: "Recipient email" },
          subject: { type: "string", description: "Email subject" },
          body: { type: "string", description: "Email body" }
        },
        required: ["to", "subject", "body"]
      }
    }
  }
];

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

4. Sending tools and parsing response

Once you’ve defined your tools, the next step is sending them to the API and understanding what comes back.

This is where most developers trip up — the response structure has several gotchas that can cause runtime errors if you’re not prepared.

⚠️ Gotcha: This is where a lot of errors happen initially. ⚠️

When the model decides to call a tool, it DOES NOT return text.
Instead, it returns a structured tool_calls array containing the function name and arguments as a JSON string (NOT an object).
You must parse this string and handle the response correctly.

This is what this part looks like below:

import dotenv from "dotenv";
import path from "path";

dotenv.config({ path: path.resolve(__dirname, "../../.env") });

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

async function callWithTools() {
  const tools: OpenAI.Chat.ChatCompletionTool[] = [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string", description: "City name" }
          },
          required: ["location"]
        }
      }
    }
  ];

  const response = await client.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      { role: "user", content: "What's the weather in Tokyo?" }
    ],
    tools,
    tool_choice: "auto"  // "auto" | "none" | "required" | { type: "function", function: { name: "..." } }
  });

  console.log(response.choices[0].message);
  console.log("Finish reason:", response.choices[0].finish_reason);
}

5. Response structure when tool is called

This is the structure of the first response when a tool is called:

// When the model decides to call a tool, you get:
{
  role: "assistant",
  content: null,  // Often null when calling tools
  tool_calls: [
    {
      id: "call_abc123",           // Unique ID - you need this later
      type: "function",
      function: {
        name: "get_weather",       // Which function to call
        arguments: "{\"location\": \"Tokyo\"}"  // JSON string - must parse!
      }
    }
  ]
}
// finish_reason will be "tool_calls"

6. Parsing tool calls (OpenAI SDK v6.x)

This is how you parse the tool calls. This is really important to learn because a lot of errors come up in this part of the process.

Important: In OpenAI SDK v6.x, tool_calls is a union type that includes both function-type and custom-type tool calls.

To parse: const args = JSON.parse(toolCall.function.arguments)
It should say finish_reason: “tool_calls” if it says "stop" then it returned no text and no tools.
You must add a type guard to access the function property.
You need the type guard because OpenAI SDK v6.x introduced a union type that includes custom tool calls. So we need to check to make sure we got back toolCall.function

async function parseToolCalls() {
  const response = await client.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
    tools
  });

  const message = response.choices[0].message;

  // Check if model wants to call tools
  if (message.tool_calls) {
    for (const toolCall of message.tool_calls) {
      // Type guard required for OpenAI SDK v6.x
      if (toolCall.type !== 'function') continue;

      const functionName = toolCall.function.name;
      const args = JSON.parse(toolCall.function.arguments);  // IMPORTANT: Parse JSON string
      const callId = toolCall.id;

      console.log(`Tool: ${functionName}`);
      console.log(`Args:`, args);
      console.log(`ID: ${callId}`);
    }
  } else {
    // Model responded with text, no tool call
    console.log("Response:", message.content);
  }
}

When would you use tool_choice: "required" vs "auto"?

Use "required" when you know the user's request needs a tool (e.g., "get weather" must call weather API). Use "auto" for general queries where the model should decide.

7. Executing functions with error handling

The LLM doesn’t execute code — it only requests that you do. When you receive a tool_calls response, your code is responsible for:

Mapping the function name to an actual implementation.
Passing the parsed arguments to that function.
Returning a stringified result.

Key point in this section is the function registry pattern — a clean way to organize your tool implementations and dispatch calls dynamically.

// Define your actual functions
function getWeather(location: string): string {
  // In real code, call a weather API
  // This is to simulate the response
  return JSON.stringify({
    location,
    temperature: 72,
    unit: "fahrenheit",
    conditions: "sunny"
  });
}

function searchDatabase(query: string, limit: number = 10): string {
  // Simulate database search
  return JSON.stringify({
    query,
    results: [
      { id: 1, title: "Result 1" },
      { id: 2, title: "Result 2" }
    ],
    total: 2
  });
}

// Function registry - map names to implementations
const functionRegistry: Record = {
  get_weather: (args: { location: string }) => getWeather(args.location),
  search_database: (args: { query: string; limit?: number }) =>
    searchDatabase(args.query, args.limit)
};

// Execute a tool call
function executeTool(name: string, args: Record): string {
  const fn = functionRegistry[name];
  if (!fn) {
    return JSON.stringify({ error: `Unknown function: ${name}` });
  }
  return fn(args);
}

You can add error handling with below in place of above executeTool:

function safeExecuteTool(name: string, argsJson: string): string {
  try {
    const args = JSON.parse(argsJson);
    const fn = functionRegistry[name];

    if (!fn) {
      return JSON.stringify({ error: `Unknown function: ${name}` });
    }

    const result = fn(args);
    return typeof result === "string" ? result : JSON.stringify(result);
  } catch (error) {
    return JSON.stringify({
      error: `Execution failed: ${error instanceof Error ? error.message : "Unknown error"}`
    });
  }
}

8. Sending tool results back — single-turn tool loop

This is where everything comes together. After executing your functions, you need to send the results back to the LLM so it can generate a final response.

⚠️ Required: each tool result must be linked to its original request via the tool_call_id.

Flow:

User message → LLM → tool_calls → Execute → Tool result → LLM →
Final answer

9. Single-turn tool template — example (mock data).

You can run this code below to try the process first with mock data.

Create single-turn-tool.ts


import dotenv from "dotenv";
import path from "path";

dotenv.config({ path: path.resolve(__dirname, "../../.env") });

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

// 1. Define one tool
const tools: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" }
        },
        required: ["city"]
      }
    }
  }
];

// 2. Implement the function
function getWeather(city: string): string {
  // Simulated - replace with real API in production
  const data = { city, temp: 72, conditions: "sunny" };
  return JSON.stringify(data);
}

// 3. The complete single-turn loop
async function chat(userMessage: string): Promise {
  // Build initial messages
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: userMessage }
  ];

  // First LLM call
  const response = await client.chat.completions.create({
    model: "openai/gpt-4o",
    messages,
    tools
  });

  const assistantMessage = response.choices[0].message;

  // Check if tools were called
  if (!assistantMessage.tool_calls) {
    // No tools - return direct response
    return assistantMessage.content || "";
  }

  // Tools were called - execute them
  messages.push(assistantMessage); // Add assistant's tool request

  for (const toolCall of assistantMessage.tool_calls) {
    // Type guard required for OpenAI SDK v6.x
    if (toolCall.type !== 'function') continue;

    const args = JSON.parse(toolCall.function.arguments);
    const result = getWeather(args.city); // Execute

    messages.push({
      role: "tool",
      tool_call_id: toolCall.id,
      content: result
    });
  }

  // Second LLM call with tool results
  const finalResponse = await client.chat.completions.create({
    model: "openai/gpt-4o",
    messages,
    tools
  });

  return finalResponse.choices[0].message.content || "";
}

// Test it
async function main() {
  console.log(await chat("What's the weather in Paris?"));
}
main();

Reminders:

Type guard for SDK v6.x — Check toolCall.type !== 'function' before accessing toolCall.function
tool_call_id is required — Must match the id from the tool_call
Arguments are JSON string — Always JSON.parse(toolCall.function.arguments)
Tool results are strings — Always stringify your function output
Message order matters — assistant (with tool_calls) -> tool responses -> next call

10. Single-turn template — actual API call example.

In this we are expanding that example.

We are using a couple free API endpoints, both with open-meteo. You should not have to signup for it, it just works.

api.open-meteo.com — for weather info.
geocoding-api.open-meteo.com — geocoding city to coordinates.

Summary of steps in the code:

Setup: Import dotenv, OpenAI SDK, validate API key exists
Define tools array: Each tool has type: "function", name, description, parameters (JSON Schema)
Implement tool functions: Async functions that return objects (real APIs) or sync for local operations
Create function registry: Record mapping names to implementations
Build executeTool(): Parse args JSON, lookup function, await result, stringify output, handle errors
Initialize messages: System prompt + user message in ChatCompletionMessageParam[]
First LLM call: client.chat.completions.create() with messages + tools
Check for tool_calls: If none, return message.content directly
Execute tools: Push assistant message, loop through tool_calls with type guard, await each, push tool results
Second LLM call: Same endpoint with updated messages, return final message.content

import dotenv from "dotenv";
import path from "path";

dotenv.config({ path: path.resolve(__dirname, "../../.env") });

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const tools: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name, e.g. 'Paris' or 'New York'" }
        },
        required: ["city"]
      }
    }
  }
];

// REAL API implementation (Open-Meteo - free, no API key required)
async function getWeather(city: string): Promise {
  try {
    // Step 1: Geocode city name to coordinates
    const geoUrl = `https://geocoding-api.open-meteo.com/v1/search?name=${encodeURIComponent(city)}&count=1`;
    const geoResponse = await fetch(geoUrl);
    const geoData = await geoResponse.json();

    if (!geoData.results || geoData.results.length === 0) {
      return JSON.stringify({ error: `City not found: ${city}` });
    }

    const { latitude, longitude, name, country } = geoData.results[0];

    // Step 2: Get current weather
    const weatherUrl = `https://api.open-meteo.com/v1/forecast?latitude=${latitude}&longitude=${longitude}¤t=temperature_2m,relative_humidity_2m,weather_code,wind_speed_10m&temperature_unit=fahrenheit`;
    const weatherResponse = await fetch(weatherUrl);
    const weatherData = await weatherResponse.json();

    const current = weatherData.current;

    // Weather code mapping (simplified)
    const weatherCodes: Record = {
      0: "Clear sky",
      1: "Mainly clear", 2: "Partly cloudy", 3: "Overcast",
      45: "Foggy", 48: "Depositing rime fog",
      51: "Light drizzle", 53: "Moderate drizzle", 55: "Dense drizzle",
      61: "Slight rain", 63: "Moderate rain", 65: "Heavy rain",
      71: "Slight snow", 73: "Moderate snow", 75: "Heavy snow",
      95: "Thunderstorm"
    };

    return JSON.stringify({
      city: name,
      country,
      temperature: current.temperature_2m,
      unit: "fahrenheit",
      humidity: current.relative_humidity_2m,
      conditions: weatherCodes[current.weather_code] || "Unknown",
      wind_speed_mph: current.wind_speed_10m
    });
  } catch (error) {
    return JSON.stringify({
      error: `Failed to fetch weather: ${error instanceof Error ? error.message : "Unknown error"}`
    });
  }
}

// The tool loop - now with async tool execution
async function chat(userMessage: string): Promise {
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: "system", content: "You are a helpful weather assistant." },
    { role: "user", content: userMessage }
  ];

  const response = await client.chat.completions.create({
    model: "openai/gpt-4o",
    messages,
    tools
  });

  const assistantMessage = response.choices[0].message;

  if (!assistantMessage.tool_calls) {
    return assistantMessage.content || "";
  }

  messages.push(assistantMessage);

  for (const toolCall of assistantMessage.tool_calls) {
    if (toolCall.type !== 'function') continue;

    const args = JSON.parse(toolCall.function.arguments);
    const result = await getWeather(args.city);  // await for async tool!
    console.log(`[Tool] get_weather("${args.city}") =>`, result);

    messages.push({
      role: "tool",
      tool_call_id: toolCall.id,
      content: result
    });
  }

  const finalResponse = await client.chat.completions.create({
    model: "openai/gpt-4o",
    messages,
    tools
  });

  return finalResponse.choices[0].message.content || "";
}

// Test with real data
async function main() {
  console.log(await chat("What's the weather in Paris?"));
}
main().catch(console.error);

Loads environment variables from a .env file located two directories up.
Creates an OpenAI client pointed at OpenRouter (not official OpenAI endpoint).
Defines one tool → get_weather function with city parameter.
Implements the real get_weather function using Open-Meteo free API.
First fetches latitude/longitude of the city using geocoding API.
Then requests current weather (temp °F, humidity, wind, weather code).
Converts WMO weather code to human-readable condition string.
Returns weather info as JSON string (or error message).
In chat(): sends user message → model may call tool → executes real getWeather → feeds result back → gets final answer.

🚀 Looking forward!

That was awesome, we covered a lot with this one and it’s a great base case for single-turn AI function calling.

We’ll soon be exploring possible enhancements that are more advanced in future articles.

Parallel tool calls — Handle multiple tool_calls in a single LLM response
Multi-turn agent loop — Loop until tool_calls is empty (not fixed 2-call pattern)
Agentic loop pattern — while(true) → call LLM → execute tools → repeat
Safety guardrails — maxIterations, maxToolCalls, and token limits
Parallel async execution — Use Promise.all() to execute tools concurrently
Tool result batching — Collect ALL tool results before next API call
Error handling — Return errors as JSON, let LLM decide recovery

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

Previous articles in the series:

AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)

AI Chat Coding Essentials — Advanced version (AI Agent Coding Series #2)

AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Savings: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

I’ve worked 20+ years in software development, both in an enterprise setting such as NIKE and the original MP3.com, as well as startups like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.

My experience ranges from cloud ecommerce, API design/implementation, serverless, multiple AI integration for development, content management, frontend UI/UX architecture and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.

AI Chat Coding Essentials — Adding Tools (AI Agent Coding Series #3) was originally published in AI Dev Tips on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)

Chris St. John — Thu, 08 Jan 2026 14:37:12 GMT

AI Chat Coding Essentials — Advanced version (AI Agent Coding Series #2)

An advanced coding TypeScript option for AI Chat completions using a TypeScript class-based approach with AsyncGenerator

I like to share what I learn, as I build — and I build every day.

Many of these articles I write because I have a related project or interest and I am simply documenting it so I can share and remember it!

While writing the last article (#1 in the series), I left out some more advanced TypeScript code patterns. I didn’t want to confuse those getting up to speed initially. And, we wanted to see immediate results.

That article uses functional programming only, which has the advantage of simplicity, speed and less boilerplate for a solution.

Now, let’s look at a more advanced style of coding often used in bigger companies, enterprise-level, where there may be more complexity and collaboration involved. Also, you may get asked about this in an interview.

We can fluctuate between easier functions and this class-based style.

In this article, our code is class-based and uses dependency injection to make it easy to re-use the same logic with various AI LLM models, even simultaneously for a variety of users. The class is designed to be flexible enough to be used with singleton or factory patterns when needed.

I think it’s a fun and legitimate code example to learn more advanced real-world techniques, especially if you are not familiar with this style of coding.

This code is OPTIONAL, you can still stick with the functional approach of the last article if you want. But it shines as being more extensible and usable in a professional enterprise environment, for modularity, extensibility, collaboration and testing.

This kind of advanced refactoring is what really puts the “hands-on” into our self-description of “hands-on” engineers.

We’re not afraid to dig in deep, get in the code and understand how to take basic code to expert status.

We do an advanced topic walkthrough for this article, to explain what this code means and key benefits. Is it difficult? Yes, it will be a challenge for some people. At the very least we’ll both learn more about other options.

Another thing I want to emphasize here is: you do NOT need to memorize all this. Quite honestly, a big part of the reason I’m even doing this article is so I can remember how/why we would use this modular and more advanced version 😅 — the class-based approach with AsyncGenerator.. Even if you are using an AI coding assistant, you will now know the advantages, and then can specify a prompt for this type of code, or refactor existing code.

What we are covering:

Why a Class-Based Approach?
The ChatClient Architecture (Basics)
Why Use AsyncGenerator for Streaming?
Dependency Injection — needed?
Chat History Management.
Extended Enterprise Version — FINAL File.
Example Usage of the Class — RUN IT!

✅ Goal: I am starting with AI Chat completion APIs so users have a good base of knowledge to for more Agent coding. There are also some good no/low-code solutions for agents (we may cover some later), but the idea is to set a solid foundation of understanding for future custom coding. 🚀

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

1. Why a Class-Based Approach?

The simple function-based examples in the previous (#1) article work great for learning and simple application, but real enterprise applications need:

State Management: Track conversation history across multiple calls.
Configuration: Different settings for different use cases (coding assistant vs creative writer).
Encapsulation: Private fields prevent external mutation; methods like getHistory() return safe copies.
Reusability: Create multiple instances with different configurations.
Testability: Easier to mock and test class instances with different dependency injections (for example, configs).
Extensibility: Add features like retry logic, logging, caching without changing the core API.
Token estimates: Keeps token-counting/trimming logic tightly coupled to the history it manages.
Multi-conversation support: Spawn multiple instances with different models/prompts simultaneously.

2. The ChatClient Architecture (Basics)

Below is the initial code we’ll use for our class-based AI chat architecture.

This is an “initial” sketch, there may be some minor type errors if you run this first code block now.

We’ll create the full completed code near the end of the article.

If you are not familiar with class-based TypeScript — perhaps do a brief review of W3 Schools TS Classes

import OpenAI from 'openai';
import dotenv from 'dotenv';

dotenv.config({ path: '../../.env' });

// Configuration interface with optional properties
// This allows partial configuration - only override what you need
interface ChatConfig {
  model?: string;
  temperature?: number;
  maxTokens?: number;
  systemPrompt?: string;
}

// Default configuration - sensible defaults for most use cases
const defaultConfig = {
  model: 'openai/gpt-4-turbo',
  temperature: 0.7,
  maxTokens: 2000,
  systemPrompt: 'You are a helpful assistant.',
};

// Discussed throughout article why to use classes, go with it for now
class ChatClient {
  // Private properties - encapsulation prevents external mutation
  private client: OpenAI;
  private config: Required;  // Required makes all properties non-optional
  private conversationHistory: OpenAI.Chat.ChatCompletionMessageParam[];

  constructor(config: ChatConfig = {}) {
    // Initialize the OpenAI client once, reuse for all requests
    this.client = new OpenAI({
      baseURL: 'https://openrouter.ai/api/v1',
      apiKey: process.env.OPENROUTER_API_KEY,
    });

    // Merge user config with defaults using spread operator
    // User values override defaults
    this.config = { ...defaultConfig, ...config };

    // Initialize history with system prompt
    this.conversationHistory = [
      { role: 'system', content: this.config.systemPrompt }
    ];
  }


  getHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {
    return [...this.conversationHistory];  // Return a copy, not the original
  }

  // Reset conversation while keeping configuration
  reset(): void {
    this.conversationHistory = [
      { role: 'system', content: this.config.systemPrompt }
    ];
  }

 // Trim history to manage token usage (advanced)
  trimHistory(keepLastN: number): void {
    const systemMessage = this.conversationHistory[0];
    const recentMessages = this.conversationHistory.slice(-keepLastN);
    this.conversationHistory = [systemMessage, ...recentMessages];
  }

  // Non-streaming chat - simple request/response
  async chat(userMessage: string): Promise {
    // Add user message to history BEFORE the API call
    this.conversationHistory.push({ role: 'user', content: userMessage });

    const response = await this.client.chat.completions.create({
      model: this.config.model,
      messages: this.conversationHistory,
      temperature: this.config.temperature,
      max_tokens: this.config.maxTokens,
    });

    const assistantMessage = response.choices[0].message.content || '';

    // Add assistant response to history AFTER receiving it
    this.conversationHistory.push({ role: 'assistant', content: assistantMessage });

    return assistantMessage;
  }

  // Streaming chat - returns an AsyncGenerator
  async *streamChat(userMessage: string): AsyncGenerator {
    this.conversationHistory.push({ role: 'user', content: userMessage });

    const stream = await this.client.chat.completions.create({
      model: this.config.model,
      messages: this.conversationHistory,
      temperature: this.config.temperature,
      max_tokens: this.config.maxTokens,
      stream: true,
    });

    let fullResponse = '';

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content || '';
      fullResponse += content;
      yield content;  // Yield each chunk as it arrives
    }

    this.conversationHistory.push({ role: 'assistant', content: fullResponse });
    return fullResponse;  // Return the complete response at the end
  }
}

Key points of this code:

streamChat uses an AsyncGenerator. A generator is a function that can pause execution and resume later.
Private properties: Prevents external code from corrupting internal state.
Required: All config values exist after merging with defaults
History managed internally: Simplifies API — callers don’t need to track messages
Separate chat and streamChat: Clear distinction between sync-like and streaming patterns

3. Why Use AsyncGenerator for Streaming?

The AsyncGenerator is different from what you may be used to. Strictly speaking, it’s not required for an AI chat, but it does present some benefits in an enterprise setting.

Memory efficient: Process chunks as they arrive, don’t buffer entire response.
Real-time: UI can show text as it’s generated.
Backpressure control: Consumer controls the pace.
Composable: Can transform, filter, or combine streams easily.

How it works:

async *streamChat(userMessage: string): AsyncGenerator {
  // AsyncGenerator
  //   YieldType: string - each chunk we yield
  //   ReturnType: string - the final return value
  //   NextType: unknown - can be used to pass values to the generator

// ....continue with any other setup code here

Let’s look at the parts of this:

async *streamChat(userMessage: string): AsyncGenerator

async: can use await, returns promises
*streamChat: function name and designated with * as a generator, which means it can yield multiple values.
AsyncGenerator: This is the type and the YieldType, ReturnType and NextType.

The actual streaming part of the code is:

for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    fullResponse += content;

    yield content;  // <-- Pause here, send chunk to consumer
                    //     Resume when consumer asks for next chunk
  }
  return fullResponse;  // <-- Final return (accessible via .return())
}

The above is the most common method of outputting from an async generator.

A different way of doing it (instead of the code above) would be using .next()

// Method 2: Manual iteration (more control)
const generator = assistant.streamChat('Tell me a story');
let result = await generator.next();
while (!result.done) {
  console.log('Chunk:', result.value);
  result = await generator.next();
}
console.log('Final:', result.value);  // The return value

4. Dependency Injection — needed?

A top advantage of dependency injection is testability in more of an enterprise setting. We can easily inject various configs into the code, and it’s kept very neat and modular.

Again, to emphasize, like with this whole article, you can still do AI Chat without dependency injection, it just will be less modular and less enterprise-ready. The advanced version may help in a job interview, or to take the lead on a project.

See here that we inject defaultConfig and config from outside the class. This makes it more modular and testable.

// Configuration injected, testable, customizable

const defaultConfig = {
  model: 'openai/gpt-4-turbo',
  temperature: 0.7,
  maxTokens: 2000,
  systemPrompt: 'You are a helpful assistant.',
};

class ChatClient {
  constructor(config: ChatConfig = {}) {
    this.config = { ...defaultConfig, ...config };
    // ...
  }
}

5. Chat History Management

The ChatClient maintains conversation history automatically. This enables multi-turn conversations where the AI remembers context.

If you look in the code you will see for both non-streaming and streaming we add to the conversationHistory array:

this.conversationHistory.push({ role: 'user', content: userMessage });

Real world scenario:

const assistant = new ChatClient();

await assistant.chat('My name is Chris');
// History: [system, user: "My name is Chris", assistant: "Nice to meet you, Chris!"]

await assistant.chat('What is my name?');
// History: [system, user: "My name is Chris", assistant: "Nice to meet you...",
//           user: "What is my name?", assistant: "Your name is Chris."]

This method will give you the history:

getHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {
    return [...this.conversationHistory];  // Return a copy, not the original
  }

I also added to the script trimHistory

 // Trim history to manage token usage (advanced)
  trimHistory(keepLastN: number): void {
    const systemMessage = this.conversationHistory[0];
    const recentMessages = this.conversationHistory.slice(-keepLastN);
    this.conversationHistory = [systemMessage, ...recentMessages];
  }

What are some other related methods we could add?

You can add a lot of other methods. I do not do this from memory I have some boilerplate code I developed, you can inquire with coding assistant tool what other methods could be appropriate.

Examples:

// Get history without the system prompt (most common use case for display)
  getUserAssistantHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {
    return this.conversationHistory.slice(1); // skip system message
  }

  // Get only the last N messages (useful for UI "show more" or token limiting)
  getRecentHistory(count: number): OpenAI.Chat.ChatCompletionMessageParam[] {
    const start = Math.max(1, this.conversationHistory.length - count);
    return this.conversationHistory.slice(start);
  }

// Add external / previous messages (very useful for resuming conversations)
  loadHistory(messages: OpenAI.Chat.ChatCompletionMessageParam[]): void {
    // Optional: validate roles?
    this.conversationHistory = [
      { role: 'system', content: this.config.systemPrompt },
      ...messages
    ];
  }

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

6. Extended Enterprise Version — Final File

We can try to extend this, here are some variations you could use:

Retry with backoff: Networks fail, graceful recovery to keep your app running
Token tracking: Budget management, usage monitoring, cost alerts
History trimming: Prevents context overflow, manages memory
Error callbacks: Integrate with logging/monitoring systems (Sentry, DataDog)
Usage callbacks: Real-time cost tracking, quota management
Fork method: A/B testing responses, exploring different conversation paths
Structured results: Access to metadata enables better debugging and monitoring

For example let’s say we added some extra variables to the config.

Final file: enterpriseChatClient.ts

Create enterpriseChatClient.ts:

// OpenAI SDK works with OpenRouter by changing the baseURL
import OpenAI from 'openai';
import dotenv from 'dotenv';

// Load environment variables from project root
dotenv.config({ path: '../../.env' });

// Pricing per 1000 tokens - varies by model, update based on your provider's rates
interface ModelPricing {
  promptCostPer1k: number;      // Cost per 1K input tokens
  completionCostPer1k: number;  // Cost per 1K output tokens
}

// Extended configuration with more options
interface ChatConfig {
  model?: string;
  temperature?: number;
  maxTokens?: number;
  systemPrompt?: string;
  // Enterprise additions
  maxRetries?: number;
  retryDelayMs?: number;
  maxHistoryLength?: number;
  pricing?: ModelPricing;  // Custom pricing for cost estimation
  onError?: ((error: Error) => void) | undefined;
  onTokenUsage?: ((usage: TokenUsage) => void) | undefined;
}

// Token usage tracking - essential for monitoring costs and staying within budget
interface TokenUsage {
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  estimatedCost: number;  // Calculated based on model pricing
}

// Structured response object - provides metadata alongside the AI response
// This is more informative than just returning a string
interface ChatResult {
  content: string;                  // The actual AI response text
  usage: TokenUsage | undefined;    // Token counts and cost (if available)
  model: string;                    // Which model processed the request
  finishReason: string;             // Why generation stopped: 'stop', 'length', etc.
}

// ResolvedConfig has all properties required (non-optional) after merging with defaults.
// This ensures type safety when accessing config values without null checks throughout the class.
interface ResolvedConfig {
  model: string;
  temperature: number;
  maxTokens: number;
  systemPrompt: string;
  maxRetries: number;
  retryDelayMs: number;
  maxHistoryLength: number;
  pricing: ModelPricing;
  onError: ((error: Error) => void) | undefined;
  onTokenUsage: ((usage: TokenUsage) => void) | undefined;
}

// Default pricing for GPT-4 Turbo - override in config for other models
const DEFAULT_PRICING: ModelPricing = {
  promptCostPer1k: 0.01,        // $0.01 per 1K input tokens
  completionCostPer1k: 0.03,    // $0.03 per 1K output tokens
};

// Decent defaults - users only need to override what they want to change
const defaultConfig: ResolvedConfig = {
  model: 'openai/gpt-4-turbo',
  temperature: 0.7,             // Balanced between creative and focused
  maxTokens: 2000,              // Reasonable limit for most responses
  systemPrompt: 'You are a helpful assistant.',
  maxRetries: 3,                // Handles transient failures gracefully
  retryDelayMs: 1000,           // Base delay for exponential backoff
  maxHistoryLength: 50,         // Prevents context overflow in long conversations
  pricing: DEFAULT_PRICING,     // Override for different models/providers
  onError: undefined,
  onTokenUsage: undefined,
};

export class EnterpriseChatClient {
  // Private properties ensure internal state can't be corrupted externally
  private client: OpenAI;                                                    // Reused for all API calls
  private config: ResolvedConfig;                                            // Merged user + default config
  private conversationHistory: OpenAI.Chat.ChatCompletionMessageParam[];     // Full conversation context
  private totalTokensUsed: number = 0;                                       // Cumulative token counter

  // Constructor uses dependency injection pattern - config is passed in, not hardcoded
  constructor(config: ChatConfig = {}) {
    this.client = new OpenAI({
      baseURL: 'https://openrouter.ai/api/v1',  // OpenRouter endpoint
      apiKey: process.env.OPENROUTER_API_KEY,
    });

    // Spread operator merges configs: defaults first, then user overrides
    this.config = { ...defaultConfig, ...config };

    // Initialize with system prompt - this sets the AI's behavior/personality
    this.conversationHistory = [
      { role: 'system', content: this.config.systemPrompt }
    ];
  }

  // Retry wrapper with exponential backoff
  private async withRetry(
    operation: () => Promise,
    context: string
  ): Promise {
    let lastError: Error | null = null;

    for (let attempt = 1; attempt <= this.config.maxRetries; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error as Error;

        // Don't retry on auth errors
        if (error instanceof OpenAI.APIError && error.status === 401) {
          throw new Error('Invalid API key. Check your configuration.');
        }

        // Don't retry on the last attempt
        if (attempt === this.config.maxRetries) {
          break;
        }

        // Exponential backoff: 1s, 2s, 4s...
        const delay = this.config.retryDelayMs * Math.pow(2, attempt - 1);
        console.warn(`${context}: Attempt ${attempt} failed, retrying in ${delay}ms...`);
        await this.delay(delay);
      }
    }

    // Call error handler if provided
    if (this.config.onError && lastError) {
      this.config.onError(lastError);
    }

    throw lastError;
  }

  // Promise-based delay utility - cleaner than nested setTimeout callbacks
  private delay(ms: number): Promise {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  // Calculate token usage and cost based on configured pricing
  private calculateUsage(usage: OpenAI.Completions.CompletionUsage | undefined): TokenUsage | undefined {
    if (!usage) return undefined;

    // Use pricing from config - allows different rates for different models
    const { promptCostPer1k, completionCostPer1k } = this.config.pricing;

    const tokenUsage: TokenUsage = {
      promptTokens: usage.prompt_tokens,
      completionTokens: usage.completion_tokens,
      totalTokens: usage.total_tokens,
      estimatedCost:
        (usage.prompt_tokens / 1000) * promptCostPer1k +
        (usage.completion_tokens / 1000) * completionCostPer1k,
    };

    this.totalTokensUsed += tokenUsage.totalTokens;

    // Call usage callback if provided
    if (this.config.onTokenUsage) {
      this.config.onTokenUsage(tokenUsage);
    }

    return tokenUsage;
  }

  // Manage history length to prevent token overflow
  private trimHistoryIfNeeded(): void {
    if (this.conversationHistory.length > this.config.maxHistoryLength) {
      const systemMessage = this.conversationHistory[0];
      if (!systemMessage) return;
      // Keep the system message and the most recent messages
      const keepCount = this.config.maxHistoryLength - 1;
      const recentMessages = this.conversationHistory.slice(-keepCount);
      this.conversationHistory = [systemMessage, ...recentMessages];
    }
  }

  // Enhanced chat with full result object
  async chat(userMessage: string): Promise {
    this.conversationHistory.push({ role: 'user', content: userMessage });
    this.trimHistoryIfNeeded();

    const response = await this.withRetry(
      () => this.client.chat.completions.create({
        model: this.config.model,
        messages: this.conversationHistory,
        temperature: this.config.temperature,
        max_tokens: this.config.maxTokens,
      }),
      'chat'
    );

    const choice = response.choices[0];
    if (!choice) {
      throw new Error('No response choice returned from API');
    }

    const assistantMessage = choice.message.content || '';
    this.conversationHistory.push({ role: 'assistant', content: assistantMessage });

    return {
      content: assistantMessage,
      usage: this.calculateUsage(response.usage),
      model: response.model,
      finishReason: choice.finish_reason || 'unknown',
    };
  }

  // Simple chat that just returns the string (convenience method)
  async quickChat(userMessage: string): Promise {
    const result = await this.chat(userMessage);
    return result.content;
  }

  // Streaming chat using AsyncGenerator - yields chunks as they arrive
  // The async* syntax creates a generator that can await and yield asynchronously
  // Consumer uses: for await (const chunk of streamChat(...)) { ... }
  async *streamChat(
    userMessage: string,
    onProgress?: (chunk: string, accumulated: string) => void
  ): AsyncGenerator {
    this.conversationHistory.push({ role: 'user', content: userMessage });
    this.trimHistoryIfNeeded();

    const stream = await this.withRetry(
      () => this.client.chat.completions.create({
        model: this.config.model,
        messages: this.conversationHistory,
        temperature: this.config.temperature,
        max_tokens: this.config.maxTokens,
        stream: true,
      }),
      'streamChat'
    );

    let fullResponse = '';

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content || '';
      fullResponse += content;

      // Call progress callback if provided
      if (onProgress) {
        onProgress(content, fullResponse);
      }

      yield content;
    }

    this.conversationHistory.push({ role: 'assistant', content: fullResponse });
    return fullResponse;
  }

  // ==================== Utility Methods ====================
  // These provide controlled access to internal state

  // Clear conversation but keep the same configuration and system prompt
  reset(): void {
    this.conversationHistory = [
      { role: 'system', content: this.config.systemPrompt }
    ];
  }

  // Return a copy of history (not the original) to prevent external mutation
  getHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {
    return [...this.conversationHistory];
  }

  getHistoryLength(): number {
    return this.conversationHistory.length;
  }

  // Useful for monitoring costs across multiple conversations
  getTotalTokensUsed(): number {
    return this.totalTokensUsed;
  }

  // Update system prompt mid-conversation
  updateSystemPrompt(newPrompt: string): void {
    this.config.systemPrompt = newPrompt;
    if (this.conversationHistory.length > 0) {
      this.conversationHistory[0] = { role: 'system', content: newPrompt };
    }
  }

  // Fork creates a branch of the conversation - useful for A/B testing responses
  // or exploring different conversation paths without affecting the original
  fork(newConfig?: ChatConfig): EnterpriseChatClient {
    const forked = new EnterpriseChatClient({ ...this.config, ...newConfig });
    forked.conversationHistory = [...this.conversationHistory];  // Copy, not reference
    return forked;
  }
}

// ==================== Example Usage ====================
// Demonstrates both non-streaming and streaming patterns with callbacks

async function main() {
  // Create client with custom config - only override what you need
  const assistant = new EnterpriseChatClient({
    systemPrompt: 'You are an expert programmer.',
    temperature: 0.2,       // Low for consistent, precise code responses
    maxRetries: 3,
    // Custom pricing - update based on your model/provider
    // See: https://openrouter.ai/docs#models for current rates
    pricing: {
      promptCostPer1k: 0.01,      // GPT-4 Turbo input
      completionCostPer1k: 0.03,  // GPT-4 Turbo output
    },
    // Callbacks integrate with your logging/monitoring systems
    onTokenUsage: (usage) => {
      console.log(`Tokens: ${usage.totalTokens}, Cost: $${usage.estimatedCost.toFixed(4)}`);
    },
    onError: (error) => {
      console.error('Chat error:', error.message);
    },
  });

  // Non-streaming with full result
  const result = await assistant.chat('What is a closure in JavaScript?');
  console.log('Response:', result.content);
  console.log('Finish reason:', result.finishReason);

  // Streaming with progress callback
  console.log('\nStreaming response:');
  for await (const chunk of assistant.streamChat('Give me an example')) {
    process.stdout.write(chunk);
  }

  console.log('\n\nTotal tokens used:', assistant.getTotalTokensUsed());
  console.log('Conversation length:', assistant.getHistoryLength(), 'messages');
}

main().catch(console.error);

As I was writing this article, I made several iterations on the code to improve it.

For example, I originally had pricing approximations in the class, but I moved those out so you can easily inject those.

The idea with this class-based approach is to abstract so your model is a template where you can inject variables if required — for testing or when new models are being used.

Other ideas:

Add a getApproximateTokenCount method, so we can better track or estimate the approx. token count:

getApproximateTokenCount(): number {
    let total = 0;
    for (const msg of this.conversationHistory) {
      // Very rough approximation: ~4 chars per token + some overhead
      total += Math.ceil((msg.content?.length || 0) / 4) + 4; // +4 for role + overhead
    }
    return total;
  }

We will be using tools later, but just to give you an early heads up, we could add a tools response here:

addToolResponse(toolCallId: string, content: string): void {
    this.conversationHistory.push({
      role: 'tool',
      tool_call_id: toolCallId,
      content
    });
  }

Undo the last turn

undoLastTurn(): boolean {
    if (this.conversationHistory.length < 3) return false; // need at least user + assistant

    // Remove last assistant
    if (this.conversationHistory.at(-1)?.role === 'assistant') {
      this.conversationHistory.pop();
    }
    // Remove last user
    if (this.conversationHistory.at(-1)?.role === 'user') {
      this.conversationHistory.pop();
      return true;
    }
    return false;
  }

Okay, that’s good for now. I’ve given you a solid baseline to illustrate the concept. You can come up with a lot more examples. Feel free to research with an AI coding assistant tool, they can help you learn this too.

Now let’s put it all together with a script to run the class.

7. Example Usage of the Class Code — Run It!

Create the file: runEnterpriseChatClient.ts

Here we will import the class and use it in our script:

import { EnterpriseChatClient } from "./enterpriseChatClient";

async function main() {
  const assistant = new EnterpriseChatClient({
    systemPrompt: 'You are an expert programmer.',
    temperature: 0.2,
    maxRetries: 3,
    // Custom pricing - update based on your model/provider
    // See: https://openrouter.ai/docs#models for current rates
    pricing: {
      promptCostPer1k: 0.01,      // GPT-4 Turbo input cost per 1K tokens
      completionCostPer1k: 0.03,  // GPT-4 Turbo output cost per 1K tokens
    },
    onTokenUsage: (usage) => {
      console.log(`Tokens: ${usage.totalTokens}, Cost: $${usage.estimatedCost.toFixed(4)}`);
    },
    onError: (error) => {
      console.error('Chat error:', error.message);
      // Could send to error tracking service here
    },
  });

  // Non-streaming with full result
  const result = await assistant.chat('What is a closure in JavaScript?');
  console.log('Response:', result.content);
  console.log('Finish reason:', result.finishReason);

  // Streaming with progress callback
  console.log('\nStreaming response:');
  for await (const chunk of assistant.streamChat(
    'Give me an example',
    (chunk, accumulated) => {
      // Could update a progress indicator here
    }
  )) {
    process.stdout.write(chunk);
  }

  console.log('\n\nTotal tokens used:', assistant.getTotalTokensUsed());
  console.log('Conversation length:', assistant.getHistoryLength(), 'messages');

  // Fork for a different conversation branch
  const forked = assistant.fork({ temperature: 0.9 });
  await forked.chat('Now explain it more creatively');
}

main().catch(console.error);

Creates an EnterpriseChatClient instance configured as an expert programmer with very low temperature (0.2), custom GPT-4-Turbo pricing, token usage logging, and error handling
Performs a non-streaming chat completion asking “What is a closure in JavaScript?” and logs the full response + finish reason
Runs a streaming chat completion for “Give me an example”, printing each token chunk in real-time as it arrives
After both calls, shows total tokens used across the whole session and current conversation history length
Demonstrates conversation forking: creates a new independent branch with higher creativity (temperature 0.9) and continues the conversation in that forked context

As mentioned above, I made several revisions of the code while writing this article and when I added the pricing to the class, I also had to add it to the config in this script — reason: remember that we are using dependency injection.

Run the whole thing with this:

npx ts-node ./runEnterpriseChatClient.ts

Keep this code.

In future articles we will probably be alternating between functional approaches to illustrate basic principles, and building more advanced interfaces on this class-based logic!

I believe the next article will be about calling tools.

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Savings: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2) was originally published in AI Dev Tips on Medium, where people are continuing the conversation by highlighting and responding to this story.

Amazon EKS (K8s) Media Cluster: Part 5 — Node Autoscaling with Karpenter + Spot instances

Chris St. John — Mon, 05 Jan 2026 01:37:43 GMT

Amazon EKS (K8s) Media Cluster: Part 5 — Node Autoscaling with Karpenter + Spot instances

Karpenter node autoscaling and bin packing, Helm charts, HPA, automatic Spot instances, k6 load testing.

✅ “I need my nodes to scale automatically with pod demand and be able to handle overcapacity with Spot instances— no Pending pods!”

In the last article we succeeded in autoscaling pods to handle traffic spikes, but then we encountered a new problem — pods often get stuck or delayed in pending if the nodes are maxed out at capacity. If autoscaling is delayed, then that is a problem.

Let’s solve this by using node optimization/autoscaling with the tool Karpenter. Later we can also better optimize our workload to use Spot instances which can also save a lot of money.

What we are doing in this article:

1. Learn about what we can do with Karpenter.
2. Rebuild our infra, if you destroyed it previously.
3. Prepare Karpenter (IAM).
4. Install Karpenter w/Terraform and Helm (K8s config tool).
5. Create NodePool (rules) EC2NodeClass (AMI, subnets, SGs).
6. Test rapid node provisioning (30–60 second spin-up).
7. Combined load test: HPA + Karpenter working together.
8. Notes: Configure Spot instances for cost savings. Observe automatic node consolidation.
9. Cleanup, destroy resources

Goals to achieve:

Scale deployment up to 20 pods. Karpenter: new nodes in real-time.
Nodes appear in 30–60 seconds (vs 3–5 minutes w/Cluster Autoscaler)
Watch Karpenter select the most cost-effective instance types.
Node consolidation (Karpenter removes underutilized nodes).
Run workloads on Spot instances with automatic On-Demand fallback.
Do not have to manually manage node groups again.

Skills we’ll learn in this article:

Karpenter architecture and components.
How Helm works with K8s for configuration.
NodePool and EC2NodeClass resource definitions.
Spot instance management and interruption handling.
Bin packing and resource optimization.
Node consolidation strategies.
Mixed capacity types (Spot + On-Demand).
Helm chart installation via Terraform.

This should take about 1 hour+ including about 15 minutes each building and destroying resources.

Files we will create/modify in this article:

environments/dev/karpenter-iam.tf… IAM roles, policies, SQS queue
environments/dev/karpenter.tf … Karpenter Helm installation
environments/dev/providers.tf … Add Helm provider
environments/dev/main.tf … Add Karpenter discovery tags
k8s/karpenter-nodepool.yaml … NodePool & EC2NodeClass

🛠️ Get more articles like this at https://www.systemsarchitect.io and follow new articles in the series here. 🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

Prerequisites for Amazon EKS (K8s) — Part 5

Previous articles in this series you should do first:

✅ PART 1 Amazon EKS (K8s) Part 1 — Initial Setup/Roadmap

✅ PART 2 Amazon EKS (K8s) Part 2 — Deploy Initial Terraform Multi-AZ EKS Cluster

✅ PART 3 Amazon EKS (K8s) Part 3 — Self-Healing Video Pods

✅ PART 4 Amazon EKS (K8s) Part 4 — Pod Auto-Scaling (HPA) and CDN

I recommend doing all these because we are building on existing prior lessons.

1. What can we do with Karpenter

Karpenter specializes in creating “just-in-time” nodes.

“Karpenter simplifies Kubernetes infrastructure with the right nodes at the right time. Karpenter automatically launches just the right compute resources to handle your cluster’s applications. It is designed to let you take full advantage of the cloud with fast and simple compute provisioning for Kubernetes clusters.” — official website.

https://karpenter.sh/

Just-in-time provisioning with right-sizing. Karpenter directly observes unschedulable pods, evaluates their resource requests and provisions optimally sized nodes on-demand.
Rapid scaling and deprovisioning. It launches nodes quickly to minimize pod scheduling latency and terminates them when no longer needed.
Workload consolidation for optimization. Karpenter consolidates underutilized workloads by bin-packing pods onto fewer, more efficient nodes or replacing expensive/over-provisioned nodes with better-fitting ones.
Flexible instance selection and cost savings It supports a wide range of instance types, including Spot instances, On-Demand, and Reserved, with options like weightings, requirements, and limits in NodePools.
Native kubernetes integration and simplicity Managed via declarative CRDs like NodePools and NodeClasses, it uses Kubernetes-native APIs for constraints.

As you can see this is a powerful tool we can use to automate the optimization of our workloads.

Strengths of Karpenter:

Speed: 30–60 seconds in most cases vs. minutes in other solutions.
Optimized: Best optimizations per workload.
Cost: Easily add Spot instances for cost savings.
Efficient: Uses bin packing for better optimization.

What about alternatives to Karpenter?

There are a couple alternatives we can discuss upfront:

AWS ECS: Many teams start with AWS ECS for their container orchestration, and it can also scale containers. But ECS lacks bin packing (optimizing pods in nodes). Also it has a fixed ASG (Autoscaling Groups) so it has less flexibility than Karpenter. Node consolidation is manual in ECS. It’s also slower in the autoscaling. Overall while ECS will work for simpler, smaller projects requiring less granular optimization, it does not have the options and granularity of Karpenter.

source: https://docs.aws.amazon.com/whitepapers/latest/overview-deployment-options/amazon-elastic-container-service.html

Cluster Autoscaler: Cluster Autoscaler is the traditional Kubernetes component that automatically adjusts nodes in your cluster. It monitors for unschedulable pods, then scales up by adding nodes to existing Auto Scaling Groups (ASGs) (or Managed Node Groups in EKS), and scales down by removing underutilized nodes.

The problem with this is it works at the level of predefined node groups with fixed instance types, which often leads to slower scaling

autoscaler/cluster-autoscaler at master · kubernetes/autoscaler

2. Rebuild Cluster (If Destroyed)

At the end of the last article we destroyed our cluster with Terraform to avoid charges. When you rebuild remember to destroy all resources again and check on it in the AWS console or you will incur fees.

If you did this already then follow these instructions to rebuild it.

For the AWS console, log in as root email that you set up before for your EKS admin. I recommend doing this in an incognito window to prevent any conflicts with other sessions.

The rebuild code is similar to the prior article Part 4…

Except we will make an adjustment here for the count value in CloudFront which gives an initial error.

We have to modify 2 files: cloudfront.tf and variables.tf

And in cloudfront.tf update this line under # CloudFront Distribution to this:

resource "aws_cloudfront_distribution" "video_app" {
  count = var.enable_cloudfront ? 1 : 0

And under local variables (above, higher in the file):

locals {
  # Get LoadBalancer hostname (empty string if service not deployed)
  lb_hostname = try(
    data.kubernetes_service.video_app.status[0].load_balancer[0].ingress[0].hostname,
    ""
  )

And then the output (lower in that file)

output "cloudfront_url" {
  description = "Full CloudFront URL"
  value       = var.enable_cloudfront ? "https://${aws_cloudfront_distribution.video_app[0].domain_name}" : "Set enable_cloudfront=true after deploying video-app service"
}

Then at the end of variables.tf add this:


# -----------------------------------------------------------------------------
# CloudFront Configuration
# -----------------------------------------------------------------------------

variable "enable_cloudfront" {
  description = "Enable CloudFront distribution (requires video-app K8s service to be deployed first)"
  type        = bool
  default     = false
}

Now continue logging in with your AWS CLI

# Verify AWS CLI profile
export AWS_PROFILE=terraform-eks-admin
aws sts get-caller-identity

And Terraform

# From project root
  cd environments/dev
  terraform plan
  terraform apply

  aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster
  kubectl get nodes
  # this should show 3 nodes
  
  # Run TWICE: kubectl apply -f ../../k8s/
  kubectl apply -f ../../k8s/
  kubectl apply -f ../../k8s/

  # confirm
  kubectl get pods -n video-app
  kubectl get svc -n video-app

  # Then to enable CloudFront (after service has LoadBalancer):

  cd environments/dev # should be here already, but if not

  terraform apply -var="enable_cloudfront=true"

  # Apply Metrics server (needed later for HPA)
  kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

After you run these commands you should be up and running.

✅ If you stop the article at any time, and destroy resources, then you must do the above again to get back to the correct state.

⚠️ REMINDER: Just remember that you are being charged by Amazon AWS for the resources. You must (1) use terraform destroy AND (2) check the AWS console to be sure all resources are removed so you won’t get charged. (recall from previous article, sometimes an error causes resources to not be deleted, so you could still be charged even after running the command.) ⚠️ Run terraform destroy TWICE and double-check console and remove anything not removed.

🚨Warning: I did notice once my ELB did not get destroyed, even though I used terraform destroy, which then also caused the Internet Gateway to not be destroyed — so double-check.

3. Prepare for Karpenter

In this section we need to create some Terraform for Karpenter configurations.

IAM Role: To launch/terminate EC2 instances
Instance Profile: For nodes it creates
Subnet Tags: To know where to launch nodes
Security Group Tags: To attach to nodes
OIDC Provider: For IAM authentication (already have from Part 2)

Create Karpenter IAM Resources

Create file: environments/dev/karpenter-iam.tf

# =============================================================================
# KARPENTER IAM CONFIGURATION
# =============================================================================
# IAM roles and policies required for Karpenter to provision EC2 instances.
# =============================================================================

# -----------------------------------------------------------------------------
# Data Sources
# -----------------------------------------------------------------------------
data "aws_iam_policy_document" "karpenter_controller_assume_role" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(module.eks.cluster_oidc_issuer_url, "https://", "")}:sub"
      values   = ["system:serviceaccount:karpenter:karpenter"]
    }

    condition {
      test     = "StringEquals"
      variable = "${replace(module.eks.cluster_oidc_issuer_url, "https://", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }

    principals {
      type        = "Federated"
      identifiers = [module.eks.oidc_provider_arn]
    }
  }
}

# -----------------------------------------------------------------------------
# Karpenter Controller IAM Role
# -----------------------------------------------------------------------------
resource "aws_iam_role" "karpenter_controller" {
  name               = "${var.cluster_name}-karpenter-controller"
  assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role.json

  tags = local.tags
}

# -----------------------------------------------------------------------------
# Karpenter Controller Policy
# -----------------------------------------------------------------------------
resource "aws_iam_policy" "karpenter_controller" {
  name        = "${var.cluster_name}-karpenter-controller"
  description = "Policy for Karpenter controller"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Karpenter"
        Effect = "Allow"
        Action = [
          "ec2:CreateFleet",
          "ec2:CreateLaunchTemplate",
          "ec2:CreateTags",
          "ec2:DeleteLaunchTemplate",
          "ec2:DescribeAvailabilityZones",
          "ec2:DescribeImages",
          "ec2:DescribeInstances",
          "ec2:DescribeInstanceTypeOfferings",
          "ec2:DescribeInstanceTypes",
          "ec2:DescribeLaunchTemplates",
          "ec2:DescribeSecurityGroups",
          "ec2:DescribeSpotPriceHistory",
          "ec2:DescribeSubnets",
          "ec2:RunInstances",
          "ec2:TerminateInstances",
          "iam:AddRoleToInstanceProfile",
          "iam:CreateInstanceProfile",
          "iam:DeleteInstanceProfile",
          "iam:GetInstanceProfile",
          "iam:PassRole",
          "iam:RemoveRoleFromInstanceProfile",
          "iam:TagInstanceProfile",
          "pricing:GetProducts",
          "ssm:GetParameter"
        ]
        Resource = "*"
      },
      {
        Sid    = "ConditionalEC2Termination"
        Effect = "Allow"
        Action = "ec2:TerminateInstances"
        Resource = "*"
        Condition = {
          StringLike = {
            "ec2:ResourceTag/karpenter.sh/nodepool" = "*"
          }
        }
      },
      {
        Sid    = "PassNodeIAMRole"
        Effect = "Allow"
        Action = "iam:PassRole"
        Resource = aws_iam_role.karpenter_node.arn
      },
      {
        Sid    = "EKSClusterEndpointLookup"
        Effect = "Allow"
        Action = "eks:DescribeCluster"
        Resource = module.eks.cluster_arn
      },
      {
        Sid    = "SQSInterruptionQueue"
        Effect = "Allow"
        Action = [
          "sqs:DeleteMessage",
          "sqs:GetQueueAttributes",
          "sqs:GetQueueUrl",
          "sqs:ReceiveMessage"
        ]
        Resource = aws_sqs_queue.karpenter_interruption.arn
      }
    ]
  })

  tags = local.tags
}

resource "aws_iam_role_policy_attachment" "karpenter_controller" {
  role       = aws_iam_role.karpenter_controller.name
  policy_arn = aws_iam_policy.karpenter_controller.arn
}

# -----------------------------------------------------------------------------
# Karpenter Node IAM Role (for nodes Karpenter creates)
# -----------------------------------------------------------------------------
resource "aws_iam_role" "karpenter_node" {
  name = "${var.cluster_name}-karpenter-node"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })

  tags = local.tags
}

# Attach required policies for EKS worker nodes
resource "aws_iam_role_policy_attachment" "karpenter_node_worker" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}

resource "aws_iam_role_policy_attachment" "karpenter_node_cni" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}

resource "aws_iam_role_policy_attachment" "karpenter_node_ecr" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}

resource "aws_iam_role_policy_attachment" "karpenter_node_ssm" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance profile for Karpenter nodes
resource "aws_iam_instance_profile" "karpenter_node" {
  name = "${var.cluster_name}-karpenter-node"
  role = aws_iam_role.karpenter_node.name

  tags = local.tags
}

# -----------------------------------------------------------------------------
# SQS Queue for Spot Interruption Handling
# -----------------------------------------------------------------------------
resource "aws_sqs_queue" "karpenter_interruption" {
  name                      = "${var.cluster_name}-karpenter-interruption"
  message_retention_seconds = 300
  sqs_managed_sse_enabled   = true

  tags = local.tags
}

resource "aws_sqs_queue_policy" "karpenter_interruption" {
  queue_url = aws_sqs_queue.karpenter_interruption.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = [
            "events.amazonaws.com",
            "sqs.amazonaws.com"
          ]
        }
        Action   = "sqs:SendMessage"
        Resource = aws_sqs_queue.karpenter_interruption.arn
      }
    ]
  })
}

# EventBridge rules for Spot interruption events
resource "aws_cloudwatch_event_rule" "karpenter_spot_interruption" {
  name        = "${var.cluster_name}-karpenter-spot-interruption"
  description = "Spot instance interruption warnings for Karpenter"

  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Spot Instance Interruption Warning"]
  })

  tags = local.tags
}

resource "aws_cloudwatch_event_target" "karpenter_spot_interruption" {
  rule      = aws_cloudwatch_event_rule.karpenter_spot_interruption.name
  target_id = "KarpenterInterruptionQueue"
  arn       = aws_sqs_queue.karpenter_interruption.arn
}

# EventBridge rule for instance rebalance recommendations
resource "aws_cloudwatch_event_rule" "karpenter_rebalance" {
  name        = "${var.cluster_name}-karpenter-rebalance"
  description = "EC2 instance rebalance recommendations for Karpenter"

  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Instance Rebalance Recommendation"]
  })

  tags = local.tags
}

resource "aws_cloudwatch_event_target" "karpenter_rebalance" {
  rule      = aws_cloudwatch_event_rule.karpenter_rebalance.name
  target_id = "KarpenterInterruptionQueue"
  arn       = aws_sqs_queue.karpenter_interruption.arn
}

# -----------------------------------------------------------------------------
# Outputs
# -----------------------------------------------------------------------------
output "karpenter_controller_role_arn" {
  description = "ARN of the Karpenter controller IAM role"
  value       = aws_iam_role.karpenter_controller.arn
}

output "karpenter_node_role_arn" {
  description = "ARN of the Karpenter node IAM role"
  value       = aws_iam_role.karpenter_node.arn
}

output "karpenter_instance_profile_name" {
  description = "Name of the Karpenter node instance profile"
  value       = aws_iam_instance_profile.karpenter_node.name
}

output "karpenter_interruption_queue_name" {
  description = "Name of the SQS queue for Spot interruption handling"
  value       = aws_sqs_queue.karpenter_interruption.name
}

# -----------------------------------------------------------------------------
# Authorize Karpenter Node Role in aws-auth ConfigMap
# -----------------------------------------------------------------------------
# This allows EC2 instances launched by Karpenter to join the EKS cluster.
# Without this, nodes will get "Unauthorized" when trying to register.
# -----------------------------------------------------------------------------
resource "kubernetes_config_map_v1_data" "aws_auth_karpenter" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = yamlencode([
      {
        rolearn  = module.eks.eks_managed_node_groups["primary"].iam_role_arn
        username = "system:node:{{EC2PrivateDNSName}}"
        groups   = ["system:bootstrappers", "system:nodes"]
      },
      {
        rolearn  = aws_iam_role.karpenter_node.arn
        username = "system:node:{{EC2PrivateDNSName}}"
        groups   = ["system:bootstrappers", "system:nodes"]
      }
    ])
  }

  force = true

  depends_on = [module.eks]
}

⚠️ Make sure you copy/paste the above perfectly. I ended up having to add the entire section, which was previously forgotten and caused error:

Authorize Karpenter Node Role in aws-auth ConfigMap

If you get errors, check this IAM config file, because if you are missing permissions this will not work!

Tag subnets for Karpenter discovery

Update environments/dev/main.tf - add Karpenter discovery tags to subnets:

# In the VPC module, update private_subnet_tags:
private_subnet_tags = {
  "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  "kubernetes.io/role/internal-elb"           = "1"
  "karpenter.sh/discovery"                    = var.cluster_name  # ADD THIS
  "Tier"                                      = "private"
}

Also, I found out later in the article we need this to below, I believe you can add it now but I will mention it later when we launch Karpenter, add it is under Tags in its own block in main.tf

  # ---------------------------------------------------------------------------
  # Karpenter Discovery Tag for Security Groups
  # ---------------------------------------------------------------------------
  node_security_group_tags = {
    "karpenter.sh/discovery" = var.cluster_name
  }

Apply the Terraform changes above:

cd environments/dev
terraform plan
terraform apply

4. Install Karpenter with Terraform/Helm

Next we are going to install Helm and Karpenter by using Helm.

Helm is the package manager for Kubernetes — similar to how npm works for Node.js, apt/yum for Linux, or Homebrew for macOS.

Technically, you can install Karpenter without Helm but I’m showing you the standards and we can use Helm for other things.

Helm simplifies how we define, install, configure, update, and manage even complex applications on Kubernetes clusters.

The main reasons we use Helm:

Without Helm there are a lot of YAML files to manage, with Helm we just install with one command helm install
Easily handle updates with helm upgrade
Easy rollbacks with helm rollback

Helm core concepts:

Chart: Package definition (like package.json), templates, default values, and metadata.
Values: Configuration overrides (like .env files).
Release: Installed instance of a chart.
Repository: Where charts are stored (like npm registry). For example Karpenter uses: oci://public.ecr.aws/karpenter

Normal flow of how to use Helm with Karpenter:

Chart contains YAML templates with {{ .Values.xxx }} placeholders.
You provide values (via values.yaml or — set flags). Since we are using Terraform, we have the this in the Terraform.
Helm renders templates, produces valid Kubernetes YAML.
Helm applies the YAML to your cluster (like kubectl apply).
Helm tracks the “release” so you can upgrade/rollback later!

Keep in mind since we are using Terraform with this, there is some slight variation. Without terraform you create a yaml file and execute the command to install with Helm. In this case, we are using terraform and helm as a provider.

Configure Helm provider

Update environments/dev/providers.tf:

# Add to required_providers block:
helm = {
  source  = "hashicorp/helm"
  version = "~> 2.12"
}

# Add Helm provider configuration:
 provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args = [
        "eks",
        "get-token",
        "--cluster-name",
        var.cluster_name,
        "--region",
        var.aws_region
      ]
    }
  }
}

Create Karpenter Helm Installation

These are the instructions to use Helm to install Karpenter.
Create file: environments/dev/karpenter.tf

# =============================================================================
# KARPENTER INSTALLATION
# =============================================================================
# Installs Karpenter using Helm chart via Terraform.
# =============================================================================

# -----------------------------------------------------------------------------
# Karpenter Namespace
# -----------------------------------------------------------------------------
resource "kubernetes_namespace" "karpenter" {
  metadata {
    name = "karpenter"

    labels = {
      name = "karpenter"
    }
  }

  depends_on = [module.eks]
}

# -----------------------------------------------------------------------------
# Karpenter Helm Release
# -----------------------------------------------------------------------------
resource "helm_release" "karpenter" {
  namespace  = kubernetes_namespace.karpenter.metadata[0].name
  name       = "karpenter"
  repository = "oci://public.ecr.aws/karpenter"
  chart      = "karpenter"
  version    = "1.6.0"  # Supports K8s 1.34 - can use 1.8.x

  # Wait for CRDs to be ready
  wait    = true
  timeout = 600  # 10 minutes (I added this later because of a timeout)

  # Force upgrade if stuck (I added this later because of timeout)
  force_update  = true
  recreate_pods = true

  values = [
    yamlencode({
      settings = {
        clusterName       = module.eks.cluster_name
        clusterEndpoint   = module.eks.cluster_endpoint
        interruptionQueue = aws_sqs_queue.karpenter_interruption.name
      }

      serviceAccount = {
        annotations = {
          "eks.amazonaws.com/role-arn" = aws_iam_role.karpenter_controller.arn
        }
      }

      controller = {
        resources = {
          requests = {
            cpu    = "200m"
            memory = "256Mi"
          }
          limits = {
            cpu    = "1"
            memory = "1Gi"
          }
        }
      }

      # Enable logging
      logLevel = "info"
    })
  ]

  depends_on = [
    kubernetes_namespace.karpenter,
    aws_iam_role_policy_attachment.karpenter_controller,
    module.eks
  ]
}

# -----------------------------------------------------------------------------
# Outputs
# -----------------------------------------------------------------------------
output "karpenter_namespace" {
  description = "Karpenter namespace"
  value       = kubernetes_namespace.karpenter.metadata[0].name
}

output "karpenter_chart_version" {
  description = "Karpenter Helm chart version"
  value       = helm_release.karpenter.version
}

What this means:

Creates a dedicated namespace. A karpenter namespace is created in your EKS cluster to isolate Karpenter’s components from other workloads
CRD = Custom Resource Definition. Once you install a CRD, Kubernetes recognizes a new kind of object. Karpenter is built entirely around CRDs.
Installs Karpenter via Helm. Pulls the official Karpenter chart (v1.0.0) from AWS’s public ECR registry (oci://public.ecr.aws/karpenter) and deploys it to your cluster
Configures cluster connection. Passes your EKS cluster name, API endpoint, and SQS interruption queue so Karpenter knows which cluster to manage and can handle Spot instance interruptions.
Sets up IAM authentication. Annotates the Karpenter service account with the IAM role ARN, enabling IRSA (IAM Roles for Service Accounts) so Karpenter can provision EC2 instances.
Defines resource limits. Allocates 200m CPU / 256Mi memory (requests) up to 1 CPU / 1Gi memory (limits) for the Karpenter controller pod to ensure stable operation

Apply Karpenter installation

cd environments/dev
terraform init  # Needed for new Helm provider <------- ATTENTION
terraform plan
terraform apply

# Verify Karpenter is running
kubectl get pods -n karpenter
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=20

# see leader node
kubectl get lease -n karpenter

Results:

Leader pod: Actively provisions/terminates nodes.
Standby pod: Ready to take over immediately if the leader fails.

Karpenter keeps working even if one pod crashes or gets evicted during node scaling. Only one pod is actively making decisions at any time (leader election), but having two means zero downtime for the autoscaler itself.

Karpenter runs as a Deployment and uses the standard Kubernetes built-in leader election mechanism based on Leases (from the coordination.k8s.io API group). When the leader fails, one of the standby pods detects the lease has expired and acquires it usually in seconds.

Logs results:

note: There are a couple of minor errors. These are normal startup messages, not real errors. They resolve automatically I believe. The rest of the install looks normal.

5. Create NodePool & EC2NodeClass

Now we need to define some of the basic EC2 instance settings.

Create Karpenter resources

Create file: k8s/karpenter-nodepool.yaml

# =============================================================================
# KARPENTER NODEPOOL
# =============================================================================
# Defines what instances Karpenter can provision and constraints.
#
# ECS Comparison:
# - NodePool is similar to ASG Launch Template + Scaling Policy
# - But more flexible: Karpenter chooses optimal instance per workload
# =============================================================================
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  # ---------------------------------------------------------------------------
  # Template: What nodes look like
  # ---------------------------------------------------------------------------
  template:
    metadata:
      labels:
        managed-by: karpenter
        environment: dev
    spec:
      # Reference to EC2NodeClass (AWS-specific settings)
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default

      # Instance requirements
      requirements:
        # Instance category (general purpose is cost-effective)
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["t", "m", "c"]  # t3, m5, c5 families

        # Instance size (small to large for flexibility)
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["small", "medium", "large"]

        # Capacity type: Prefer Spot, fallback to On-Demand
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

        # Architecture
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]

        # Operating system
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

  # ---------------------------------------------------------------------------
  # Disruption: When Karpenter can remove/replace nodes
  # ---------------------------------------------------------------------------
  disruption:
    # Consolidation policy: Remove underutilized nodes
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

    # Budget: How many nodes can be disrupted at once
    budgets:
      - nodes: "20%"

  # ---------------------------------------------------------------------------
  # Limits: Maximum resources Karpenter can provision
  # ---------------------------------------------------------------------------
  limits:
    cpu: "100"        # Max 100 vCPUs total
    memory: "200Gi"   # Max 200 GB memory total

  # ---------------------------------------------------------------------------
  # Weight: Priority when multiple NodePools exist
  # ---------------------------------------------------------------------------
  weight: 100

---
# =============================================================================
# EC2NODECLASS
# =============================================================================
# AWS-specific configuration: AMI, subnets, security groups, etc.
# =============================================================================
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  # ---------------------------------------------------------------------------
  # AMI Configuration
  # ---------------------------------------------------------------------------
  # Use EKS-optimized AMI (Amazon Linux 2023)
  amiSelectorTerms:
    - alias: al2023@latest

  # ---------------------------------------------------------------------------
  # Network Configuration
  # ---------------------------------------------------------------------------
  # Discover subnets by tag (set in Terraform)
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: eks-video-cluster

  # Discover security groups by tag
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: eks-video-cluster

  # ---------------------------------------------------------------------------
  # IAM Configuration
  # ---------------------------------------------------------------------------
  # Instance profile for nodes (created in Terraform)
  instanceProfile: eks-video-cluster-karpenter-node

  # ---------------------------------------------------------------------------
  # Storage Configuration
  # ---------------------------------------------------------------------------
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 30Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        deleteOnTermination: true
        encrypted: true

  # ---------------------------------------------------------------------------
  # Metadata Options
  # ---------------------------------------------------------------------------
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required  # IMDSv2 required (security best practice)

  # ---------------------------------------------------------------------------
  # Tags for nodes Karpenter creates
  # ---------------------------------------------------------------------------
  tags:
    Project: eks-video-tutorial
    Environment: dev
    ManagedBy: karpenter

Make sure you understand what this code does!

What this NodePool configuration does

Defines allowed instance types. Karpenter can provision t3/m5/c5 families in small/medium/large sizes, preferring Spot instances (70% cheaper) with On-Demand fallback for reliability.
Sets resource limits. Caps total provisioned capacity at 100 vCPUs and 200Gi memory to prevent runaway costs from autoscaling.
Enables automatic consolidation. Removes underutilized or empty nodes after 30 seconds, bin-packing workloads onto fewer nodes to save money.
Configures AWS-specific settings (EC2NodeClass). Uses EKS-optimized AL2023 AMI, discovers subnets/security groups by tag, provisions 30Gi gp3 encrypted volumes.
Enforces security. Requires IMDSv2 (httpTokens: required) to protect against SSRF attacks on instance metadata.

Tag Security Groups for Karpenter

Add to environments/dev/main.tf in the EKS module:

# Add tags to node security group
node_security_group_tags = {
  "karpenter.sh/discovery" = var.cluster_name
}

Apply changes

# Apply Terraform changes (security group tags)
cd environments/dev
terraform apply

# Apply Karpenter NodePool and EC2NodeClass
kubectl apply -f ../../k8s/karpenter-nodepool.yaml

# Verify resources created
kubectl get nodepool
kubectl get ec2nodeclass

Results:

nodepool.karpenter.sh/default created
ec2nodeclass.karpenter.k8s.aws/default created

and then… whoops… I see an issue/error:

$ kubectl get nodepool
NAME      NODECLASS   NODES   READY   AGE
default   default     0       False   30s

$ kubectl get ec2nodeclass
NAME      READY   AGE
default   False   41s

I am leaving this in so you can see what happens if there is an error. Its says that is it is not ready. I saw some errors in the logs.

You can also see the errors in the logs with:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50 | grep -i "error\|nodeclass"

I changed Helm version in karpenter.tf

 version    = "1.6.0"  # Supports K8s 1.34

Also in main.tf added at the end of the file:

+  # ---------------------------------------------------------------------------                                                       
# Karpenter Discovery Tag for Security Groups                                                                                       
# ---------------------------------------------------------------------------                                                       
node_security_group_tags = {                                                                                                        
      "karpenter.sh/discovery" = var.cluster_name                                                                                       
}

Re-run commands:

# Apply Terraform changes (adds security group tag + upgrades Karpenter)
  terraform plan
  terraform apply

  # Re-apply the NodePool (CRDs might have changed with new version)
  kubectl delete -f ../../k8s/karpenter-nodepool.yaml
  kubectl apply -f ../../k8s/karpenter-nodepool.yaml

  # Check status
  kubectl get ec2nodeclass
  kubectl get nodepool

Results:

$ kubectl get nodepool
NAME      NODECLASS   NODES   READY   AGE
default   default     0       True    67m

$ kubectl get ec2nodeclass
NAME      READY   AGE
default   True    67m

I also got some more errors the first time so had to re-apply using the below… this is good to hold onto if you need to reset it.

⚠️Re-apply Karpenter install if any errors

If you need to re-apply at any time, then you can reset the Karpenter install with:

  
  # if you have helm installed locally such as with:  brew install helm
  helm uninstall karpenter -n karpenter --wait

  # Delete the Karpenter deployment directly
  kubectl delete deployment -n karpenter --all

  # Delete the namespace (this removes everything in it)
  kubectl delete namespace karpenter

  # Remove the Helm release from Terraform state so it can recreate
  cd environments/dev
  terraform state rm helm_release.karpenter
  terraform state rm kubernetes_namespace.karpenter

  # Re-apply
  terraform apply
  
  # Re-apply kubectl
  kubectl apply -f ../../k8s/
  kubectl apply -f ../../k8s/ # second time fixed the namespace

  kubectl get ec2nodeclass
  kubectl get nodepool

Fixed! Just make sure it says “True” under ready

$ kubectl get ec2nodeclass

NAME      READY   AGE
default   True    113s

$ kubectl get nodepool

NAME      NODECLASS   NODES   READY   AGE
default   default     0       True    2m3s

6. Test Node Provisioning

In this section, we’ll migrate workloads from the managed node group to Karpenter-provisioned nodes. This demonstrates Karpenter’s ability to automatically provision nodes when pods are pending.

⚠️This section can be a little tricky. I had to redo it twice, because I ran into some errors such as with the service role below, but I believe this should work fine. Stick with it. You are learning.

Karpenter must stay running during migration. It needs to be active to detect pending pods and provision replacement nodes.

6.1 AWS Spot role

Before Karpenter can launch Spot instances, AWS requires a service-linked role. This is a one-time setup per AWS account:

# Create the Spot service-linked role
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

# If you see "Role already exists", that's fine - you're good to proceed

AWS uses this role to manage Spot instance lifecycle events.

Without it, Karpenter’s Spot instance requests will fail with: “AuthFailure.ServiceLinkedRoleCreationNotPermitted”.

6.2 Verify Karpenter is Running

# Check Karpenter pods (should show 2 replicas for HA)
kubectl get pods -n karpenter

# Expected output:
NAME                         READY   STATUS    RESTARTS   AGE
karpenter-859bfc7db7-h4wf6   1/1     Running   0          38m
karpenter-859bfc7db7-zr6d2   1/1     Running   0          62m

# Verify NodePool and EC2NodeClass are ready
kubectl get nodepool
kubectl get ec2nodeclass

# Expected output:
NAME      NODECLASS   NODES   READY   AGE
default   default     0       True    10m

NAME      READY   AGE
default   True    10m

6.3 Remove PodDisruptionBudgets (PDBs)

PDBs protect pods from eviction but will block our drain operation. Delete them temporarily — they’ll be recreated automatically:

# Check existing PDBs
kubectl get pdb -A

# Delete Karpenter PDB
kubectl delete pdb -n karpenter --all

# Delete CoreDNS PDB
kubectl delete pdb -n kube-system coredns

⚠️ If it says no resources found, that is fine — we want to delete it if it exists. But if does not exist, then just continue.

Why we do this: PDBs enforce minimum availability during voluntary disruptions. During migration, they can create a deadlock where pods can’t be evicted because there’s nowhere for them to go, but Karpenter won’t provision nodes because there are no pending pods.

6.4 Phased Node Migration

6.4.1. Get node names

kubectl get nodes -l eks.amazonaws.com/nodegroup=primary

# Example output:
NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-1-239.ec2.internal   Ready       81m   v1.34.2-eks-ecaa3a6
ip-10-0-2-106.ec2.internal   Ready       81m   v1.34.2-eks-ecaa3a6
ip-10-0-3-62.ec2.internal    Ready       81m   v1.34.2-eks-ecaa3a6

6.4.2. Cordon 2 of 3 nodes (keep one for Karpenter)

Cordoning marks nodes as unschedulable — no new pods will be placed on those nodes.

⚠️ The name is like ip-10–0–1–239.ec2.internal and obviously yours will be different, these were from my output above.

# Keep the FIRST node available for Karpenter
# Cordon the other two nodes
kubectl cordon ip-10-0-2-106.ec2.internal
kubectl cordon ip-10-0-3-62.ec2.internal

# Verify status
kubectl get nodes

# Expected output - two nodes show SchedulingDisabled:
NAME                         STATUS                     ROLES    AGE   VERSION
ip-10-0-1-239.ec2.internal   Ready                         45m   v1.34.2-eks-ecaa3a6
ip-10-0-2-106.ec2.internal   Ready,SchedulingDisabled      45m   v1.34.2-eks-ecaa3a6
ip-10-0-3-62.ec2.internal    Ready,SchedulingDisabled      45m   v1.34.2-eks-ecaa3a6

6.4.3. Drain the cordoned nodes

Draining evicts all pods from the nodes.

Karpenter will detect pending pods and provision new nodes.

# Drain the two cordoned nodes
kubectl drain ip-10-0-2-106.ec2.internal --ignore-daemonsets --delete-emptydir-data
kubectl drain ip-10-0-3-62.ec2.internal --ignore-daemonsets --delete-emptydir-data

# Expected output:
node/ip-10-0-2-106.ec2.internal already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-2ttcv, kube-system/eks-pod-identity-agent-9mkmt, kube-system/kube-proxy-tcfd8
evicting pod karpenter/karpenter-859bfc7db7-h4wf6
pod/karpenter-859bfc7db7-h4wf6 evicted
node/ip-10-0-2-106.ec2.internal drained
node/ip-10-0-3-62.ec2.internal already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-gtzxk, kube-system/eks-pod-identity-agent-qdsqp, kube-system/kube-proxy-crhmx
node/ip-10-0-3-62.ec2.internal drained

6.5 Deployment to Trigger Karpenter

With only one node available and multiple pods, we need to create demand that exceeds capacity:

# Scale video-app to create pending pods (or change to 20 if nothing)
kubectl scale deployment video-app -n video-app --replicas=10

# Watch for new nodes (Ctrl+C to exit)
kubectl get nodes -w

# You should see some pending pods with this, it's scaling up
kubectl get pods -n video-app

NAME                         READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-29rgx   0/1     Pending   0          3m26s
video-app-6498b5dd57-7778w   1/1     Running   0          50m
video-app-6498b5dd57-b9t6h   1/1     Running   0          75m
video-app-6498b5dd57-jzsbq   1/1     Running   0          3m26s
video-app-6498b5dd57-n87vs   1/1     Running   0          75m
video-app-6498b5dd57-nsq57   0/1     Pending   0          3m25s
video-app-6498b5dd57-ppdbk   0/1     Pending   0          3m25s
video-app-6498b5dd57-xbzj5   1/1     Running   0          3m26s

You can also check in the logs.

$ kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=20

# You will see some entries like 
{
  "level": "INFO",
  "time": "2026-01-02T22:33:03.370Z",
  "logger": "controller",
  "message": "computed new nodeclaim(s) to fit pod(s)",
  "commit": "c8c45c1",
  "controller": "provisioner",
  "namespace": "",
  "name": "",
  "reconcileID": "eb5c68a5-3da2-4dd4-bf22-d7b1067ac4f8",
  "nodeclaims": 2,
  "pods": 3
}

Behind the scenes:

Timeline:
0s   - Scale to 10 replicas requested
5s   - Some pods are Pending (insufficient capacity on single node)
10s  - Karpenter detects Pending pods
15s  - Karpenter calculates optimal instance type
20s  - Karpenter calls EC2 CreateFleet API
45s  - New EC2 instance launches and joins cluster
50s  - Pending pods scheduled on new node
60s  - All pods Running!

6.6 Verify Karpenter Provisioned Nodes


  # List nodes managed by Karpenter (should include new ones)
  kubectl get nodes -l karpenter.sh/nodepool=default

  # Expected output - new nodes with private DNS names:
  # NAME                         STATUS   ROLES    AGE   VERSION
  # ip-10-0-2-221.ec2.internal   Ready       60s   v1.34.2-eks-ecaa3a6
  # ip-10-0-3-185.ec2.internal   Ready       60s   v1.34.2-eks-ecaa3a6

  # Check capacity type (Spot vs On-Demand) - look at CAPACITY-TYPE column
  kubectl get nodes -L karpenter.sh/capacity-type

  # Expected output - Karpenter nodes show "spot":
  # NAME                         STATUS                     ROLES    AGE    CAPACITY-TYPE
  # ip-10-0-1-239.ec2.internal   Ready                         129m
  # ip-10-0-2-106.ec2.internal   Ready,SchedulingDisabled      129m
  # ip-10-0-2-221.ec2.internal   Ready                         73s    spot
  # ip-10-0-3-185.ec2.internal   Ready                         73s    spot
  # ip-10-0-3-62.ec2.internal    Ready,SchedulingDisabled      129m

  # Verify all video-app pods are running
  kubectl get pods -n video-app

  # Expected output - all pods Running:
  # NAME                         READY   STATUS    RESTARTS   AGE
  # video-app-6498b5dd57-xxxxx   1/1     Running   0          5m
  # video-app-6498b5dd57-xxxxx   1/1     Running   0          5m
  # ... (8-10 pods total)

  # Troubleshooting: Nodeclaims Stuck in Unknown
  # If you see nodeclaims stuck with READY: Unknown:
  # this can sometimes happen due to various errors

  # Check nodeclaims status
  kubectl get nodeclaims

  # If stuck in Unknown, delete and let Karpenter retry
  kubectl delete nodeclaims --all --force --grace-period=0

  # Verify deleted
  kubectl get nodeclaims

  # Watch for new nodes (should appear within 60 seconds)
  kubectl get nodes -w

🚀 Hopefully you got all that!

If you run into any issues stick with it — I ran through this tutorial 3 times while writing it and ran into various errors, mostly related to IAM actually (permissions preventing EC2 instances from joining a cluster) — I have updated the code above obviously to resolve that (namely the Terrafrom IAM for Karpenter).

But the point is, there are a lot of details here, and just be diligent and consider some errors as part of the learning process.

But this is a more advanced tutorial, so keep going!

🔥 It is quite an accomplishment to even get this far!

7. Combined Load Test (HPA + Karpenter)

The goal of this section

Load HPA scales pods ->Pods Pending ->
Karpenter provisions nodes -> Pods running

Open Multiple Terminals

Terminal 1 — HPA Watch:

kubectl get hpa -n video-app --watch

Terminal 2 — Pod Watch:

kubectl get pods -n video-app --watch

Terminal 3 — Node Watch:

kubectl get nodes --watch

Terminal 4 — Karpenter Logs:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

Terminal 5 — Generate Load:

# Get Load Balancer URL
LB_URL=$(kubectl get service video-app -n video-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "Load Balancer URL: $LB_URL"

# Verify URL works
curl -s http://$LB_URL/health

# output

{
  "status": "healthy",
  "timestamp": "2026-01-03T14:57:37.672Z",
  "hostname": "video-app-6498b5dd57-dgzmk",
  "uptime": 295.935155309
}

Generate Load with k6

In Terminal 5, run the load test with k6.

First run this in your terminal, this creates a script for us to run:

cat > /tmp/loadtest.js << 'SCRIPT'
import http from 'k6/http';
import { check, sleep } from 'k6';

export default function () {
  const res = http.get(__ENV.TARGET_URL);
  check(res, { 'status is 200': (r) => r.status === 200 });
  sleep(0.1);
}
SCRIPT

Then run this command (for more load try 400 vus):

k6 run --vus 200 --duration 5m -e TARGET_URL=http://$LB_URL/api/info /tmp/loadtest.js

Expected Scaling Sequence

Watch the terminals as the load test runs:

Time    HPA Replicas    Nodes    Events
─────────────────────────────────────────────────────
0:00    3               2        Baseline (Karpenter nodes from Section 6)
0:30    3 → 6           2        HPA detects CPU > 70%, scales up
1:00    6 → 9           2        HPA continues scaling
1:15    9 → 12          2        Pods Pending (insufficient capacity)
1:20    12              2        Karpenter detects Pending pods
1:45    12              3        New Spot node joins cluster (~30 sec)
2:00    12              3        All pods Running
2:30    12 → 15         3        HPA continues if load persists
3:00    15              4        Another node added if needed

Verify Mixed Capacity Types

After scaling, check the node distribution:

# See Spot vs On-Demand and instance types
kubectl get nodes -L karpenter.sh/capacity-type -L node.kubernetes.io/instance-type

# Example output:
# NAME                         STATUS   CAPACITY-TYPE   INSTANCE-TYPE
# ip-10-0-1-50.ec2.internal    Ready    spot            t3a.small
# ip-10-0-2-75.ec2.internal    Ready    spot            t2.small
# ip-10-0-3-100.ec2.internal   Ready    spot            t3.medium

Count nodes

# Count nodes by capacity type
kubectl get nodes -l karpenter.sh/capacity-type=spot --no-headers | wc -l
kubectl get nodes -l karpenter.sh/capacity-type=on-demand --no-headers | wc -l

Verify Pod Distribution

Check that pods are spread across nodes:

# See which node each pod is running on
kubectl get pods -n video-app -o wide

# Count pods per node
kubectl get pods -n video-app -o wide --no-headers | awk '{print $7}' | sort | uniq -c

Stop the Load Test

Press Ctrl+C in Terminal 5 to stop the load test or wait for it to end.

Watch Scale-Down and Consolidation

After stopping the load, watch the automatic scale-down:

# Watch HPA scale down
kubectl get hpa -n video-app --watch

# Watch nodes consolidate (Karpenter removes underutilized nodes)
kubectl get nodes --watch

# Watch Karpenter logs for consolidation
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i consolidat

Expected scale-down sequence:

Time    Pods    Nodes    Events
─────────────────────────────────────────────────────
0:00    15      4        Load test ended
0:30    15      4        CPU drops below 70%
1:00    12      4        HPA scales down
2:00    9       4        HPA continues scaling down
3:00    6       4        Karpenter detects underutilization
3:30    6       3        Node cordoned, pods moved
4:00    3       3        HPA reaches minReplicas
5:00    3       2        Karpenter consolidates to fewer nodes

Final Verification

# Check final state
kubectl get nodes -L karpenter.sh/capacity-type
kubectl get pods -n video-app
kubectl get hpa -n video-app

# Check Karpenter-managed nodes
kubectl get nodes -l karpenter.sh/nodepool=default

Examples results after several minutes, running 1000 vus (virtual users):

$ kubectl get hpa -n video-app --watch

video-app-hpa   Deployment/video-app   cpu: 73%/70%   3         10        10         36m
video-app-hpa   Deployment/video-app   cpu: 74%/70%   3         10        10         36m
video-app-hpa   Deployment/video-app   cpu: 73%/70%   3         10        10         37m
video-app-hpa   Deployment/video-app   cpu: 74%/70%   3         10        10         37m
video-app-hpa   Deployment/video-app   cpu: 50%/70%   3         10        10         37m
video-app-hpa   Deployment/video-app   cpu: 21%/70%   3         10        10         37m
video-app-hpa   Deployment/video-app   cpu: 64%/70%   3         10        10         38m
video-app-hpa   Deployment/video-app   cpu: 128%/70%   3         10        10         38m
video-app-hpa   Deployment/video-app   cpu: 130%/70%   3         10        10         38m
video-app-hpa   Deployment/video-app   cpu: 126%/70%   3         10        10         38m
video-app-hpa   Deployment/video-app   cpu: 46%/70%    3         10        10         39m
video-app-hpa   Deployment/video-app   cpu: 57%/70%    3         10        10         39m
video-app-hpa   Deployment/video-app   cpu: 182%/70%   3         10        10         39m
video-app-hpa   Deployment/video-app   cpu: 293%/70%   3         10        10         39m
video-app-hpa   Deployment/video-app   cpu: 303%/70%   3         10        10         40m
video-app-hpa   Deployment/video-app   cpu: 298%/70%   3         10        10         40m
video-app-hpa   Deployment/video-app   cpu: 304%/70%   3         10        10         40m
video-app-hpa   Deployment/video-app   cpu: 300%/70%   3         10        10         40m
video-app-hpa   Deployment/video-app   cpu: 304%/70%   3         10        10         41m
video-app-hpa   Deployment/video-app   cpu: 114%/70%   3         10        10         41m
video-app-hpa   Deployment/video-app   cpu: 3%/70%     3         10        10         41m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        10         41mExplain how it works.

Successful Test!

The complete autoscaling stack has been observed.

Scale-Up:

Load test generated traffic
HPA detected high CPU usage (>70%)
HPA scaled pods: 3 → 6 → 9 → 12+
Some pods went Pending (no capacity)
Karpenter detected Pending pods
Karpenter provisioned Spot instances (~30 seconds)
New nodes joined the cluster
Pods scheduled on new nodes
All requests served successfully!

Scale-Down:

Load test ended
HPA detected low CPU usage
HPA scaled down pods
Karpenter detected underutilized nodes
Karpenter consolidated workloads
Empty nodes terminated
Cost savings achieved!

8. Node Consolidation, Configure Spot instances (extra advanced notes about this)

Understanding Spot Instances

Spot instances are spare AWS capacity offered at typically 60–90% discount. The trade-off: AWS can reclaim them with 2-minute warning. See more info at https://aws.amazon.com/ec2/spot/instance-advisor/

https://aws.amazon.com/ec2/spot/instance-advisor/

Spot Interruption Handling

Karpenter handles Spot interruptions automatically via the SQS queue we configured in the Terraform:

Spot Interruption Flow:
1. AWS sends 2-minute warning to SQS queue
2. Karpenter receives the message
3. Karpenter cordons the node (no new pods)
4. Karpenter provisions replacement node
5. Pods are rescheduled to new node
6. Interrupted node terminates

Result: Near-zero downtime despite Spot reclamation!

View interruption events:

# Watch for interruption handling in logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i interrupt

# Check SQS queue for interruption messages
aws sqs get-queue-attributes \
  --queue-url $(aws sqs get-queue-url --queue-name eks-video-cluster-karpenter-interruption --query 'QueueUrl' --output text) \
  --attribute-names ApproximateNumberOfMessages

Create Spot-Only NodePool (Optional)

For workloads that can tolerate interruptions, create a dedicated Spot NodePool

# k8s/karpenter-nodepool-spot.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-only
spec:
  template:
    metadata:
      labels:
        capacity-type: spot
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["t", "m", "c"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["small", "medium", "large"]
        # Force Spot only
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: "50"
    memory: "100Gi"
  weight: 50  # Lower priority than default pool

Target specific workloads to Spot nodes:

# In your deployment spec
spec:
  template:
    spec:
      nodeSelector:
        capacity-type: spot  # Matches the label in Spot-only NodePool

Trigger Node Consolidation:

# Stop the load test (Ctrl+C in Terminal 5)

# Scale down deployment
kubectl scale deployment video-app -n video-app --replicas=3

# Watch nodes get consolidated
kubectl get nodes --watch

# Watch Karpenter logs for consolidation events
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i consolidat

Karpenter automatically:

Detects underutilized nodes.
Moves pods to other nodes.
Terminates empty nodes.
Saves costs!

# Stop the load test (Ctrl+C in Terminal 5)

# Scale down deployment
kubectl scale deployment video-app -n video-app --replicas=3

# Watch nodes get consolidated
kubectl get nodes --watch

# Watch Karpenter logs for consolidation events
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i consolidat

Consolidation Settings

disruption:
  # When to consolidate
  consolidationPolicy: WhenEmptyOrUnderutilized

  # How long to wait before consolidating
  consolidateAfter: 30s

  # Budget: Limit simultaneous disruptions
  budgets:
    - nodes: "20%"  # Max 20% of nodes disrupted at once

9. Cleanup

Remove Karpenter Resources

# Delete NodePool and EC2NodeClass
kubectl delete -f k8s/karpenter-nodepool.yaml

# Wait for Karpenter nodes to be terminated
kubectl get nodes --watch

Destroy All Infrastructure

cd environments/dev

terraform destroy

Recommended to do this TWICE, and check the AWS console especially in the VPC area for internet gateway and EC2 area for load balancers.

Do not assume all were destroyed automatically.

Check AWS Console (especially these):

EKS cluster deleted
EC2 instances terminated (including Karpenter nodes)
Load balancers removed
IAM roles deleted
SQS queue deleted

AWESOME!

After Parts 4 and 5, we have:

Pods scaling automatically (HPA).
Nodes scaling automatically (Karpenter).
Spot instances for cost savings.
Helm for installs.
Karpenter for Spot instances, cost savings.

By now, we are getting pretty experienced with this.

Is it starting to make sense? I hope so.

Coming up!

I have not finalized the next article, and I will let some days go by for people to catch up… but the below is vaguely what I am thinking about.

These topics may be included in the next article, Part 6:

Kubernetes Dashboard for cluster overview
Prometheus installation for metrics collection
Grafana dashboards for visualization
Kubecost (or other) for detailed cost analysis
CloudWatch Container Insights integration
Custom dashboards for Karpenter metrics
HPA scaling history visualization
Cost comparison dashboards (Spot vs On-Demand)
Alert configuration for anomalies

🛠️ Get more articles and tips like this at https://www.systemsarchitect.io and follow new articles in the series here.

🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Savings: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)

Chris St. John — Fri, 26 Dec 2025 15:55:29 GMT

Creating OpenAI chat messages in code, handling streaming vs non-streaming responses, temperature/top_p control and other factors and variables. This is a review of essentials for the new series.

It’s been a while since I added new articles to AI Dev Tips, but this seems like a great time for a review article and new AI agent series.

With new tech coming out daily it’s good to set a baseline with this “getting started” #1 in the series.

Getting Started: AI Agent Coding Series #1

First I want to make sure we’re on the same page about typical AI Chat coding, standards, as well as terms and new SDKs being introduced (I will cover more on that in later articles)

We will be using OpenRouter and OpenAI SDK/APIs initially for this series but may branch into some others as well. OpenAI’s API format is used by many other LLMs (and OpenRouter) as a standardized API format, so it is good to learn that. OpenRouter allows us to use an interface for many different AI LLMs.

In this first article of the OpenAI Agent Coding Series, we’ll cover the fundamentals and essentials of working with the OpenAI Chat Completions API.

I think getting these chat completion fundamentals down is really important. It’s accessible to most people at very little or no cost. And of course, this is just a start and a foundation to build on.

Later articles we’ll continue with more advanced OpenAI coding with potentially other SDKs like the Open AIs Agent SDK.

Some upfront notes about this series:

I am going to be using OpenRouter for this project. If you want to slightly modify the code, you can use OpenAI directly. There may be some features later not included in OpenRouter’s OpenAI interface and if there is, then we’ll discuss options.
We’re going to use TypeScript and Python.
Why both? For one, in fullstack application coding, I am mainly a TypeScript developer with some Python when necessary. Our focus though is pragmatic, so we will use each when it makes sense!
A lot of code both on the frontend and backend for a variety of types of apps especially UIs, MVPs and prototypes is often in Javascript/TypeScript with React, Node.js and related tooling. For example, the Next.js and Vercel deploy is a popular option.
However, no doubt, AI agents, and many AI tools in general, are often developed in Python — so we need to be familiar with Python too.

We’ll be covering AI Chat coding basics for Typescript and Python:

Setup and Introduction
Message Roles and Structure + Example
Non-Streaming Responses
Streaming Responses
Temperature and Top_p Control
Other Important Parameters
Error Handling Best Practices

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

1. Setup and Introduction

OpenRouter Setup

OpenRouter provides a unified API that’s compatible with the OpenAI SDK, allowing you to access multiple LLM providers through a single interface. This makes it perfect for experimentation and comparing models.

Sign up at openrouter.ai
Generate an API key from your dashboard
Create a .env file in your project root:

OPENROUTER_API_KEY=your-api-key-here

Important: Add .env to your .gitignore to keep your API key secure and out of version control.

TypeScript Setup

# Initialize a new project
npm init -y

# Install dependencies
npm install openai dotenv
npm install -D typescript @types/node ts-node

# Create tsconfig.json
npx tsc --init

The last command creates tsconfig.json — we need to edit that to add the following to the config:

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "outDir": "./dist",
// ... other options pre-existing in here should be ok to leave there.
// "verbatimModuleSyntax": true,
// Disable verbatimModuleSyntax enforcing strict module syntax 
  }
}

note: Disable verbatimModuleSyntax. Also, I left in some of the existing options and removed some of the boilerplate that was originally created with init command above. Just make sure you have the above in there.

Python Setup

We’re not going to do everything in Python but I’ll create a separate file for streaming so you can use it if you want.

python -m venv venv

Understanding the Chat Completions API

The Chat Completions API is the primary way to interact with GPT models.
At its core, you send a list of messages and receive a generated response.
The API is stateless — it doesn’t remember previous conversations, so you must send the full conversation history with each request.

Basic Flow

Below is a basic conceptual flow at the highest level.

Your application makes an API request →
API request (messages + parameters) →
LLM provider (OpenAI, Anthropic, Grok etc.) →
API response (completion + metadata) →
Your application (process/output)

Key Concepts:

Messages: An array of message objects representing the conversation
Model: Which LLM to use (gpt-4, gpt-3.5-turbo, claude-3-opus)
Parameters: Control behavior like creativity, length, and format
Tokens: The basic unit of text processing (~4 characters in English)

More Complex Flow

The basic flow only shows highest-level aspects, this shows other possible practical steps in usage that I often use for integrations.

Your application makes an API request →
Pre-process and authenticate →
API request (messages + parameters) →
Queue, orchestrate if multiple calls →
LLM provider (OpenAI, Anthropic, Grok etc.) →
Validate, moderation →
Cache and logging →
API response (completion + metadata) →
Post-Process (format output, RAG injection, vector database) →
Your application (process/output)

2. Message Roles and Structure + Example

Every message in the Chat Completions API has a role and content. Understanding roles is fundamental to building effective AI applications.

system: Sets behavior, personality, and instructions. Example: "You are a helpful coding assistant".
user: Represents human input "How do I sort an array in Python?".
assistant: Represents AI responses. Previous AI responses in conversation.

import OpenAI from 'openai';

// Define message types for clarity
type ChatMessage = {
  role: 'system' | 'user' | 'assistant';
  content: string;
};

const messages: ChatMessage[] = [
  {
    role: 'system',
    content: 'You are a senior software engineer. Provide concise, practical answers with code examples when appropriate.'
  },
  {
    role: 'user',
    content: 'What is the difference between let and const in JavaScript?'
  },
  {
    role: 'assistant',
    content: 'let allows reassignment, const does not. Both are block-scoped.'
  },
  {
    role: 'user',
    content: 'Can you show me an example?'
  }
];

In Python the syntax is slightly different because it’s without the explicit type:

from openai import OpenAI

messages = [
    {
        "role": "system",
        "content": "You are a senior software engineer. Provide concise, practical answers with code examples when appropriate."
    },
    {
        "role": "user",
        "content": "What is the difference between let and const in JavaScript?"
    },
    {
        "role": "assistant",
        "content": "let allows reassignment, const does not. Both are block-scoped."
    },
    {
        "role": "user",
        "content": "Can you show me an example?"
    }
]

Best Practices for Messages

System messages first: Always place your system message at the beginning
Be specific in system prompts: Vague instructions lead to inconsistent results
Include context: For multi-turn conversations, include relevant history
Trim when needed: Long conversations can be truncated (oldest first) to stay within token limits
Include context: For multi-turn conversations, include relevant history (previous user + assistant messages) to maintain coherence.
Leverage few-shot examples effectively: Include 2–5 high-quality input/output examples inside the system/developer message (or as alternating user/assistant pairs). Use clear delimiters (XML tags, Markdown sections, or numbered lists) to separate examples.

3. Non-Streaming Responses

Non-streaming (also called “blocking” or “synchronous”) responses wait until the entire response is generated before returning.

This is simpler to implement but provides a worse user experience for longer responses.

When to Use Non-Streaming

Short, quick responses
Background processing where latency doesn’t matter
When you need the complete response before processing.

import OpenAI from 'openai';
import dotenv from 'dotenv';

dotenv.config({ path: '../../.env' });

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});

interface ChatCompletionResult {
  content: string;
  response: OpenAI.Chat.Completions.ChatCompletion;
}

async function getChatCompletion(userMessage: string): Promise {
  const response = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: 'You are a helpful senior developer.'
      },
      {
        role: 'user',
        content: userMessage
      }
    ],
    // Non-streaming is the default (stream: false)
  });

  // Extract the response content
  const content = response.choices[0]?.message?.content;

  if (!content) {
    throw new Error('No content in response');
  }

  return { content, response };
}

// Usage
async function main() {
  try {
    const { content, response } = await getChatCompletion('Explain async/await in one paragraph');
    console.log('Content:', content);
    console.log('Model used:', response.model);
    console.log('Usage:', response.usage);
    console.log('Message:', JSON.stringify(response?.choices?.[0]?.message ?? null, null, 2));
    console.log('Full response:', response);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

Run this command from the directory you are in (you may need to adjust the path for dotenv.config({ path: ‘../../.env’ });):

Run:

npx ts-node non-streaming-ts

Response (without the message object expanded):


// Message: 

{
  "role": "assistant",
  "content": "Async/await is a syntax in modern programming languages such as JavaScript designed to make asynchronous programming simpler and more readable. The `async` keyword is used when declaring a function, indicating that the function will handle asynchronous operations and will return a promise. Within an `async` function, the `await` keyword is used before a function call to pause the execution of the async function until the promise is resolved or rejected. This allows the code to be written in a more synchronous manner, which helps in reducing the complexity of code managing multiple concurrent operations and error handling, without blocking the main thread. Essentially, async/await helps manage asynchronous code, making it easier to follow and maintain by writing code that avoids deep nesting of callbacks and complex chains of promise handlers.",
  "refusal": null,
  "reasoning": null
}

// Full response: 

{
 {
  id: 'gen-1766607459-DshFs0qxxxxxxxxxxxx',
  provider: 'OpenAI',
  model: 'openai/gpt-4-turbo',
  object: 'chat.completion',
  created: 1766607459,
  choices: [
    {
      logprobs: null,
      finish_reason: 'stop',
      native_finish_reason: 'stop',
      index: 0,
      message: [Object]
    }
  ],
  system_fingerprint: 'fp_de235xxxxx',
  usage: {
    prompt_tokens: 26,
    completion_tokens: 155,
    total_tokens: 181,
    cost: 0.00491,
    is_byok: false,
    prompt_tokens_details: { cached_tokens: 0, audio_tokens: 0, video_tokens: 0 },
    cost_details: {
      upstream_inference_cost: null,
      upstream_inference_prompt_cost: 0.00026,
      upstream_inference_completions_cost: 0.00465
    },
    completion_tokens_details: { reasoning_tokens: 0, image_tokens: 0 }
  }
}

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

4. Streaming Responses

Streaming returns the response incrementally as it’s generated. This provides a much better user experience for longer responses — users see content appearing in real-time rather than waiting for the complete response.

When to Use Streaming

Chat interfaces where users expect real-time feedback
Long-form content generation
When perceived latency matters more than simplicity
Interactive applications

import OpenAI from 'openai';
import dotenv from 'dotenv';

dotenv.config({ path: '../../.env' });

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});

interface StreamingResult {
  content: string;
  chunks: string[];
}

async function streamChatCompletion(userMessage: string): Promise {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: 'You are a helpful senior developer.'
      },
      {
        role: 'user',
        content: userMessage
      }
    ],
    stream: true,  // Enable streaming
    //  temperature: 0.2,  // Low temperature for consistent code
  });

  let fullResponse = '';
  const chunks: string[] = [];

  // Process chunks as they arrive
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;

    if (content) {
      process.stdout.write(content);  // Print without newline
      fullResponse += content;
      chunks.push(content);
    }
  }

  console.log();  // Final newline
  return { content: fullResponse, chunks };
}

// Usage
async function main() {
  try {
    console.log('Streaming response:\n');
    const { content, chunks } = await streamChatCompletion(
      'Write a short poem about coding'
    );
    console.log('\n--- Complete response captured ---');
    console.log('Length:', content.length, 'characters');
    console.log('Chunks received:', chunks.length);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

Run the script

npx ts-node streaming.ts

You should see the console.log results appear as streaming.

The image below is the Python version but functionality is the same.

Python version:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)

def stream_chat_completion(user_message: str) -> str:
    """Get a streaming chat completion."""
    stream = client.chat.completions.create(
        model="openai/gpt-4-turbo",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful senior developer."
            },
            {
                "role": "user",
                "content": user_message
            }
        ],
        stream=True,  # Enable streaming
    )

    full_response = ""

    for chunk in stream:
        content = chunk.choices[0].delta.content

        if content:
            print(content, end="", flush=True)
            full_response += content

    print()  # Final newline
    return full_response

# Usage
if __name__ == "__main__":
    try:
        print("Streaming response:\n")
        full_text = stream_chat_completion("Write a short poem about coding")
        print("\n--- Complete response captured ---")
        print(f"Length: {len(full_text)} characters")
    except Exception as e:
        print(f"Error: {e}")

Make sure to install the libraries before running it.

# install libraries
pip install openai python-dotenv

# run it
python ./streaming.py

This should stream the results.

In the next few sections we talk about parameters that you can add to the request that may affect the response.

5. Temperature and Top_p Control

These two parameters control the “creativity” or randomness of the model’s output. Understanding them is crucial for getting consistent, appropriate responses for your use case.

⚠️ OpenAI recommends altering either temperature or top_p, but not both. They serve similar purposes, and adjusting both can lead to unpredictable results.

temperature

Temperature controls randomness. Lower values make output more focused and deterministic; higher values make it more creative and varied.

Range: 0.0 to 2.0 (default: 1.0)

0.0–0.3 … Very focused, deterministic
Code generation, factual Q&A, data extraction.

0.4–0.7 … Balanced
General conversation, explanations.

0.8–1.2 … Creative, varied
Creative writing, brainstorming.

You add temperature in this section of the above code we did already:

const stream = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: 'You are a helpful senior developer.'
      },
      {
        role: 'user',
        content: userMessage
      }
    ],
    stream: true,  // Enable streaming
    temperature: 0.5,  // Balance results
  });

top_p (Nucleus Sampling)

Top_p considers only the tokens comprising the top p probability mass. A value of 0.1 means only tokens in the top 10% probability are considered.

Range: 0.0 to 1.0 (default: 1.0)

0.1 … Very focused, limited vocabulary
0.9 … High diversity, most tokens considered

6. Other Important Parameters

Beyond temperature and top_p, several other parameters help you control the API behavior.

max_tokens (or max_completion_tokens)

Maximum length of the generated response. Important for cost control and preventing runaway responses.

⚠️ Very important for cost control.
max_completion_tokens is newer syntax

const response = await client.chat.completions.create({
  model: 'openai/gpt-4-turbo',
  messages: [...],
  max_tokens: 500,  // Limit response length
});

n

This will give you multiple completions to choose from if you set n to >0

reasoning (newer reasoning models)

This is on some newer reasoning models, to allow you to adjust how much reasoning is applied.

stop

Specify sequences where the model should stop generating. Useful for structured output. Lets say I am getting some content from somewhere else and I add a keyword ENDHERE — it will stop at my keyword if I add it ot the stop array.

const response = await client.chat.completions.create({
  model: 'openai/gpt-4-turbo',
  messages: [...],
  stop: ['\n\n', 'END', '---'],  // Stop at any of these
});

user

This is to track which user is placing requests, commonly used for security purposes, such as if somebody is abusing the system.

More Complete List of Parameters

These were just a few to get you started.

Our next article is going to cover one of the most important, which is tools and tool_choice and response_format — these are important for AI agents as it allows you a lot of extra customization.

The less common parameters and advanced sampling, honestly, I have not really used too much yet, but we may delve into these more later. For now do not worry about them too much as they are for edge case scenarios.

interface ChatCompletionParams {
  // ── Core (always required) ───────────────────────────────────────────────
  model: string;                    // Required: e.g. "gpt-4o", "o1", "o3-mini", "gpt-5o-preview"
  messages: ChatCompletionMessage[]; // Required: array of message objects

  // ── Generation control ───────────────────────────────────────────────────
  max_tokens?: number;              // Classic name (still widely supported)
  max_completion_tokens?: number;   // Preferred on newer 2025+ models (especially reasoning ones)
  temperature?: number;             // 0–2 (often ignored/fixed on o1/o3/o4 reasoning models)
  top_p?: number;                   // 0–1 (nucleus sampling)
  n?: number;                       // Number of completions to generate (usually 1)
  stream?: boolean;                 // true → returns a stream of chunks
  stop?: string | string[];         // Stop generation on these sequences

  // ── Repetition & diversity penalties ─────────────────────────────────────
  presence_penalty?: number;        // -2.0 to 2.0
  frequency_penalty?: number;       // -2.0 to 2.0

  // ── Reasoning / o-series specific (recent additions) ─────────────
  reasoning?: {
    effort?: "low" | "medium" | "high";  // Controls thinking budget (o1/o3/o4 family)
    summary?: "auto" | "concise" | "detailed"; // How much of reasoning to show
  };

  // ── Advanced sampling & reproducibility ──────────────────────────────────
  seed?: number;                    // For reproducible outputs (when possible)
  top_logprobs?: number;            // 0–20 (return top log probabilities)
  logprobs?: boolean;               // Return logprobs (many new models ignore this)

  // ── Structured output & tool calling ─────────────────────────────────────
  tools?: ChatCompletionTool[];     // Function calling / tool definitions
  tool_choice?: "auto" | "none" | "required" | { type: "function"; function: { name: string } };
  response_format?: 
    | { type: "text" }
    | { type: "json_object" }
    | { type: "json_schema"; json_schema: { name: string; strict: boolean; schema: object } };

  // ── Metadata & tracking ──────────────────────────────────────────────────
  user?: string;                    // End-user identifier (for abuse monitoring)
  store?: boolean;                  // Whether to store this request for later distillation/evals (new 2025)

  // ── Less common / enterprise / preview features (2025) ───────────────────
  service_tier?: "default" | "low_latency" | "high_throughput"; // Latency/scale tier
  include_usage?: boolean;          // Include token usage in final chunk (streaming)
  parallel_tool_calls?: boolean;    // Allow model to call multiple tools at once
}

7. Error Handling Best Practices

Before ending this I did want to mention error handling and validation. This is important for user experience.

Robust error handling is essential for production applications.

⚠️ The OpenAI SDK throws specific error types that you can catch and handle appropriately.

Below is an example of how you can handle errors and retries in the catch block of the above functional streaming script:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});

async function safeChatCompletion(
  messages: OpenAI.Chat.ChatCompletionMessageParam[],
  retries: number = 3
): Promise {
  for (let attempt = 1; attempt <= retries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model: 'openai/gpt-4-turbo',
        messages,
      });

      const content = response.choices[0]?.message?.content;

      if (!content) {
        throw new Error('Empty response from API');
      }

      return content;

    } catch (error) {
      if (error instanceof OpenAI.APIError) {
        const { status, message } = error;

        if (status === 401) {
          throw new Error('Invalid API key. Check your configuration.');
        }

        if (status === 429) {
          // Rate limited - wait and retry
          const waitTime = Math.pow(2, attempt) * 1000;
          console.log(`Rate limited. Waiting ${waitTime}ms before retry...`);
          await new Promise(resolve => setTimeout(resolve, waitTime));
          continue;
        }

        if (status === 500 || status === 503) {
          if (attempt < retries) {
            const waitTime = Math.pow(2, attempt) * 1000;
            console.log(`Server error. Retrying in ${waitTime}ms...`);
            await new Promise(resolve => setTimeout(resolve, waitTime));
            continue;
          }
        }

        throw new Error(`API Error (${status}): ${message}`);
      }

      throw error;
    }
  }

  throw new Error('Max retries exceeded');
}

This gives the ability to catch errors and do retries.

In the next article of the series we’ll look at a more advanced class-based approach for this and explain it.

This is a good way to advance your coding skills.

Look for it soon!

Thanks

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Savings: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1) was originally published in AI Dev Tips on Medium, where people are continuing the conversation by highlighting and responding to this story.

✅ Cloud Storage I/O Performance Checklist

Chris St. John — Tue, 23 Dec 2025 16:04:33 GMT

Cloud storage I/O performance strategies, tips and pitfalls for professional cloud engineers and startups. (CC #3)

This latest article in our Cloud Checklist series continues our dedicated focus on Performance, which we continue from the last 2 articles.

We break down the most common cloud storage I/O performance strategies, tips and pitfalls and how to avoid storage I/O problems before they impact production systems.

Rather than focusing on theory, this is a practical checklist that helps identify where storage latency, throughput limits, and IOPS constraints typically appear in real workloads.

✅ Cloud Storage I/O Performance Checklist

Choose local NVMe SSD
Provision required IOPs
Distribute load across volumes
Separate data and logs
Monitor queue depth
Pre-warm new volumes
Use high-performance tiers
Avoid IOPS/network throttling
Enable multi-attach when supported
Stripe volumes for bandwidth

As always in our cloud checklists, I have intentionally limited this to what I consider the 10 most important points for the topic.

There are many other I/O related tips if you dig deeper and depending on your setup, but these are the “headline” Cloud Storage I/O Performance factors that you need to consider as best practice.

🛠️ Get more tips like this at https://www.systemsarchitect.io and follow the Cloud Checklists series here. 🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next checklist that I put out!

1. Choose local NVMe SSD

Local NVMe SSDs deliver the lowest latency and highest raw IOPS/throughput available on cloud instances, typically greatly outperforming network-attached storage.

NVMe stands for Non-Volatile Memory Express, a communication protocol and interface spec designed for solid-state storage (not for spinning hard drives, which are slower).

⚡️IOPS = Input/Output operations per second. It’s a measure of how many read or write requests your storage can handle each second.

⚡️Throughput: Amount of data per second. (not the same as IOPS)

Use them for latency-sensitive workloads like databases, caches, or real-time analytics. Here are some factors I like to look at first:

⚡️ For performance:

IOPS: Determine whether your workload is read-heavy, write-heavy, or mixed. Random read/write IOPS matter most for databases and virtualization, while sequential throughput matters for analytics and media streaming.
Latency: Look for low and predictable latency (often measured at 99th or 99.9th percentile), especially for latency-sensitive applications like real-time analytics or transaction processing.
Throughput (MB/s or MiB/s): Sequential data transfer rate

⚡️ For durability:

DWPD (Drive Writes Per Day): Higher write-intensive workloads (logging, caching, databases) need drives rated for 3+ DWPD, while read-heavy workloads can use lower-endurance drives (0.3–1 DWPD).
TBW (Terabytes Written): Ensure the drive’s total write endurance matches your expected lifespan and write volume.

Not prioritizing local SSDs? You risk unnecessary network overhead and performance bottlenecks.

On AWS EC2, select instance types with instance store NVMe SSDs (i4i, m5d, r5d series) for the highest ephemeral performance, ideal for caches or temporary data.

AWS docs: “The data on NVMe instance storage is encrypted using an XTS-AES-256 block cipher implemented in a hardware module on the instance. The encryption keys are generated using the hardware module and are unique to each NVMe instance storage device.”

Linux: Monitor with iostat -x or iotop to confirm low latency; use fiofor benchmarking NVMe performance.
To benchmark Persistent Disk performance on Linux, use Flexible I/O tester (FIO) instead of other disk benchmarking tools such as dd. By default, dd uses a very low I/O queue depth, and might not accurately test disk performance.

⚠️ Gotcha: Local/instance store volumes are ephemeral which means data is lost on instance stop, termination, or hardware failure, so never use them for persistent data without replication

Related Tools (useful throughout this article):

sysstat: Open source performance monitoring suite including iostat and iotop for monitoring NVMe latency and IOPS on Linux. https://github.com/sysstat/sysstat

systat main features: https://sysstat.github.io/features.html

fio: Open source tool for benchmarking NVMe SSD performance, measuring IOPS, latency, and throughput in cloud environments. https://github.com/axboe/fio

ezFIO: Open source NVMe-specific benchmarking tool for testing SSD performance parameters like queue depth and bandwidth. https://github.com/earlephilhower/ezfio

Iometer: Open source tool for SSD benchmarking, including NVMe, to simulate workloads and measure IOPS/throughput. https://sourceforge.net/projects/iometer/

2. Provision required IOPS

Provisioning the correct IOPS upfront allows you to sustain the peak demands of your workload without throttling or queue buildup.

Under-provisioning leads to unpredictable performance degradation during spikes, while over-provisioning wastes money.

Modern cloud volumes allow independent IOPS scaling, making it essential to right-size based on benchmarks rather than defaults.

Burst vs. baseline: Many cloud disks offer burst IOPS that deplete over time. Provision for sustained needs, not burst.
Disk size often determines IOPS: On AWS EBS, GCP PD, and Azure, larger disks get more baseline IOPS.
VM limits: The VM type caps total IOPS regardless of disk capability.

Optimize block/I/O size: Aligning your I/O request size to your workload pattern (e.g., 256KB+ for sequential, smaller for random) has a major impact on throughput and IOPS efficiency.

Many performance issues stem from mismatched I/O sizes.

(note IOPS estimates below may change, they do update frequently)

AWS: Use io2 or gp3 volumes and provision IOPS independently (up to 256,000 IOPS per volume with io2 Block Express).
Azure: Choose Premium SSD v2 or Ultra Disks to provision IOPS directly. Premium SSD v2: Maximum of 80,000 IOPS, Ultra Disk: Maximum of 400,000 IOPS
Google Cloud: Select Extreme Persistent Disk and provision IOPS separately (up to 120,000).
Linux tip: Use iostat -x 1 to monitor delivered vs. provisioned IOPS

Example:

# iostat - shows IOPS per device (r/s = reads, w/s = writes)
iostat -x 1 10

# Key columns to watch:
# r/s     - read IOPS
# w/s     - write IOPS
# await   - average latency (ms)
# %util   - device saturation

# iotop - shows IOPS by process
sudo iotop -aoP

Simulate with fio:

fio --name=mixed_workload \
    --ioengine=libaio \
    --rw=randrw \
    --rwmixread=70 \
    --bs=8k \
    --iodepth=64 \
    --numjobs=4 \
    --size=10G \
    --runtime=60 \
    --time_based \
    --group_reporting \
    --direct=1

https://github.com/axboe/fio

3. Distribute load across volumes

Distribute I/O load across multiple volumes to parallelize requests and avoid hitting per-volume IOPS/throughput ceilings.

A single volume, no matter how highly provisioned, has hard limits that can become a bottleneck under heavy load.

Proper distribution maximizes overall instance performance and provides better scalability.

⚡️ Striping: Splits it into chunks and writes them across multiple disks simultaneously. ⚠️ If any disk in a striped set fails, you typically lose ALL your data (because pieces are scattered across all disks).

⚡️ LVM (Logical Volume Manager): a “virtual disk manager” that lets you combine, resize, and organize storage without being locked to physical disk boundaries. Also, create striped volumes.

Physical Disks (/dev/sdb, /dev/sdc, etc.)
↓
Physical Volumes (PVs): LVM’s view of the disks
↓
Volume Group (VG) : Pool of storage from multiple PVs
↓
Logical Volumes (LVs): Your “virtual disks” that you format and mount

mdadm (Multiple Device Administrator: Linux’s software RAID tool. It combines multiple physical disks into RAID arrays directly at the kernel level, creating a single device like /dev/md0.

Spread I/O across multiple volumes to avoid single-volume limits and improve parallelism.
AWS/Azure/GCP: Attach multiple block volumes and balance application load (e.g., via database sharding).
Linux tip: Use LVM or mdadm to create striped logical volumes if needed, but prefer application-level distribution.
Monitor per-device stats with iostat -dx 1 to identify hot volumes.

Example to create striped logical volume:

# Create a striped logical volume across 3 disks
lvcreate --type raid0 -L 500G --stripes 3 --stripesize 64k -n mydata_lv my_vg

⚠️ Gotcha: Adding more volumes increases management complexity and may require application-level or filesystem changes to balance load.

Tools:

mdadm: Open source Linux tool for managing RAID arrays to stripe and distribute load across volumes for better parallelism. https://github.com/neilbrown/mdadm

4. Separate data and logs

Separating data and transaction logs onto different volumes allows us to optimize each for its access pattern, random for data, sequential for logs, while isolating the intense write activity of logs from data reads.

This separation can highly improve overall database performance and recovery characteristics.

Mixing them on one volume often causes contention and reduces effective throughput.

⚠️ Gotcha: Log volumes usually need higher write endurance and lower latency; skimping here can become the hidden bottleneck.

Transaction logs typically have 2–10x write amplification (each logical write becomes multiple physical writes)

Place database transaction logs on dedicated high-IOPS volumes for sequential write optimization.
AWS: Use separate io2 volumes for logs; enable EBS-optimized instances.
Azure: Separate logs to Ultra Disks for sub-millisecond latency.
GCP: Use Hyperdisk or Extreme PD for logs.
Linux tip: Mount logs with noatime and use XFS/ext4 with appropriate stride for RAID/LVM; monitor with pidstat -d to see per-process I/O.

noatime (No Access Time): A Linux mount option that stops the filesystem from writing the “last accessed time” every time a file is read. Every file read normally triggers a write to update metadata which reduces performance. (note: Some backup tools and mail programs check access times, but most modern systems use relatime by default)

Example usage:

# In /etc/fstab, add noatime to mount options
/dev/sdb1  /var/log/database  ext4  defaults,noatime  0  2

pidstat -d and other troubleshooting examples:

# Monitor all processes with I/O activity, update every 2 seconds
pidstat -d 2

# Monitor specific process (e.g., PostgreSQL)
pidstat -d -p $(pgrep postgres) 2

# See which device is busy
iostat -dx 1

# See both together
watch -n 1 'pidstat -d | head -20; iostat -dx | grep -v loop'

For a lot more info, see the iostat(1) — Linux manual page

5. Monitor queue depth

Queue depth reflects how many I/O requests are waiting to be processed; monitoring it detects saturation before latency spikes get bad.

⚡️ Queue Depth (aqu-sz / avgqu-sz with iostat -x 1): The average queue length of requests that were issued to the device. note: it shows the number of operations that were either queued OR being serviced, not just waiting in the queue!

NVMe SSDs: 32–256 optimal in iostat -x 1. NVMe handles deep queues efficiently due to parallel processing. High aqu-sz + high await = storage saturation. High %util but low aqu-sz = short bursts of activity.

⚡️ How it works: Application → I/O Scheduler Queue → Device Driver Queue → Physical Disk

High queue depth indicates your storage or instance is overwhelmed, while chronically low depth suggests over-provisioning.

Proactive monitoring allows timely scaling or tuning.

⚠️ Gotcha: Optimal queue depth varies by device. NVMe likes deeper queues (100+), while traditional spinning disks perform best with shallow queues.

Linux commands: iostat -x (look at aqu-sz or avgqu-sz); cat /proc/diskstatsfor raw stats; iotop for interactive view.

AWS: Use CloudWatch metrics like VolumeQueueLength.
Azure/GCP: Monitor via their portals or export to Prometheus.
Tune with echo 128 > /sys/block/sdX/queue/nr_requests or NVMe-specific settings.

6. Pre-warm new volumes

Pre-warming ensures that restored volumes from snapshots achieve full performance immediately by populating the underlying storage infrastructure.

Without it, the first read of each block incurs significant latency penalties as data is fetched from archival locations.

This step gives us consistent performance after scaling or recovery events.

⚠️ Gotcha: Do not pre-warm new empty volumes, it is only for snapshot restores.

⚠️ Gotcha: Modern provisioned volumes (AWS gp3/io2) often no longer require manual pre-warming for new empty volumes, but snapshot restores still benefit greatly.

Primarily needed for volumes restored from snapshots to avoid first-access latency penalty.
AWS: New empty volumes no longer need pre-warming; for snapshot restores, read all blocks (esudo dd if=/dev/xvdf of=/dev/null bs=1M or fio).

Example, there are other similar variations for Azure and GCP:

# 1. Create volume from snapshot
aws ec2 create-volume --snapshot-id snap-abc123 ...

# 2. Attach to temporary instance
aws ec2 attach-volume --instance-id i-temp123 ...

# 3. Pre-warm
sudo fio --filename=/dev/xvdf --name=init --rw=read --bs=128k \
  --iodepth=32 --ioengine=libaio --direct=1

# 4. Detach and attach to production
aws ec2 detach-volume --volume-id vol-abc123
aws ec2 attach-volume --instance-id i-prod456 ...

# Result: Production sees consistent performance from day 1

and fast snapshot restore:

# Enable FSR for snapshot in specific AZs
aws ec2 enable-fast-snapshot-restores \
  --availability-zones us-east-1a us-east-1b \
  --source-snapshot-ids snap-abc123

Azure/GCP: Similar initialization for restored disks.
Linux tip for efficient read-based warming:

fio --name=prewarm --filename=/dev/sdX --rw=read --bs=128k --iodepth=32 --ioengine=libaio --direct=1

Note: Modern volumes often perform well without manual warming due to lazy loading improvements. For example, GCP snapshots can be used immediately with no performance impact according to some users.

7. Use high-performance tiers

Select the right high-performance storage tier provides guaranteed low latency and high throughput tailored to demanding workloads.

Lower tiers may suffice for cold data but will throttle critical applications.

Matching tier to workload characteristics optimizes both performance and cost.

Use cases:

Latency-sensitive transactional databases: OLTP workloads (e-commerce order processing, financial trading systems, payment gateways) where sub-millisecond read/write latency is required.
Real-time analytics and data warehousing: Platforms like Snowflake, BigQuery, Redshift, or ClickHouse running interactive queries on hot datasets.
High-performance applications and caches: In-memory databases (Redis, Memcached), session stores, or application tiers needing ultra-low latency block storage for fast random reads/writes, such as gaming backends or ad-tech bidding systems.
AI/ML training and inference serving: Model training with frequent checkpointing or inference endpoints requiring extreme throughput

⚠️ Gotcha: High-performance tiers are significantly more expensive — always validate with real workload benchmarks before committing.

AWS: io2 Block Express or gp3 for cost-effective high performance. For object storage use Intelligent-Tiering on S3 and/or lifecycle policies.
Azure: Premium SSD v2 or Ultra Disks for top-tier latency/IOPS.
GCP: Hyperdisk Extreme for ultra-high IOPS, or Hyperdisk ML for extreme throughput (ML workloads).
SaaS tip: For object storage (S3, Blob, GCS), use intelligent tiering or frequent-access tiers for hot data.
Monitor tier effectiveness with provider dashboards.

⚡️ Best Overall Price/Performance (estimate mid-2025):

AWS: gp3 (cheaper than io2 for most workloads)
Azure: Premium SSD v2 (better than most competitors)
GCP: Hyperdisk Balanced

⚡️ Lowest Latency Mission-Critical (estimate mid-2025):

AWS: io2 Block Express (<500μs)
Azure: Ultra Disk (sub-millisecond)
GCP: Hyperdisk Extreme (sub-millisecond)

Tools:

Amazon S3 Intelligent-Tiering: SaaS storage class for automatic tiering to optimize performance and cost for hot/cold data. https://aws.amazon.com/s3/storage-classes/intelligent-tiering/

8. Avoid IOPS/network throttling

Throttling occurs when you exceed provisioned or instance limits, causing sudden performance drops that are hard to diagnose.

Avoiding it ensures predictable, sustained performance under load.

Carefully plan instance type, volume configuration, and workload patterns.

EBS-optimized instances: AWS EC2 instances with dedicated bandwidth for storage traffic, preventing network and EBS I/O from competing for the same throughput (critical: instance limits can throttle even high-IOPS volumes).

⚠️ Match instance network bandwidth to storage needs. For network-attached storage (EBS, Persistent Disk, etc.), your VM’s network throughput can become the bottleneck. EBS-optimized instances or properly sized VMs are essential.

Burst credits: Performance tokens consumed when storage (AWS gp2, st1, sc1) or compute exceeds baseline capacity; once exhausted, performance drops to baseline, often causing a major slowdown.

⚠️ Gotcha: Burst-credit systems (AWS gp2, st1) can hide throttling until credits are exhausted, leading to unexpected cliffs in performance.

Also account for snapshot overhead, as snapshots can degrade I/O performance temporarily, especially on first read after creation.

Stay within instance limits (AWS EBS bandwidth per instance type) and volume caps.
Use EBS-optimized instances (AWS) or accelerated networking (Azure/GCP).
Avoid burst-dependent types (such as gp2) for steady workloads; prefer provisioned.
Linux tip: Benchmark with fio to test sustained vs. burst; watch CloudWatch for VolumeReadOps/WriteOps throttling.

Look into Accelerated Networking (SR-IOV): Single Root I/O Virtualization technology that bypasses the virtual switch to give VMs direct access to physical network hardware. It’s available on major cloud providers.

Tools:

Prometheus: Open source monitoring system for tracking IOPS, throughput, and alerting on throttling in cloud storage. https://prometheus.io/
Grafana: Open source visualization tool paired with Prometheus for dashboards on storage metrics to prevent throttling. https://grafana.com/

9. Enable multi-attach when supported

Multi-attach allows a single volume to be shared across multiple instances, enabling clustered applications and high-availability setups without replication overhead.

This is great for traditional clustered databases or filesystems needing shared storage in the cloud.

However, it requires careful coordination to prevent data corruption.

Use cases:

Clustered databases requiring high availability: Ideal for failover cluster instances (SQL Server FCI on Azure/AWS, or Oracle RAC-like setups) where multiple nodes need simultaneous read/write access to the same data for fast failover without data replication overhead.
Traditional on-premises lift-and-shift migrations: When migrating legacy clustered applications that rely on SAN-like shared block storage to the cloud while preserving existing architectures.
Active-active or active-standby HA configurations: In scenarios demanding low-latency shared storage for business intelligence platforms, data-intensive analytics, or applications managing concurrent writes with cluster managers

⚠️ Gotcha: Multi-attach typically supports only limited concurrent writers and requires cluster-aware filesystems — misconfiguration can cause severe data loss.

⚠️ Gotcha: Check cloud platform docs for limitations. For example in AWS Multi-Attach enabled volumes can’t be created as boot volumes.

AWS: io1/io2 volumes support multi-attach for shared storage (e.g., clustered apps).
Azure: Premium SSDs support shared disks.
GCP: Regional Persistent Disks for multi-writer.
Requires filesystem like OCFS2 or GFS2; use carefully to avoid corruption.
Linux tip: Mount with cluster-aware options; monitor with blkid and shared queue management.

sudo blkid /dev/sdb1
# Output: /dev/sdb1: LABEL="myshareddata" UUID="..." TYPE="ocfs2"

Tools:

OCFS2 (Oracle Cluster File System 2): Open source cluster-aware filesystem for shared multi-attach volumes to prevent data corruption. https://oss.oracle.com/projects/
GFS2 (Global File System 2): Open source clustered filesystem for multi-writer access on shared cloud volumes. https://www.kernel.org/doc/html/v6.0/filesystems/gfs2.html

10. Stripe volumes for bandwidth

Striping multiple volumes together aggregates their throughput and IOPS, allowing you to exceed the limits of any single volume.

This is works great for workloads requiring multi-GB/s bandwidth, such as large-scale data processing or media streaming.

Without striping, we’re artificially capped by per-volume max.

Use cases:

High-throughput databases: For heavily I/O-intensive databases (e.g., MySQL, PostgreSQL, SQL Server) where a single volume’s bandwidth limit becomes a bottleneck, striping multiple provisioned volumes.
Video editing, rendering, and live streaming: Workloads involving large sequential file transfers, such as 4K/8K video processing, real-time encoding, or caching live streams
Big data processing and machine learning training: In ETL pipelines, Hadoop/Spark jobs, or ML model training with large datasets, striping enables faster data ingestion, shuffling, and checkpointing by parallelizing I/O across volumes.
High-performance computing (HPC) and scientific simulations: Applications like genomics, fluid dynamics, or financial modeling
Use RAID0/LVM striping across multiple volumes to exceed single-volume throughput limits.
AWS: Common with gp3/io2; e.g., mdadm or LVM stripe for >1 GB/s.
Azure/GCP: Storage Spaces or logical volume striping.
Linux tip:

lvcreate --stripes N --stripesize 64K

# align filesystem (e.g., mkfs.xfs -d su=64k,sw=N).

Benchmark striped setup with fio--rw=randread — bs=1M — numjobs=16.

⚠️ Gotcha: Striping (RAID-0) offers no redundancy — if one volume fails, the entire striped set is lost, so combine with backups or application-level resilience.

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next checklist that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

Also follow the SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Saving: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

✅ Cloud Storage I/O Performance Checklist was originally published in Cloud Checklists on Medium, where people are continuing the conversation by highlighting and responding to this story.

Amazon EKS (K8s) Media Cluster: Part 4— Pod Auto-Scaling (HPA) and CDN

Chris St. John — Sat, 20 Dec 2025 20:56:57 GMT

🚀 Amazon EKS + CloudFront CDN + Horizontal Pod Autoscaler (HPA), load testing with k6

✅ “Scale pods automatically, test loads, use a CDN, handle traffic spikes, deliver content fast — and have the ability to build infrastructure from scratch using Terraform”

In Part 3, we deployed our video app with 3 replicas for high availability. This was a great demo of pod self-healing, but did not have auto-scaling fully implemented or a CDN.

What happens when traffic spikes in a real-world situation? We’ll experiment with some load testing to see what happens!

Amazon EKS (K8s) Media Cluster: Part 4 — Pod Auto-Scaling (HPA) and CDN

What we’ll work on in this article:

1. CloudFront CDN distribution integrated with your application.
2. Metrics Server installation (required for HPA).
3. Horizontal Pod Autoscaler (HPA) configured for 3–10 pods.
4. Load testing setup using k6 tool.
5. CPU-based autoscaling triggers (70% threshold).
6. Real-time monitoring of pod scaling behavior.
7. Observe Node capacity limits, pending delays.

Goals to achieve:

Generate loads, watch HPA automatically scale pods 3 -> 6 -> 9 -> 10 under a variety of conditions.
Learn how to load test with k6.
See videos delivered faster via CloudFront edge locations worldwide.
Discuss “Pending pods” problem when nodes run out of capacity.
Relationship between pod scaling and node capacity.
Monitor resource utilization in real-time with kubectl top
Identify exactly when and why scaling hits limits.

Skills we’ll flex in this article:

Using eksctl for Amazon EKS.
CloudFront CDN configuration and origin setup.
Metrics Server installation and configuration.
HPA manifest creation with resource targets.
Load testing techniques and tools (k6).
Understanding CPU/memory metrics in Kubernetes.
Troubleshooting Pending pods and resource constraints.
Resource requests and limits concepts.
Node capacity planning calculations.

1. Prerequisites for Part 4

Previous articles in this series you should do first:

✅ PART 1 Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap

✅ PART 2 Amazon EKS (K8s) Media Cluster: Part 2 — Deploy Initial Terraform Multi-AZ EKS Cluster

✅ PART 3 Amazon EKS (K8s) Media Cluster: Part 3 — Self-Healing Video Pods

This is where we are at with infra and what we worked on last article:

🛠️ Get more tips like this at https://www.systemsarchitect.io . 🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

2. Rebuild Cluster (If Destroyed)

We destroyed our cluster with Terraform to avoid charges. If you did this already then follow these instructions to rebuild it.

# Verify AWS CLI profile
export AWS_PROFILE=terraform-eks-admin
aws sts get-caller-identity

# Rebuild infrastructure (~15-20 min)
cd environments/dev
terraform plan
terraform apply

# Reconnect kubectl
aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster

# Verify nodes
kubectl get nodes

# Verify video app is running (from Part 3)
kubectl get pods -n video-app
kubectl get svc -n video-app

cd ./k8s/

# Apply the manifests
kubectl apply -f namespace.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

After you run these commands you should be up and running.

⚠️ Just remember that you are being charged by Amazon AWS for the resources. You must (1) use terraform destroy AND (2) check the AWS console to be sure all resources are removed so you won’t get charged. (recall from previous article, sometimes an error causes resources to not be deleted, so you could still be charged even after running the command.)

🚨Warning: I did notice once my ELB did not get destroyed, even though I used terraform destroy, which then also caused the Internet Gateway to not be destroyed — so double-check.

3. Install eksctl and Metrics Server

3.1 Install eksctl (optional)

Why do we need eksctl? Strictly speaking, we do not need it. However, it is Amazon-context aware (EKS/AWS resources) so it does give us some advantages for installing EKS addons and scaling nodes manually (if needed).

Also you should familiarize yourself with it as an option in these articles. eksctl installs as an EKS-managed addon (AWS handles updates). kubectl installs raw manifests (you manage updates).

The eksctl addon approach requires your cluster to have OIDC provider configured — we did install it earlier, but remember if on another project.

# macOS
brew install eksctl

# Verify
eksctl version

# my output:
0.220.0-dev+3f73c725c.2025-12-01T08:05:49Z

# DO NOT DO NOW< just an Example: 
# Creating IAM role for a service account
# kubectl approach: 5+ steps (create OIDC provider, IAM role, policy, annotate SA...)
# eksctl approach: 1 command
eksctl create iamserviceaccount \
  --cluster eks-video-cluster \
  --name my-app \
  --namespace default \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

I am also going to show you how to use it to install Metrics Server, just for practice.

However, much of the time in this article I will stick with kubectl unless it makes a lot of sense to use eksctl instead!

3.2 Install Metrics Server

Why do we need Metrics Server? HPA needs real-time CPU/memory metrics. Metrics Server collects these from kubelets and exposes them via the Kubernetes API.

Without Metrics Server (or an equivalent implementation), HPA cannot fetch CPU/memory metrics and will show values or fail to scale on resources.

Flow:

Each node’s kubelet (via /metrics/resource endpoint) collects raw container resource usage data.
Metrics Server scrapes this data from kubelets across the cluster.
It aggregates and exposes the metrics through the Kubernetes Resource Metrics API
The HPA controller queries this API to compute current utilization (compared to pod resource requests) and adjust replica counts accordingly.

Use Option 1 install if you want to try the more universal way of doing it.

Use Option 2 install if you want more Amazon-managed options.


# Option 1: Install Metrics Server (kubectl)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Option 2: Install Metrics Server (eksctl)
eksctl create addon \
  --cluster eks-video-cluster \
  --name metrics-server \
  --region us-east-1

# -------------------------------

# output

serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

# Wait for it to be ready
kubectl get deployment metrics-server -n kube-system

# output at 22s

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   0/1     1            0           22s

# output at 43s

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           43s

# Verify metrics are available (may take 1-2 minutes)
kubectl top nodes

# output
NAME                         CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)   
ip-10-0-1-65.ec2.internal    29m          1%       462Mi           32%         
ip-10-0-2-201.ec2.internal   50m          2%       466Mi           32%         
ip-10-0-3-164.ec2.internal   32m          1%       569Mi           39%

kubectl top pods -n video-app

3.3 Troubleshooting

# Check Metrics Server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# output

I1217 20:26:15.541402       1 configmap_cafile_content.go:205] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1217 20:26:15.541423       1 shared_informer.go:350] "Waiting for caches to sync" controller="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1217 20:26:15.541452       1 configmap_cafile_content.go:205] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1217 20:26:15.541469       1 shared_informer.go:350] "Waiting for caches to sync" controller="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1217 20:26:15.542411       1 secure_serving.go:211] Serving securely on [::]:10250
I1217 20:26:15.542544       1 dynamic_serving_content.go:135] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I1217 20:26:15.542657       1 tlsconfig.go:243] "Starting DynamicServingCertificateController"
I1217 20:26:15.642199       1 shared_informer.go:357] "Caches are synced" controller="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1217 20:26:15.642251       1 shared_informer.go:357] "Caches are synced" controller="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1217 20:26:15.642218       1 shared_informer.go:357] "Caches are synced" controller="RequestHeaderAuthRequestController"

# Common fix: Add --kubelet-insecure-tls flag
kubectl patch deployment metrics-server -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

4. Create CloudFront Distribution

Amazon CloudFront is a Content Delivery Network (CDN) that caches and delivers content from edge locations worldwide.

Instead of every user request hitting your EKS cluster directly, CloudFront intercepts requests and serves cached content from the nearest edge location.

This is how the flow works without CloudFront, or on a cache miss:

User → Internet → EKS Load Balancer (us-east-1) →
Pod Latency: ~200–300ms

This is how the flow works with CloudFront cache hit:

User → CloudFront Edge → Cached response Latency: ~20–50ms

4.1 Terraform Configuration

Create file: environments/dev/cloudfront.tf

# =============================================================================
# CLOUDFRONT CDN DISTRIBUTION
# =============================================================================
# CloudFront delivers video content from edge locations worldwide,
# reducing latency and offloading 70-80% of requests from your EKS cluster.
# 
# NOTE: This requires the video-app Kubernetes service to be deployed first!
# =============================================================================

# -----------------------------------------------------------------------------
# Data: Get Load Balancer Hostname
# -----------------------------------------------------------------------------
data "kubernetes_service" "video_app" {
  metadata {
    name      = "video-app"
    namespace = "video-app"
  }

  depends_on = [module.eks]
}

# -----------------------------------------------------------------------------
# Local Variables
# -----------------------------------------------------------------------------
locals {
  # Check if LoadBalancer has been assigned
  lb_hostname = try(
    data.kubernetes_service.video_app.status[0].load_balancer[0].ingress[0].hostname,
    null
  )
  
  # Only create CloudFront if LoadBalancer exists
  create_cloudfront = local.lb_hostname != null
}

# -----------------------------------------------------------------------------
# CloudFront Distribution
# -----------------------------------------------------------------------------
resource "aws_cloudfront_distribution" "video_app" {
  count = local.create_cloudfront ? 1 : 0

  enabled             = true
  is_ipv6_enabled     = true
  comment             = "CDN for EKS Video App - ${var.environment}"
  default_root_object = ""
  price_class         = "PriceClass_100"  # US, Canada, Europe only (cost savings)

  # Origin: Your EKS Load Balancer
  origin {
    domain_name = local.lb_hostname
    origin_id   = "eks-video-app"

    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "http-only"  # LB is HTTP, CloudFront handles HTTPS
      origin_ssl_protocols   = ["TLSv1.2"]
    }
  }

  # Default cache behavior
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD", "OPTIONS"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "eks-video-app"

    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600      # 1 hour
    max_ttl                = 86400     # 24 hours
    compress               = true
  }

  # Cache behavior for video files (longer cache)
  ordered_cache_behavior {
    path_pattern     = "/videos/*"
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "eks-video-app"

    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 86400     # 24 hours for videos
    max_ttl                = 604800    # 7 days
    compress               = false     # Videos are already compressed
  }

  # Cache behavior for API (no cache)
  ordered_cache_behavior {
    path_pattern     = "/api/*"
    allowed_methods  = ["GET", "HEAD", "OPTIONS"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "eks-video-app"

    forwarded_values {
      query_string = true
      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 0         # No caching for API
    max_ttl                = 0
  }

  # Geo restrictions (none)
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  # SSL certificate (default CloudFront cert)
  viewer_certificate {
    cloudfront_default_certificate = true
  }

  tags = local.tags
}

# -----------------------------------------------------------------------------
# Outputs
# -----------------------------------------------------------------------------
output "cloudfront_distribution_id" {
  description = "CloudFront distribution ID"
  value       = try(aws_cloudfront_distribution.video_app[0].id, null)
}

output "cloudfront_domain_name" {
  description = "CloudFront domain name (use this URL!)"
  value       = try(aws_cloudfront_distribution.video_app[0].domain_name, null)
}

output "cloudfront_url" {
  description = "Full CloudFront URL"
  value       = local.create_cloudfront ? "https://${aws_cloudfront_distribution.video_app[0].domain_name}" : "Deploy video-app service first, then re-run terraform apply"
}

output "loadbalancer_hostname" {
  description = "LoadBalancer hostname (if available)"
  value       = local.lb_hostname
}

4.2 Apply CloudFront

cd environments/dev
terraform plan
terraform apply

# Get CloudFront URL
CF_URL=$(terraform output -raw cloudfront_url)
echo "CloudFront URL: $CF_URL"

# output
CloudFront URL: https://xxxxxxxxxxx.cloudfront.net

# Test via CloudFront (may take 5-10 min to deploy)
curl -I $CF_URL/health

#output
HTTP/2 200 
content-type: application/json; charset=utf-8
content-length: 122
x-powered-by: Express
etag: W/"7a-yQMmSaT6LIG0mMdnzDkwYGgreec"
date: Wed, 17 Dec 2025 20:44:45 GMT
x-cache: Miss from cloudfront
via: 1.1 xxxxxxxxx.cloudfront.net (CloudFront)
x-amz-cf-pop: DEN53-P5
x-amz-cf-id: 6K4QNRkctTMnl3YZrKDtCl8wzcqlBr5XHX9ucSXtjnGBuZ4u4sMrkg==

Test with the CloudFront url that you get.

Estimated saving on latency and requests:

The Video latency in the US should decrease by ~50% and in the EU probably more.
Bandwidth costs and requests should also be decreased.

5. Configure HPA (Horizontal Pod Autoscaler)

5.1 Update Deployment Resource Requests

First, ensure your deployment has proper resource requests (HPA needs these):

Update k8s/deployment.yaml resources section:

resources:
  requests:
    cpu: "100m"      # 0.1 CPU - HPA uses this for calculations
    memory: "128Mi"
  limits:
    cpu: "500m"      # 0.5 CPU max
    memory: "256Mi"

Apply IF changed (you may have the above correct already):

kubectl apply -f k8s/deployment.yaml

5.2 Create HPA Manifest

Create file: k8s/hpa.yaml

# =============================================================================
# HORIZONTAL POD AUTOSCALER (HPA)
# =============================================================================
# Automatically scales pods based on CPU utilization.
#
# ECS Comparison:
# - HPA = ECS Service Auto Scaling with Target Tracking
# - Both scale based on metrics (CPU, memory, custom)
# - Key difference: HPA is declarative YAML (GitOps-friendly)
# =============================================================================
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: video-app-hpa
  namespace: video-app
  labels:
    app: video-app
spec:
  # ---------------------------------------------------------------------------
  # Target Deployment
  # ---------------------------------------------------------------------------
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: video-app

  # ---------------------------------------------------------------------------
  # Scaling Bounds
  # ---------------------------------------------------------------------------
  minReplicas: 3   # Match Part 3 baseline (HA)
  maxReplicas: 10  # Allow 3x scaling for traffic spikes

  # ---------------------------------------------------------------------------
  # Scaling Metrics
  # ---------------------------------------------------------------------------
  metrics:
    # Scale based on CPU utilization
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70  # Scale when CPU > 70%

  # ---------------------------------------------------------------------------
  # Scaling Behavior (optional, fine-tuning)
  # ---------------------------------------------------------------------------
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0   # Scale up immediately
      policies:
        - type: Percent
          value: 100                   # Double pods if needed
          periodSeconds: 15
        - type: Pods
          value: 4                     # Or add up to 4 pods
          periodSeconds: 15
      selectPolicy: Max                # Use whichever adds more pods

    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 50                    # Remove up to 50% of pods
          periodSeconds: 60

Automatic pod scaling based on CPU load — Monitors your video-app and automatically adds/removes pods when average CPU crosses 70%, keeping performance consistent during traffic spikes.
Maintains 3–10 pod range — Minimum 3 pods (one per AZ for HA), maximum 10 pods during peak traffic, providing 3x capacity increase while preventing runaway scaling.
Aggressive scale-up for fast response — When CPU hits 70%, immediately doubles capacity or adds up to 4 pods in 15 seconds, preventing overwhelm during sudden viral traffic.
Conservative scale-down prevents flapping — Waits 5 minutes before removing pods and only removes 50% max at a time, avoiding wasteful yo-yo scaling that disrupts service.
Demonstrates node capacity limits — HPA scales pods 3 → 6 → 9 → 10, but around pod 8–9 you’ll hit capacity on your 3 t3.small nodes, causing “Pending” pods — this proves why you need Karpenter to increase nodes (Part 5).

5.3 Apply HPA

# Apply HPA
kubectl apply -f k8s/hpa.yaml

# output 
horizontalpodautoscaler.autoscaling/video-app-hpa created

# Verify HPA is created
kubectl get hpa -n video-app

# output
NAME            REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
video-app-hpa   Deployment/video-app   cpu: 1%/70%   3         10        3          12m

# Watch HPA status
kubectl get hpa video-app-hpa -n video-app --watch

6. Load Testing & Observe Scaling

I switched to a new terminal window — remember to use, every time you open a terminal window, just in case you later want to deploy Terraform or AWS CLI commands:

export AWS_PROFILE=terraform-eks-admin
aws sts get-caller-identity

6.1 Install Load Testing Tool — k6

I originally planned to use hey tool for this, but it’s being deprecated and k6 is more modern and updated and I believe maintained by Grafana Labs.

# macOS
brew install k6

# Linux
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

# Verify
k6 version

# output
k6 v1.4.2 (commit/devel, go1.25.4,

6.2 Get Load Balancer URL

LB_URL=$(kubectl get service video-app -n video-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "Load Balancer URL: http://$LB_URL"

6.3 Open Multiple Terminals

Terminal 1 — HPA Watch:

kubectl get hpa video-app-hpa -n video-app --watch

Terminal 2 — Pod Watch:

kubectl get pods -n video-app --watch

# output
$ kubectl get pods -n video-app --watch
NAME                         READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-g82n9   1/1     Running   0          31m
video-app-6498b5dd57-njqcr   1/1     Running   0          31m
video-app-6498b5dd57-v54wx   1/1     Running   0          31m

Terminal 3 — Resource Usage:

brew install watch # macOS

exec $SHELL

watch -n 2 'kubectl top pods -n video-app'

Terminal 4 — Create Load script and Generate Load:

# I create a script in this directory
k8s/load/load-test.js

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  // Light load (test first)
  // vus: 10,
  // duration: '30s',

  // Heavy load (trigger scaling)
  vus: 100,
  duration: '5m',
};

export default function () {
  const res = http.get(`${__ENV.LB_URL}/api/info`);
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(0.1);
}

Light load

Execute load script

# Set your load balancer URL
export LB_URL=http://$(kubectl get svc video-app -n video-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Light load first (test)
k6 run --vus 10 --duration 30s -e LB_URL=$LB_URL k8s/load/load-test.js

Results

Below is the load test execution, watching the pods and the load results.

Moderate Load

# Moderate load (trigger scaling)
k6 run --vus 100 --duration 5m -e LB_URL=$LB_URL k8s/load/load-test.js

Output result (did not hit scaling wall) but this does show HPA worked as intended.

video-app-hpa   Deployment/video-app   cpu: 117%/70%   3         10        3          36m
video-app-hpa   Deployment/video-app   cpu: 109%/70%   3         10        6          37m
video-app-hpa   Deployment/video-app   cpu: 60%/70%    3         10        6          37m
video-app-hpa   Deployment/video-app   cpu: 53%/70%    3         10        6          37m
video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          37m
video-app-hpa   Deployment/video-app   cpu: 49%/70%    3         10        6          38m
video-app-hpa   Deployment/video-app   cpu: 48%/70%    3         10        6          38m
video-app-hpa   Deployment/video-app   cpu: 52%/70%    3         10        6          38m
video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          38m
video-app-hpa   Deployment/video-app   cpu: 53%/70%    3         10        6          39m
video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          39m
video-app-hpa   Deployment/video-app   cpu: 54%/70%    3         10        6          39m
video-app-hpa   Deployment/video-app   cpu: 51%/70%    3         10        6          39m
video-app-hpa   Deployment/video-app   cpu: 49%/70%    3         10        6          40m
video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          40m
video-app-hpa   Deployment/video-app   cpu: 52%/70%    3         10        6          40m
video-app-hpa   Deployment/video-app   cpu: 51%/70%    3         10        6          40m
video-app-hpa   Deployment/video-app   cpu: 49%/70%    3         10        6          41m
video-app-hpa   Deployment/video-app   cpu: 7%/70%     3         10        6          41m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        6          41m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        6          42m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        5          42m

Results:

Scaling up and down.

Heavier Load

Let’s try one more time so we hit the scaling wall:

k6 run --vus 300 --duration 3m -e LB_URL=$LB_URL k8s/load/load-test.js

Results

kubectl get hpa video-app-hpa -n video-app --watch

NAME            REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
video-app-hpa   Deployment/video-app   cpu: 1%/70%   3         10        8          59m
video-app-hpa   Deployment/video-app   cpu: 45%/70%   3         10        8          60m
video-app-hpa   Deployment/video-app   cpu: 72%/70%   3         10        8          60m
video-app-hpa   Deployment/video-app   cpu: 102%/70%   3         10        8          61m
video-app-hpa   Deployment/video-app   cpu: 106%/70%   3         10        10         61m
video-app-hpa   Deployment/video-app   cpu: 101%/70%   3         10        10         61m
video-app-hpa   Deployment/video-app   cpu: 88%/70%    3         10        10         61m
video-app-hpa   Deployment/video-app   cpu: 86%/70%    3         10        10         62m
video-app-hpa   Deployment/video-app   cpu: 88%/70%    3         10        10         62m
video-app-hpa   Deployment/video-app   cpu: 84%/70%    3         10        10         62m
video-app-hpa   Deployment/video-app   cpu: 88%/70%    3         10        10         63m
video-app-hpa   Deployment/video-app   cpu: 85%/70%    3         10        10         63m
video-app-hpa   Deployment/video-app   cpu: 92%/70%    3         10        10         63m
video-app-hpa   Deployment/video-app   cpu: 87%/70%    3         10        10         63m
video-app-hpa   Deployment/video-app   cpu: 86%/70%    3         10        10         64m
video-app-hpa   Deployment/video-app   cpu: 84%/70%    3         10        10         64m

And

kubectl get pods -n video-app --watch
NAME                         READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-2jzr5   1/1     Running   0          5m57s
video-app-6498b5dd57-2n9dc   1/1     Running   0          5m43s
video-app-6498b5dd57-9w8bv   1/1     Running   0          5m58s
video-app-6498b5dd57-g82n9   1/1     Running   0          71m
video-app-6498b5dd57-njqcr   1/1     Running   0          71m
video-app-6498b5dd57-qqcqj   1/1     Running   0          2m42s
video-app-6498b5dd57-v54wx   1/1     Running   0          71m
video-app-6498b5dd57-xl28k   1/1     Running   0          5m57s
video-app-6498b5dd57-bfmf6   0/1     Pending   0          0s
video-app-6498b5dd57-bfmf6   0/1     Pending   0          0s
video-app-6498b5dd57-xfsf7   0/1     Pending   0          1s
video-app-6498b5dd57-bfmf6   0/1     ContainerCreating   0          1s
video-app-6498b5dd57-xfsf7   0/1     Pending             0          1s
video-app-6498b5dd57-xfsf7   0/1     Running             0          2s
video-app-6498b5dd57-bfmf6   0/1     Running             0          2s
video-app-6498b5dd57-xfsf7   1/1     Running             0          8s
video-app-6498b5dd57-bfmf6   1/1     Running             0          8s

When HPA tries to scale to 10 pods, you’ll see pods stuck in “Pending” state for some amount of time and then when it gets to 10 may be over CPU capacity (seen in other metrics), which is overloaded.

What does this mean?

✅ HPA hit maximum capacity: Scaled to 10 pods (the configured max)
⚠️ CPU still high at 84–92%: Even with 10 pods, system is struggling.
⚠️ Pods were briefly Pending: Pods 9 and 10 showed Pending for a few seconds.
✅ Eventually scheduled: But only because we barely had enough capacity.

One more time — at a higher load!

k6 run --vus 400 --duration 3m -e LB_URL=$LB_URL k8s/load/load-test.js

Let’s see if we can make it stall out.

Check why a pod is stalled

kubectl describe pod video-app-6498b5dd57-xfsf7 -n video-app

Example:

# Check why pod is pending
kubectl describe pod video-app-6498b5dd57-xfsf7 -n video-app

# Look for Events section:
Events:
  Type     Reason            Age   Message
  ----     ------            ----  -------
  Warning  FailedScheduling  30s   0/3 nodes are available:
                                   3 Insufficient cpu,
                                   3 Insufficient memory

Also look at allocatable vs used

# See allocatable vs used
kubectl describe nodes | grep -A 5 "Allocated resources"

# Example output:
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                1800m (90%) 4500m (225%)
  memory             1536Mi (79%) 2304Mi (119%)

The Breaking Point: When HPA isn’t enough

NAME            REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
video-app-hpa   Deployment/video-app   cpu: 1%/70%   3         10        5          70m
video-app-hpa   Deployment/video-app   cpu: 73%/70%   3         10        5          70m
video-app-hpa   Deployment/video-app   cpu: 102%/70%   3         10        5          71m
video-app-hpa   Deployment/video-app   cpu: 137%/70%   3         10        8          71m
video-app-hpa   Deployment/video-app   cpu: 163%/70%   3         10        10         71m
video-app-hpa   Deployment/video-app   cpu: 119%/70%   3         10        10         71m
video-app-hpa   Deployment/video-app   cpu: 113%/70%   3         10        10         72m
video-app-hpa   Deployment/video-app   cpu: 112%/70%   3         10        10         72m
video-app-hpa   Deployment/video-app   cpu: 109%/70%   3         10        10         72m
video-app-hpa   Deployment/video-app   cpu: 106%/70%   3         10        10         73m
video-app-hpa   Deployment/video-app   cpu: 101%/70%   3         10        10         73m
video-app-hpa   Deployment/video-app   cpu: 84%/70%    3         10        10         73m
video-app-hpa   Deployment/video-app   cpu: 24%/70%    3         10        10         73m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        10         74m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        10         78m
video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        5          78m

I ran a second load test with 300 virtual users to really push the system:

The Sequence:

1. CPU spiked from 73% → 102% → 137% → 163%

🚨 2. HPA rapidly scaled: 5 → 8 → 10 pods

3. But CPU stayed above 100% for 3 full minutes

4. System maxed out with no way to scale further

What 163% CPU Means: The math is simple: 163% ÷ 70% = 2.33x overcapacity.

My system needed ~16 pods to properly handle the load.

In the future if you need to save logs you can do:

kubectl get hpa -n video-app -o yaml > hpa-max-capacity-demo.yaml
kubectl top nodes > node-capacity-max.txt
kubectl get pods -n video-app -o wide > pods-during-spike.txt

Why we are doing this?

The bottleneck? Not enough nodes.

We got HPA going, and HPA scales pods, but we are limited to how many pods we can do in a node.

You can manually increase node count in AWS Console or Terraform:

# Update terraform.tfvars
node_desired_size = 5
node_min_size = 3
node_max_size = 6

# Apply
terraform apply

But we want to do it automatically!

In Part 5, we’ll solve this with Karpenter, which will:

Detect pods that can’t be scheduled.
Provision new nodes in 30–60 seconds.
Allow HPA to scale to 15, 20, or even 30 pods.
Automatically remove nodes when traffic drops.
All without manual intervention.

Here is what we’ll work on:

Karpenter installation and configuration
NodePool definitions for Spot and On-Demand instances
Full autoscaling: HPA scales pods, Karpenter scales nodes
Cost optimization with Spot instances (70% savings)
Automatic node consolidation when load drops

8. Cleanup

8.1 Remove HPA and CloudFront

# Delete HPA
kubectl delete -f k8s/hpa.yaml

# Scale back to 3 replicas
kubectl scale deployment video-app -n video-app --replicas=3

8.2 Destroy All Infrastructure

cd environments/dev
terraform destroy

⚠️ Critical Reminder for Removing Resources (Avoid Extra Charges)

Double-check in AWS Console:

EKS cluster deleted.
EC2 instances terminated.
Load balancers removed (⚠️ sometimes does not remove).
CloudFront distribution deleted (may take 15–30 min).
NAT Gateway deleted.
Internet Gateway deleted.
anything else I’m not thinking of 😁

9. Conclusion and Looking Ahead

What we accomplished in this article:

What’s next?

Below is a tentative look at what is next!

Now we are starting to get more advanced!

1. Remove managed node groups (Karpenter replaces them)
2. Install Karpenter using Terraform and Helm
3. Create NodePool resource (defines provisioning rules)
4. Create EC2NodeClass (AMI, subnets, security groups)
5. Test rapid node provisioning (30–60 second spin-up)
6. Combined load test: HPA + Karpenter working together
7. Configure Spot instances for 70% cost savings
8. Observe automatic node consolidation

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next checklist that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Savings: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

Amazon EKS (K8s) Media Cluster: Part 3— Self-Healing Video Pods

Chris St. John — Wed, 17 Dec 2025 02:28:51 GMT

🚀 Amazon EKS + AWS ECR + Docker with self healing multi-AZ pods and kubectl diagnostics

✅ “I need to deploy my video app and Docker image on self-healing Amazon EKS pods with kubectl diagnostics”

In this article Part 3 of this Amazon EKS series, we are going to be deploying the video delivery serving part of the application, container and cluster. Also we will validate that high availability measures are working as expected in different availability zones, and if we lose one.

Review of the last 2 articles which you should do first (we build on them):

In Part 1, the first article of this series, we got many of the prerequisites done for our isolated IAM account, AWS CLI and Terraform.

✅ PART 1 Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap

Then, in Part 2, the last article, we focused on the setting up the VPC and basic Kubernetes resources with Terraform and kubectl so we could connect to our cluster.

✅ PART 2 Amazon EKS (K8s) Media Cluster: Part 2 — Deploy Initial Terraform Multi-AZ EKS Cluster

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

Roadmap for this article:

1. Introduction: What we’re building, prerequisites (5m)
2. Rebuild Cluster: Quick terraform apply to restore (15–20m)
3. Create ECR Repository: Terraform for private container registry (5m)
4. Build the App: Node.js server + HTML5 video player (10m)
5. Docker Build & Push: Containerize and push to ECR (10m)
6. Kubernetes Manifests: Deployment + Service with HA (10m)
7. Deploy to EKS: kubectl apply, verify pods (5m)
8. Access the App: Open browser, watch video! (5m)
9. Explore & Verify HA: Test replicas, health checks (10m)
10. Cleanup: Destroy resources

Time estimate: 45m-60m

Note: We are moving from more basic examples on AWS EKS and K8s to more advanced examples with each article.

This is a simplified video/player server example.
On a higher volume scenario we will decouple the video serving, and we’ll be doing that in later articles! This is as a building block to future parts in the series.

1: Introduction- Goals, Roadmap, Costs

Goals:

ECR Repository: Private container registry for Node.js Docker image
Application: Simple web server in Node.js with HTML5 video player Kubernetes. We’ll create an image with Docker.
Deployment: Replicas spread across availability zones LoadBalancer
Service: Public URL to access the application
Destroy the resources at the end so we do not get charged a lot!

Prerequisites

Completed Articles Part 1 and 2
EKS cluster running (or ready to rebuild with terraform apply) — setup is in Article Part 2 of this series and below first step we rebuild
Docker installed locally (for building the image) — https://docs.docker.com/
Estimated (may vary slightly) ~$1-$2 or less budget for this session ⚠️ if you complete it within the expected 1–2 hours and destroy all AWS resources at that time.
⚠️ IMPORTANT: There will be ongoing charges if you do not remove AWS resources built here. Also charges are higher if you use a legacy version of k8s.
I have isolated my account (see series Part 1 setup) so it’s easy for me to track. As stated in earlier parts of this series use terraform destroy and make sure to double check that all resources were destroyed in

To set up cost guardrails and AWS Budgets alerts see my articles:

This is what we are aiming for here and will experiment with deleting a pod and making sure it’s healed (rebuilt)

Video player

Updated diagram

2. Rebuild the Cluster (If Destroyed)

If you destroyed your cluster after Article 2 (good job saving money!), let’s rebuild it:


# login to to IAM user for the CLI

# if you cannot remember the name of your AWS CLI profile
aws configure list-profiles

# use your tutorial creds setup earlier (I used profile "terraform-eks-admin")
export AWS_PROFILE=terraform-eks-admin

# verify what account you are in - if u=issues see Part 1 article.
aws sts get-caller-identity

# output

{
    "UserId": "xxxxxxxxxxxx",
    "Account": "xxxxxxxxxxxxx",
    "Arn": "arn:aws:iam::xxxxxxxxxxxxx:user/terraform-eks-admin"
}

# cd from your root project dir.
cd environments/dev

# preview our build to make sure it did not change form before
terraform plan

# Rebuild infrastructure (~15-20 minutes)
terraform apply

Make sure you were in environments/dev
After that you did the terraform apply and
it will take 15–20 min.

Once complete, reconnect kubectl (change region if yours is different):

aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster

# output
Updated context ...

Verify nodes are ready:

kubectl get nodes

# output
NAME                             STATUS   ROLES    AGE   VERSION
ip-10-0-1-110.ec2.internal   Ready       10m   v1.34.2-eks-ecaa3a6
ip-10-0-2-6.ec2.internal     Ready       10m   v1.34.2-eks-ecaa3a6
ip-10-0-3-7.ec2.internal     Ready       10m   v1.34.2-eks-ecaa3a6

Perfect, let’s continue!

3. Create ECR Repository

Now we’ll add the ECR repository to our Terraform configuration.

Create new file: environments/dev/ecr.tf

# =============================================================================
# ELASTIC CONTAINER REGISTRY (ECR)
# =============================================================================
# ECR is AWS's private Docker registry. We'll store our video app image here.
#
# ECS Comparison:
# - ECR works exactly the same for both ECS and EKS
# - You push images to ECR, then reference them in task definitions (ECS)
#   or pod specs (EKS)
# =============================================================================

# -----------------------------------------------------------------------------
# ECR Repository for Video App
# -----------------------------------------------------------------------------
resource "aws_ecr_repository" "video_app" {
  name                 = "${var.project_name}-video-app"
  image_tag_mutability = "MUTABLE"  # Allows overwriting tags like "latest"

  # Scan images for vulnerabilities on push
  image_scanning_configuration {
    scan_on_push = true
  }

  # Encryption at rest
  encryption_configuration {
    encryption_type = "AES256"
  }

  tags = local.tags
}

# -----------------------------------------------------------------------------
# ECR Lifecycle Policy
# -----------------------------------------------------------------------------
# Automatically clean up old images to save storage costs
resource "aws_ecr_lifecycle_policy" "video_app" {
  repository = aws_ecr_repository.video_app.name

  policy = jsonencode({
    rules = [
      {
        rulePriority = 1
        description  = "Keep only 10 most recent images"
        selection = {
          tagStatus   = "any"
          countType   = "imageCountMoreThan"
          countNumber = 10
        }
        action = {
          type = "expire"
        }
      }
    ]
  })
}

# -----------------------------------------------------------------------------
# Outputs
# -----------------------------------------------------------------------------
output "ecr_repository_url" {
  description = "URL of the ECR repository"
  value       = aws_ecr_repository.video_app.repository_url
}

output "ecr_repository_name" {
  description = "Name of the ECR repository"
  value       = aws_ecr_repository.video_app.name
}

output "ecr_login_command" {
  description = "Command to authenticate Docker with ECR"
  value       = "aws ecr get-login-password --region ${var.aws_region} | docker login --username AWS --password-stdin ${data.aws_caller_identity.current.account_id}.dkr.ecr.${var.aws_region}.amazonaws.com"
}

output "docker_build_push_commands" {
  description = "Commands to build and push the video app image"
  value       = <<-EOT

    # Navigate to app directory
    cd ~/eks-video-tutorial/app

    # Build the image
    docker build -t ${aws_ecr_repository.video_app.repository_url}:latest .

    # Login to ECR
    aws ecr get-login-password --region ${var.aws_region} | docker login --username AWS --password-stdin ${data.aws_caller_identity.current.account_id}.dkr.ecr.${var.aws_region}.amazonaws.com

    # Push the image
    docker push ${aws_ecr_repository.video_app.repository_url}:latest

  EOT
}

ECR is AWS’s private Docker image registry (like Docker Hub, but a private one for your images)
Stores container images that ECS/EKS pull when running containers
scan_on_push = auto-scans images for security vulnerabilities
Lifecycle: Auto-deletes old images to save storage costs
Keeps only the 10 most recent images; older ones expire

Typical workflow of ECR

Build Docker image locally
Authenticate Docker to ECR (login command)
Push image to ECR
ECS/EKS pulls image from ECR URL when deploying

Apply to create ECR repository:

cd environments/dev
terraform plan

# output
Plan: 2 to add, 0 to change, 0 to destroy.

# apply the changes - yes
terraform apply

# output
ecr_repository_url = ...
ecr_repository_name = "eks-video-tutorial-video-app"

You should see in the Terraform output after you run in your console like this (will vary based on your account numbers and ids).

⚠️ Keep this text we will need it later.

🚨⚠️ CRITICAL: Remember to run terraform destroy to delete these resources when you are done or will you be charged, and this could accrue to $15–20 per day in charges.


# Navigate to app directory
cd ~/eks-video-tutorial/app

# Build the image
docker build -t xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app:latest .

# Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com

# Push the image
docker push xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app:latest


EOT
ecr_login_command = "aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com"
ecr_repository_name = "eks-video-tutorial-video-app"
ecr_repository_url = "xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app"
get_nodes_command = "kubectl get nodes -o wide"
get_pods_command = "kubectl get pods -A"

4. Build the Node.js Application

Let’s build the Node.js app for the video streaming player.

This app will be a simple Express server that serves video content with an HTML5 player, and it will include health checks, static file serving, and basic streaming.

# from root directory for your project
mkdir -p app/public/videos
cd app

4.1 Package.json

Create app/package.json

{
  "name": "eks-video-app",
  "version": "1.0.0",
  "description": "Simple video streaming app for EKS tutorial",
  "main": "server.js",
  "scripts": {
    "start": "node server.js"
  },
  "dependencies": {
    "express": "^4.21.0"
  },
  "engines": {
    "node": ">=20.0.0"
  }
}

4.2 Server.js (Main Application)

Create file: app/server.js

// =============================================================================
// EKS VIDEO STREAMING APP
// =============================================================================
// A simple Express server that serves video content with an HTML5 player.
// Demonstrates: health checks, static file serving, and basic streaming.
// =============================================================================

const express = require('express');
const path = require('path');
const os = require('os');

const app = express();
const PORT = process.env.PORT || 3000;

// -----------------------------------------------------------------------------
// Middleware
// -----------------------------------------------------------------------------

// Serve static files from 'public' directory
app.use(express.static(path.join(__dirname, 'public')));

// Request logging
app.use((req, res, next) => {
  const timestamp = new Date().toISOString();
  console.log(`[${timestamp}] ${req.method} ${req.path} - ${req.ip}`);
  next();
});

// -----------------------------------------------------------------------------
// Health Check Endpoints
// -----------------------------------------------------------------------------
// These are CRITICAL for Kubernetes!
// - /health: Used by both liveness and readiness probes
// - /ready: Could be used for more complex readiness logic

// Basic health check - Kubernetes uses this to know if the pod is alive
app.get('/health', (req, res) => {
  res.status(200).json({
    status: 'healthy',
    timestamp: new Date().toISOString(),
    hostname: os.hostname(),
    uptime: process.uptime()
  });
});

// Readiness check - could include dependency checks in production
app.get('/ready', (req, res) => {
  // In production, you might check:
  // - Database connectivity
  // - External service availability
  // - Required files exist
  res.status(200).json({
    status: 'ready',
    timestamp: new Date().toISOString()
  });
});

// -----------------------------------------------------------------------------
// API Endpoints
// -----------------------------------------------------------------------------

// Server info endpoint - useful for debugging which pod you're hitting
app.get('/api/info', (req, res) => {
  res.json({
    app: 'eks-video-app',
    version: '1.0.0',
    hostname: os.hostname(),
    platform: os.platform(),
    nodeVersion: process.version,
    environment: process.env.NODE_ENV || 'development',
    podName: process.env.POD_NAME || os.hostname(),
    nodeName: process.env.NODE_NAME || 'unknown',
    timestamp: new Date().toISOString()
  });
});

// List available videos
app.get('/api/videos', (req, res) => {
  res.json({
    videos: [
      {
        id: 'sample',
        title: 'Sample Video',
        description: 'A sample video for testing the EKS video streaming platform',
        url: '/videos/sample.mp4',
        thumbnail: '/images/thumbnail.png'
      }
    ]
  });
});

// -----------------------------------------------------------------------------
// Main Page
// -----------------------------------------------------------------------------

app.get('/', (req, res) => {
  res.sendFile(path.join(__dirname, 'public', 'index.html'));
});

// -----------------------------------------------------------------------------
// Error Handling
// -----------------------------------------------------------------------------

// 404 handler
app.use((req, res) => {
  res.status(404).json({
    error: 'Not Found',
    path: req.path
  });
});

// Error handler
app.use((err, req, res, next) => {
  console.error('Error:', err);
  res.status(500).json({
    error: 'Internal Server Error',
    message: err.message
  });
});

// -----------------------------------------------------------------------------
// Start Server
// -----------------------------------------------------------------------------

app.listen(PORT, '0.0.0.0', () => {
  console.log('='.repeat(60));
  console.log('EKS VIDEO STREAMING APP');
  console.log('='.repeat(60));
  console.log(`Server running on port ${PORT}`);
  console.log(`Hostname: ${os.hostname()}`);
  console.log(`Node.js: ${process.version}`);
  console.log(`Started: ${new Date().toISOString()}`);
  console.log('='.repeat(60));
  console.log('Endpoints:');
  console.log(`  - GET /          : Video player UI`);
  console.log(`  - GET /health    : Health check (liveness)`);
  console.log(`  - GET /ready     : Readiness check`);
  console.log(`  - GET /api/info  : Server info`);
  console.log(`  - GET /api/videos: List videos`);
  console.log('='.repeat(60));
});

// Graceful shutdown
process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully...');
  process.exit(0);
});

process.on('SIGINT', () => {
  console.log('SIGINT received, shutting down gracefully...');
  process.exit(0);
});

4.3 HTML Video Player

Create file: File: app/public/index.html




    
    
    EKS Video Streaming Platform
    


    
        
            🎬 EKS Video Streaming

            A Kubernetes-powered video platform running on Amazon EKS

            Article 3 - First Deployment
        


        
            Sample Video HA Enabled

            
                
                    
                    Your browser does not support the video tag.
                
            

        


        
            
                🖥️ Server Info (Pod Details)

                Loading server info...

            


            
                🔷 High Availability

                
                    This app runs with 3 replicas spread across multiple 
                    availability zones. Refresh the page multiple times and watch the 
                    hostname change - you're being load balanced between pods!
                

            


            
                ❤️ Health Checks

                
                    Kubernetes monitors this app using liveness and 
                    readiness probes. If this pod becomes unhealthy, 
                    Kubernetes automatically restarts it or stops sending traffic.
                

            


            
                📊 ECS Comparison

                
                    In ECS, you'd use a Task Definition + Service. Here, we use a 
                    Deployment (combines both concepts) and a 
                    Service for load balancing. Same outcome, K8s vocabulary!
                

            

        


        
            
                Built with ❤️ for the 
                EKS Video Tutorial Series
            

            
                Running on Amazon EKS • Node.js • Kubernetes

4.4 Download a Sample Video

Go to the public video directory we made and download a sample video — this url works right now .

Source, no license required “Below are sample videos available for download with no license restrictions.”: https://samplelib.com/sample-mp4.html — OR, use another one of your own videos.

cd app/public/videos

# download sample video or use your own
curl -L -o sample.mp4 "https://download.samplelib.com/mp4/sample-5s.mp4"

You may need a placeholder image

mkdir -p app/public/images

# I created an image with https://placehold.co/600x400 and saved:
app/public/images/placeholder.png

5. Dockerfile and Build

Now we will do a multi-stage docker build for a smaller, more secure image.

Create file: app/Dockerfile

# =============================================================================
# DOCKERFILE - EKS Video Streaming App
# =============================================================================
# Multi-stage build for a smaller, more secure image
# =============================================================================

# -----------------------------------------------------------------------------
# Stage 1: Dependencies
# -----------------------------------------------------------------------------
FROM node:22-alpine AS deps

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install production dependencies only
RUN npm ci --omit=dev

# -----------------------------------------------------------------------------
# Stage 2: Production
# -----------------------------------------------------------------------------
FROM node:22-alpine AS production

# Add labels for image metadata
LABEL maintainer="EKS Tutorial"
LABEL description="Video streaming app for EKS tutorial"
LABEL version="1.0.0"

# Create non-root user for security
RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup

WORKDIR /app

# Copy dependencies from deps stage
COPY --from=deps /app/node_modules ./node_modules

# Copy application code
COPY --chown=appuser:appgroup . .

# Switch to non-root user
USER appuser

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Start the application
CMD ["node", "server.js"]

Key points of this:

Multi-stage build — Uses two stages (deps → production) to keep the final image small by only copying what's needed
Alpine base — Uses node:22-alpine, a minimal Linux distro (~5MB vs ~900MB for full images)
Dependency isolation — Stage 1 installs only production deps (RUN npm ci — omit=dev), excluding dev dependencies
Non-root user — Creates appuser instead of running as root — limits damage if container is compromised
Health check — Kubernetes/ECS can auto-detect if the app is unhealthy by hitting /health every 30s and restart if needed
.dockerignore — Excludes unnecessary files (node_modules, .git, docs) from the build context, making builds faster and images smaller
EXPOSE 3000 + CMD — Documents the port and starts the Node.js server

5.2 Dockerignore

And also we need a Dockerignore file at

app/.dockerignore

node_modules
npm-debug.log
Dockerfile
.dockerignore
.git
.gitignore
README.md
.env
*.md

5.3 Build and Push to ECR

cd app/

cd environments/dev

# get ecr url

ECR_URL=$(terraform output -raw ecr_repository_url)
echo "ECR URL: $ECR_URL"

# Output:
ECR URL: xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app

cd ../../app/

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URL

⚠️note: At this point I got an error say my Docker client was slightly outdated, it was confused with another version I had previously installed. I installed the latest Docker and removed the old one.

docker --version

# output
Docker version 24.0.7

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URL

# output
Login Succeeded

# Build the image (from app/)
docker build -t $ECR_URL:latest -t $ECR_URL:v1.0.0 .

# ONLY if error during build you may need to do, from app/ 
npm i

# Push the image
docker push $ECR_URL:latest
docker push $ECR_URL:v1.0.0

# Output expected
[+] Building 15.3s (12/12) FINISHED
 => [internal] load build definition from Dockerfile
 => [internal] load .dockerignore
 => [deps 1/3] FROM docker.io/library/node:22-alpine

etc.

This should have pushed your image to ECR and now we can use it with our AWS EKS cluster 🚀

Result:

And:

6. Kubernetes Manifests

Now we need to create a K8s manifest — make the k8s directory at the top level of our app (same level as app directory)

# make this k8s dir at the top level of our app

mkdir -p k8s
cd k8s

6.1 Namespace

Create file: k8s/namespace.yaml

# =============================================================================
# NAMESPACE
# =============================================================================
# Namespaces provide isolation and organization for Kubernetes resources.
# Similar to ECS clusters providing logical grouping.
# =============================================================================
apiVersion: v1
kind: Namespace
metadata:
  name: video-app
  labels:
    app: video-app
    environment: dev
    project: eks-video-tutorial

6.2 Namespace Resource Management (Optional)

Namespaces provide isolation, but without limits, one application can consume all cluster resources. ResourceQuota and LimitRange prevent this.

When I was reviewing the full article after originally writing it, I realized this would be a good subtopic that I had missed — so I have added it now, but it is purely optional.
For this tutorial Part 4, ResourceQuota and LimitRange are optional. Our video-app deployment already specifies some resource requests/limits. However there are team/collaboration benefits to breaking these out.

Problems to solve:

Pod w/o limits consumes all node memory (LimitRange sets defaults)
One namespace monopolizes cluster (ResourceQuota caps total usage)
Container OOMKilled unexpectedly (LimitRange enforces min memory)
Cost runaway from too many pods (ResourceQuota limits pod count)

ResourceQuota

Limits total resources a namespace can consume. Think of it as a budget for the namespace.

Create file: k8s/resourcequota.yaml

# =============================================================================
# RESOURCEQUOTA
# =============================================================================
# Sets hard limits on total resources consumed by all pods in a namespace.
#
# ECS Comparison:
# - Similar to Service Quotas or capacity provider limits
# - Prevents one service from monopolizing cluster capacity
# =============================================================================
apiVersion: v1
kind: ResourceQuota
metadata:
  name: video-app-quota
  namespace: video-app
spec:
  hard:
    # Compute limits
    requests.cpu: "2"           # Total CPU requests across all pods
    requests.memory: 2Gi        # Total memory requests
    limits.cpu: "4"             # Total CPU limits
    limits.memory: 4Gi          # Total memory limits

    # Object count limits
    pods: "20"                  # Max pods in namespace
    services: "5"               # Max services
    persistentvolumeclaims: "5" # Max PVCs

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

LimitRange

Sets default and per-container limits. Ensures every container has resource constraints.

Create file: k8s/limitrange.yaml

# =============================================================================
# LIMITRANGE
# =============================================================================
# Sets default resource requests/limits for containers that don't specify them.
# Also enforces min/max constraints per container.
#
# ECS Comparison:
# - Similar to Task Definition CPU/memory settings
# - LimitRange provides defaults; ResourceQuota caps the total
# =============================================================================
apiVersion: v1
kind: LimitRange
metadata:
  name: video-app-limits
  namespace: video-app
spec:
  limits:
    - type: Container
      # Defaults applied when container doesn't specify
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      # Hard constraints per container
      max:
        cpu: "2"
        memory: 2Gi
      min:
        cpu: 50m
        memory: 64Mi

Apply Resource Controls

# Apply quota and limits (after namespace exists)
kubectl apply -f k8s/resourcequota.yaml
kubectl apply -f k8s/limitrange.yaml

# Verify
kubectl describe resourcequota video-app-quota -n video-app
kubectl describe limitrange video-app-limits -n video-app

6.3 Deployment

Create file: k8s/deployment.yaml

# =============================================================================
# DEPLOYMENT
# =============================================================================
# A Deployment manages ReplicaSets and provides declarative updates for Pods.
#
# ECS Comparison:
# - Deployment ≈ ECS Service + Task Definition combined
# - replicas ≈ desired count in ECS Service
# - template ≈ Task Definition (container specs)
# - Kubernetes handles rolling updates automatically (like ECS deployments)
# =============================================================================
apiVersion: apps/v1
kind: Deployment
metadata:
  name: video-app
  namespace: video-app
  labels:
    app: video-app
    version: v1.0.0
spec:
  # ---------------------------------------------------------------------------
  # Replica Configuration
  # ---------------------------------------------------------------------------
  # HA Lesson: Always run at least 2-3 replicas in production!
  # A single pod is a single point of failure.
  replicas: 3

  # ---------------------------------------------------------------------------
  # Update Strategy
  # ---------------------------------------------------------------------------
  # RollingUpdate ensures zero-downtime deployments
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Allow 1 extra pod during update
      maxUnavailable: 0  # Never reduce below desired count

  # ---------------------------------------------------------------------------
  # Pod Selector
  # ---------------------------------------------------------------------------
  selector:
    matchLabels:
      app: video-app

  # ---------------------------------------------------------------------------
  # Pod Template
  # ---------------------------------------------------------------------------
  template:
    metadata:
      labels:
        app: video-app
        version: v1.0.0
      annotations:
        # Force pod restart on config changes (optional)
        kubectl.kubernetes.io/restartedAt: ""
    spec:
      # -----------------------------------------------------------------------
      # Topology Spread Constraints (HA)
      # -----------------------------------------------------------------------
      # Spread pods across availability zones for high availability
      # If one AZ fails, pods in other AZs continue serving traffic
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: video-app

      # -----------------------------------------------------------------------
      # Containers
      # -----------------------------------------------------------------------
      containers:
        - name: video-app
          # IMPORTANT: Replace with your ECR URL!
          # Run: terraform output ecr_repository_url
          image: REPLACE_WITH_ECR_URL:latest
          imagePullPolicy: Always

          # Port configuration
          ports:
            - name: http
              containerPort: 3000
              protocol: TCP

          # -------------------------------------------------------------------
          # Environment Variables
          # -------------------------------------------------------------------
          env:
            - name: NODE_ENV
              value: "production"
            - name: PORT
              value: "3000"
            # Inject pod name for debugging (see /api/info endpoint)
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            # Inject node name to see which node the pod runs on
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName

          # -------------------------------------------------------------------
          # Resource Limits
          # -------------------------------------------------------------------
          # Always set resource requests and limits!
          # Requests: Guaranteed resources for scheduling
          # Limits: Maximum resources the container can use
          resources:
            requests:
              cpu: "100m"      # 0.1 CPU cores
              memory: "128Mi"  # 128 MB RAM
            limits:
              cpu: "500m"      # 0.5 CPU cores max
              memory: "256Mi"  # 256 MB RAM max

          # -------------------------------------------------------------------
          # Health Checks (CRITICAL for HA!)
          # -------------------------------------------------------------------
          
          # Readiness Probe: Is the pod ready to receive traffic?
          # Kubernetes only sends traffic to pods that pass this check
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5   # Wait 5s before first check
            periodSeconds: 5         # Check every 5s
            timeoutSeconds: 3        # Timeout after 3s
            successThreshold: 1      # 1 success = ready
            failureThreshold: 3      # 3 failures = not ready

          # Liveness Probe: Is the pod alive and functioning?
          # Kubernetes restarts pods that fail this check
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15  # Wait 15s before first check
            periodSeconds: 20        # Check every 20s
            timeoutSeconds: 3        # Timeout after 3s
            successThreshold: 1      # 1 success = alive
            failureThreshold: 3      # 3 failures = restart pod

          # -------------------------------------------------------------------
          # Security Context
          # -------------------------------------------------------------------
          securityContext:
            readOnlyRootFilesystem: false  # App needs to write logs
            runAsNonRoot: true
            runAsUser: 1001
            allowPrivilegeEscalation: false

      # -----------------------------------------------------------------------
      # Pod-level Settings
      # -----------------------------------------------------------------------
      
      # Termination grace period - time to finish requests before killing pod
      terminationGracePeriodSeconds: 30

      # Restart policy (always restart failed containers)
      restartPolicy: Always

      # DNS policy
      dnsPolicy: ClusterFirst

6.4 Service (LoadBalancer)

Create file: k8s/service.yaml

# =============================================================================
# SERVICE (LoadBalancer)
# =============================================================================
# A Service provides a stable endpoint for accessing pods.
# LoadBalancer type creates an AWS Network Load Balancer (NLB) automatically.
#
# ECS Comparison:
# - Service ≈ ECS Service with ALB/NLB integration
# - LoadBalancer type ≈ Attaching a load balancer to ECS Service
# - Kubernetes automatically registers/deregisters pod IPs
# =============================================================================
apiVersion: v1
kind: Service
metadata:
  name: video-app
  namespace: video-app
  labels:
    app: video-app
  annotations:
    # AWS-specific annotations for NLB
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer

  # Traffic policy - Local means traffic stays on the same node when possible
  # This can improve performance and preserve client IP
  externalTrafficPolicy: Cluster

  # Port configuration
  ports:
    - name: http
      port: 80           # External port (what users access)
      targetPort: 3000   # Container port (where app listens)
      protocol: TCP

  # Pod selector - matches pods with these labels
  selector:
    app: video-app

6.5 Update Deployment with Your ECR URL

Before deploying, update the image URL in deployment.yaml:

# Get your ECR URL
cd ../environments/dev
ECR_URL=$(terraform output -raw ecr_repository_url)
echo "Your ECR URL: $ECR_URL"

# output
Your ECR URL:  xxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app

# Update the deployment file
cd ../../k8s

# manually edit deployment.yaml and replace 
# REPLACE_WITH_ECR_URL with your actual ECR URL from above.
# replace in the .bak too

# So your deployment.yaml and .bak files should say something like:
      containers:
        - name: video-app
          # IMPORTANT: Replace with your ECR URL!
          # Run: terraform output ecr_repository_url
          image: xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app:latest

7. Deploy to EKS

Now you should be still in the k8s directory, if not, get there


# Apply namespace first
kubectl apply -f namespace.yaml

# output
namespace/video-app created

# Apply deployment
kubectl apply -f deployment.yaml

# output
deployment.apps/video-app created

# Apply service
kubectl apply -f service.yaml

# output
service/video-app created

7.1 Verify Deployment

Check pods:

kubectl get pods -n video-app -o wide

# output
NAME                       READY   STATUS             RESTARTS      AGE   IP           NODE                         NOMINATED NODE   READINESS GATES
video-app-6498b5dd57-ckvwn   1/1     Running   0          16s   10.0.2.176   ip-10-0-2-6.ec2.internal                
video-app-6498b5dd57-q6jrj   1/1     Running   0          16s   10.0.1.112   ip-10-0-1-110.ec2.internal

Check pods are in different AZs:

kubectl get pods -n video-app -o wide --show-labels | grep -E "NAME|video-app"

# output

NAME                       READY   STATUS             RESTARTS      AGE    IP           NODE                         NOMINATED NODE   READINESS GATES   LABELS
video-app-6498b5dd57-ckvwn   1/1     Running   0          4m47s   10.0.2.176   ip-10-0-2-6.ec2.internal                            app=video-app,pod-template-hash=6498b5dd57,version=v1.0.0
video-app-6498b5dd57-g5b8t   0/1     Running   0          8s      10.0.3.145   ip-10-0-3-7.ec2.internal                            app=video-app,pod-template-hash=6498b5dd57,version=v1.0.0
video-app-6498b5dd57-q6jrj   1/1     Running   0          4m47s   10.0.1.112   ip-10-0-1-110.ec2.internal                          app=video-app,pod-template-hash=6498b5dd57,version=v1.0.0

Check service:

kubectl get service -n video-app

# output
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)        AGE
video-app   LoadBalancer   172.20.xxx.xx   xxxxxxxxxxxx.elb.us-east-1.amazonaws.com   80:30774/TCP   5m9s

You can do watch mode (Press Ctrl+C to stop watching):

kubectl get service -n video-app -w

8. Access Your Application!

8.1 Test Load Balancing

Get the Load Balancer URL:

kubectl get service video-app -n video-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'

⚠️Error Fix — Only needed if your image was broken, if not, skip this

⚠️ At this point I ran into an error. You may or may not depending on where you built the local Docker image.

Although the pod was running the Docker image did not build correctly, because of my local MacOS Apple Silicon chip.

It was an easy fix to simply

cd ../app

# Get your ECR URL
ECR_URL=$(terraform -chdir=../environments/dev output -raw ecr_repository_url)

# Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URL

# Build for AMD64 (Linux x86_64) - this is what EKS nodes run
docker buildx build --platform linux/amd64 -t $ECR_URL:latest -t $ECR_URL:v1.0.0 --push .

Note: The --push flag at the end of the last line builds and pushes in one command.

# Delete existing pods (deployment will recreate them)
kubectl delete pods -n video-app -l app=video-app

pod "video-app-6498b5dd57-ckvwn" deleted from video-app namespace
pod "video-app-6498b5dd57-g5b8t" deleted from video-app namespace
pod "video-app-6498b5dd57-q6jrj" deleted from video-app namespace

# Watch new pods start
kubectl get pods -n video-app -w

Verify:

# Check pods are running
kubectl get pods -n video-app

video-app-6498b5dd57-c8k97   1/1     Running   0          22s
video-app-6498b5dd57-nx65r   1/1     Running   0          23s
video-app-6498b5dd57-s4tj4   1/1     Running   0          23s

# Check logs - should see the startup message now and/or GET /health
kubectl logs -n video-app -l app=video-app

# Test with port-forward
POD_NAME=$(kubectl get pods -n video-app -o jsonpath='{.items[0].metadata.name}')

kubectl port-forward -n video-app $POD_NAME 8080:3000

note: the last command is should lock that window until you do CTRL-C you want that process running.

Then open a new terminal window:

curl http://localhost:8080/health

# output
{"status":"healthy","timestamp":"2025-12-14T21:22:58.782Z","hostname":"video-app-6498b5dd57-c8k97","uptime":141.854259452}

9. Verify High Availability

Part of the reason we are doing this tutorial series is to demonstrate high availability cloud engineering skills, and we will be doing a lot more soon, now that we have basic infra.

Let’s first verify our load balance is set up properly:

# Set the variable
LB_URL=$(kubectl get service video-app -n video-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Verify it's set
echo "Load Balancer URL: $LB_URL"

9.1 Check Pod Distribution

Now lets make sure we’re in several Availability Zones (AZ) so if there is an outage in one AZ, we can divert traffic to another AZ.

# Show which AZ each pod is in
kubectl get nodes --show-labels | grep topology.kubernetes.io/zone

# This will output verbose info with the AZ at the end of each entry

9.2 Check Health Endpoints

Next, we need to have proper health checks, or else how do we know the AZ went down?

# Health check
curl http://$LB_URL/health

# output
"status":"healthy","timestamp":"2025-12-14T21:33:06.048Z","hostname":"video-app-6498b5dd57-nx65r","uptime":749.358247182}

# Readiness check
curl http://$LB_URL/ready

# output
{"status":"ready","timestamp":"2025-12-14T21:33:21.120Z"}

# Server info
curl http://$LB_URL/api/info

# output
{"app":"eks-video-app","version":"1.0.0","hostname":"video-app-6498b5dd57-c8k97","platform":"linux","nodeVersion":"v22.21.1","environment":"production","podName":"video-app-6498b5dd57-c8k97","nodeName":"ip-10-0-1-110.ec2.internal","timestamp":"2025-12-14T21:33:32.040Z"}

9.3 Test Pod Recovery

Finally, we are ready for a basic HA test!

Delete one pod and watch Kubernetes recreate it:

# Get pod names
kubectl get pods -n video-app

NAME                       READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-c8k97   1/1     Running   0          13m
video-app-6498b5dd57-nx65r   1/1     Running   0          13m
video-app-6498b5dd57-s4tj4   1/1     Running   0          13m

9.4 Delete one pod (replace with actual pod name)

kubectl delete pod  -n video-app

pod "video-app-6498b5dd57-c8k97" deleted from video-app namespace

9.5 Watch it get recreated immediately:

kubectl get pods -n video-app -w

NAME                       READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-nx65r   1/1     Running   0          14m
video-app-6498b5dd57-s4tj4   1/1     Running   0          14m
video-app-6498b5dd57-w8l9m   1/1     Running   0          22s

9.6 Kubernetes’ automatic self-healing:

I deleted pod video-app-6498b5dd57-c8k97
The Deployment controller noticed the replica count dropped from 3 to 2
Kubernetes immediately created a new pod video-app-6498b5dd57-w8l9m to maintain the desired state of 3 replicas
Total time: ~22 seconds from deletion to new pod running

9.7 View the video


cd ../environments/dev/

LB_URL=$(kubectl get service video-app -n video-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

echo "Open in browser: http://$LB_URL"

Open in browser: http://xxxxxxxxxxxx.elb.us-east-1.amazonaws.com

9.8 Interactive Scaling Tests

# Scale up/down (watch self-healing)
kubectl scale deployment video-app -n video-app --replicas=5
kubectl get pods -n video-app -w

# output
NAME                         READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-m2l5f   0/1     Running   0          7s
video-app-6498b5dd57-nx65r   1/1     Running   0          28m
video-app-6498b5dd57-qz95b   0/1     Running   0          7s
video-app-6498b5dd57-s4tj4   1/1     Running   0          28m
video-app-6498b5dd57-w8l9m   1/1     Running   0          14m
video-app-6498b5dd57-m2l5f   1/1     Running   0          7s
video-app-6498b5dd57-qz95b   1/1     Running   0          8s

# Scale back down
kubectl scale deployment video-app -n video-app --replicas=3

# output
NAME                         READY   STATUS    RESTARTS   AGE
video-app-6498b5dd57-nx65r   1/1     Running   0          29m
video-app-6498b5dd57-s4tj4   1/1     Running   0          29m
video-app-6498b5dd57-w8l9m   1/1     Running   0          15m

9.9 Additional kubectl Commands

🔍 Cluster-Wide Diagnostics

# See all nodes with resource usage
kubectl top nodes

# Get detailed node info (capacity, allocatable resources, conditions)
kubectl describe nodes

# View all namespaces and what's running
kubectl get all --all-namespaces

# Check cluster events (shows recent issues/warnings)
kubectl get events --all-namespaces --sort-by='.lastTimestamp'

📦 Pod Diagnostics

# See resource usage for your pods
kubectl top pods -n video-app

# Get detailed pod info (events, conditions, resource limits)
kubectl describe pod  -n video-app

# View pod logs (live tail)
kubectl logs -f  -n video-app

# Get logs from all pods with a label
kubectl logs -l app=video-app -n video-app --tail=50

# Previous container logs (if pod restarted)
kubectl logs  -n video-app --previous

# Execute commands inside a pod (interactive shell)
kubectl exec -it  -n video-app -- /bin/sh

🌐 Service & Networking

# See all services and their endpoints
kubectl get svc --all-namespaces

# Check which pods are behind your service
kubectl get endpoints video-app -n video-app

# Describe your LoadBalancer service (shows events, selectors)
kubectl describe svc video-app -n video-app

# Port-forward to test a pod directly (bypasses LoadBalancer)
kubectl port-forward  -n video-app 8080:3000

🎯 Deployment & ReplicaSet

# See deployment status and history
kubectl rollout status deployment/video-app -n video-app
kubectl rollout history deployment/video-app -n video-app

# See the ReplicaSet managing your pods
kubectl get rs -n video-app

# Check HPA (Horizontal Pod Autoscaler) if you had one
kubectl get hpa -n video-app

🔐 RBAC & Security

# Check what permissions you have
kubectl auth can-i --list -n video-app

# View secrets (names only for security)
kubectl get secrets -n video-app

📊 Resource Quotas & Limits

# Check if there are resource quotas
kubectl get resourcequota -n video-app

# See limit ranges
kubectl get limitrange -n video-app

# View pod resource requests and limits
kubectl describe pod  -n video-app | grep -A 5 "Limits:\|Requests:"

10. Cleanup

# Delete the video app
kubectl delete -f ~/eks-video-tutorial/k8s/

# Destroy all infrastructure
# if not in this directory still cd
cd environments/dev 

terraform destroy

🚨⚠️ CRITICAL: Double-check in AWS console to make sure all resources are destroyed (especially EKS cluster, ELB, EC2 instances) using terraform destroy! There are cases where people didn’t see an error and it was not destroyed, and get a big bill later, so make sure! If you neglect to destroy this, there will be ongoing charges, which could be as much as $10-$20/per day.

Looking Ahead…

Part 4: ✅ “I need my Amazon EKS cluster to handle traffic spikes and variable traffic loads”

In Article 4, we’ll continue with more measures for improved AWS EKS cluster admin and video delivery with high availability:

CloudFront CDN distribution integrated with your application
Metrics Server installation (required for HPA)
Horizontal Pod Autoscaler (HPA) configured for 3–10 pods
Load testing setup using `hey` tool
CPU-based autoscaling triggers (70% threshold)
Real-time monitoring of pod scaling behavior
Experience the “Pending pods” problem (node capacity limits)

🛠️ Get more tips like this at https://www.systemsarchitect.io .

🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Savings: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

Amazon EKS (K8s) Media Cluster: Part 2— Deploy Initial Terraform Multi-AZ EKS Cluster

Chris St. John — Mon, 15 Dec 2025 23:34:58 GMT

🚀Amazon EKS + Terraform, Kubernetes high availability prep for serving media from scalable clusters

✅ “I need to deploy my basic Terraform state/structure and an Amazon EKS platform so we can prepare for a high availability media cluster”

This is Part 2 of a fun ongoing project to advance with Amazon EKS skills as a Cloud Engineer pro:

Master skills for production-grade EKS at scale (the #1 way companies run Kubernetes going into 2026)
Deep Infrastructure as Code with Terraform (serious companies use it)
Learn multi-AZ high availability the right way (VPC, subnets, load balancers, node placement)

Possible future article topics (I’m still preparing these):

Understand EKS-optimized node groups, Karpenter vs Cluster Autoscaler trade-offs
Work with EKS Pod Identity (Agent) — critical for secure media apps talking to S3, DynamoDB, CloudFront, etc.
Build muscle memory: kubectl, helm, kustomize, eksctl, AWS CLI daily
Set up monitoring & logging foundations (CloudWatch, Prometheus)
Get comfortable with ALB/NLB Ingress, cert-manager, external-dns — exactly what media sites need
We may look later at GPU workloads (video encoding/transcoding nodes) and high-IOPS storage (EBS gp3, EFS for shared media)
If we have time may even try some other things like Amazon EKS Capabilities “a layered set of fully managed cluster features that help accelerate developer velocity” .

If you missed Part 1, you do need to do the instructions in that article before you start this. I will refer to that config multiple times.

✅ PART 1 Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap

It includes setup of AWS CLI, Terraform, kubectl (K8s), Docker and ECS info and some testing .

A view from later in the article after we deployed

🪧Roadmap

In this article, Article 2, we’ll build real Amazon EKS infrastructure with Terraform:

Project Setup. Create folder structure (10m)
Terraform backend. S3 bucket + DynamoDB for state locking (10m)
Build a multi-AZ VPC. Subnets across 3 availability zones (10m)
Provision EKS cluster. Control plane + managed node group (10m)
Deploy first node group. 2 × t3.small instances spread across AZs. (5m)
Connect kubectl to cluster. Run kubectl get nodes and see nodes! (5m)
Cleanup. Destroy all resources to avoid charges. (5m)

Keep in mind, we are still in the early stages, we will get more advanced as the series continues.

🔥 What you can do after this part:

Run kubectl get nodes and see your 3 worker nodes
Understand Multi-AZ high availability architecture
Deploy resources to your cluster using kubectl
Rebuild entire infrastructure in minutes with terraform apply

⚡️Technical skills gained:

Terraform state management
VPC networking (CIDR blocks, subnets, routing)
EKS cluster configuration
Node group management
kubectl authentication setup

Estimated time: 45–60 minutes hands-on

💰Estimated cost: ~$1, as of writing this article, it’s about $1 or less if you complete and destroy it right away, or ~$1–2 within a few hours. Amazon EKS nodes are standard Amazon EC2 instances and load balancer, so you pay for those hourly, and also the EKS managed service.

⚠️ IMPORTANT: Cost variables to be aware of

To set up cost guardrails and AWS Budgets alerts see my articles:

1. I initially used an earlier version of Kubernetes that was in “extended support” by AWS. It costs more, ~$0.60/hour — so now I have updated the code in the current article to the newest version of Kubernetes 1.34 (as I write this) which should charge approx $0.10/hour (as of writing) which should keep costs lower.

2. If you keep the build up longer it will cost more, and with ongoing cost, if you do not destroy it. It may cost as much as $10–20 per day, so do not do this. Use terraform destroy and confirm in the AWS dashboard console.

3. It’s your responsibility, not automatic, to remove resources so double check in the AWS console after you do terraform destroy … I had a case recently where it did not delete the ELBs each time on 3 cycles and it cost me a couple dollars extra when I figured it out a couple days later.

4. 🚨At the end of this article we will remind you to destroy resources when you’re done for the day! Always remember to do this!

🛠️ Get more like this at https://www.systemsarchitect.io 🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

1. Project Setup & AWS CLI: Create folder structure, configure Terraform backend, AWS CLI

We’re going to keep a project structure based on this roadmap I’m starting out with.

Familiarize yourself with some of these files and filenames, they various purposes to keep things organize.

eks-video-tutorial/
├── .gitignore
├── environments/
│   └── dev/
│       ├── main.tf           # Root module - calls other modules
│       ├── variables.tf      # Input variables
│       ├── outputs.tf        # Output values
│       ├── providers.tf      # AWS provider configuration
│       ├── backend.tf        # S3 backend configuration
│       └── terraform.tfvars  # Variable values (not committed to git)
├── modules/                  # (Future use - Article 3+)
└── README.md

Git. Before going too far, I recommend you create a Github repo or other git repo to track your changes. You will want to keep this code.
And perhaps make a branch for each article so you can easily return to your place and keep it organized

This is the .gitignore file I am using (some extra ones in there!):

# ==========================
# Terraform
# ==========================
.terraform/
*.tfstate
*.tfstate.*
crash.log
crash.*.log
*.tfvars
*.tfvars.json
override.tf
override.tf.json
*_override.tf
*_override.tf.json
.terraformrc
terraform.rc
.terraform.lock.hcl

# ==========================
# AWS
# ==========================
.aws/
*.pem
.aws-sam/
samconfig.toml

# ==========================
# Node.js
# ==========================
node_modules/
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.npm
.yarn-integrity
dist/
build/
.cache/

# ==========================
# Environment & Secrets
# ==========================
.env
.env.*
!.env.example
*.key
*.crt
secrets.json

# ==========================
# IDE & Editors
# ==========================
.idea/
.vscode/
*.swp
*.swo
*~

# ==========================
# OS Files
# ==========================
.DS_Store
Thumbs.db

# ==========================
# Logs & Coverage
# ==========================
logs/
*.log
coverage/
.nyc_output/

Put that at the top level in .gitignore so you do not check in those files.

What we are building architecture diagram:

What we are building in this article with Terraform and Amazon EKS (initial build-out)

2. Terraform Backend: S3 bucket + DynamoDB for state locking

The Terraform state file is a record of the resources Terraform has created, modified, or destroyed in your infrastructure.

Without the state file, Terraform would have no context about existing infrastructure, leading to potential duplication or errors.

2.1 Why Remote State?

We are going to setup a remote/cloud state file in S3 with DynamoDB.

This is something I always do and is enterprise best practice.

Using local state (the default behavior, where the state is stored in a terraform.tfstate file on your machine) works fine for solo development or small projects, is not recommended for team projects.

I think you should do the remote file I show here, it only takes a few minutes and is more pro.

Downsides of a local TF state file:

Team. Multiple engineers might need to work on the same infrastructure. Local state means each person has their own copy, which can lead to conflicts, out-of-sync states, or accidental overwrites
Security. Local state files often contain sensitive information, such as AWS resource IDs, secrets, or connection details.
Backup. Cloud backups of the state file are important.

What we are going to do with the remote state file:

Use S3 to store the config. S3 provides durable, versioned, and encrypted storage.
DynamoDB for state locking. This is primarily used for making sure there are no conflicts between engineers simultaneously making changes.

These are important for getting the basics down with Amazon EKS and more complex setup with Terraform, so it is important to do this.

2.3 Create state file bootstrap directory

We have to run this one time, before the main deploy we are doing later.

mkdir -p backend-bootstrap
cd backend-bootstrap

# If you did not do this from Article 1 - we removed this test folder
# just to prevent any confusion

rm -Rf eks-tutorial-test

2.4 Create Backend Resources

We need to create a .tf file to create the resource required for the remote state file. We’re only running this once hopefully.

Create backend-bootstrap/main.tf

🚨⚠️ The code below uses my region, confirm/change to match yours.


# =============================================================================
# TERRAFORM BACKEND BOOTSTRAP
# =============================================================================
# This configuration creates the S3 bucket and DynamoDB table needed to store
# Terraform state for our main EKS project.
#
# Run this ONCE, then use these resources as the backend for all other configs.
# =============================================================================

# -----------------------------------------------------------------------------
# Terraform Configuration
# -----------------------------------------------------------------------------
terraform {
  required_version = ">= 1.10.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Note: This bootstrap uses LOCAL state intentionally!
  # We can't use remote state to create the remote state bucket.
}

# -----------------------------------------------------------------------------
# AWS Provider
#
# ⚠️ CHANGE the region to yours
# -----------------------------------------------------------------------------
provider "aws" {
  region = "us-east-1"

  default_tags {
    tags = {
      Project     = "eks-video-tutorial"
      Environment = "dev"
      ManagedBy   = "terraform"
      Purpose     = "terraform-backend"
    }
  }
}

# -----------------------------------------------------------------------------
# Data Sources
# -----------------------------------------------------------------------------

# Get current AWS account ID for unique bucket naming
data "aws_caller_identity" "current" {}

# Get current region
data "aws_region" "current" {}

# -----------------------------------------------------------------------------
# Local Variables
# -----------------------------------------------------------------------------
locals {
  account_id  = data.aws_caller_identity.current.account_id
  region      = data.aws_region.current.name
  bucket_name = "eks-tutorial-tfstate-${local.account_id}"
  table_name  = "eks-tutorial-terraform-locks"
}

# -----------------------------------------------------------------------------
# S3 Bucket for Terraform State
# -----------------------------------------------------------------------------

# Create the S3 bucket
resource "aws_s3_bucket" "terraform_state" {
  bucket = local.bucket_name

  # Prevent accidental deletion of this bucket
  lifecycle {
    prevent_destroy = false  # Set to true in production!
  }

  tags = {
    Name        = local.bucket_name
    Description = "Terraform state storage for EKS tutorial"
  }
}

# Enable versioning - allows recovery from bad applies or accidental deletions
resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

# Enable server-side encryption by default
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
    bucket_key_enabled = true
  }
}

# Block ALL public access - state files should never be public!
resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Bucket policy to enforce SSL/TLS connections only
resource "aws_s3_bucket_policy" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "EnforceTLS"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3:*"
        Resource = [
          aws_s3_bucket.terraform_state.arn,
          "${aws_s3_bucket.terraform_state.arn}/*"
        ]
        Condition = {
          Bool = {
            "aws:SecureTransport" = "false"
          }
        }
      }
    ]
  })
}

# -----------------------------------------------------------------------------
# DynamoDB Table for State Locking
# -----------------------------------------------------------------------------

# This table prevents concurrent Terraform runs from corrupting state
resource "aws_dynamodb_table" "terraform_locks" {
  name         = local.table_name
  billing_mode = "PAY_PER_REQUEST"  # Only pay for what you use (cheaper for low usage)
  hash_key     = "LockID"           # Required by Terraform - do not change!

  attribute {
    name = "LockID"
    type = "S"  # String type
  }

  tags = {
    Name        = local.table_name
    Description = "Terraform state locking for EKS tutorial"
  }
}

# -----------------------------------------------------------------------------
# Outputs
# -----------------------------------------------------------------------------

output "s3_bucket_name" {
  description = "Name of the S3 bucket for Terraform state"
  value       = aws_s3_bucket.terraform_state.id
}

output "s3_bucket_arn" {
  description = "ARN of the S3 bucket"
  value       = aws_s3_bucket.terraform_state.arn
}

output "s3_bucket_region" {
  description = "Region of the S3 bucket"
  value       = local.region
}

output "dynamodb_table_name" {
  description = "Name of the DynamoDB table for state locking"
  value       = aws_dynamodb_table.terraform_locks.name
}

output "dynamodb_table_arn" {
  description = "ARN of the DynamoDB table"
  value       = aws_dynamodb_table.terraform_locks.arn
}

# Output the backend configuration block to copy into other projects
output "backend_config" {
  description = "Backend configuration to use in other Terraform projects"
  value       = <<-EOT

    # Copy this into your backend.tf file:
    terraform {
      backend "s3" {
        bucket         = "${aws_s3_bucket.terraform_state.id}"
        key            = "environments/dev/terraform.tfstate"
        region         = "${local.region}"
        dynamodb_table = "${aws_dynamodb_table.terraform_locks.name}"
        encrypt        = true
      }
    }

  EOT
}

⚠️ The code uses my region, confirm/change code to match yours.

2.5 Make sure your AWS CLI is configured

Terraform is using your AWS CLI to create resource.

Therefore, you need to make sure you are logged in correctly.

# login to to IAM user for the CLI

# if you cannot remember the name of yours
aws configure list-profiles

# use your tutorial creds setup earlier (I used profile "terraform-eks-admin")
export AWS_PROFILE=terraform-eks-admin

# verify what account you are in - if issues see Part 1 article.
aws sts get-caller-identity

# output

{
    "UserId": "xxxxxxxxxxxx",
    "Account": "xxxxxxxxxxxxx",
    "Arn": "arn:aws:iam::xxxxxxxxxxxxx:user/terraform-eks-admin"
}

2.6 Apply Backend Bootstrap

Commands to run: terraform init, terraform apply
Expected output description
Note the bucket name for next step


# Reminder of what Terraform I am using
terraform -v
Terraform v1.14.1

# Make sure you are in the backend-bootstrap directory
$ cd ./backend-bootstrap/

# Initialize Terraform (do this again, we deleted the one from last article) 
terraform init

# Output

Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.100.0...
- Installed hashicorp/aws v5.100.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

Next we need to run terraform plan which runs through our code and basically is a “preview” that checks if there are any errors in the code.

Remember that you need to be in the same backend-bootstrap directory still!

You will see some output like this:

Near the top

At the bottom of the output

Next we will use terraform apply which is required to actually create the resources.

terraform apply

# when prompted input yes to continue

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

# Output

# You will see all the resources being created and at the end 

Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

Did you get an error? A funny thing happened to me while writing this, I actually had several CLIs open for various aspects of this project, but one was not updated to use the new EKS admin account we created in the previous article so it launched resources in another account!

Check your account: aws sts get-caller-identity

Good news! It’s super-easy to fix this mistake if you did too— just use terraform destroy and make sure to do that from the same CLI that was the wrong account, and it will tear down the resources that were placed in that wrong account.

Then… switch to your cli in the correct account and re-run terraform apply

More error troubleshooting is below.

2.7 Error Troubleshooting

If you saw an error at any point, it could be due to the following:

Make sure your AWS CLI is using the correct profile. That is themost common issue. Terraform is using that. Logged into the incorrect account? Change AWS CLI profiles. (see last article in this series)
If checking the Amazon Console UI for resources, remember to be in the correct region for S3 and DDB.
Resources created in wrong region. The code above uses my region, you may need to change it to match yours.
No valid credential sources found for AWS Provider. Run aws configure to set up credentials.
ExpiredToken or InvalidClientTokenId. Your AWS credentials have expired (common with SSO/temporary credentials) — re-login.
creating S3 Bucket: BucketAlreadyExists. The account ID suffix usually prevents this, but if it happens, modify local.bucket_name to add a unique identifier.
AccessDenied or UnauthorizedAccess. Your IAM user/role needs access to S3. Make sure your AWS CLI is using the correct profile.
Unsupported Terraform Core version. Check your terraform -v you need a recent version of terraform for this code I gave.
Error acquiring the state lock. Happens when other Terraform instances are already running locally.

2.8 Review Resource in AWS Console

Lets just check quickly in the console to make sure we see our resources

Make sure you are looking in the correct region in the UI!

this is in us-east-1 for me

⚠️Verify billing mode is “On-demand”

Estimated costs for this (may vary based on factors liek region)

S3: ~$0.023/GB/month for storage + $0.0004 per 1,000 requests
DynamoDB (On-Demand): ~$0.25 per million writes, $0.25 per million reads

And lets see S3

Good to go, lets continue!

2.9 Copy Backend Config

You may have seen in the output of apply that we echo output of the backend config. well be needing that … you can get it gaina here:


terraform output backend_config

# Copy this into your backend.tf file:
terraform {
  backend "s3" {
    bucket         = "eks-tutorial-tfstate-[your account id]"
    key            = "environments/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "eks-tutorial-terraform-locks"
    encrypt        = true
  }
}

Also as another check let’s make sure you still have a local state file

# Check that the state file exists
ls -la ~/eks-video-tutorial/backend-bootstrap/

# You should see this in the backend-bootstrap dir:

-rw-r--r--  main.tf
-rw-r--r--  terraform.tfstate
-rw-r--r--  terraform.tfstate.backup
drwxr-xr-x  .terraform/
-rw-r--r--  .terraform.lock.hcl

Now we will set up the dev env for our main Terraform resources for our Multi-AZ VPC….

3. VPC Infrastructure: Multi-AZ VPC with public/private subnets

3.1 Create the Project Directory Structure

# Create the environments/dev directory - FROM your project root!

# swithc to your project root and then do this:
mkdir -p environments/dev
cd environments/dev

3.2 Create the Project Directory Structure

Next I want to configure Terraform to use the S3 backend we just created.

Confirm your cli is still in the correct account aws sts get-caller-identity

File to create: environments/dev/backend.tf

You need to copy in the info you got above from the output of the previous apply:

# =============================================================================
# TERRAFORM BACKEND CONFIGURATION
# =============================================================================
# This configures Terraform to store state in S3 with DynamoDB locking.
# The S3 bucket and DynamoDB table were created by the backend-bootstrap config.
#
# IMPORTANT: Update the bucket name to match YOUR account ID!
# =============================================================================

terraform {
  backend "s3" {
    # S3 bucket name - replace XXXXXXXXXXXX with your AWS account ID
    # Or copy the exact bucket name from your backend-bootstrap output
    bucket = "eks-tutorial-tfstate-XXXXXXXXXXXX"

    # Path within the bucket for this environment's state file
    key = "environments/dev/terraform.tfstate"

    # Region where the bucket exists
    region = "us-east-1"

    # DynamoDB table for state locking
    dynamodb_table = "eks-tutorial-terraform-locks"

    # Encrypt state file at rest
    encrypt = true
  }
}

When that is done continue.

3.3 Provider Configuration

Create file for all providers AWS and EKS we need to for deploying EKS environments/dev/providers.tf

# =============================================================================
# TERRAFORM AND PROVIDER CONFIGURATION
# =============================================================================
# Configures Terraform version requirements and all providers needed for
# deploying EKS infrastructure.
# =============================================================================

# -----------------------------------------------------------------------------
# Terraform Settings
# -----------------------------------------------------------------------------
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    # AWS Provider - for all AWS resources
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }

    # Kubernetes Provider - for K8s resources (used later)
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }

    # TLS Provider - for certificate handling
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }

    # Time Provider - for adding delays when needed
    time = {
      source  = "hashicorp/time"
      version = "~> 0.9"
    }
  }
}

# -----------------------------------------------------------------------------
# AWS Provider
# -----------------------------------------------------------------------------
provider "aws" {
  region = var.aws_region

  # Default tags applied to ALL resources created by this configuration
  # This is a best practice for cost tracking and resource management
  default_tags {
    tags = {
      Project     = var.project_name
      Environment = var.environment
      ManagedBy   = "terraform"
      Repository  = "eks-video-tutorial"
    }
  }
}

# -----------------------------------------------------------------------------
# Kubernetes Provider
# -----------------------------------------------------------------------------
# This provider is configured to authenticate with our EKS cluster.
# It uses the AWS CLI to get a token for authentication.
#
# NOTE: This will show a warning during the first run because the cluster
# doesn't exist yet. This is normal and expected.
# -----------------------------------------------------------------------------
provider "kubernetes" {
  # Only configure if the cluster exists
  host                   = try(module.eks.cluster_endpoint, null)
  cluster_ca_certificate = try(base64decode(module.eks.cluster_certificate_authority_data), null)

  # Use AWS CLI to get authentication token
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args = [
      "eks",
      "get-token",
      "--cluster-name",
      var.cluster_name,
      "--region",
      var.aws_region
    ]
  }
}

3.4 Variables Definition

This file defines all input variables for our infrastructure.

Create the file environments/dev/variables.tf

# =============================================================================
# INPUT VARIABLES
# =============================================================================
# These variables allow customization of the infrastructure.
# Default values are set for the tutorial, but can be overridden in
# terraform.tfvars or via command line.
# =============================================================================

# -----------------------------------------------------------------------------
# General Configuration
# -----------------------------------------------------------------------------

variable "aws_region" {
  description = "AWS region to deploy resources"
  type        = string
  default     = "us-east-1"
}

variable "project_name" {
  description = "Name of the project - used for resource naming and tagging"
  type        = string
  default     = "eks-video-tutorial"
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
  default     = "dev"
}

# -----------------------------------------------------------------------------
# VPC Configuration
# -----------------------------------------------------------------------------

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"

  validation {
    condition     = can(cidrhost(var.vpc_cidr, 0))
    error_message = "VPC CIDR must be a valid IPv4 CIDR block."
  }
}

variable "availability_zones" {
  description = "List of availability zones to use for subnets"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]

  validation {
    condition     = length(var.availability_zones) >= 2
    error_message = "At least 2 availability zones are required for high availability."
  }
}

variable "private_subnet_cidrs" {
  description = "CIDR blocks for private subnets (one per AZ)"
  type        = list(string)
  default     = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}

variable "public_subnet_cidrs" {
  description = "CIDR blocks for public subnets (one per AZ)"
  type        = list(string)
  default     = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
}

variable "enable_nat_gateway" {
  description = "Enable NAT Gateway for private subnet internet access"
  type        = bool
  default     = true
}

variable "single_nat_gateway" {
  description = "Use a single NAT Gateway (cost savings for dev, not HA)"
  type        = bool
  default     = true  # Set to false in production for HA
}

# -----------------------------------------------------------------------------
# EKS Cluster Configuration
# -----------------------------------------------------------------------------

variable "cluster_name" {
  description = "Name of the EKS cluster"
  type        = string
  default     = "eks-video-cluster"

  validation {
    condition     = can(regex("^[a-zA-Z][a-zA-Z0-9-]*$", var.cluster_name))
    error_message = "Cluster name must start with a letter and contain only alphanumeric characters and hyphens."
  }
}

variable "cluster_version" {
  description = "Kubernetes version for the EKS cluster"
  type        = string
  default     = "1.33" # use a recent one, older legacy versions on AWS get charged more "extended support"

  validation {
    condition     = can(regex("^1\\.(2[89]|3[0-5])$", var.cluster_version)) # if later version make sure to update regex
    error_message = "Cluster version must be a supported EKS version"
  }
}

variable "cluster_endpoint_public_access" {
  description = "Enable public access to the EKS API endpoint"
  type        = bool
  default     = true  # Required for kubectl access from your machine
}

variable "cluster_endpoint_private_access" {
  description = "Enable private access to the EKS API endpoint"
  type        = bool
  default     = true  # Allows nodes to communicate with control plane privately
}

# -----------------------------------------------------------------------------
# EKS Node Group Configuration
# -----------------------------------------------------------------------------

variable "node_instance_types" {
  description = "List of EC2 instance types for the node group"
  type        = list(string)
  default     = ["t3.small"]

  # t3.small: 2 vCPU, 2 GB RAM - minimum recommended for EKS
  # t3.micro is too small for EKS system pods!
}

variable "node_capacity_type" {
  description = "Capacity type for nodes: ON_DEMAND or SPOT"
  type        = string
  default     = "ON_DEMAND"

  validation {
    condition     = contains(["ON_DEMAND", "SPOT"], var.node_capacity_type)
    error_message = "Capacity type must be either ON_DEMAND or SPOT."
  }
}

variable "node_desired_size" {
  description = "Desired number of nodes in the node group"
  type        = number
  default     = 3

  validation {
    condition     = var.node_desired_size >= 1
    error_message = "Desired size must be at least 1."
  }
}

variable "node_min_size" {
  description = "Minimum number of nodes in the node group"
  type        = number
  default     = 3 # for high availability

  validation {
    condition     = var.node_min_size >= 1
    error_message = "Minimum size must be at least 1."
  }
}

variable "node_max_size" {
  description = "Maximum number of nodes in the node group"
  type        = number
  default     = 4

  validation {
    condition     = var.node_max_size >= 1
    error_message = "Maximum size must be at least 1."
  }
}

variable "node_disk_size" {
  description = "Disk size in GB for worker nodes"
  type        = number
  default     = 20  # 20 GB is sufficient for learning

  validation {
    condition     = var.node_disk_size >= 20
    error_message = "Disk size must be at least 20 GB."
  }
}

# -----------------------------------------------------------------------------
# Additional Tags
# -----------------------------------------------------------------------------

variable "additional_tags" {
  description = "Additional tags to apply to all resources"
  type        = map(string)
  default     = {}
}

3.5 Variable Values (tfvars)

This file sets the actual values for our variables. Most use defaults, but we include it for clarity and future customization.

Note: I added this into the .gitignore because sometimes you could have secrets in tfvars files, it’s just good practice to .gitignore this kind of file.

Create the file: environments/dev/terraform.tfvars

# =============================================================================
# TERRAFORM VARIABLE VALUES
# =============================================================================
# This file contains the actual values for our infrastructure.
# These values override the defaults defined in variables.tf
#
# NOTE: This file should NOT be committed to git if it contains secrets!
#       For this tutorial, it's safe since we're only using non-sensitive values.
# =============================================================================

# -----------------------------------------------------------------------------
# General Configuration
# -----------------------------------------------------------------------------
aws_region   = "us-east-1"
project_name = "eks-video-tutorial"
environment  = "dev"

# -----------------------------------------------------------------------------
# VPC Configuration
# -----------------------------------------------------------------------------
vpc_cidr = "10.0.0.0/16"

availability_zones = [
  "us-east-1a",
  "us-east-1b",
  "us-east-1c"
]

# Private subnets - where EKS nodes will run
private_subnet_cidrs = [
  "10.0.1.0/24",   # AZ-1a: 251 usable IPs
  "10.0.2.0/24",   # AZ-1b: 251 usable IPs
  "10.0.3.0/24"    # AZ-1c: 251 usable IPs
]

# Public subnets - for load balancers and NAT gateway
public_subnet_cidrs = [
  "10.0.101.0/24", # AZ-1a
  "10.0.102.0/24", # AZ-1b
  "10.0.103.0/24"  # AZ-1c
]

# NAT Gateway settings
enable_nat_gateway = true
single_nat_gateway = true  # Use one NAT GW to save costs (not HA!)

# -----------------------------------------------------------------------------
# EKS Configuration
# -----------------------------------------------------------------------------
cluster_name    = "eks-video-cluster"
cluster_version = "1.34"

# Access settings
cluster_endpoint_public_access  = true   # Allow kubectl from your machine
cluster_endpoint_private_access = true   # Allow node-to-control-plane communication

# -----------------------------------------------------------------------------
# Node Group Configuration
# -----------------------------------------------------------------------------
node_instance_types = ["t3.small"]  # 2 vCPU, 2 GB RAM
node_capacity_type  = "ON_DEMAND"   # Use SPOT for cost savings (less stable)
node_desired_size   = 3             # Start with 2 nodes
node_min_size       = 3             # Never go below 2 (HA)
node_max_size       = 4              # Allow scaling up to 4
node_disk_size      = 20             # 20 GB per node

# -----------------------------------------------------------------------------
# Additional Tags
# -----------------------------------------------------------------------------
additional_tags = {
  Owner       = "tutorial"
  CostCenter  = "learning"
  DeleteAfter = "2025-12-31"
}

Why are we using t3.small?

I believe t3.micro is not going to give us enough memory, I’m not sure we can even run what we need on that.

After some research, I came up with t3.small with 2GB RAM as minimum, and a good start for learning, and to keep costs down.

Generally though, I believe t3.medium would be a minimum in some use cases, and many companies will run bigger than that, just depends on your use cases.

High Availability tip: Running a single node is a single point of failure. If that node fails, all your pods go down. With 2 nodes spread across AZs, your application survives node failures, so that’s why we are doing this, as a demo of what you can do.

⚠️ Also, note, the first time I did this myself before publishing this article, I accidentally made node_desired_size and node_min_size as 2…. when I checked from kubectl down below I noticed it was only showing 2 nodes and after troubleshooting updated the code (now correct above) as:

variable "node_desired_size" {
  description = "Desired number of nodes in the node group"
  type        = number
  default     = 3

  validation {
    condition     = var.node_desired_size >= 1
    error_message = "Desired size must be at least 1."
  }
}

variable "node_min_size" {
  description = "Minimum number of nodes in the node group"
  type        = number
  default     = 3 # Minimum 3for high availability

  validation {
    condition     = var.node_min_size >= 1
    error_message = "Minimum size must be at least 1."
  }
}

and in the terraform.tfvars file:

node_desired_size   = 3             # Start with 2 nodes
node_min_size       = 3             # Never go below 2 (HA)

Notice some differences with ECS:

More granular on

Terraform module with VPC integration
Creating a managed Node Group
K8s-specific subnet tags
Node Role, EKS Pod Identity (Agent) for pods (not task role)
We customize to our K8s version

Take some time to just look through this so you understand the code we are creating and all the resources.

⚠️ Remember when we terraform apply this soon, that you may have some costs unless you destroy it at the end! It is your responsibility to do that to avoid costs.

4. EKS Cluster: Control plane + managed node group setup in Terraform

In this section, we’ll build a production-grade Amazon EKS cluster using Terraform.

We’ll create a highly available setup that spans multiple Availability Zones (AZs), with a secure network architecture separating public and private subnets, and a fully managed control plane backed by AWS.

This foundation emphasizes resilience: even if one AZ experiences an outage, your cluster’s control plane and worker nodes in the remaining AZs will continue to operate.

4.1 Review key components

Let’s review the resources that we will be initially build for our Amazon EKS demo:

3 Availability Zones for high availability.

Multi-AZ VPC Our VPC spans 3 availability zones. If one AZ has an outage, the other two continue operating. This is foundational High Availability (HA).

Public Subnets: For load balancers and NAT Gateway
Private Subnets: For EKS worker nodes (more secure)

By passing all 3 private subnet IDs to the node group, EKS automatically distributes nodes across AZs. If one AZ fails, nodes in other AZs continue running your workloads.

NAT Gateway: Allows private subnets to access internet (for pulling images) —⚠️ note this a primary cost for this project, remember that, and you need to make sure it is destroyed when you are done, it’s an hourly rate that adds up.

⚠️Also note that in an HA config we would want more than one NAT Gateway, we are not doing that in dev and for cost, but in prod your would want another to prevent single source of failure.

EKS Control Plane: Fully managed by AWS, automatically HA across 3 AZs, includes the API server etcd, Controller Manager and Scheduler.

AWS automatically runs the EKS control plane (API server, etcd, controllers) across 3 AZs. You get this High Availability for free — no configuration needed.

What we are building architecture diagram

4.2 Main Configuration File

Study this file carefully. I’ve tried to comment and document key parts but spend some time reviewing this.

Create the file environments/dev/main.tf

# =============================================================================
# MAIN INFRASTRUCTURE CONFIGURATION
# =============================================================================
# This file defines the core infrastructure for our EKS video streaming platform:
# - VPC with public and private subnets across 3 AZs
# - EKS cluster with managed node group
#
# ECS Comparison:
# - In ECS, cluster creation is simpler (just a cluster resource)
# - EKS requires more explicit VPC configuration and subnet tagging
# - But EKS gives you full Kubernetes, not just container orchestration
# =============================================================================

# -----------------------------------------------------------------------------
# Local Variables
# -----------------------------------------------------------------------------
locals {
  # Common name prefix for resources
  name_prefix = "${var.project_name}-${var.environment}"

  # Common tags to apply to all resources
  common_tags = {
    Project     = var.project_name
    Environment = var.environment
    ManagedBy   = "terraform"
  }

  # Merge common tags with any additional tags
  tags = merge(local.common_tags, var.additional_tags)
}

# -----------------------------------------------------------------------------
# Data Sources
# -----------------------------------------------------------------------------

# Get current AWS account ID
data "aws_caller_identity" "current" {}

# Get available AZs in the region (validates our AZ choices)
data "aws_availability_zones" "available" {
  state = "available"

  # Exclude Local Zones and Wavelength Zones
  filter {
    name   = "opt-in-status"
    values = ["opt-in-not-required"]
  }
}

# =============================================================================
# VPC MODULE
# =============================================================================
# We use the official AWS VPC module - it's battle-tested and handles all the
# complexity of subnets, route tables, NAT gateways, etc.
#
# Documentation: https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws
# =============================================================================

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  # ---------------------------------------------------------------------------
  # Basic VPC Configuration
  # ---------------------------------------------------------------------------
  name = "${local.name_prefix}-vpc"
  cidr = var.vpc_cidr

  # Use 3 AZs for high availability
  azs = var.availability_zones

  # ---------------------------------------------------------------------------
  # Subnet Configuration
  # ---------------------------------------------------------------------------
  # Private subnets - EKS nodes will run here (more secure)
  private_subnets = var.private_subnet_cidrs

  # Public subnets - for load balancers and NAT gateway
  public_subnets = var.public_subnet_cidrs

  # ---------------------------------------------------------------------------
  # NAT Gateway Configuration
  # ---------------------------------------------------------------------------
  # NAT Gateway allows private subnets to access the internet
  # (needed for pulling container images, etc.)
  enable_nat_gateway = var.enable_nat_gateway

  # Single NAT Gateway saves costs (~$32/month) but is not HA
  # For production, set this to false (one NAT GW per AZ)
  single_nat_gateway = var.single_nat_gateway

  # Place NAT Gateway in first AZ
  one_nat_gateway_per_az = false

  # ---------------------------------------------------------------------------
  # DNS Configuration
  # ---------------------------------------------------------------------------
  # Required for EKS and service discovery
  enable_dns_hostnames = true
  enable_dns_support   = true

  # ---------------------------------------------------------------------------
  # VPC Flow Logs (Optional - disabled for cost savings in dev)
  # ---------------------------------------------------------------------------
  enable_flow_log                      = false
  create_flow_log_cloudwatch_log_group = false
  create_flow_log_cloudwatch_iam_role  = false

  # ---------------------------------------------------------------------------
  # Subnet Tags for EKS Auto-Discovery
  # ---------------------------------------------------------------------------
  # These tags are REQUIRED for EKS to discover and use the subnets correctly!

  # Tags for all subnets
  tags = merge(local.tags, {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  })

  # Tags for public subnets - tells AWS LB Controller to use these for internet-facing LBs
  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
    "Tier"                                      = "public"
  }

  # Tags for private subnets - tells AWS LB Controller to use these for internal LBs
  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
    "Tier"                                      = "private"
  }
}

# =============================================================================
# EKS CLUSTER MODULE
# =============================================================================
# We use the official Amazon EKS module - it handles the complexity of:
# - IAM roles and policies
# - Security groups
# - OIDC provider
# - Add-ons (CoreDNS, kube-proxy, vpc-cni)
# - Managed node groups
#
# Documentation: https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
#
# ECS Comparison:
# - ECS Cluster = just a logical grouping
# - EKS Cluster = full Kubernetes control plane with API server, etcd, etc.
# - EKS automatically runs the control plane across 3 AZs (HA built-in)
# =============================================================================

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  # ---------------------------------------------------------------------------
  # Cluster Configuration
  # ---------------------------------------------------------------------------
  cluster_name    = var.cluster_name
  cluster_version = var.cluster_version

  # ---------------------------------------------------------------------------
  # Network Configuration
  # ---------------------------------------------------------------------------
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # Control plane subnets (can be different from node subnets)
  control_plane_subnet_ids = module.vpc.private_subnets

  # ---------------------------------------------------------------------------
  # Cluster Endpoint Access
  # ---------------------------------------------------------------------------
  # Public access - allows kubectl from your local machine
  cluster_endpoint_public_access = var.cluster_endpoint_public_access

  # Private access - allows nodes to communicate with control plane via VPC
  cluster_endpoint_private_access = var.cluster_endpoint_private_access

  # Restrict public access to specific IPs (optional, more secure)
  # cluster_endpoint_public_access_cidrs = ["YOUR_IP/32"]

  # ---------------------------------------------------------------------------
  # Cluster Add-ons
  # ---------------------------------------------------------------------------
  # These are essential Kubernetes components managed by AWS
  cluster_addons = {
    # CoreDNS - DNS server for Kubernetes service discovery
    coredns = {
      most_recent = true
      configuration_values = jsonencode({
        # Ensure CoreDNS runs on different nodes for HA
        affinity = {
          podAntiAffinity = {
            preferredDuringSchedulingIgnoredDuringExecution = [
              {
                weight = 100
                podAffinityTerm = {
                  labelSelector = {
                    matchExpressions = [
                      {
                        key      = "k8s-app"
                        operator = "In"
                        values   = ["kube-dns"]
                      }
                    ]
                  }
                  topologyKey = "kubernetes.io/hostname"
                }
              }
            ]
          }
        }
      })
    }

    # kube-proxy - Network proxy that runs on each node
    kube-proxy = {
      most_recent = true
    }

    # vpc-cni - AWS VPC CNI plugin for pod networking
    vpc-cni = {
      most_recent = true
      configuration_values = jsonencode({
        # Enable prefix delegation for more IPs per node
        env = {
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }

    # EKS Pod Identity Agent - IAM
    eks-pod-identity-agent = {
      most_recent = true
    }
  }

  # ---------------------------------------------------------------------------
  # IAM / Access Configuration
  # ---------------------------------------------------------------------------
  # Allow the Terraform user to administer the cluster
  enable_cluster_creator_admin_permissions = true

  # Access entry configuration (EKS v1.30+ authentication mode)
  authentication_mode = "API_AND_CONFIG_MAP"

  # ---------------------------------------------------------------------------
  # Managed Node Group
  # ---------------------------------------------------------------------------
  # ECS Comparison:
  # - ECS Capacity Provider = similar concept
  # - Both manage a pool of EC2 instances for running containers
  # - EKS nodes run kubelet and join the K8s cluster automatically
  # ---------------------------------------------------------------------------
  eks_managed_node_groups = {
    # Primary node group
    primary = {
      # Use shorter names to avoid AWS IAM role name length limits (38 chars max)
      name            = "primary"
      use_name_prefix = false

      # Override IAM role name to be shorter
      iam_role_name            = "${var.cluster_name}-ng-role"
      iam_role_use_name_prefix = false

      # Instance configuration
      instance_types = var.node_instance_types
      capacity_type  = var.node_capacity_type

      # Scaling configuration
      min_size     = var.node_min_size
      max_size     = var.node_max_size
      desired_size = var.node_desired_size

      # Disk configuration
      disk_size = var.node_disk_size

      # Subnet placement - spread across all private subnets (AZs)
      subnet_ids = module.vpc.private_subnets

      # Labels applied to all nodes in this group
      labels = {
        Environment = var.environment
        NodeGroup   = "primary"
        Project     = var.project_name
      }

      # Tags for the node group and EC2 instances
      tags = merge(local.tags, {
        Name      = "${local.name_prefix}-node"
        NodeGroup = "primary"
      })

      # AMI type - Amazon Linux 2023 (AL2023) is the latest
      ami_type = "AL2023_x86_64_STANDARD"

      # Update configuration - how many nodes can be unavailable during updates
      update_config = {
        max_unavailable_percentage = 50
      }

      # IAM role additional policies
      iam_role_additional_policies = {
        AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
      }
    }
  }

  # ---------------------------------------------------------------------------
  # Node Security Group Additional Rules
  # ---------------------------------------------------------------------------
  node_security_group_additional_rules = {
    # Allow nodes to communicate with each other on all ports
    ingress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "ingress"
      self        = true
    }

    # Allow outbound traffic to all destinations
    egress_all = {
      description      = "Node all egress"
      protocol         = "-1"
      from_port        = 0
      to_port          = 0
      type             = "egress"
      cidr_blocks      = ["0.0.0.0/0"]
      ipv6_cidr_blocks = ["::/0"]
    }
  }

  # ---------------------------------------------------------------------------
  # Tags
  # ---------------------------------------------------------------------------
  tags = local.tags
}

# =============================================================================
# ADDITIONAL RESOURCES
# =============================================================================

# -----------------------------------------------------------------------------
# Wait for cluster to be ready
# -----------------------------------------------------------------------------
# This ensures the cluster is fully ready before we try to use it
resource "time_sleep" "wait_for_cluster" {
  depends_on = [module.eks]

  create_duration = "30s"
}

# -----------------------------------------------------------------------------
# Null resource to update kubeconfig
# -----------------------------------------------------------------------------
# This is optional - provides a command to run after apply
resource "null_resource" "update_kubeconfig" {
  depends_on = [time_sleep.wait_for_cluster]

  provisioner "local-exec" {
    command = "aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}"
  }

  # Only run this when the cluster ARN changes (i.e., on initial creation)
  triggers = {
    cluster_arn = module.eks.cluster_arn
  }
}

Note #1: I believe EKS should autoprovision EKS Pod Identity Agent by default, so that means we do not need the OIDC provider, however I am keeping that in my main.tf right now because we may need it later. The old way of using IRSA (IAM Roles for Service Accounts) and load balancers I believe will need it. But it has no impact on our deploy.

Note #2: we have one extra setup addon for better availability here (copied form the code above).

We configured CoreDNS with pod anti-affinity to prefer running DNS on different nodes. This prevents DNS from being a single point of failure.

TF code relating to this:

You can typically make fine-grained adjutsments using sections like this in the TF code.

4.3 Output file

Create file: environments/dev/outputs.tf

This will make it easy to see the status and other values after applying.

# =============================================================================
# TERRAFORM OUTPUTS
# =============================================================================
# These outputs display important information after terraform apply and can be
# referenced by other Terraform configurations or scripts.
# =============================================================================

# -----------------------------------------------------------------------------
# VPC Outputs
# -----------------------------------------------------------------------------

output "vpc_id" {
  description = "ID of the VPC"
  value       = module.vpc.vpc_id
}

output "vpc_cidr_block" {
  description = "CIDR block of the VPC"
  value       = module.vpc.vpc_cidr_block
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = module.vpc.private_subnets
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = module.vpc.public_subnets
}

output "nat_gateway_ids" {
  description = "List of NAT Gateway IDs"
  value       = module.vpc.natgw_ids
}

output "availability_zones" {
  description = "List of availability zones used"
  value       = module.vpc.azs
}

# -----------------------------------------------------------------------------
# EKS Cluster Outputs
# -----------------------------------------------------------------------------

output "cluster_name" {
  description = "Name of the EKS cluster"
  value       = module.eks.cluster_name
}

output "cluster_arn" {
  description = "ARN of the EKS cluster"
  value       = module.eks.cluster_arn
}

output "cluster_endpoint" {
  description = "Endpoint URL for the EKS cluster API server"
  value       = module.eks.cluster_endpoint
}

output "cluster_version" {
  description = "Kubernetes version of the EKS cluster"
  value       = module.eks.cluster_version
}

output "cluster_certificate_authority_data" {
  description = "Base64 encoded certificate data for cluster authentication"
  value       = module.eks.cluster_certificate_authority_data
  sensitive   = true
}

output "cluster_oidc_issuer_url" {
  description = "OIDC issuer URL for the cluster"
  value       = module.eks.cluster_oidc_issuer_url
}

output "cluster_oidc_provider_arn" {
  description = "ARN of the OIDC provider"
  value       = module.eks.oidc_provider_arn
}

# -----------------------------------------------------------------------------
# EKS Node Group Outputs
# -----------------------------------------------------------------------------

output "node_group_name" {
  description = "Name of the primary node group"
  value       = try(module.eks.eks_managed_node_groups["primary"].node_group_id, "")
}

output "node_group_arn" {
  description = "ARN of the primary node group"
  value       = try(module.eks.eks_managed_node_groups["primary"].node_group_arn, "")
}

output "node_group_status" {
  description = "Status of the primary node group"
  value       = try(module.eks.eks_managed_node_groups["primary"].node_group_status, "")
}

output "node_security_group_id" {
  description = "Security group ID attached to the EKS nodes"
  value       = module.eks.node_security_group_id
}

# -----------------------------------------------------------------------------
# IAM Outputs
# -----------------------------------------------------------------------------

output "cluster_iam_role_arn" {
  description = "IAM role ARN of the EKS cluster"
  value       = module.eks.cluster_iam_role_arn
}

output "node_iam_role_arn" {
  description = "IAM role ARN of the EKS node group"
  value       = try(module.eks.eks_managed_node_groups["primary"].iam_role_arn, "")
}

# -----------------------------------------------------------------------------
# Useful Commands
# -----------------------------------------------------------------------------

output "configure_kubectl" {
  description = "Command to configure kubectl for this cluster"
  value       = "aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}"
}

output "get_nodes_command" {
  description = "Command to list cluster nodes"
  value       = "kubectl get nodes -o wide"
}

output "get_pods_command" {
  description = "Command to list all pods in all namespaces"
  value       = "kubectl get pods -A"
}

# -----------------------------------------------------------------------------
# Summary Output
# -----------------------------------------------------------------------------

output "summary" {
  description = "Summary of created infrastructure"
  value       = <<-EOT

    ============================================================
    EKS CLUSTER DEPLOYMENT COMPLETE!
    ============================================================

    Cluster Name:     ${module.eks.cluster_name}
    Cluster Version:  ${module.eks.cluster_version}
    Cluster Endpoint: ${module.eks.cluster_endpoint}

    VPC ID:           ${module.vpc.vpc_id}
    VPC CIDR:         ${module.vpc.vpc_cidr_block}

    Availability Zones: ${join(", ", module.vpc.azs)}

    Private Subnets:
      - ${join("\n      - ", module.vpc.private_subnets)}

    Public Subnets:
      - ${join("\n      - ", module.vpc.public_subnets)}

    ------------------------------------------------------------
    NEXT STEPS:
    ------------------------------------------------------------

    1. Configure kubectl:
       aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}

    2. Verify nodes are ready:
       kubectl get nodes

    3. Check system pods:
       kubectl get pods -n kube-system

    ============================================================

  EOT
}

# -----------------------------------------------------------------------------
# HA Verification Output
# -----------------------------------------------------------------------------

output "ha_status" {
  description = "High availability configuration status"
  value       = <<-EOT

    ============================================================
    HIGH AVAILABILITY STATUS
    ============================================================

    ✅ VPC spans ${length(module.vpc.azs)} Availability Zones
    ✅ Private subnets in ${length(module.vpc.private_subnets)} AZs (for nodes)
    ✅ Public subnets in ${length(module.vpc.public_subnets)} AZs (for load balancers)
    ✅ EKS control plane: Managed by AWS across 3 AZs (automatic)
    ✅ Node group: Configured to spread across all private subnets

    ${var.single_nat_gateway ? "⚠️  Single NAT Gateway: NOT highly available (cost savings for dev)" : "✅ NAT Gateway per AZ: Highly available"}

    Node Configuration:
    - Min nodes: ${var.node_min_size}
    - Max nodes: ${var.node_max_size}
    - Desired:   ${var.node_desired_size}

    ============================================================

  EOT
}

These will output info for diagnostics for you to fix any errors and for guidance on next steps.

4.4 Verify Files

ls -la environments/dev/

Expected:

-rw-r--r--  backend.tf
-rw-r--r--  main.tf
-rw-r--r--  outputs.tf
-rw-r--r--  providers.tf
-rw-r--r--  terraform.tfvars
-rw-r--r--  variables.tf
-rw-r--r--  .gitignore # (or may be in higher dir.)

These are all the files we should have.

5. Deploy AWS Resources with Terraform

Now that we have deployed our resources, we need to do some checks to validate that everything is working as it should be.

5.1 Verify account

Let’s just check we are in the right account before we launch this thing….

aws sts get-caller-identity

# Returns the account you are in

5.2 Update Backend Configuration

Validat that you backend state bucket name is correct (we pasted this in much earlier, but check again or we’ll have issues)


# cd from the root of your project

cd backend-bootstrap
terraform output s3_bucket_name

Copy that bucket name and if you have not done it yet update environments/dev/backend.tf:

cd environments/dev
cat backend.tf

5.3 Initialize Terraform

Now let’s initialize Terraform with the remote backend.

⚠️ With the local lockfile on the init, this could initially take up a decent amount of hard drive space for library installs. Keep an eye on that.

Inside environments/dev

terraform init

# Output - you should see it installing updated libraries needed

terraform plan

# Output - this will confirm our syntax is correct and do a preview dry run

# If all is good run in plan, run terraform apply

terraform apply

That is it you are deploying!!!!

Pray it works…. I think it will 😅. Worked on my machine 🤣

Troubleshooting: Terraform, Amazon EKS and associated libraries change frequently. If you get any errors, it may be due to a small syntax change that happens between versions.

Research any errors and feel free to update in the comments.

# Output

============================================================
EKS CLUSTER DEPLOYMENT COMPLETE!
============================================================

Cluster Name:     eks-video-cluster
Cluster Version:  1.34
Cluster Endpoint: https://913C8xxxxxxxx.gr7.us-east-1.eks.amazonaws.com

VPC ID:           vpc-05d295xxxxxxxxxx
VPC CIDR:         10.0.0.0/16

Availability Zones: us-east-1a, us-east-1b, us-east-1c

Private Subnets:
  - subnet-04fd3xxxxxxxxxxx
  - subnet-0936400xxxxxxxxxxx
  - subnet-0807b3xxxxxxxxxxx

Public Subnets:
  - subnet-0612axxxxxxxxxxx
  - subnet-01a1xxxxxxxxxxx
  - subnet-01cxxxxxxxxxxx

🚀 Let’s go! You’re doing it!

6. Validate Amazon EKS Launch: Examine what was created, run basic commands with kubectl

Lets go to the AWS console and and check this out

VPCs: 2 … We created 1 and there is 1 by default.
Subnets: 12 … 6 default VPC subnets + 6 we created (3 public + 3 private)
Route Tables: 4 … 1 public route table + 3 private route tables (one/AZ)
Internet Gateways: 1 default + 1 we created
NAT Gateways: 1 … Single NAT Gateway (only 1 instead of 2 for cost savings, use 2 for prod HA)
Security Groups: 5 … Default + cluster + nodes + additional
Running Instances: 2 …. 2 EKS worker nodes!

Amazon EKS Clusters in AWS Console

Note: I initially rolled this out for version 1.31, but changed the article to do version 1.34 because at writing the cost was less (1.31 is considered legacy “extended support”)

This was a first run with 1.31 — see where it says “Extended support” — this costs more hourly, so I re-ran again, launched a second time, so I do not have to pay the legacy rates.

Rerun for K8s 1.34 This is where I re-ran it later to use 1.34 — notice how it says “Standard support”. (watch this, if you run this a year from now it may be legacy)

Minor IAM issues in AWS Console Amazon EKS

⚠️ I noticed a few minor permissions issues in AWS Console, we do not need to fix that yet as we are just validating the basic health, but we will fix that in later articles.

These are just minor IAM issues that can be quickly fixed but I do not want to distract from the core of this lesson.

Connect kubectl: Configure local kubectl, verify nodes

Without running this command below (or an similar), kubectl has no idea where your cluster is or how to authenticate to it.

So kubectl knows the API server endpoint of your EKS cluster.
It writes (or updates) a cluster config entry in your ~/.kube/config file.
Sets up AWS IAM-based authentication to use your credentials.
After running the command, kubectl will automatically switch to this cluster making it the active cluster for this.

aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster

# Output

updated context arn:aws:eks:us-east-1:xxxxxxxxxxxx:cluster/eks-video-cluster in /Users/me/.kube/config

You need to re-run this whenever:

You create a brand-new cluster for first time access.
You switch between multiple EKS clusters.
You work on a new laptop/machine (no kubeconfig yet).
~/.kube/config got deleted or corrupted.
You changed the AWS CLI profile or region and use a different profile.

You can also alias it like (optional):

aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster --alias video

# then you can do instead:

kubectl config use-context video

Verify connection is working for kubectl

Let’s make sure it is working

kubectl cluster-info

# Output

Kubernetes control plane is running at https://xxxxxxxxx.gr7.us-east-1.eks.amazonaws.com

CoreDNS is running at https://xxxxxxxxx.gr7.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Check Nodes

Lets make sure our worker nodes are there.


kubectl get nodes

# Output:

NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-1-110.ec2.internal   Ready       6m26s   v1.34.2-eks-xxxxxx
ip-10-0-2-6.ec2.internal     Ready       6m44s   v1.34.2-eks-xxxxxx
ip-10-0-3-7.ec2.internal     Ready       6m45s   v1.34.2-eks-xxxxxx

We run this because it is the most basic health-check we run after we successfully connected kubectl.

By running this success we know:

Our AWS credientials are correct
We have permission to talk to EKS API
kubectl can reach the EKS control plane over the internet
The cluster is alive and has worker nodes joined

Verify Nodes are in different AZs (HA Check)

kubectl get nodes -L topology.kubernetes.io/zone

# Output - notice now (far right) it shows the AZs

NAME                         STATUS   ROLES    AGE   VERSION                ZONE
ip-10-0-1-110.ec2.internal   Ready       6m48s   v1.34.2-eks-xxxxxx   us-east-1a
ip-10-0-2-6.ec2.internal     Ready       7m6s    v1.34.2-eks-xxxxxx   us-east-1b
ip-10-0-3-7.ec2.internal     Ready       7m7s    v1.34.2-eks-xxxxxx   us-east-1c

Check System Pods

The below is the output you want, it means our brand-new EKS cluster is 100% healthy at the core level.

kubectl get pods -n kube-system

# Output

get pods -n kube-system
NAME                           READY   STATUS    RESTARTS   AGE
aws-node-8dds9                 2/2     Running   0          6m21s
aws-node-l95q6                 2/2     Running   0          6m32s
aws-node-vdhm4                 2/2     Running   0          6m10s
coredns-975b7d678-f5ghx        1/1     Running   0          6m32s
coredns-975b7d678-lnq69        1/1     Running   0          6m32s
eks-pod-identity-agent-2wwnk   1/1     Running   0          6m32s
eks-pod-identity-agent-7gxt9   1/1     Running   0          6m32s
eks-pod-identity-agent-v5fg7   1/1     Running   0          6m32s
kube-proxy-2tjxp               1/1     Running   0          6m32s
kube-proxy-c2r86               1/1     Running   0          6m24s
kube-proxy-ckt9w               1/1     Running   0          6m28s

aws-node: The Amazon VPC CNI plugin (gives pods IPs from your VPC, ENI support, security groups for pods, etc.)

coredns: CoreDNS — the cluster’s internal DNS server. 2 replicas for high availability. Both are healthy

eks-pod-identity-agent: EKS Pod Identity Agent (the recommended modern replacement for IRSA in newer clusters, if you are working on older EKS or by choice they use IRSA then it may use IRSA instead). Lets pods assume IAM roles securely without putting keys in secrets, one per node

kube-proxy: kube-proxy — makes Kubernetes Services (ClusterIP, NodePort, LoadBalancer) actually work, one per node, without which K8s will not work.

The control plane is healthy
Worker nodes successfully joined the cluster
Networking (VPC CNI) is working
DNS is working
IAM roles for service accounts / Pod Identity will work
Kubernetes Services will work

Summary of what we did today

So to summarize for your resume and knowledge:

✅ Created Terraform state file cloud backend.
✅ Created Terraform templates for Amazon EKS HA deployment.
✅ Learned about differences in container orchestration.
✅ VPC spans 3 Availability Zones for high availability
✅ Worker nodes distributed across multiple AZs.
✅ EKS control plane runs across 3 AZs (AWS managed).
✅ Diagnostic commands with kubectl for validation
✅ CoreDNS replicas on different nodes.

Ok are you ready to destroy this now?

⚠️ We need to save money 😁 Until the next article… We will ride again!

We have Terraform so now we can build all this infra again easily with terraform apply

7. Cleanup. Destroy all resources to avoid charges

🚨 IMPORTANT: Your cluster (only once deployed near the end of the article) costs approximately $0.50–$0.75/hour (as of writing, could vary later, depending on region and version of Kubernetes) while running. Always destroy resources when you’re done for the day! Also, do not just use the destroy command and assume it worked, double check in the AWS console that it actually did remove the resources.

Make sure you are in environments/dev where your main.tf is located

$ terraform destroy

That’s it, it will destroy your whole infra. Confirm all resources were destroyed.

To set up cost guardrails and AWS Budgets alerts see my articles:

🔥What a project! (ongoing with more series articles scheduled)

🚀 Props and respect to you for making it through this!

You certainly are a professional to stick with it.

Later you can rebuild it with terraform apply when you want to, just remember what I said about watching costs— you’ll get charged by the hour for several resources.

Looking Ahead…

Part 3: ✅ “I need to deploy my video app and Docker image on self-healing Amazon EKS nodes/pods with kubectl diagnostics”

Keep in mind, we are still in the early stages, we will get more advanced as the series continues.

🛠️ Get more like this at https://www.systemsarchitect.io 🚀Also follow the SystemsArchitect X account: https://x.com/systemsarch — we follow back!

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

Also follow the SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Saving: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

AWS EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap

Chris St. John — Mon, 15 Dec 2025 14:31:29 GMT

Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap

Setup AWS subaccount/admin, Terraform, Kubernetes, Docker/ECS, AWS CLI and all the prerequisites we need, and look ahead on the roadmap for this project!

Although I’ve mentioned Amazon EKS (AWS platform for Kubernetes, K8s) in some past writings, people have been asking for a while about me doing some more in-depth deep-dive tutorials on the K8s ecosystem here….

Now is the time! This is for intermediate-advanced cloud engineers, or even new devs who have the drive and aspiration to enhance their skills to jump to the next level.

I’m super-excited for this K8s/Amazon EKS series…. Let’s do it!

🛠 ️Is it worth the effort? — Yes, definitely!!! 🚀

Imagine your sense of accomplishment and confidence after completing the below list of highlights….

This is the best project series you can do for:

Mastering production-grade EKS at scale.
Deep Infrastructure as Code (IaC) with Terraform .
Learn multi-AZ high availability the right way.
Understand EKS-optimized node groups, Karpenter.
Work with EKS Pod Identity (Agent).
Prepare for GPU workloads and high-IOPS storage.
Build skills with kubectl, helm, kustomize, eksctl, AWS CLI daily.
Set up monitoring & logging foundations.
Get comfortable with ALB/NLB Ingress, cert-manager, external-dns.
With a focus on high availability (HA) and container management.

Details may vary as I am writing this now, will be incoming in part 2+.

Intro — How we got here.

This will be a practical, hands-on tutorial series.

Go from zero to a fully functional, scalable video-streaming platform running on Amazon EKS.
All built with Terraform and real-world AWS best practices
Special attention to high availability and media delivery.
And comparison to other container solutions like ECS.

When I was at the original MP3.com, back in the early dotcom days, our cracked Engineering infra team (accomplished devs and upcoming talent) handled massive scale media delivery — but they didn’t have Docker, Kubernetes or fancy orchestration tools.

Back then infrastructure code was mostly orchestrated with custom scripts copied with Linux automation and CVS (Concurrent Versions System, a precursor to git)— and that did work to run thousands of colo bare-metal servers!

But a lot of media delivery then was ad-hoc, on-the-fly, seat-of-the-pants learning… 20 years later, a lot of these learnings in the industry have evolved into a new modern cloud constellation of tools.

Now? It’s almost the other extreme. We have so many more sophisticated and elaborate choices to integrate with platforms like AWS and their many services…

And that is good AND bad. Good for standardization, docs and integrating tools/controls in more fully-featured management/deployment solutions.

But ironically…. that can make it all more complex and take extra time to ramp up devs to, which can be tricky. It’s almost a fulltime job to do that, besides knowing the features of all the other cloud SaaS, architecture and service varieties out there!

Looking ahead over the first 3 articles:

Basic INITIAL architecture diagram — Part 2 Terraform initial implementation

Testing high availability video hosting page (to demo Amazon EKS) — Part 3

Initially we will host the video on individual pods, but later we will try different configurations such as with S3 and other ideas. This is an experimental series where we will try different scenarios!

Containers, Docker and Kubernetes (K8s)

You probably already know why we use containers…. they package an application with its exact runtime, libraries, and dependencies into a single, lightweight, immutable unit that runs identically on a developer’s laptop, CI pipeline, staging, production on AWS, GCP, on-prem, or even an edge device.

Initially, Docker containers became the go-to. Then, services like Amazon ECS simplified some basic orchestration of Docker containers.

But running dozens or hundreds of Docker containers at scale was still too complex for many companies to pull off, especially when integrating AWS, Azure or GCP.

Kubernetes (K8s) sought to address this… and has become more popular every year, filling that gap especially for larger enterprises and now even a growing number of small and medium-sized tech businesses.

K8s is an advanced and more granular platform for container orchestration, management, and scaling. We have containers, Docker, pods, a lot of fancy controllers, YAML, kube-xxxx, CLIs and on AWS, for example, serverless options for running containers without managing underlying servers too (if using Fargate).

While a container service like Amazon ECS is great for small numbers of Docker-based containers and AWS-native simplicity, K8s (via EKS) offers a lot of other advantages: portability, lower-latency scalability and many other benefits discussed below for larger-scale solutions.

🚀 For many, it’s intimidating. Let’s get over that, together.

Roadmap of this tutorial series

Over several focused articles, we’ll progressively construct a production-like environment while deliberately comparing to other possible solutions, as Amazon ECS equivalents, so you can confidently decide which orchestration tool fits your use case. Also we’ll look at related tools in the K8s ecosystem.

Brief summary of our plans over the series:

You’ll begin by creating an isolated AWS subaccount, spinning up a minimal EKS cluster, and installing CLIs and Terraform. Isolated environments in IAM and IaC (infrastructure as code) allows us excellence in the Well-Architected category of Operational Excellence.
Next, quickly deploy a simple Node.js app that serves video — giving you that satisfying “it works!” moment by the second article. Something is happening, its not all configs for the sake of configs.
From there? We’ll layer on tools for High Availability HA like S3 (highly-available 99.99% availability, 11 9s durability) storage, CloudFront for low-latency delivery, autoscaling, proper load balancing, k8s extras and you’ll be feeling a sense of accomplishment!!!
Then we’ll drill down further with monitoring with Kubecost (and related tools) and the Kubernetes Dashboard, and finally CI/CD with ArgoCD, turning the project into something you’d be proud to show as a portfolio project.

By the end of this series you’ll not only have a cost-aware, auto-scaling video platform you can build or tear down in minutes, but also a deep, comparative understanding of K8s, EKS versus ECS, strong Terraform muscle memory, and exposure to the modern tools real teams use every day.

Whether you’re preparing for cloud certifications, system-design interviews, or leveling up your Kubernetes game for your job, follow along, code along, and let’s make this Amazon EKS series project grow into an impressive, resume-worthy architecture.

🥰 Thanks for reading! … 🔥 please clap and share this article, thanks! 🚀

Preview of what is in this article, which is mostly setup for phase 1:

Why are we using Amazon EKS?
AWS Account with Organizations. Isolate tutorial resources.
AWS CLI setup.
Terraform CLI v1.5+. Infrastructure as Code for all AWS resources.
Install kubectl. Command-line tool to interact with Kubernetes.
Docker basics and comparison to ECS.
Test Terraform + AWS
Clean up test directory
What’s Next Preview

Appendix: Troubleshooting

Estimated time: 30–60 minutes hands-on

Estimated cost: ~$0 in this article, creating only local setup and IAM, not other AWS resources. ⚠️ Note that later articles incur some small fees — assuming you use Terraform destroy to remove AWS resources at the end of the lesson within a couple hours. If you do not remove the AWS resources when you are done there will be ongoing charges.

⚠️ Typically, I found the stacks we make in series part 2 and 3 each should cost ~$1-$2 (or less) for about 2 hours each article. You can remove the stack sooner and use the latest version of K8s for less cost.

⚠️ Important: Use a recent version of K8s — AWS charges more per hour for “extended support”. We use 1.34 (current, “standard support” as of Dec. 2025) in our code to keep prices lower, but if you read this months or a year from now, you may want a newer version. Also note: prices may change when you read this, so exact cost could vary, keep an eye on it with AWS Cost Explorer.

1. Why use Amazon EKS and K8s.

Kubernetes is an open-source cloud tool that is separate from AWS. So we why are we using Amazon EKS (Amazon’s Kubernetes platform)?

We’re using Amazon EKS because it gives us a fully managed Kubernetes control plane with zero downtime upgrades, easy integration with AWS services (ALB, CloudWatch, IAM, EBS/EFS), and the same production-grade experience used by Netflix, Expedia, and Snap.

True portability and multi-cloud. Kubernetes is the industry standard orchestration platform (used by Google, Netflix, Spotify, Airbnb, etc.). Once you know EKS, you can run the exact same manifests on GKE, AKS, DigitalOcean, on-prem, or even move to self-hosted k8s.
Rich ecosystems and tools. Examples: Horizontal Pod Autoscaling + Cluster Autoscaler, Ingress controllers (ALB, NGINX, Traefik) with real Content-Based routing, canary deployments, etc. Unlike ECS, there are more options for Custom Controller logic with K8s on Amazon EKS.
Best-in-class deployment and observability stack. Cost estimate tools, Prometheus/Grafana, K9s/Lens, ArgoCD GitOps — all native and battle-tested. Native support for canary, blue/green, and other custom delivery methods vs. Amazon ECS. Some who have migrated indicate observability is much better and efficient.
Rich media workload primitives. Depending how far you want to take this demo you could learn a lot implementing Jobs/CronJobs for transcoding, HPA + Cluster Autoscaler for viral spikes, DaemonSets for logging/monitoring agents.
Advanced networking & scaling control. We can get more granular on pods and containers and use for scalability and high availability solutions. Some anecdotal discussions from larger companies indicate scaling is much faster with Amazon EKS than ECS. Snapchat has reported up to 50% decrease in scaling latency, after migrating from ECS to EKS, for example.
In-demand job skill. A lot of jobs require some knowledge of AWS, k8s and this demonstrates the ability to take on advanced tasks. Amazon EKS + Terraform + ArgoCD, for example, is the stack that many modern startups and mid-size companies actually are implementing for media, gaming, and SaaS in 2026.

2. AWS Management Acct. & Organizations Setup

We’re going to create an isolated AWS subaccount, role and user specifically for this tutorial series.

Why I am doing that first?

Is it really necessary? Yes.

If you do not want to, you can could skip the account isolation, but I do not recommend skipping this.

⚡️There are several key benefits:

Cost isolation. Easily track tutorial spending separately.
Security isolation. No risk to production resources.
Easy cleanup. Delete the infra and entire account when done. Deleting infra is super-important to reduce costs, as some costs are charged for hourly usage. So when we are done in each section, we want to delete infra so we do not get charged.
Since we are using Terraform (TF) for the app part, we can easily tear down and rebuild where we were at (delete, recreate).

If you have not used Terraform, it’s like a blueprint template for your cloud infra you can run locally and check in templates into Git — you can automatically and destroy all the pieces easily and quickly. When completely done with everything we will delete the entire account to be sure.

Realistic practice. Multi-account is an AWS best practice and commonly used in Enterprise projects.

2.1 Be Aware of Potential Costs

🚨⚠️ Possible AWS Cost Alert: Be aware that there may be some small costs involved in this project, using the AWS platform and Internet Gateway Interface — as of now, I am estimating it could be only $5-$10 USD for the full project (multiple articles), if you do each section promptly and follow my instructions to destroy infra when complete. We will be using some free tier services, but not all are free. Keep tabs on billing.

If you are unsure or worried about it then setup AWS Cost Explorer and Budget alerts for the account.

🚨 I will show you how to destroy the infra with Terraform (to reduce costs) and rebuild it quickly.

However, keep in mind, it is 100% your responsibility to confirm that all resources created/charged are also destroyed. If you do not do this, then it will cost and you will be charged in your Amazon bill.
So confirm you destroyed them at the end of each session. I will put some warnings top/bottom of each article to remind you.
Cost awareness is part of being a cloud architect and engineer! If you do not feel up to that, then read up on it only before implementing this project.

2.2 Initial AWS Organizations Setup for Subaccount

Sign in to the AWS Console, available after you signup to AWS.
https://us-east-1.console.aws.amazon.com/console/home or which ever region you use
⚠️ It is recommended to NOT use your Root account if possible!
⚠️ But if you do need to use Root OR previously have a config set up where your Org is only managed by Root, then you will have to setup another admin user account to switch into this new subaccount. I do give instructions for that further below, but it takes about 10 min. extra.
Enable AWS Organizations if it is not enabled.

3. Click Create an organization (with all features)

4. In the AWS Organizations console, click Add an AWS account.

5. Select Create an AWS account

AWS account name: eks-tutorial-dev
Email address: Use a unique email for this account if possible.
Tip: If you use Gmail, you can use email aliases. For example, if your email is yourname@gmail.com, use yourname+eks-tutorial@gmail.com. All emails will still go to your main inbox.
IAM role name: Leave as default (OrganizationAccountAccessRole). This allows you to assume access
Tags: This is best practice, an enterprise may have a system but for these we will do these key values: Project=eks-video-tutorial, Environment=learning, Owner=eks-tutorial-dev, CostCenter=personal-learning, ManagedBy=terraform, DeleteAfter=20251230

You can change “Chris” or leave it in honor of me 🤣

6. Click Create AWS account and wait a few minutes to complete.

7. Note the Account ID of your new subaccount (a 12-digit number like 987654321098). You will need that soon.

2.3 Access the Subaccount via Role Assumption

Now we have to switch to the subaccount using the role.

⚠️ IMPORTANT — You cannot use your Root mgmt. AWS login to switch to a subaccount. AWS console does not allow you to switch to subaccounts from Root account.

So how to fix this? We will create a special admin account eks-admin for this project. Then you can switch from that!!!

In the top-right corner of the AWS Console, click on your account name/number. Do you have a Switch link? If so, click that and you are good. If Not continue below…
There are a couple ways of doing this next part. Several issues can cause you to not have the “Switch Role” link in the upper right due to some configs/AWS changes — you may not have it yet and it may be faster to just use the url directly below … if that works then you are good, but if it does not work continue below.
✅ Prepare a url like this and put in your browser (one line) — replace 123456789012 in this sample with the new account ID (12 numbers)you were given
https://signin.aws.amazon.com/switchrole?account=123456789012&roleName=OrganizationAccountAccessRole&displayName=eks-tutorial-dev
Confirm the following:

Account: Your new subaccount ID (e.g., 123456789012)
Role: OrganizationAccountAccessRole
Display Name: EKS-Tutorial (for easy identification)
Color: Choose any color (helps identify which account you’re in)

Now click the Switch button. You should now be in the subaccount.

⚠️ If a permissions error, read below, we will create a new admin account, and if that does not resolve the issue, you may need to add extra permissions to your mgmt. account. (Troubleshooting Appendix)

2.4 Create admin user for this project

We will now create a special admin account, calling it eks-admin for this project. Then you login to that and then you can switch to the subaccount.

In your management account console, go to IAM: https://console.aws.amazon.com/iam/home#/users
Click Create user.
User name: eks-admin
Check Provide user access to the AWS Management Console.
Console password: Auto-generated (let AWS create one; you’ll reset it next).
Uncheck “Require password reset” (optional, but easier).

Permissions

Click Attach policies directly.
Search for and select AdministratorAccess as that will be needed for ths project.
Click Next.
Click Create user.

Re-login as the new admin user

Open a new incognito/private tab (to avoid root session conflicts).
Go to https://aws.amazon.com/console/.
Select IAM user sign in.

Account ID: (your management ID 12 digit number of this new user, not the earlier new account number). You can find it on the new user’s IAM page, somethig like arn:aws:iam::69019161xxxx:user/eks-admin (the number highlighted)
IAM username: eks-admin.
Password: The one you copied.

Now do the Switch.

Use this URL below. You are using this new user to switch to the Organizatinos account.

Therefore for the url below you should now be using:

The account number in Organizations we made earlier (first one, not the new user here)
that account number name, such as if you used what I suggested eks-tutorial-dev
Url:

https://signin.aws.amazon.com/switchrole?account=123456789012&roleName=OrganizationAccountAccessRole&displayName=eks-tutorial-dev

If that does not work still then go to the Appendix: Troubleshooting Subaccount Switch at the bottom of the article.

2.5 Create an IAM User in the Subaccount for CLI Access

We need a user for CLI/Terraform (TF) for ops work:

Navigate to IAM > Users > Create user
User details:

User name: terraform-eks-admin
Click Next

3. Permissions

Select Attach policies directly
Search for and check AdministratorAccess

note: For now in this demo so I can show many things while I am writing this we are using admin access in this subaccount, but ideally this should be narrowed to least privilege for security. We will revisit that.

Click Next

4. Review and create:

Click Create user

5. Create access keys for CLI:

Click on the newly created user terraform-eks-admin
Go to Security credentials tab
Scroll to Access keys > Click Create access key
Select Command Line Interface (CLI)
Check the confirmation box
Click Next > Create access key
Important: Download the CSV and/or copy text to your secrets/password manager
Access key ID
Secret access key
You will need to configure these in your AWS CLI setup below.
Click Done

note: In an enterprise setup you probably will be using SSO (IAM Identify Center) but this will work for demo purposes for now. Just make sure to delete keys at some point when you are done, to prevent any security issues.

3. Install and Configure AWS CLI & Terraform

I have written a whole article on this for MacOS and Windows at:

AWS CLI/Terraform Setup for MacOS/Windows

⚠️ When you are done following the IAM setup article above “AWS CLI/Terraform Setup for MacOS/Windows” then continue the below instructions which assume you did that already.

Verify your identity when you are logged into the CLI


aws sts get-caller-identity

# or to check that the profile exists 
aws sts get-caller-identity --profile terraform-eks-admin

To avoid having to keep using — profile what you can do is

export AWS_PROFILE=terraform-eks-admin

Or add it to your shell to be default something like

# For bash
echo 'export AWS_PROFILE=eks-tutorial' >> ~/.bashrc
source ~/.bashrc

# For zsh (macOS default)
echo 'export AWS_PROFILE=eks-tutorial' >> ~/.zshrc
source ~/.zshrc

Now do

aws sts get-caller-identity

# should give you the correct account and username

aws s3 ls

# no error; if incorrect login, will have an error

⚠️ Enable Cost Explorer for the Subaccount

While switched into the subaccount in the Console, go to Billing and Cost Management
In the left sidebar, click Cost Explorer
Click Enable Cost Explorer (takes 24 hours to populate data)

4. Install Terraform locally if you do not have it

Same article as above but I’ll put it here again:

AWS CLI/Terraform Setup for MacOS/Windows

Make sure to get Terraform setup, we’ll be using that.

For this article I am using

terraform -v

Terraform v1.14.1

🥰 Thanks for reading! 🔥 Please clap, share, and follow.

5. Install kubectl (k8s)

kubectl is the official command-line tool for interacting with Kubernetes clusters. Use it locally on your dev machine.

It allows you to deploy applications, inspect and manage cluster resources, view logs, execute commands in pods, and trigger scaling or rollouts with a single command, and more. It’s essential for interacting with out resources.

On MacOS:

brew install kubectl

# Output: 

==> Fetching downloads for: kubernetes-cli
✔︎ Bottle Manifest kubernetes-cli (1.34.3)                                                                [Downloaded    7.5KB/  7.5KB]
✔︎ Bottle kubernetes-cli (1.34.3)

On Windows

choco install kubernetes-cli

Linux (may vary slightly):

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

chmod +x kubectl

sudo mv kubectl /usr/local/bin/

Validate kubectl CLI response:

$ kubectl version --client
Client Version: v1.34.3
Kustomize Version: v5.7.1

Enable kubectl Autocompletion (Highly Recommended) — Bash example:

echo 'source <(kubectl completion bash)' >> ~/.bashrc
echo 'alias k=kubectl' >> ~/.bashrc
echo 'complete -o default -F __start_kubectl k' >> ~/.bashrc
source ~/.bashrc

Now you can do something like k get pods instead of typing kubectl get pods — note: that if you do that right now you will get a server error because we have not hooked that up yet!

$ k get pods

# Output:
E1210 15:28:11.689728   32330 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E1210 15:28:11.691210   32330 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"

6. Docker Basics (Quick Review)

You don’t need Docker installed locally for this article but you should have it for later, I believe article Part 3. Also you should understand these core concepts:

Image: Packaged app with all dependencies (same for ECS and EKS)
Container: Running instance of an image. (called ECS Task in ECS)
Dockerfile: Building container image.
Registry: Storage for images (in AWS ECR, Docker Hub) — used in both ECS and EKS
Tag: Version label to keep track of versioning.

Differences between ECS and EKS (sample):

“Pod Spec” (EKS), Task definition (Amazon ECS), “: container config.
“Pod” (EKS), “Task” (ECS): Running container instance/s
EKS Pod Identity (EKS), “Task Role” (ECS): IAM Permissions
Replicas (EKS), Desired Count (ECS): How many instances to run
Ingress + Service (EKS), ALB + Target Group (ECS): routes external traffic.
Node Group / Karpenter (EKS), Capacity Provider (ECS): Managing EC2s
HorizontalPodAutoscaler (HPA) (EKS), Auto Scaling (ECS)
ArgoCD / Flux (EKS), CodeDeploy (ECS): GitOps/pipelines

As you can see there are similarities and some differences on the container aspect for ECS and EKS. quick note: You can use other non-Docker container image platforms. For this we’ll use Docker runtime since it’s most common.

Sample Dockerfile (plain text, not YAML)

FROM node:18-alpine
# FROM node:18-alpine - Node.js version 18 on Alpine Linux as our base. 

# Set working directory inside container
WORKDIR /app

# Copy dependency files first (for better caching)
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production
# npm ci uses dep. caching, faster, RUN is during image build

# Copy application code
COPY . .

# Expose the port our app listens on
EXPOSE 3000

# Command to run when container starts, CMD is when container starts
CMD ["node", "server.js"]

A Dockerfile is a text filelist of instructions that Docker reads and executes from top to bottom to build a Docker image. As you can see the Dockerfile basics are easy to follow.

Optional: Install Docker Desktop

We’ll be needing later. create and test containers locally:

macOS/Windows: Download Docker Desktop from https://www.docker.com/products/docker-desktop/

docker --version

Dcoker is only optional right now. We will use it in article 3, so you do not need to do it now, but it is get to get familiar with it.

7. Test Terraform + AWS Integration

Before we build real infrastructure, let’s verify Terraform can communicate with your AWS subaccount correctly.

Create a test directory (change path as you like):

mkdir ~/eks-tutorial-test
cd ~/eks-tutorial-test

Create a Test Configuration File:

We need main.tf for Terraform. This will just confirm everything works so far and we are on the right track for next tutorial article.

# Terraform configuration
terraform {
  required_version = ">= 1.5.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# AWS Provider - uses your eks-tutorial profile
provider "aws" {
  region = "us-east-1"
  
  # Best practice: Default tags for all resources
  default_tags {
    tags = {
      Project     = "eks-video-tutorial"
      Environment = "learning"
      ManagedBy   = "terraform"
    }
  }
}

# Data source - queries AWS for current identity
data "aws_caller_identity" "current" {}

# Data source - gets current region
data "aws_region" "current" {}

# Outputs - displayed after terraform apply
output "account_id" {
  description = "AWS Account ID (should be your subaccount)"
  value       = data.aws_caller_identity.current.account_id
}

output "user_arn" {
  description = "Current IAM user ARN"
  value       = data.aws_caller_identity.current.arn
}

output "region" {
  description = "AWS Region"
  value       = data.aws_region.current.name
}

output "status" {
  value = "Terraform successfully connected to AWS account ${data.aws_caller_identity.current.account_id} in ${data.aws_region.current.name}"
}

Initialize terraform

$ terraform init

# Output:

Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.100.0...
- Installed hashicorp/aws v5.100.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

# Note: Here if you did not have all current libraries needed by TF
# there will be some downloads as shown here

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Terraform apply (“111111111111” will be changed to your account number)

$ terraform apply

# Output

 terraform apply
data.aws_caller_identity.current: Reading...
data.aws_region.current: Reading...
data.aws_region.current: Read complete after 0s [id=us-east-1]
data.aws_caller_identity.current: Read complete after 0s [id=111111111111]

# and so on... there will be a lot of info about what is going to occur.


Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

$ yes

Outputs:

account_id = "111111111111"
region = "us-east-1"
status = "Terraform successfully connected to AWS account 111111111111 in us-east-1"
user_arn = "arn:aws:iam::111111111111:user/terraform-eks-admin"

And you can also check the output with this:

$ terraform output

# this will output the same info

That’s it!

8. Cleanup

When you are sure you got the correct result and it’s connected as stated above, then you can delete the test folder.

rm -Rf eks-tutorial-test

NOTE: We do NOT have to run the terraform destroy command used to delete Terraform AWS resources normally, because in this example we only created outputs not actual resources in AWS yet.

9. What’s Next: Article 2 Preview

You’ve completed all the prerequisites! Your AWS subaccount is ready, tools are installed, and Terraform can communicate with AWS.

In Article 2, we’ll build real infrastructure:

Create the Terraform backend. S3 bucket + DynamoDB for state locking
Build a multi-AZ VPC. Subnets across 3 availability zones
Provision your EKS cluster. Using the terraform-aws-modules/eks module
Deploy your first node group. 2 × t3.small instances spread across AZs
Connect kubectl to your cluster. Run kubectl get nodes and see real nodes!

Estimated time: 45–60 minutes hands-on

Estimated cost: ~$1–2 if you complete and destroy within a few hours (note: costs may vary or change by the time you read this, so double-check, and keep close track)

Looking Ahead…

Here are coming article topics…

Master production-grade EKS at scale (the #1 way companies run Kubernetes going into 2026)
Setup the basics of a a multi-AZ cluster with Kubernetes.
Deep Infrastructure as Code with Terraform (serious companies use it)
Learn multi-AZ high availability the right way (VPC, subnets, load balancers, node placement)

Possible future article topics (I’m still preparing these):

Understand EKS-optimized node groups, Karpenter vs Cluster Autoscaler trade-offs
Work with EKS Pod Identity (Agent) — critical for secure media apps talking to S3, DynamoDB, CloudFront, etc.
Build muscle memory: kubectl, helm, kustomize, eksctl, AWS CLI daily
Set up monitoring & logging foundations (CloudWatch, Prometheus)
Get comfortable with ALB/NLB Ingress, cert-manager, external-dns — exactly what media sites need

If we have time in the series we may even try some other things like Amazon EKS Capabilities “a layered set of fully managed cluster features that help accelerate developer velocity” .

Basic INITIAL architecture diagram — Part 2 Terraform initial implementation

⚠️ APPENDIX: Troubleshooting Subaccounts, Org Permissions and Switching between them

Reminder: You cannot switch to a subaccount from the Root account. If you are trying to do that it will not work.

The initial management account you logged into to create the Organization and subaccount must have permissions and/or policy attached for AWSOrganizationsFullAccess. If you get a permissions error, go to IAM > Users > the management account (used to create the above subaccount initially) and add AWSOrganizationsFullAccess policy.

I have had a couple errors in the past related to this, and I used a different mgmt. acct. this time and I had to add that, so this part may require some extra steps depending on your configuration.

⚠️ Troubleshooting step #2: I also got this error. If it is still not working while into your mgmt account (that created the subaccount) then you need to add OrganizationAccountAccessRole.

Go to IAM > Roles > Create Role (while logged into your mgmt account)
Choose “AWS account” → Choose “Another AWS account”
In the Account ID box, type the 12-digit ID of your eks-tutorial-dev subaccount
Click Next
On Add Permissions page tick the box next to AdministratorAccess > Next
Then you role have Role Name (type exact): OrganizationAccountAccessRole
Description: “Management account access role”
And click “Create role” button.

Now try the same url we used above.

⚠️ Troubleshooting step #3: If that still does not work, the next step is to login in directly to the IAM subaccount we created.

To login you log out of your other session > login from the console using the option “login as root” and use your subaccount email address. So if you did myemail+eks+tutorial@gmail.com (pattern suggested above, then login as that). Since you did not setup a password yet, click “forgot password” you will be sent a verification number to put in and setup a new password, and relogin with that.

Once logged into the subaccount: Go to IAM >Roles > OrganizationAccountAccessRole > Trust relationships > Edit trust policy

🛠️ Get more like this at https://www.systemsarchitect.io and follow the Follow the SystemsArchitect X account: https://x.com/systemsarch

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next article.

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

Also follow the SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Saving: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

✅ Cloud Memory Optimization Checklist (CC #2)

Chris St. John — Wed, 03 Dec 2025 15:47:22 GMT

Right-size RAM allocations, leverage caching, tune JVMs/containers, offload cold data for max efficiency

This is the second checklist in the Cloud Checklists series — focusing entirely on memory optimization across AWS, Azure, GCP, and cloud-native environments, and I am putting a bit more focus on Linux-based servers.

Cloud Memory Optimization Checklist (CC #2)

Why memory deserves its own checklist:

Memory is often the most expensive resource in cloud VMs — we need to get this right-sized. Most people remember CPU right-size, but ironically, forget memory!!!
Poor memory management causes OOM (out of memory) issues, garbage collection problems, and cascading performance failures more often (in my opinion) than CPU bottlenecks in CPU or I/O
Most cloud cost overruns I see in real environments (50%+) come from chronically over-provisioned or poorly tuned memory rather than CPU waste.
This is a good one to use to catch issues others missed!

Here is what I am going to cover for Memory Optimization on cloud projects:

✅ Cloud Memory Optimization Checklist

Monitor memory utilization
Select memory-optimized instances
Add in-memory caching
Use managed cache services
Compress inactive data
Tune application heap size
Fix memory leaks fast
Scale vertically before horizontally
Use swap only as last resort
Offload to object storage

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next checklist that I put out!

1. Monitor memory utilization

Memory utilization and true memory pressure is hidden to most default cloud dashboards. For example, when Linux reports “used” memory, it includes file caches that can be reclaimed instantly under pressure. A system showing 90% memory “used” could still have plenty of headroom,

Enable any detailed memory metrics you can get from your platform, look up in the docs, it will give you more clues.
Track out-of-memory events in container runtimes.

Essential metrics from /proc/meminfo:

MemAvailable: An estimate of memory available for new applications without swapping. This is more useful than MemFree because it accounts for reclaimable caches.
MemFree: Truly unused memory (often misleadingly low on healthy systems)
Cached: Memory used for file caches (mostly reclaimable)
Buffers: Memory used for block device buffers
SwapFree / SwapTotal: Swap usage (high swap activity indicates memory pressure)
AnonPages: Memory allocated by applications (heap, stack) that isn’t backed by files — this is your actual application memory footprint
Active vs Inactive: Memory recently accessed vs memory that can be reclaimed
Set alerts on MemAvailable < 10%: Reliable indicator of real memory pressure across all Linux OSs, though PSI (Pressure Stall Information) is most highly-regarded on newer Linux servers.
Use unified observability such as from providers like Datadog Memory Deep Dive, New Relic APM, Prometheus node_exporter + cAdvisor for containers

🚀 Tip: Pressure Stall Information (PSI): Recommended for Linux 4.20+ PSI is the modern, authoritative way to detect memory pressure.

CLI examples:

# Snapshot of memory state
free -h

# Detailed breakdown
cat /proc/meminfo

# Memory pressure (Linux 4.20+)
cat /proc/pressure/memory

# Real-time stats with swap activity
vmstat 1

# Per-process memory usage
ps aux --sort=-%mem | head -20

# OOM killer logs
dmesg | grep -i "killed process"

AWS: CloudWatch, enable detailed metrics with agent directly or through Systems Manager.
Azure: Enable VM Insights + Guest OS diagnostics for Percentage Used Memory and Available MB
GCP: Install Ops Agent and enable memory metrics in Cloud Monitoring
Cloud Native: Use Prometheus + node_exporter node_memory_MemAvailable_bytes with Grafana dashboards

More info on setting up CloudWatch

2. Select memory-optimized instances

Choose instance families specifically designed for high memory-to-vCPU ratios when workloads are have the highest memory requirements.

Example reasons you need to focus on memory-optimized instances:

Your app performance scales directly with available RAM
Data is required to reside entirely in memory for acceptable latency
You’re over-provisioning vCPUs just to get enough memory
Swapping or paging to disk is degrading performance

🚀 Tip: Benchmark your actual workload on a small memory-optimized instance first — many “memory-heavy” apps actually become CPU-bound once given enough RAM.

Platforms (examples I looked up check for more):

AWS: R6i/R7g (Intel/Graviton), X2gd (ARM with local NVMe), or High Memory u-*.metal
Azure: Easv5/Epsv5 (AMD), Dasv5 (general), Masv5 (SAP HANA)
GCP: Tau R4 (Ampere), M3/C3 (high-memory with local SSD)
Cloud Native: Karpenter or Cluster Autoscaler

3. Add in-memory caching

Move “hot” data from disk or database into RAM using application-level caches (Redis, Memcached, Caffeine, Guava, etc.).

Why this is needed: Every time someone views a product page, your app queries the database to fetch product details, reviews, and recommendations. If that page gets 5,000 views per hour, you’re hitting the database 5,000 times for data that perhaps rarely changes.

Database hits can add up and be performance expensive, and add cost.

Speed estimates (~ballpark):

RAM access: ~100 nanoseconds
SSD access: 1,000x slower
Database query over network: 100,000x+ slower

I am not going into every caching strategy, it would be an insanely long article, and there are many. when I was at NIKE I did an innovation sprint report on several and strategies and it was like 20–30 slides (1hr), and I went over time limits 🤣. Just be aware of these, and research your use case when it comes up.

Cache data that is:

Read frequently: product catalogs, user profiles, configuration settings.
Expensive to compute: analytics dashboards, report aggregations.
Slow to fetch: data from external APIs or complex database joins.
Relatively stable: data that doesn’t change every second.

🚀 Tip: Pre-warm caches on deployment using a “cache primer” or Lambda that reads the top 10K keys

Well known cache software to explore (instance-based but I also gave cloud versions):

Redis (popular): Stores key-value pairs in memory. Multiple application servers can share the same cache. Complex data types, persistence, and pub/sub. Great for distributed systems. For cloud: AWS ElastiCache for Redis, Azure Cache for Redis, or GCP Memorystore for Redis as fully managed services. AWS also offer Valkey which is a similar fork that has lower cost.

Memcached: Similar to Redis but simpler, less bells-and-whistles. Used in many high volume stacks. Very fast. For cloud: AWS ElastiCache for Memcached or GCP Memorystore for Memcached.

Caffeine (Java): An in-process cache library. Data lives inside your application’s memory. Fast but not shared between servers. For cloud: Can run on various cloud compute services for each platform.

Guava Cache (Java): Google’s library, similar to Caffeine but older. Still widely used in existing codebases.

4. Use managed cache services

Offload cache operations, eviction, replication, backups, and scaling to fully managed services instead of self-hosted Redis.

As mentioned above in more detail these are most popular for cloud services (copied from where I mentioned it above)

Redis (popular): AWS ElastiCache for Redis, Azure Cache for Redis, or GCP Memorystore for Redis as fully managed services. AWS also offer Valkey which is a similar fork that has lower cost.

Memcached: AWS ElastiCache for Memcached, Azure Cache for Redis (Memcached tier), or GCP Memorystore for Memcached.

Also some people use other services like Upstash, which is popular for Vercel apps.

Managed cache services are quite popular and my experience I they do good for many use cases:

Zero-ops, automatic failover, encryption at rest/transit, and built-in monitoring
Pay only for provisioned memory + requests (often cheaper than running 24/7 standalone)

🚀 Tip: Enable “cluster mode disabled” + Multi-AZ for Redis if you only need one shard (has limits but is fast).

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next checklist!

5. Compress inactive data

Compress cold or infrequently accessed objects in memory or before persisting to disk.

Lower memory usage = smaller instances or more capacity on existing ones
More active data fits in memory when inactive data is compressed
App-based, time-based compression, Access-based compression (access patterns), columnar compression (similar data types together)

Popular algorithms for app level compressions:

Gzip: Best compression ratio, slower
LZ4: Fast compression/decompression, moderate ratio
Snappy: Very fast, lower ratio
Zstandard: Good balance of speed and ratio

Tools with caching services and software:

Redis: Built-in compression for large values
Memcached: Can store compressed data manually
In-memory DBs: Often have native compression

⚡️ Quick promo message ⚡️ (article continues below)

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

6. Tune application heap size.

Over-provisioned JVMs/Node.js/Python heaps waste huge amounts of RAM and trigger excessive GC.

The heap is the chunk of RAM your application uses to store objects and data while running.

In JVM (Java), Node.js, and Python, you can configure how much memory the heap can use. It is the maximum amount of RAM allocated for your application’s runtime data (NOT including code, stack, or native memory).

😱 So if you don’t have enough heap allocated, you’re screwed. But you should not have too much either.

What happens if you have…

Too much (heap allocated):

Wastes RAM that other apps could use.
Garbage collection (GC) pauses become longer.
Waste money on too much RAM. If you’re running 10 containers, each with 10GB heap when they only need 2GB, you’re paying for 80GB of wasted RAM.

Too little:

Garbage collection “thrashing” (constant).
Out of memory errors.
Application crashes.

Recommended:

1.5–2x your actual working memory needs.

Example with Node.js

# Default: ~1.4GB on 64-bit systems
node app.js

# Tuned: Set max old space (heap) size
node --max-old-space-size=2048 app.js  # 2GB heap

7. Fix memory leaks fast

A memory leak is when an app allocates memory but fails to release it, causing gradual memory consumption growth until the system runs out of RAM or crashes.

Even tiny leaks (a few MB/hour) become multi-GB problems in long-running cloud services.

Implement continuous memory profiling in production
Don’t wait for OOM (Out of Memory) errors to set up alerts, do it preemptively.
Set alerts when memory usage increases >5% per hour over baseline
Circular references are the #1 cause of leaks in Python, JavaScript
Enable memory leak detection in CI/CD pipelines

Platforms:

AWS: Use CloudWatch Container Insights with anomaly detection. Automatic alerts on memory trend deviations

Azure: Application Insights Live Metrics + Snapshot Debugger

GCP: Cloud Profiler continuous profiling

Kubernetes: Use VerticalPodAutoscaler with memory leak restart policies

8. Scale vertically before horizontally

Adding more RAM to a single instance is almost always cheaper and lower latency than distributing across many small instances.

Network hops, serialization, and sharding overhead often dominate once you go beyond 4–8 nodes
Vertical scaling avoids split-brain, quorum, and consistency headaches

🚀 Tip: Use AWS EC2 High Memory (u-*.metal) or Azure Mv2 for >12 TB single-node Redis — impossible to achieve horizontally at same price/performance

9. Use swap only as last resort

Swap kills performance and causes instance termination in spot/preemptible environments.

Linux default swappiness=60 is far too high — set to 1 or 0

# Never do this in cloud
echo 'vm.swappiness=60' >> /etc/sysctl.conf   # ← BAD
echo 'vm.swappiness=1'  >> /etc/sysctl.d/99-low-swappiness.conf  # ← GOOD

Cloud providers terminate instances that swap heavily on burstable/spot nodes

10. Offload to object storage

Move large blobs, logs, backups, and static assets out of instance memory/disks into S3/GCS/Azure Blob.

Use S3 Intelligent-Tiering or GCS Nearline for infrequently accessed data

🚀 Tip: Use AWS S3 + CacheControl headers + CloudFront to make even large 50GB+ datasets feel like local RAM with <50 ms first-byte latency

AWS S3 / Glacier Instant Retrieval
Azure Blob Storage Hot/Cool tiers
GCP Cloud Storage Nearline + Cloud CDN

🥰 Thanks for reading! 🔥 Please clap, share, and follow for the next checklist that I put out!

⚡️ Quick promo message ⚡️

If you would like to beta test and get involved with my new app SystemsArchitect.io for cloud engineering check it out, and feel free to send me any comments. You are early, your input counts!
The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content, and I’ll be giving some good Pro discounts for testers later for the Pro plan. It’s a slow rollout because there is a lot to test!

https://www.systemsarchitect.io/

I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.

About me

I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.

I’m open to discussing projects, for both enterprise and startups. If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.

🚀 My current project I am working on is SystemsArchitect.io (in Beta testing) which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. Check it out: https://systemsarchitect.io

Also follow the SystemsArchitect X account: https://x.com/systemsarch

My latest articles on Medium: https://medium.com/@csjcode

Cloud Cost Saving: https://medium.com/cloud-cost-savings

Cloud Architect Review: https://medium.com/cloud-architect-review

AI Dev Tips: https://medium.com/ai-dev-tips

Solana Dev Tips: https://medium.com/solana-dev-tips

Chris St. John - Medium

✅ Cloud Memory Optimization Checklist (CC #2) was originally published in Cloud Checklists on Medium, where people are continuing the conversation by highlighting and responding to this story.