<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Chris St. John on Medium]]></title>
        <description><![CDATA[Stories by Chris St. John on Medium]]></description>
        <link>https://medium.com/@csjcode?source=rss-649f4282ab20------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*Rn1EIBjMHVzwp0HzfIR6pg.jpeg</url>
            <title>Stories by Chris St. John on Medium</title>
            <link>https://medium.com/@csjcode?source=rss-649f4282ab20------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 28 May 2026 16:41:58 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@csjcode/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[AI Chat Coding Essentials — Adding Tools (AI Agent Coding Series #3)]]></title>
            <link>https://medium.com/ai-dev-tips/ai-chat-coding-essentials-adding-tools-ai-agent-coding-series-3-41c3a1ba21b9?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/41c3a1ba21b9</guid>
            <category><![CDATA[openai]]></category>
            <category><![CDATA[ai-agent]]></category>
            <category><![CDATA[llm-agent]]></category>
            <category><![CDATA[typescript]]></category>
            <category><![CDATA[web-development]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Sun, 25 Jan 2026 22:41:34 GMT</pubDate>
            <atom:updated>2026-03-20T17:14:44.977Z</atom:updated>
            <content:encoded><![CDATA[<h3>AI Chat Coding Essentials - Adding Tools (AI Agent Coding Series #3)</h3><h4>Single-Turn example of adding tooling and API calls to your AI Chat requests, complete tool-calling flow.</h4><p>So far, we did a quick review of <a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">how OpenAI Chat Completion API works.</a></p><p>We started with that just so we were sure to know the fundamentals before building on it.</p><p>Now we start getting into the fun stuff that leads to understanding AI Agents.</p><p><strong>Function calling, “tools”, is how LLMs interact with the real world.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*omvpRSAfi7sL6We1-t1yYQ.jpeg" /></figure><p><strong>Function calling</strong> is a mechanism where the LLM can request to execute external functions by <strong>returning structured data</strong> (function name + arguments) instead of just text, <strong>enabling interaction with APIs, databases, and external systems.</strong></p><p><strong>We are still using OpenAI Chat Completion SDK</strong> but getting increasingly more advanced.</p><p>We’re following this path to get a better understanding of how AI agents work. These principles can be applied to other different and more sophisticated agent SDKs as well, but this uses the most popular OpenAI API.</p><p><strong>The principles are the same.</strong></p><ul><li><strong>Function calling is </strong>the foundation of all AI agents.</li><li>This tutorial helps you understand the request/response cycle that gets more complex and we build on for AI agents.</li><li>Reviews and demonstrates practical async API knowledge.</li><li>By the end you have a couple basic examples of how to use it.</li></ul><p>Let’s look at the steps for getting this working.</p><ol><li><strong>Project setup.</strong></li><li><strong>Tool schema definition.</strong></li><li><strong>Adding multiple tools.</strong></li><li><strong>Sending tools and parsing response.</strong></li><li><strong>Response structure when tool is called.</strong></li><li><strong>Parsing tool calls (OpenAI SDK v6.x).</strong></li><li><strong>Executing functions with error handling.</strong></li><li><strong>Sending tool results back — single-turn tool loop.</strong></li><li><strong>Single-turn template — simple example (mock data).</strong></li><li><strong>Single-turn template — actual API call example.</strong></li></ol><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong>If it feels a bit slow or tedious to learn this, a few KEY points:</strong></p><ol><li><strong>We are building on past lessons,</strong> this makes it easier to learn and will make your understanding more complete. For example, even the setup I tell you below you can copy-paste from last lesson.</li><li>I developed this code in an afternoon,<strong> it should be even quicker for you to copy-paste </strong>what I provided here.</li><li><strong>It’s all logical </strong>once you understand the <strong>steps and tool calling loop</strong>.</li><li><strong>We build on this for future lessons.</strong></li></ol><h4>Are we using the advanced code class version from Article #2?</h4><p>No. We are going to circle back to that one in a couple articles, but right now I want us to get the function calling for the tools down.</p><p><strong>Previous articles in the series:</strong></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)</a></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)</a></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">AI Chat Coding Essentials — Advanced version (AI Agent Coding Series #2)</a></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)</a></p><h4>Single-turn vs. Multi-turn in Agentic AI</h4><p>Just to be clear of the terminology of what single/multiple-turn means:</p><ul><li><strong>Single-turn: </strong>The entire tool resolution happens within <strong>one user message</strong>. The program/agent handles the back-and-forth internally (LLM → tool → LLM) and returns the final answer without asking the user anything else.</li><li><strong>Multi-turn: </strong>The conversation requires <strong>additional user messages</strong> at some point during tool use. For example, if the user is booking a hotel it returns the hotel options and asks for the user’s input for which option and then continues.</li></ul><p><strong>This tutorial is for Single-turn AI agent workflows:</strong></p><h4>Basic Tool Request/Response Flow</h4><p>User message → <br>LLM → <br>Tool calls →<br>Execute →<br>Tool result →<br>LLM →<br>Final answer</p><h3><strong>1. Project setup</strong></h3><p>To save time and space here I am not going too deep into the initial setup — I covered that here: <a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">AI Chat Coding Essentials Advanced version (AI Agent Coding Series #2)</a></p><p>I copied and pasted my package.json and tsconfig.json from the last project we did (url above). That works!</p><p>But if you do not have that, you need to have these dependencies in package.json:</p><pre>{<br>  &quot;dependencies&quot;: {<br>    &quot;dotenv&quot;: &quot;^17.2.3&quot;,<br>    &quot;openai&quot;: &quot;^6.15.0&quot;<br>  },<br>  &quot;devDependencies&quot;: {<br>    &quot;@types/node&quot;: &quot;^25.0.3&quot;,<br>    &quot;ts-node&quot;: &quot;^10.9.2&quot;,<br>    &quot;typescript&quot;: &quot;^5.9.3&quot;<br>  }<br>}</pre><p>and a tsconfig.json to configure ts-node for CommonJS modules (this can vary but usually has this):</p><pre>{<br>  &quot;compilerOptions&quot;: {<br>    &quot;target&quot;: &quot;ES2020&quot;,<br>    &quot;module&quot;: &quot;CommonJS&quot;,<br>    &quot;moduleResolution&quot;: &quot;node&quot;,<br>    &quot;esModuleInterop&quot;: true,<br>    &quot;strict&quot;: true,<br>    &quot;skipLibCheck&quot;: true,<br>    &quot;outDir&quot;: &quot;./dist&quot;<br>  },<br>  &quot;ts-node&quot;: {<br>    &quot;transpileOnly&quot;: true<br>  }<br>}</pre><p>Store your API key in a .env file at your project root:</p><pre>OPENROUTER_API_KEY=your_api_key_here</pre><p>So the start of your script — you can call it <strong>single-turn-tool.ts</strong> or something similar.</p><p>Single turn means where we are making this code to make one tool call loop, not multiple tool calls.</p><pre>import dotenv from &quot;dotenv&quot;;<br>import path from &quot;path&quot;;<br><br>// Load .env from project root (adjust path as needed)<br>dotenv.config({ path: path.resolve(__dirname, &quot;../../.env&quot;) });<br><br>import OpenAI from &quot;openai&quot;;<br><br>const client = new OpenAI({<br>  baseURL: &quot;https://openrouter.ai/api/v1&quot;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>// Use model: &quot;openai/gpt-4o&quot; for OpenAI models via OpenRouter</pre><p>I am using <strong>OpenRouter</strong> as I mentioned in previous articles, check the first one in the series if you are not sure what this means.</p><p>That is why we have baseURL: “https://openrouter.ai/api/v1&quot;.</p><p>If you want to call OpenAI directly use their API key (from their API portal/site) and remove the baseURL.</p><h3><strong>2. Tool schema definition</strong></h3><p>Tools are the bridge between natural language and executable code.</p><p>When you define a tool schema, you’re telling the LLM: “Here’s a function you can request, here’s what it does, and here’s exactly what arguments it needs.”</p><p>The LLM reads your schema to understand:</p><ul><li><strong>What the tool does</strong> (from the description).</li><li><strong>What inputs it needs</strong> (from the parameters). All parameters are optional by default. Add parameter names to the required array to make them mandatory. Only required params are listed in that array.</li><li><strong>What values are valid</strong> (from types, enums, and required fields). Enum is used if you want to specify that it must be of a custom value.</li></ul><p>location and unit are names that can vary — you set that based on your API.</p><pre>// A tool is defined with: name, description, parameters (JSON Schema)<br>const tools: OpenAI.Chat.ChatCompletionTool[] = [<br>  {<br>    type: &quot;function&quot;,<br>    function: {<br>      name: &quot;get_weather&quot;,<br>      description: &quot;Get current weather for a location&quot;,<br>      parameters: {<br>        type: &quot;object&quot;,<br>        properties: {<br>          location: {<br>            type: &quot;string&quot;,<br>            description: &quot;City name, e.g. &#39;San Francisco, CA&#39;&quot;<br>          },<br>          unit: {<br>            type: &quot;string&quot;,<br>            enum: [&quot;celsius&quot;, &quot;fahrenheit&quot;],<br>            description: &quot;Temperature unit&quot;<br>          }<br>        },<br>        required: [&quot;location&quot;]<br>      }<br>    }<br>  }<br>];</pre><h3><strong>3. Adding multiple tools (optional)</strong></h3><p>You can also add multiple tools that can be called by the LLM.</p><p><strong>Here is an example </strong>(up to you if you want to add this, it’s optional):</p><pre>const tools: OpenAI.Chat.ChatCompletionTool[] = [<br>  {<br>    type: &quot;function&quot;,<br>    function: {<br>      name: &quot;get_weather&quot;,<br>      description: &quot;Get current weather for a location&quot;,<br>      parameters: {<br>        type: &quot;object&quot;,<br>        properties: {<br>          location: { type: &quot;string&quot;, description: &quot;City name&quot; }<br>        },<br>        required: [&quot;location&quot;]<br>      }<br>    }<br>  },<br>  {<br>    type: &quot;function&quot;,<br>    function: {<br>      name: &quot;search_database&quot;,<br>      description: &quot;Search for records in the database&quot;,<br>      parameters: {<br>        type: &quot;object&quot;,<br>        properties: {<br>          query: { type: &quot;string&quot;, description: &quot;Search query&quot; },<br>          limit: { type: &quot;number&quot;, description: &quot;Max results to return&quot; }<br>        },<br>        required: [&quot;query&quot;]<br>      }<br>    }<br>  },<br>  {<br>    type: &quot;function&quot;,<br>    function: {<br>      name: &quot;send_email&quot;,<br>      description: &quot;Send an email to a recipient&quot;,<br>      parameters: {<br>        type: &quot;object&quot;,<br>        properties: {<br>          to: { type: &quot;string&quot;, description: &quot;Recipient email&quot; },<br>          subject: { type: &quot;string&quot;, description: &quot;Email subject&quot; },<br>          body: { type: &quot;string&quot;, description: &quot;Email body&quot; }<br>        },<br>        required: [&quot;to&quot;, &quot;subject&quot;, &quot;body&quot;]<br>      }<br>    }<br>  }<br>];</pre><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3><strong>4. Sending tools and parsing response</strong></h3><p>Once you’ve defined your tools, the next step is sending them to the API and understanding what comes back.</p><p>This is where most developers trip up — the response structure has several gotchas that can cause runtime errors if you’re not prepared.</p><p><strong>⚠️ Gotcha: This is where a lot of errors happen initially. ⚠️</strong></p><ul><li>When the model decides to call a tool, <strong>it DOES NOT return text.</strong></li><li>Instead, it returns a <strong>structured </strong><strong>tool_calls array</strong> containing the <strong>function name</strong> and <strong>arguments</strong> as a <strong>JSON string</strong> (NOT an object).</li><li><strong>You must parse this string</strong> and handle the response correctly.</li></ul><p><strong>This is what this part looks like below:</strong></p><pre>import dotenv from &quot;dotenv&quot;;<br>import path from &quot;path&quot;;<br><br>dotenv.config({ path: path.resolve(__dirname, &quot;../../.env&quot;) });<br><br>import OpenAI from &quot;openai&quot;;<br><br>const client = new OpenAI({<br>  baseURL: &quot;https://openrouter.ai/api/v1&quot;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>async function callWithTools() {<br>  const tools: OpenAI.Chat.ChatCompletionTool[] = [<br>    {<br>      type: &quot;function&quot;,<br>      function: {<br>        name: &quot;get_weather&quot;,<br>        description: &quot;Get current weather for a location&quot;,<br>        parameters: {<br>          type: &quot;object&quot;,<br>          properties: {<br>            location: { type: &quot;string&quot;, description: &quot;City name&quot; }<br>          },<br>          required: [&quot;location&quot;]<br>        }<br>      }<br>    }<br>  ];<br><br>  const response = await client.chat.completions.create({<br>    model: &quot;openai/gpt-4o&quot;,<br>    messages: [<br>      { role: &quot;user&quot;, content: &quot;What&#39;s the weather in Tokyo?&quot; }<br>    ],<br>    tools,<br>    tool_choice: &quot;auto&quot;  // &quot;auto&quot; | &quot;none&quot; | &quot;required&quot; | { type: &quot;function&quot;, function: { name: &quot;...&quot; } }<br>  });<br><br>  console.log(response.choices[0].message);<br>  console.log(&quot;Finish reason:&quot;, response.choices[0].finish_reason);<br>}</pre><h3><strong>5. Response structure when tool is called</strong></h3><p>This is the structure of the first response when a tool is called:</p><pre>// When the model decides to call a tool, you get:<br>{<br>  role: &quot;assistant&quot;,<br>  content: null,  // Often null when calling tools<br>  tool_calls: [<br>    {<br>      id: &quot;call_abc123&quot;,           // Unique ID - you need this later<br>      type: &quot;function&quot;,<br>      function: {<br>        name: &quot;get_weather&quot;,       // Which function to call<br>        arguments: &quot;{\&quot;location\&quot;: \&quot;Tokyo\&quot;}&quot;  // JSON string - must parse!<br>      }<br>    }<br>  ]<br>}<br>// finish_reason will be &quot;tool_calls&quot;</pre><h3><strong>6. Parsing tool calls (OpenAI SDK v6.x)</strong></h3><p>This is how you parse the tool calls. This is really important to learn because a lot of errors come up in this part of the process.</p><p><strong>Important:</strong> In OpenAI SDK v6.x, tool_calls is a union type that includes both function-type and custom-type tool calls.</p><ul><li>To parse: const args = JSON.parse(toolCall.function.arguments)</li><li>It should say finish_reason: “tool_calls” if it says &quot;stop&quot; then it returned no text and no tools.</li><li><strong>You must add a type guard </strong>to access the function property.</li><li>You need the type guard because OpenAI SDK v6.x introduced a union type that includes custom tool calls. So we need to check to make sure we got back toolCall.function</li></ul><pre>async function parseToolCalls() {<br>  const response = await client.chat.completions.create({<br>    model: &quot;openai/gpt-4o&quot;,<br>    messages: [{ role: &quot;user&quot;, content: &quot;What&#39;s the weather in Tokyo?&quot; }],<br>    tools<br>  });<br><br>  const message = response.choices[0].message;<br><br>  // Check if model wants to call tools<br>  if (message.tool_calls) {<br>    for (const toolCall of message.tool_calls) {<br>      // Type guard required for OpenAI SDK v6.x<br>      if (toolCall.type !== &#39;function&#39;) continue;<br><br>      const functionName = toolCall.function.name;<br>      const args = JSON.parse(toolCall.function.arguments);  // IMPORTANT: Parse JSON string<br>      const callId = toolCall.id;<br><br>      console.log(`Tool: ${functionName}`);<br>      console.log(`Args:`, args);<br>      console.log(`ID: ${callId}`);<br>    }<br>  } else {<br>    // Model responded with text, no tool call<br>    console.log(&quot;Response:&quot;, message.content);<br>  }<br>}</pre><p><strong>When would you use </strong><strong>tool_choice: &quot;required&quot; vs </strong><strong>&quot;auto&quot;?</strong></p><p>Use &quot;required&quot; when you know the user&#39;s request needs a tool (e.g., &quot;get weather&quot; must call weather API). Use &quot;auto&quot; for general queries where the model should decide.</p><h3><strong>7. Executing functions with error handling</strong></h3><p>The LLM doesn’t execute code — it only requests that you do. When you receive a tool_calls response, your code is responsible for:</p><ol><li>Mapping the function name to an actual implementation.</li><li>Passing the parsed arguments to that function.</li><li>Returning a stringified result.</li></ol><p>Key point in this section is the <strong>function registry pattern</strong> — a clean way to organize your tool implementations and dispatch calls dynamically.</p><pre>// Define your actual functions<br>function getWeather(location: string): string {<br>  // In real code, call a weather API<br>  // This is to simulate the response<br>  return JSON.stringify({<br>    location,<br>    temperature: 72,<br>    unit: &quot;fahrenheit&quot;,<br>    conditions: &quot;sunny&quot;<br>  });<br>}<br><br>function searchDatabase(query: string, limit: number = 10): string {<br>  // Simulate database search<br>  return JSON.stringify({<br>    query,<br>    results: [<br>      { id: 1, title: &quot;Result 1&quot; },<br>      { id: 2, title: &quot;Result 2&quot; }<br>    ],<br>    total: 2<br>  });<br>}<br><br>// Function registry - map names to implementations<br>const functionRegistry: Record&lt;string, Function&gt; = {<br>  get_weather: (args: { location: string }) =&gt; getWeather(args.location),<br>  search_database: (args: { query: string; limit?: number }) =&gt;<br>    searchDatabase(args.query, args.limit)<br>};<br><br>// Execute a tool call<br>function executeTool(name: string, args: Record&lt;string, any&gt;): string {<br>  const fn = functionRegistry[name];<br>  if (!fn) {<br>    return JSON.stringify({ error: `Unknown function: ${name}` });<br>  }<br>  return fn(args);<br>}</pre><p>You can add error handling with below in place of above executeTool:</p><pre>function safeExecuteTool(name: string, argsJson: string): string {<br>  try {<br>    const args = JSON.parse(argsJson);<br>    const fn = functionRegistry[name];<br><br>    if (!fn) {<br>      return JSON.stringify({ error: `Unknown function: ${name}` });<br>    }<br><br>    const result = fn(args);<br>    return typeof result === &quot;string&quot; ? result : JSON.stringify(result);<br>  } catch (error) {<br>    return JSON.stringify({<br>      error: `Execution failed: ${error instanceof Error ? error.message : &quot;Unknown error&quot;}`<br>    });<br>  }<br>}</pre><h3><strong>8. Sending tool results back — single-turn tool loop</strong></h3><p>This is where everything comes together. After executing your functions, you need to send the results back to the LLM so it can generate a final response.</p><p><strong>⚠️ Required: </strong>each tool result must be linked to its original request via the tool_call_id.</p><h4>Flow:</h4><p>User message → LLM → tool_calls → Execute → Tool result → LLM → <br>Final answer</p><h3><strong>9. Single-turn tool template — example (mock data).</strong></h3><p>You can run this code below to try the process first with mock data.</p><ul><li><strong>Create </strong><strong>single-turn-tool.ts</strong></li></ul><pre><br>import dotenv from &quot;dotenv&quot;;<br>import path from &quot;path&quot;;<br><br>dotenv.config({ path: path.resolve(__dirname, &quot;../../.env&quot;) });<br><br>import OpenAI from &quot;openai&quot;;<br><br>const client = new OpenAI({<br>  baseURL: &quot;https://openrouter.ai/api/v1&quot;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>// 1. Define one tool<br>const tools: OpenAI.Chat.ChatCompletionTool[] = [<br>  {<br>    type: &quot;function&quot;,<br>    function: {<br>      name: &quot;get_weather&quot;,<br>      description: &quot;Get weather for a city&quot;,<br>      parameters: {<br>        type: &quot;object&quot;,<br>        properties: {<br>          city: { type: &quot;string&quot;, description: &quot;City name&quot; }<br>        },<br>        required: [&quot;city&quot;]<br>      }<br>    }<br>  }<br>];<br><br>// 2. Implement the function<br>function getWeather(city: string): string {<br>  // Simulated - replace with real API in production<br>  const data = { city, temp: 72, conditions: &quot;sunny&quot; };<br>  return JSON.stringify(data);<br>}<br><br>// 3. The complete single-turn loop<br>async function chat(userMessage: string): Promise&lt;string&gt; {<br>  // Build initial messages<br>  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [<br>    { role: &quot;system&quot;, content: &quot;You are a helpful assistant.&quot; },<br>    { role: &quot;user&quot;, content: userMessage }<br>  ];<br><br>  // First LLM call<br>  const response = await client.chat.completions.create({<br>    model: &quot;openai/gpt-4o&quot;,<br>    messages,<br>    tools<br>  });<br><br>  const assistantMessage = response.choices[0].message;<br><br>  // Check if tools were called<br>  if (!assistantMessage.tool_calls) {<br>    // No tools - return direct response<br>    return assistantMessage.content || &quot;&quot;;<br>  }<br><br>  // Tools were called - execute them<br>  messages.push(assistantMessage); // Add assistant&#39;s tool request<br><br>  for (const toolCall of assistantMessage.tool_calls) {<br>    // Type guard required for OpenAI SDK v6.x<br>    if (toolCall.type !== &#39;function&#39;) continue;<br><br>    const args = JSON.parse(toolCall.function.arguments);<br>    const result = getWeather(args.city); // Execute<br><br>    messages.push({<br>      role: &quot;tool&quot;,<br>      tool_call_id: toolCall.id,<br>      content: result<br>    });<br>  }<br><br>  // Second LLM call with tool results<br>  const finalResponse = await client.chat.completions.create({<br>    model: &quot;openai/gpt-4o&quot;,<br>    messages,<br>    tools<br>  });<br><br>  return finalResponse.choices[0].message.content || &quot;&quot;;<br>}<br><br>// Test it<br>async function main() {<br>  console.log(await chat(&quot;What&#39;s the weather in Paris?&quot;));<br>}<br>main();</pre><p><strong>Reminders:</strong></p><ol><li><strong>Type guard for SDK v6.x</strong> — Check toolCall.type !== &#39;function&#39; before accessing toolCall.function</li><li><strong>tool_call_id is required</strong> — Must match the id from the tool_call</li><li><strong>Arguments are JSON string</strong> — Always JSON.parse(toolCall.function.arguments)</li><li><strong>Tool results are strings</strong> — Always stringify your function output</li><li><strong>Message order matters</strong> — assistant (with tool_calls) -&gt; tool responses -&gt; next call</li></ol><h3><strong>10. Single-turn template — actual API call example.</strong></h3><p>In this we are expanding that example.</p><p>We are using a couple free API endpoints, both with open-meteo. You should not have to signup for it, it just works.</p><ul><li>api.open-meteo.com — for weather info.</li><li>geocoding-api.open-meteo.com — geocoding city to coordinates.</li></ul><p>Summary of steps in the code:</p><ol><li><strong>Setup</strong>: Import dotenv, OpenAI SDK, validate API key exists</li><li><strong>Define tools array</strong>: Each tool has type: &quot;function&quot;, name, description, parameters (JSON Schema)</li><li><strong>Implement tool functions</strong>: Async functions that return objects (real APIs) or sync for local operations</li><li><strong>Create function registry</strong>: Record&lt;string, ToolFunction&gt; mapping names to implementations</li><li><strong>Build executeTool()</strong>: Parse args JSON, lookup function, await result, stringify output, handle errors</li><li><strong>Initialize messages</strong>: System prompt + user message in ChatCompletionMessageParam[]</li><li><strong>First LLM call</strong>: client.chat.completions.create() with messages + tools</li><li><strong>Check for tool_calls</strong>: If none, return message.content directly</li><li><strong>Execute tools</strong>: Push assistant message, loop through tool_calls with type guard, await each, push tool results</li><li><strong>Second LLM call</strong>: Same endpoint with updated messages, return final message.content</li></ol><pre>import dotenv from &quot;dotenv&quot;;<br>import path from &quot;path&quot;;<br><br>dotenv.config({ path: path.resolve(__dirname, &quot;../../.env&quot;) });<br><br>import OpenAI from &quot;openai&quot;;<br><br>const client = new OpenAI({<br>  baseURL: &quot;https://openrouter.ai/api/v1&quot;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>const tools: OpenAI.Chat.ChatCompletionTool[] = [<br>  {<br>    type: &quot;function&quot;,<br>    function: {<br>      name: &quot;get_weather&quot;,<br>      description: &quot;Get current weather for a city&quot;,<br>      parameters: {<br>        type: &quot;object&quot;,<br>        properties: {<br>          city: { type: &quot;string&quot;, description: &quot;City name, e.g. &#39;Paris&#39; or &#39;New York&#39;&quot; }<br>        },<br>        required: [&quot;city&quot;]<br>      }<br>    }<br>  }<br>];<br><br>// REAL API implementation (Open-Meteo - free, no API key required)<br>async function getWeather(city: string): Promise&lt;string&gt; {<br>  try {<br>    // Step 1: Geocode city name to coordinates<br>    const geoUrl = `https://geocoding-api.open-meteo.com/v1/search?name=${encodeURIComponent(city)}&amp;count=1`;<br>    const geoResponse = await fetch(geoUrl);<br>    const geoData = await geoResponse.json();<br><br>    if (!geoData.results || geoData.results.length === 0) {<br>      return JSON.stringify({ error: `City not found: ${city}` });<br>    }<br><br>    const { latitude, longitude, name, country } = geoData.results[0];<br><br>    // Step 2: Get current weather<br>    const weatherUrl = `https://api.open-meteo.com/v1/forecast?latitude=${latitude}&amp;longitude=${longitude}&amp;current=temperature_2m,relative_humidity_2m,weather_code,wind_speed_10m&amp;temperature_unit=fahrenheit`;<br>    const weatherResponse = await fetch(weatherUrl);<br>    const weatherData = await weatherResponse.json();<br><br>    const current = weatherData.current;<br><br>    // Weather code mapping (simplified)<br>    const weatherCodes: Record&lt;number, string&gt; = {<br>      0: &quot;Clear sky&quot;,<br>      1: &quot;Mainly clear&quot;, 2: &quot;Partly cloudy&quot;, 3: &quot;Overcast&quot;,<br>      45: &quot;Foggy&quot;, 48: &quot;Depositing rime fog&quot;,<br>      51: &quot;Light drizzle&quot;, 53: &quot;Moderate drizzle&quot;, 55: &quot;Dense drizzle&quot;,<br>      61: &quot;Slight rain&quot;, 63: &quot;Moderate rain&quot;, 65: &quot;Heavy rain&quot;,<br>      71: &quot;Slight snow&quot;, 73: &quot;Moderate snow&quot;, 75: &quot;Heavy snow&quot;,<br>      95: &quot;Thunderstorm&quot;<br>    };<br><br>    return JSON.stringify({<br>      city: name,<br>      country,<br>      temperature: current.temperature_2m,<br>      unit: &quot;fahrenheit&quot;,<br>      humidity: current.relative_humidity_2m,<br>      conditions: weatherCodes[current.weather_code] || &quot;Unknown&quot;,<br>      wind_speed_mph: current.wind_speed_10m<br>    });<br>  } catch (error) {<br>    return JSON.stringify({<br>      error: `Failed to fetch weather: ${error instanceof Error ? error.message : &quot;Unknown error&quot;}`<br>    });<br>  }<br>}<br><br>// The tool loop - now with async tool execution<br>async function chat(userMessage: string): Promise&lt;string&gt; {<br>  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [<br>    { role: &quot;system&quot;, content: &quot;You are a helpful weather assistant.&quot; },<br>    { role: &quot;user&quot;, content: userMessage }<br>  ];<br><br>  const response = await client.chat.completions.create({<br>    model: &quot;openai/gpt-4o&quot;,<br>    messages,<br>    tools<br>  });<br><br>  const assistantMessage = response.choices[0].message;<br><br>  if (!assistantMessage.tool_calls) {<br>    return assistantMessage.content || &quot;&quot;;<br>  }<br><br>  messages.push(assistantMessage);<br><br>  for (const toolCall of assistantMessage.tool_calls) {<br>    if (toolCall.type !== &#39;function&#39;) continue;<br><br>    const args = JSON.parse(toolCall.function.arguments);<br>    const result = await getWeather(args.city);  // await for async tool!<br>    console.log(`[Tool] get_weather(&quot;${args.city}&quot;) =&gt;`, result);<br><br>    messages.push({<br>      role: &quot;tool&quot;,<br>      tool_call_id: toolCall.id,<br>      content: result<br>    });<br>  }<br><br>  const finalResponse = await client.chat.completions.create({<br>    model: &quot;openai/gpt-4o&quot;,<br>    messages,<br>    tools<br>  });<br><br>  return finalResponse.choices[0].message.content || &quot;&quot;;<br>}<br><br>// Test with real data<br>async function main() {<br>  console.log(await chat(&quot;What&#39;s the weather in Paris?&quot;));<br>}<br>main().catch(console.error);</pre><ol><li><strong>Loads environment variables </strong>from a .env file located two directories up.</li><li><strong>Creates an OpenAI client</strong> pointed at <strong>OpenRouter</strong> (not official OpenAI endpoint).</li><li><strong>Defines one tool → get_weather function</strong> with city parameter.</li><li><strong>Implements the real get_weather function</strong> using Open-Meteo free API.</li><li><strong>First fetches latitude/longitude</strong> of the city using geocoding API.</li><li><strong>Then requests current weather</strong> (temp °F, humidity, wind, weather code).</li><li><strong>Converts WMO weather code</strong> to human-readable condition string.</li><li><strong>Returns weather info as JSON string</strong> (or error message).</li><li><strong>In chat(): </strong>sends user message → model may call tool → executes real getWeather → feeds result back → gets final answer.</li></ol><h3>🚀 Looking forward!</h3><p>That was awesome, we covered a lot with this one and it’s a great base case for single-turn AI function calling.</p><p>We’ll soon be exploring possible enhancements that are more advanced in future articles.</p><ul><li><strong>Parallel tool calls</strong> — Handle multiple tool_calls in a single LLM response</li><li><strong>Multi-turn agent loop</strong> — Loop until tool_calls is empty (not fixed 2-call pattern)</li><li><strong>Agentic loop pattern</strong> — while(true) → call LLM → execute tools → repeat</li><li><strong>Safety guardrails</strong> — maxIterations, maxToolCalls, and token limits</li><li><strong>Parallel async execution</strong> — Use Promise.all() to execute tools concurrently</li><li><strong>Tool result batching</strong> — Collect ALL tool results before next API call</li><li><strong>Error handling</strong> — Return errors as JSON, let LLM decide recovery</li></ul><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong>Previous articles in the series:</strong></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)</a></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)</a></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">AI Chat Coding Essentials — Advanced version (AI Agent Coding Series #2)</a></p><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)</a></p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of <strong>years of research and writing I have done</strong> on cloud best practices and then further integrates that with my prior cloud books and also <strong>code solutions and tutorials integrated using multiple AIs</strong> and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p><strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Savings:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=41c3a1ba21b9" width="1" height="1" alt=""><hr><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-adding-tools-ai-agent-coding-series-3-41c3a1ba21b9">AI Chat Coding Essentials — Adding Tools (AI Agent Coding Series #3)</a> was originally published in <a href="https://medium.com/ai-dev-tips">AI Dev Tips</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)]]></title>
            <link>https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/b2abfb050528</guid>
            <category><![CDATA[typescript]]></category>
            <category><![CDATA[web-development]]></category>
            <category><![CDATA[ai-agent]]></category>
            <category><![CDATA[chatgpt]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Thu, 08 Jan 2026 14:37:12 GMT</pubDate>
            <atom:updated>2026-01-08T19:57:36.032Z</atom:updated>
            <content:encoded><![CDATA[<h3>AI Chat Coding Essentials — Advanced version (AI Agent Coding Series #2)</h3><h4>An advanced coding TypeScript option for AI Chat completions using a TypeScript class-based approach with AsyncGenerator</h4><p><strong>I like to share what I learn, as I build — and I build every day.</strong></p><p>Many of these articles I write because I have a related project or interest<strong> </strong>and<strong> I am simply documenting it so I can share and remember it!</strong></p><p>While writing the <a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">last article (#1 in the series)</a>, <strong>I left out some more advanced TypeScript code patterns.</strong> I didn’t want to confuse those getting up to speed initially. And, we wanted to see<strong> immediate results.</strong></p><p>That <a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">article</a> uses <strong>functional</strong> <strong>programming</strong> only, which has <strong>the advantage of simplicity, speed </strong>and<strong> less boilerplate for a solution.</strong></p><p>Now, let’s look at <strong>a more advanced style of coding often used in bigger companies, enterprise-level, where there may be more complexity and collaboration involved. </strong>Also, you may get asked about this in an interview.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*ryJAwCYGAuZ_-8dJlx7ulA.jpeg" /></figure><p>We can fluctuate between easier functions and this class-based style.</p><p>In this article, our code is <strong>class-based </strong>and uses <strong>dependency injection</strong> to make it easy to re-use the same logic with various AI LLM models, even simultaneously for a variety of users. The class is designed to be flexible enough to be used with singleton or factory patterns when needed.</p><p><strong>I think it’s a fun and </strong>legitimate<strong> code example </strong>to learn more <strong>advanced real-world techniques, </strong>especially<strong> </strong>if you are not familiar with this style of coding.</p><p><strong>This code is OPTIONAL, </strong>you can still stick with the<strong> functional approach</strong> of the last article if you want<strong>. </strong>But it shines as being <strong>more extensible and usable in a professional enterprise environment, for modularity, extensibility, collaboration </strong>and<strong> testing.</strong></p><p>This kind of advanced refactoring is what really puts the <strong>“hands-on”</strong> into our self-description of “hands-on” engineers.</p><p>We’re not afraid to dig in deep, get in the code and<strong> understand </strong>how to take<strong> basic code to expert status.</strong></p><p>We do an <strong>advanced topic walkthrough </strong>for this article, to explain what this code means and <strong>key benefits. </strong>Is it difficult? Yes, it will be a challenge for some people. At the very least we’ll both learn more about other options.</p><p><strong>Another thing I want to emphasize here is: you do NOT need to memorize all this. </strong>Quite honestly, a big part of the reason I’m even doing this article is so <em>I</em> can remember how/why we would use this modular and more advanced version 😅 — the class-based approach with AsyncGenerator..<strong> Even if you are using an AI coding assistant, </strong>you will now know the <strong>advantages</strong>, and then can specify a prompt for this type of code, or refactor existing code.</p><h4>What we are covering:</h4><ol><li><strong>Why a Class-Based Approach?</strong></li><li><strong>The ChatClient Architecture (Basics)</strong></li><li><strong>Why Use AsyncGenerator for Streaming?</strong></li><li><strong>Dependency Injection — needed?</strong></li><li><strong>Chat History Management.</strong></li><li><strong>Extended Enterprise Version — FINAL File.</strong></li><li><strong>Example Usage of the Class — RUN IT!</strong></li></ol><p>✅ <strong>Goal: I am starting with AI Chat completion APIs so users have a good base of knowledge to for more Agent coding.</strong> There are also some good no/low-code solutions for agents (we may cover some later), but the idea is to set a solid foundation of understanding for future custom coding. 🚀</p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>1. Why a Class-Based Approach?</h3><p>The simple function-based examples in the previous (#1) article work great for learning and simple application, but real enterprise applications need:</p><ul><li><strong>State Management</strong>: Track conversation history across multiple calls.</li><li><strong>Configuration</strong>: Different settings for different use cases (coding assistant vs creative writer).</li><li><strong>Encapsulation</strong>: Private fields prevent external mutation; methods like getHistory() return safe copies.</li><li><strong>Reusability</strong>: Create multiple instances with different configurations.</li><li><strong>Testability</strong>: Easier to mock and test class instances with different dependency injections (for example, configs).</li><li><strong>Extensibility</strong>: Add features like retry logic, logging, caching without changing the core API.</li><li><strong>Token estimates:</strong> Keeps token-counting/trimming logic tightly coupled to the history it manages.</li><li><strong>Multi-conversation support: </strong>Spawn multiple instances with different models/prompts simultaneously.</li></ul><h3>2. The ChatClient Architecture (Basics)</h3><p>Below is the initial code we’ll use for our class-based AI chat architecture.</p><p><strong>This is an “initial” sketch, there may be some minor type errors </strong>if you run this first code block now<strong>.</strong></p><p><strong>We’ll create the full completed code near </strong>the end of the article.</p><p>If you are not familiar with<strong> class-based TypeScript</strong> — perhaps do a brief review of <a href="https://www.w3schools.com/typescript/typescript_classes.php">W3 Schools TS Classes</a></p><pre>import OpenAI from &#39;openai&#39;;<br>import dotenv from &#39;dotenv&#39;;<br><br>dotenv.config({ path: &#39;../../.env&#39; });<br><br>// Configuration interface with optional properties<br>// This allows partial configuration - only override what you need<br>interface ChatConfig {<br>  model?: string;<br>  temperature?: number;<br>  maxTokens?: number;<br>  systemPrompt?: string;<br>}<br><br>// Default configuration - sensible defaults for most use cases<br>const defaultConfig = {<br>  model: &#39;openai/gpt-4-turbo&#39;,<br>  temperature: 0.7,<br>  maxTokens: 2000,<br>  systemPrompt: &#39;You are a helpful assistant.&#39;,<br>};<br><br>// Discussed throughout article why to use classes, go with it for now<br>class ChatClient {<br>  // Private properties - encapsulation prevents external mutation<br>  private client: OpenAI;<br>  private config: Required&lt;ChatConfig&gt;;  // Required&lt;T&gt; makes all properties non-optional<br>  private conversationHistory: OpenAI.Chat.ChatCompletionMessageParam[];<br><br>  constructor(config: ChatConfig = {}) {<br>    // Initialize the OpenAI client once, reuse for all requests<br>    this.client = new OpenAI({<br>      baseURL: &#39;https://openrouter.ai/api/v1&#39;,<br>      apiKey: process.env.OPENROUTER_API_KEY,<br>    });<br><br>    // Merge user config with defaults using spread operator<br>    // User values override defaults<br>    this.config = { ...defaultConfig, ...config };<br><br>    // Initialize history with system prompt<br>    this.conversationHistory = [<br>      { role: &#39;system&#39;, content: this.config.systemPrompt }<br>    ];<br>  }<br><br><br>  getHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {<br>    return [...this.conversationHistory];  // Return a copy, not the original<br>  }<br><br>  // Reset conversation while keeping configuration<br>  reset(): void {<br>    this.conversationHistory = [<br>      { role: &#39;system&#39;, content: this.config.systemPrompt }<br>    ];<br>  }<br><br> // Trim history to manage token usage (advanced)<br>  trimHistory(keepLastN: number): void {<br>    const systemMessage = this.conversationHistory[0];<br>    const recentMessages = this.conversationHistory.slice(-keepLastN);<br>    this.conversationHistory = [systemMessage, ...recentMessages];<br>  }<br><br>  // Non-streaming chat - simple request/response<br>  async chat(userMessage: string): Promise&lt;string&gt; {<br>    // Add user message to history BEFORE the API call<br>    this.conversationHistory.push({ role: &#39;user&#39;, content: userMessage });<br><br>    const response = await this.client.chat.completions.create({<br>      model: this.config.model,<br>      messages: this.conversationHistory,<br>      temperature: this.config.temperature,<br>      max_tokens: this.config.maxTokens,<br>    });<br><br>    const assistantMessage = response.choices[0].message.content || &#39;&#39;;<br><br>    // Add assistant response to history AFTER receiving it<br>    this.conversationHistory.push({ role: &#39;assistant&#39;, content: assistantMessage });<br><br>    return assistantMessage;<br>  }<br><br>  // Streaming chat - returns an AsyncGenerator<br>  async *streamChat(userMessage: string): AsyncGenerator&lt;string, string, unknown&gt; {<br>    this.conversationHistory.push({ role: &#39;user&#39;, content: userMessage });<br><br>    const stream = await this.client.chat.completions.create({<br>      model: this.config.model,<br>      messages: this.conversationHistory,<br>      temperature: this.config.temperature,<br>      max_tokens: this.config.maxTokens,<br>      stream: true,<br>    });<br><br>    let fullResponse = &#39;&#39;;<br><br>    for await (const chunk of stream) {<br>      const content = chunk.choices[0]?.delta?.content || &#39;&#39;;<br>      fullResponse += content;<br>      yield content;  // Yield each chunk as it arrives<br>    }<br><br>    this.conversationHistory.push({ role: &#39;assistant&#39;, content: fullResponse });<br>    return fullResponse;  // Return the complete response at the end<br>  }<br>}</pre><h4><strong>Key points of this code:</strong></h4><ul><li><strong>streamChat </strong>uses an AsyncGenerator. A generator is a function that can pause execution and resume later.</li><li><strong>Private properties: </strong>Prevents external code from corrupting internal state.</li><li><strong>Required&lt;ChatConfig&gt;</strong>: All config values exist after merging with defaults</li><li><strong>History managed internally: </strong>Simplifies API — callers don’t need to track messages</li><li>Separate chat and streamChat: Clear distinction between sync-like and streaming patterns</li></ul><h3>3. Why Use AsyncGenerator for Streaming?</h3><p>The AsyncGenerator is different from what you may be used to. Strictly speaking, it’s not required for an AI chat, but it does present some benefits in an enterprise setting.</p><ul><li><strong>Memory efficient: </strong>Process chunks as they arrive, don’t buffer entire response.</li><li><strong>Real-time: </strong>UI can show text as it’s generated.</li><li><strong>Backpressure control: </strong>Consumer controls the pace.</li><li><strong>Composable: </strong>Can transform, filter, or combine streams easily.</li></ul><p><strong>How it works:</strong></p><pre>async *streamChat(userMessage: string): AsyncGenerator&lt;string, string, unknown&gt; {<br>  // AsyncGenerator&lt;YieldType, ReturnType, NextType&gt;<br>  //   YieldType: string - each chunk we yield<br>  //   ReturnType: string - the final return value<br>  //   NextType: unknown - can be used to pass values to the generator<br><br>// ....continue with any other setup code here</pre><p>Let’s look at the parts of this:</p><pre>async *streamChat(userMessage: string): AsyncGenerator&lt;string, string, unknown&gt;</pre><ul><li><strong>async: </strong>can use await, returns promises</li><li><strong>*streamChat:</strong> function name and designated with * as a generator, which means it can yield multiple values.</li><li><strong>AsyncGenerator&lt;string, string, unknown&gt;: This is the type and the </strong>YieldType, ReturnType and NextType.</li></ul><p><strong>The actual streaming part of the code is:</strong></p><pre>for await (const chunk of stream) {<br>    const content = chunk.choices[0]?.delta?.content || &#39;&#39;;<br>    fullResponse += content;<br><br>    yield content;  // &lt;-- Pause here, send chunk to consumer<br>                    //     Resume when consumer asks for next chunk<br>  }<br>  return fullResponse;  // &lt;-- Final return (accessible via .return())<br>}</pre><p>The above is the most common method of outputting from an async generator.</p><p>A different way of doing it (instead of the code above) would be using .next()</p><pre>// Method 2: Manual iteration (more control)<br>const generator = assistant.streamChat(&#39;Tell me a story&#39;);<br>let result = await generator.next();<br>while (!result.done) {<br>  console.log(&#39;Chunk:&#39;, result.value);<br>  result = await generator.next();<br>}<br>console.log(&#39;Final:&#39;, result.value);  // The return value</pre><h3>4. Dependency Injection — needed?</h3><p>A<strong> top advantage of dependency injection is testability</strong> in more of an enterprise setting. We can easily inject various configs into the code, and it’s kept very neat and modular.</p><p><strong>Again, to emphasize, like with this whole article, you can still do AI Chat without dependency injection,</strong> it just will be less modular and less enterprise-ready. The advanced version may help in a job interview, or to take the lead on a project.</p><p>See here that we inject defaultConfig and config from outside the class. This makes it more modular and testable.</p><pre>// Configuration injected, testable, customizable<br><br>const defaultConfig = {<br>  model: &#39;openai/gpt-4-turbo&#39;,<br>  temperature: 0.7,<br>  maxTokens: 2000,<br>  systemPrompt: &#39;You are a helpful assistant.&#39;,<br>};<br><br>class ChatClient {<br>  constructor(config: ChatConfig = {}) {<br>    this.config = { ...defaultConfig, ...config };<br>    // ...<br>  }<br>}</pre><h3>5. Chat History Management</h3><p>The ChatClient maintains conversation history automatically. This enables multi-turn conversations where the AI remembers context.</p><p>If you look in the code you will see for both non-streaming and streaming we add to the conversationHistory array:</p><pre>this.conversationHistory.push({ role: &#39;user&#39;, content: userMessage });</pre><p>Real world scenario:</p><pre>const assistant = new ChatClient();<br><br>await assistant.chat(&#39;My name is Chris&#39;);<br>// History: [system, user: &quot;My name is Chris&quot;, assistant: &quot;Nice to meet you, Chris!&quot;]<br><br>await assistant.chat(&#39;What is my name?&#39;);<br>// History: [system, user: &quot;My name is Chris&quot;, assistant: &quot;Nice to meet you...&quot;,<br>//           user: &quot;What is my name?&quot;, assistant: &quot;Your name is Chris.&quot;]</pre><p>This method will give you the history:</p><pre>getHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {<br>    return [...this.conversationHistory];  // Return a copy, not the original<br>  }</pre><p>I also added to the script trimHistory</p><pre> // Trim history to manage token usage (advanced)<br>  trimHistory(keepLastN: number): void {<br>    const systemMessage = this.conversationHistory[0];<br>    const recentMessages = this.conversationHistory.slice(-keepLastN);<br>    this.conversationHistory = [systemMessage, ...recentMessages];<br>  }</pre><h4>What are some other related methods we could add?</h4><p>You can add a lot of other methods. I do not do this from memory I have some boilerplate code I developed, you can inquire with coding assistant tool what other methods could be appropriate.</p><p><em>Examples:</em></p><pre>// Get history without the system prompt (most common use case for display)<br>  getUserAssistantHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {<br>    return this.conversationHistory.slice(1); // skip system message<br>  }<br><br>  // Get only the last N messages (useful for UI &quot;show more&quot; or token limiting)<br>  getRecentHistory(count: number): OpenAI.Chat.ChatCompletionMessageParam[] {<br>    const start = Math.max(1, this.conversationHistory.length - count);<br>    return this.conversationHistory.slice(start);<br>  }<br><br>// Add external / previous messages (very useful for resuming conversations)<br>  loadHistory(messages: OpenAI.Chat.ChatCompletionMessageParam[]): void {<br>    // Optional: validate roles?<br>    this.conversationHistory = [<br>      { role: &#39;system&#39;, content: this.config.systemPrompt },<br>      ...messages<br>    ];<br>  }</pre><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>6. Extended Enterprise Version — Final File</h3><p>We can try to extend this, here are some variations you could use:</p><ul><li><strong>Retry with backoff: </strong>Networks fail, graceful recovery to keep your app running</li><li><strong>Token tracking: </strong>Budget management, usage monitoring, cost alerts</li><li><strong>History trimming: </strong>Prevents context overflow, manages memory</li><li><strong>Error callbacks: </strong>Integrate with logging/monitoring systems (Sentry, DataDog)</li><li><strong>Usage callbacks: </strong>Real-time cost tracking, quota management</li><li><strong>Fork method: </strong>A/B testing responses, exploring different conversation paths</li><li><strong>Structured results: </strong>Access to metadata enables better debugging and monitoring</li></ul><p>For example let’s say we added some extra variables to the config.</p><h4>Final file: enterpriseChatClient.ts</h4><p>Create enterpriseChatClient.ts:</p><pre>// OpenAI SDK works with OpenRouter by changing the baseURL<br>import OpenAI from &#39;openai&#39;;<br>import dotenv from &#39;dotenv&#39;;<br><br>// Load environment variables from project root<br>dotenv.config({ path: &#39;../../.env&#39; });<br><br>// Pricing per 1000 tokens - varies by model, update based on your provider&#39;s rates<br>interface ModelPricing {<br>  promptCostPer1k: number;      // Cost per 1K input tokens<br>  completionCostPer1k: number;  // Cost per 1K output tokens<br>}<br><br>// Extended configuration with more options<br>interface ChatConfig {<br>  model?: string;<br>  temperature?: number;<br>  maxTokens?: number;<br>  systemPrompt?: string;<br>  // Enterprise additions<br>  maxRetries?: number;<br>  retryDelayMs?: number;<br>  maxHistoryLength?: number;<br>  pricing?: ModelPricing;  // Custom pricing for cost estimation<br>  onError?: ((error: Error) =&gt; void) | undefined;<br>  onTokenUsage?: ((usage: TokenUsage) =&gt; void) | undefined;<br>}<br><br>// Token usage tracking - essential for monitoring costs and staying within budget<br>interface TokenUsage {<br>  promptTokens: number;<br>  completionTokens: number;<br>  totalTokens: number;<br>  estimatedCost: number;  // Calculated based on model pricing<br>}<br><br>// Structured response object - provides metadata alongside the AI response<br>// This is more informative than just returning a string<br>interface ChatResult {<br>  content: string;                  // The actual AI response text<br>  usage: TokenUsage | undefined;    // Token counts and cost (if available)<br>  model: string;                    // Which model processed the request<br>  finishReason: string;             // Why generation stopped: &#39;stop&#39;, &#39;length&#39;, etc.<br>}<br><br>// ResolvedConfig has all properties required (non-optional) after merging with defaults.<br>// This ensures type safety when accessing config values without null checks throughout the class.<br>interface ResolvedConfig {<br>  model: string;<br>  temperature: number;<br>  maxTokens: number;<br>  systemPrompt: string;<br>  maxRetries: number;<br>  retryDelayMs: number;<br>  maxHistoryLength: number;<br>  pricing: ModelPricing;<br>  onError: ((error: Error) =&gt; void) | undefined;<br>  onTokenUsage: ((usage: TokenUsage) =&gt; void) | undefined;<br>}<br><br>// Default pricing for GPT-4 Turbo - override in config for other models<br>const DEFAULT_PRICING: ModelPricing = {<br>  promptCostPer1k: 0.01,        // $0.01 per 1K input tokens<br>  completionCostPer1k: 0.03,    // $0.03 per 1K output tokens<br>};<br><br>// Decent defaults - users only need to override what they want to change<br>const defaultConfig: ResolvedConfig = {<br>  model: &#39;openai/gpt-4-turbo&#39;,<br>  temperature: 0.7,             // Balanced between creative and focused<br>  maxTokens: 2000,              // Reasonable limit for most responses<br>  systemPrompt: &#39;You are a helpful assistant.&#39;,<br>  maxRetries: 3,                // Handles transient failures gracefully<br>  retryDelayMs: 1000,           // Base delay for exponential backoff<br>  maxHistoryLength: 50,         // Prevents context overflow in long conversations<br>  pricing: DEFAULT_PRICING,     // Override for different models/providers<br>  onError: undefined,<br>  onTokenUsage: undefined,<br>};<br><br>export class EnterpriseChatClient {<br>  // Private properties ensure internal state can&#39;t be corrupted externally<br>  private client: OpenAI;                                                    // Reused for all API calls<br>  private config: ResolvedConfig;                                            // Merged user + default config<br>  private conversationHistory: OpenAI.Chat.ChatCompletionMessageParam[];     // Full conversation context<br>  private totalTokensUsed: number = 0;                                       // Cumulative token counter<br><br>  // Constructor uses dependency injection pattern - config is passed in, not hardcoded<br>  constructor(config: ChatConfig = {}) {<br>    this.client = new OpenAI({<br>      baseURL: &#39;https://openrouter.ai/api/v1&#39;,  // OpenRouter endpoint<br>      apiKey: process.env.OPENROUTER_API_KEY,<br>    });<br><br>    // Spread operator merges configs: defaults first, then user overrides<br>    this.config = { ...defaultConfig, ...config };<br><br>    // Initialize with system prompt - this sets the AI&#39;s behavior/personality<br>    this.conversationHistory = [<br>      { role: &#39;system&#39;, content: this.config.systemPrompt }<br>    ];<br>  }<br><br>  // Retry wrapper with exponential backoff<br>  private async withRetry&lt;T&gt;(<br>    operation: () =&gt; Promise&lt;T&gt;,<br>    context: string<br>  ): Promise&lt;T&gt; {<br>    let lastError: Error | null = null;<br><br>    for (let attempt = 1; attempt &lt;= this.config.maxRetries; attempt++) {<br>      try {<br>        return await operation();<br>      } catch (error) {<br>        lastError = error as Error;<br><br>        // Don&#39;t retry on auth errors<br>        if (error instanceof OpenAI.APIError &amp;&amp; error.status === 401) {<br>          throw new Error(&#39;Invalid API key. Check your configuration.&#39;);<br>        }<br><br>        // Don&#39;t retry on the last attempt<br>        if (attempt === this.config.maxRetries) {<br>          break;<br>        }<br><br>        // Exponential backoff: 1s, 2s, 4s...<br>        const delay = this.config.retryDelayMs * Math.pow(2, attempt - 1);<br>        console.warn(`${context}: Attempt ${attempt} failed, retrying in ${delay}ms...`);<br>        await this.delay(delay);<br>      }<br>    }<br><br>    // Call error handler if provided<br>    if (this.config.onError &amp;&amp; lastError) {<br>      this.config.onError(lastError);<br>    }<br><br>    throw lastError;<br>  }<br><br>  // Promise-based delay utility - cleaner than nested setTimeout callbacks<br>  private delay(ms: number): Promise&lt;void&gt; {<br>    return new Promise(resolve =&gt; setTimeout(resolve, ms));<br>  }<br><br>  // Calculate token usage and cost based on configured pricing<br>  private calculateUsage(usage: OpenAI.Completions.CompletionUsage | undefined): TokenUsage | undefined {<br>    if (!usage) return undefined;<br><br>    // Use pricing from config - allows different rates for different models<br>    const { promptCostPer1k, completionCostPer1k } = this.config.pricing;<br><br>    const tokenUsage: TokenUsage = {<br>      promptTokens: usage.prompt_tokens,<br>      completionTokens: usage.completion_tokens,<br>      totalTokens: usage.total_tokens,<br>      estimatedCost:<br>        (usage.prompt_tokens / 1000) * promptCostPer1k +<br>        (usage.completion_tokens / 1000) * completionCostPer1k,<br>    };<br><br>    this.totalTokensUsed += tokenUsage.totalTokens;<br><br>    // Call usage callback if provided<br>    if (this.config.onTokenUsage) {<br>      this.config.onTokenUsage(tokenUsage);<br>    }<br><br>    return tokenUsage;<br>  }<br><br>  // Manage history length to prevent token overflow<br>  private trimHistoryIfNeeded(): void {<br>    if (this.conversationHistory.length &gt; this.config.maxHistoryLength) {<br>      const systemMessage = this.conversationHistory[0];<br>      if (!systemMessage) return;<br>      // Keep the system message and the most recent messages<br>      const keepCount = this.config.maxHistoryLength - 1;<br>      const recentMessages = this.conversationHistory.slice(-keepCount);<br>      this.conversationHistory = [systemMessage, ...recentMessages];<br>    }<br>  }<br><br>  // Enhanced chat with full result object<br>  async chat(userMessage: string): Promise&lt;ChatResult&gt; {<br>    this.conversationHistory.push({ role: &#39;user&#39;, content: userMessage });<br>    this.trimHistoryIfNeeded();<br><br>    const response = await this.withRetry(<br>      () =&gt; this.client.chat.completions.create({<br>        model: this.config.model,<br>        messages: this.conversationHistory,<br>        temperature: this.config.temperature,<br>        max_tokens: this.config.maxTokens,<br>      }),<br>      &#39;chat&#39;<br>    );<br><br>    const choice = response.choices[0];<br>    if (!choice) {<br>      throw new Error(&#39;No response choice returned from API&#39;);<br>    }<br><br>    const assistantMessage = choice.message.content || &#39;&#39;;<br>    this.conversationHistory.push({ role: &#39;assistant&#39;, content: assistantMessage });<br><br>    return {<br>      content: assistantMessage,<br>      usage: this.calculateUsage(response.usage),<br>      model: response.model,<br>      finishReason: choice.finish_reason || &#39;unknown&#39;,<br>    };<br>  }<br><br>  // Simple chat that just returns the string (convenience method)<br>  async quickChat(userMessage: string): Promise&lt;string&gt; {<br>    const result = await this.chat(userMessage);<br>    return result.content;<br>  }<br><br>  // Streaming chat using AsyncGenerator - yields chunks as they arrive<br>  // The async* syntax creates a generator that can await and yield asynchronously<br>  // Consumer uses: for await (const chunk of streamChat(...)) { ... }<br>  async *streamChat(<br>    userMessage: string,<br>    onProgress?: (chunk: string, accumulated: string) =&gt; void<br>  ): AsyncGenerator&lt;string, string, unknown&gt; {<br>    this.conversationHistory.push({ role: &#39;user&#39;, content: userMessage });<br>    this.trimHistoryIfNeeded();<br><br>    const stream = await this.withRetry(<br>      () =&gt; this.client.chat.completions.create({<br>        model: this.config.model,<br>        messages: this.conversationHistory,<br>        temperature: this.config.temperature,<br>        max_tokens: this.config.maxTokens,<br>        stream: true,<br>      }),<br>      &#39;streamChat&#39;<br>    );<br><br>    let fullResponse = &#39;&#39;;<br><br>    for await (const chunk of stream) {<br>      const content = chunk.choices[0]?.delta?.content || &#39;&#39;;<br>      fullResponse += content;<br><br>      // Call progress callback if provided<br>      if (onProgress) {<br>        onProgress(content, fullResponse);<br>      }<br><br>      yield content;<br>    }<br><br>    this.conversationHistory.push({ role: &#39;assistant&#39;, content: fullResponse });<br>    return fullResponse;<br>  }<br><br>  // ==================== Utility Methods ====================<br>  // These provide controlled access to internal state<br><br>  // Clear conversation but keep the same configuration and system prompt<br>  reset(): void {<br>    this.conversationHistory = [<br>      { role: &#39;system&#39;, content: this.config.systemPrompt }<br>    ];<br>  }<br><br>  // Return a copy of history (not the original) to prevent external mutation<br>  getHistory(): OpenAI.Chat.ChatCompletionMessageParam[] {<br>    return [...this.conversationHistory];<br>  }<br><br>  getHistoryLength(): number {<br>    return this.conversationHistory.length;<br>  }<br><br>  // Useful for monitoring costs across multiple conversations<br>  getTotalTokensUsed(): number {<br>    return this.totalTokensUsed;<br>  }<br><br>  // Update system prompt mid-conversation<br>  updateSystemPrompt(newPrompt: string): void {<br>    this.config.systemPrompt = newPrompt;<br>    if (this.conversationHistory.length &gt; 0) {<br>      this.conversationHistory[0] = { role: &#39;system&#39;, content: newPrompt };<br>    }<br>  }<br><br>  // Fork creates a branch of the conversation - useful for A/B testing responses<br>  // or exploring different conversation paths without affecting the original<br>  fork(newConfig?: ChatConfig): EnterpriseChatClient {<br>    const forked = new EnterpriseChatClient({ ...this.config, ...newConfig });<br>    forked.conversationHistory = [...this.conversationHistory];  // Copy, not reference<br>    return forked;<br>  }<br>}<br><br>// ==================== Example Usage ====================<br>// Demonstrates both non-streaming and streaming patterns with callbacks<br><br>async function main() {<br>  // Create client with custom config - only override what you need<br>  const assistant = new EnterpriseChatClient({<br>    systemPrompt: &#39;You are an expert programmer.&#39;,<br>    temperature: 0.2,       // Low for consistent, precise code responses<br>    maxRetries: 3,<br>    // Custom pricing - update based on your model/provider<br>    // See: https://openrouter.ai/docs#models for current rates<br>    pricing: {<br>      promptCostPer1k: 0.01,      // GPT-4 Turbo input<br>      completionCostPer1k: 0.03,  // GPT-4 Turbo output<br>    },<br>    // Callbacks integrate with your logging/monitoring systems<br>    onTokenUsage: (usage) =&gt; {<br>      console.log(`Tokens: ${usage.totalTokens}, Cost: $${usage.estimatedCost.toFixed(4)}`);<br>    },<br>    onError: (error) =&gt; {<br>      console.error(&#39;Chat error:&#39;, error.message);<br>    },<br>  });<br><br>  // Non-streaming with full result<br>  const result = await assistant.chat(&#39;What is a closure in JavaScript?&#39;);<br>  console.log(&#39;Response:&#39;, result.content);<br>  console.log(&#39;Finish reason:&#39;, result.finishReason);<br><br>  // Streaming with progress callback<br>  console.log(&#39;\nStreaming response:&#39;);<br>  for await (const chunk of assistant.streamChat(&#39;Give me an example&#39;)) {<br>    process.stdout.write(chunk);<br>  }<br><br>  console.log(&#39;\n\nTotal tokens used:&#39;, assistant.getTotalTokensUsed());<br>  console.log(&#39;Conversation length:&#39;, assistant.getHistoryLength(), &#39;messages&#39;);<br>}<br><br>main().catch(console.error);</pre><p>As I was writing this article, I made several iterations on the code to improve it.</p><p>For example, I originally had pricing approximations in the class, but I moved those out so you can easily inject those.</p><p>The idea with this class-based approach is to abstract so your model is a template where you can inject variables if required — for testing or when new models are being used.</p><p><strong>Other ideas:</strong></p><p>Add a <strong>getApproximateTokenCount</strong> method, so we can better track or estimate the approx. token count:</p><pre>getApproximateTokenCount(): number {<br>    let total = 0;<br>    for (const msg of this.conversationHistory) {<br>      // Very rough approximation: ~4 chars per token + some overhead<br>      total += Math.ceil((msg.content?.length || 0) / 4) + 4; // +4 for role + overhead<br>    }<br>    return total;<br>  }</pre><p>We will be using <strong>tools</strong> later, but just to give you an early heads up, we could add a tools response here:</p><pre>addToolResponse(toolCallId: string, content: string): void {<br>    this.conversationHistory.push({<br>      role: &#39;tool&#39;,<br>      tool_call_id: toolCallId,<br>      content<br>    });<br>  }</pre><p><strong>Undo the last turn</strong></p><pre>undoLastTurn(): boolean {<br>    if (this.conversationHistory.length &lt; 3) return false; // need at least user + assistant<br><br>    // Remove last assistant<br>    if (this.conversationHistory.at(-1)?.role === &#39;assistant&#39;) {<br>      this.conversationHistory.pop();<br>    }<br>    // Remove last user<br>    if (this.conversationHistory.at(-1)?.role === &#39;user&#39;) {<br>      this.conversationHistory.pop();<br>      return true;<br>    }<br>    return false;<br>  }</pre><p>Okay, that’s good for now. I’ve given you a solid baseline to illustrate the concept. You can come up with a lot more examples. Feel free to research with an AI coding assistant tool, they can help you learn this too.</p><p>Now let’s put it all together with a script to run the class.</p><h3>7. Example Usage of the Class Code — Run It!</h3><p>Create the file: runEnterpriseChatClient.ts</p><p>Here we will import the class and use it in our script:</p><pre>import { EnterpriseChatClient } from &quot;./enterpriseChatClient&quot;;<br><br>async function main() {<br>  const assistant = new EnterpriseChatClient({<br>    systemPrompt: &#39;You are an expert programmer.&#39;,<br>    temperature: 0.2,<br>    maxRetries: 3,<br>    // Custom pricing - update based on your model/provider<br>    // See: https://openrouter.ai/docs#models for current rates<br>    pricing: {<br>      promptCostPer1k: 0.01,      // GPT-4 Turbo input cost per 1K tokens<br>      completionCostPer1k: 0.03,  // GPT-4 Turbo output cost per 1K tokens<br>    },<br>    onTokenUsage: (usage) =&gt; {<br>      console.log(`Tokens: ${usage.totalTokens}, Cost: $${usage.estimatedCost.toFixed(4)}`);<br>    },<br>    onError: (error) =&gt; {<br>      console.error(&#39;Chat error:&#39;, error.message);<br>      // Could send to error tracking service here<br>    },<br>  });<br><br>  // Non-streaming with full result<br>  const result = await assistant.chat(&#39;What is a closure in JavaScript?&#39;);<br>  console.log(&#39;Response:&#39;, result.content);<br>  console.log(&#39;Finish reason:&#39;, result.finishReason);<br><br>  // Streaming with progress callback<br>  console.log(&#39;\nStreaming response:&#39;);<br>  for await (const chunk of assistant.streamChat(<br>    &#39;Give me an example&#39;,<br>    (chunk, accumulated) =&gt; {<br>      // Could update a progress indicator here<br>    }<br>  )) {<br>    process.stdout.write(chunk);<br>  }<br><br>  console.log(&#39;\n\nTotal tokens used:&#39;, assistant.getTotalTokensUsed());<br>  console.log(&#39;Conversation length:&#39;, assistant.getHistoryLength(), &#39;messages&#39;);<br><br>  // Fork for a different conversation branch<br>  const forked = assistant.fork({ temperature: 0.9 });<br>  await forked.chat(&#39;Now explain it more creatively&#39;);<br>}<br><br>main().catch(console.error);</pre><ul><li>Creates an <strong>EnterpriseChatClient</strong> instance configured as an expert programmer with very low temperature (0.2), custom GPT-4-Turbo pricing, token usage logging, and error handling</li><li>Performs a <strong>non-streaming</strong> chat completion asking “What is a closure in JavaScript?” and logs the full response + finish reason</li><li>Runs a <strong>streaming</strong> chat completion for “Give me an example”, printing each token chunk in real-time as it arrives</li><li><strong>After both calls, shows total tokens</strong> used across the whole session and current conversation history length</li><li>Demonstrates <strong>conversation forking</strong>: creates a new independent branch with higher creativity (temperature 0.9) and continues the conversation in that forked context</li></ul><p><strong>As mentioned above, I made several revisions of the code while writing this article </strong>and when I added the <strong>pricing to the class</strong>, I also had to add it to the <strong>config</strong> in this script — reason: remember that we are using dependency injection.</p><p><strong>Run the whole thing with this:</strong></p><pre>npx ts-node ./runEnterpriseChatClient.ts </pre><p><strong>Keep this code.</strong></p><p><strong>In future articles</strong> we will probably be alternating between functional approaches to illustrate basic principles, and building more advanced interfaces on this class-based logic!</p><p>I believe the next article will be about calling tools.</p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of <strong>years of research and writing I have done</strong> on cloud best practices and then further integrates that with my prior cloud books and also <strong>code solutions and tutorials integrated using multiple AIs</strong> and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p><strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Savings:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b2abfb050528" width="1" height="1" alt=""><hr><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-advanced-version-ai-agent-coding-series-2-b2abfb050528">AI Chat Coding Essentials — Advanced version(AI Agent Coding Series #2)</a> was originally published in <a href="https://medium.com/ai-dev-tips">AI Dev Tips</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Amazon EKS (K8s) Media Cluster: Part 5 — Node Autoscaling with Karpenter + Spot instances]]></title>
            <link>https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-5-node-autoscaling-with-karpenter-spot-instances-de2f7c3334ad?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/de2f7c3334ad</guid>
            <category><![CDATA[aws-eks]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Mon, 05 Jan 2026 01:37:43 GMT</pubDate>
            <atom:updated>2026-01-05T13:41:44.009Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Amazon EKS (K8s) Media Cluster: Part 5 — Node Autoscaling with Karpenter + Spot instances</strong></h3><h4>Karpenter node autoscaling and bin packing, Helm charts, HPA, automatic Spot instances, k6 load testing.</h4><p>✅ <strong>“I need my nodes to scale automatically with pod demand and be able to handle overcapacity with Spot instances— no Pending pods!”</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*fiLvBQfH4id32zDbRuFGeg.jpeg" /></figure><p><strong>In the last article we succeeded in autoscaling pods</strong> to handle traffic spikes, but then we encountered a new problem — <strong>pods often get stuck or delayed in pending if the nodes are maxed out at capacity. </strong>If autoscaling is delayed, then that is a problem.</p><p>Let’s solve this by using <strong>node optimization/autoscaling</strong> with the tool <strong>Karpenter</strong>. Later we can also <strong>better optimize our workload</strong> to use S<strong>pot instances</strong> which can also save a lot of money.</p><p>What we are doing in this article:</p><p>1. <strong>Learn about what we can do with Karpenter.<br> </strong>2. <strong>Rebuild our infra, </strong>if you destroyed it previously.<strong><br> 3. Prepare Karpenter (IAM).</strong><br> 4<strong>. Install Karpenter w/Terraform </strong>and<strong> Helm (K8s config tool).</strong><br> 5. Create <strong>NodePool</strong> (rules) <strong>EC2NodeClass</strong> (AMI, subnets, SGs).<br> 6<strong>. Test rapid node</strong> provisioning (30–60 second spin-up).<br> 7. <strong>Combined load test: HPA + Karpenter working together.</strong><br> 8. Notes: <strong>Configure Spot</strong> instances for cost savings. <strong>Observe</strong> automatic node consolidation.<br> 9. <strong>Cleanup, destroy resources</strong></p><h4><strong>Goals to achieve:</strong></h4><ul><li><strong>Scale deployment up to 20 pods. </strong>Karpenter: new nodes in real-time.</li><li><strong>Nodes appear in 30–60 seconds </strong>(vs 3–5 minutes w/Cluster Autoscaler)</li><li><strong>Watch Karpenter select the most cost-effectiv</strong>e instance types.</li><li><strong>Node consolidation </strong>(Karpenter removes underutilized nodes).</li><li><strong>Run workloads on Spot instances</strong> with automatic On-Demand fallback.</li><li><strong>Do not have to manually manage node groups</strong> again.</li></ul><h4><strong>Skills we’ll learn in this article:</strong></h4><ul><li><strong>Karpenter architecture and components.</strong></li><li><strong>How Helm works</strong> with K8s for configuration.</li><li><strong>NodePool</strong> and <strong>EC2NodeClass</strong> resource definitions.</li><li><strong>Spot instance </strong>management and interruption handling.</li><li><strong>Bin packing</strong> and resource optimization.</li><li><strong>Node consolidation strategie</strong>s.</li><li><strong>Mixed capacity types (Spot + On-Demand).</strong></li><li><strong>Helm</strong> chart installation via <strong>Terraform.</strong></li></ul><p>This should take about 1 hour+ including about 15 minutes each building and destroying resources.</p><p><strong>Files we will create/modify in this article:</strong></p><p>environments/dev/karpenter-iam.tf… IAM roles, policies, SQS queue<br>environments/dev/karpenter.tf … Karpenter Helm installation<br>environments/dev/providers.tf … Add Helm provider<br>environments/dev/main.tf … Add Karpenter discovery tags<br>k8s/karpenter-nodepool.yaml … NodePool &amp; EC2NodeClass</p><p>🛠️ Get more articles like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>Prerequisites for Amazon EKS (K8s) — Part 5</h3><p>Previous articles in this series you should do first:</p><p><strong>✅ PART 1 </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-1-initial-setup-roadmap-176bdb085d32"><strong>Amazon EKS (K8s) Part 1 — Initial Setup/Roadmap</strong></a></p><p><strong>✅ PART 2 </strong><a href="https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-2-deploy-initial-terraform-multi-az-eks-cluster-e1a87efc9925"><strong>Amazon EKS (K8s) Part 2 — Deploy Initial Terraform Multi-AZ EKS Cluster</strong></a></p><p><strong>✅ PART 3 </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-3-self-healing-video-pods-e4459ad9ecc0?postPublishedType=repub"><strong>Amazon EKS (K8s) Part 3 — Self-Healing Video Pods</strong></a></p><p><strong>✅ PART 4 </strong><a href="https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-4-pod-auto-scaling-hpa-and-cdn-f1d9a060e20a"><strong>Amazon EKS (K8s) Part 4 — Pod Auto-Scaling (HPA) and CDN</strong></a></p><p>I recommend doing all these because we are building on existing prior lessons.</p><h3>1. What can we do with Karpenter</h3><p>Karpenter specializes in creating “just-in-time” nodes.</p><blockquote>“Karpenter simplifies Kubernetes infrastructure with the right nodes at the right time. Karpenter automatically launches just the right compute resources to handle your cluster’s applications. It is designed to let you take full advantage of the cloud with fast and simple compute provisioning for Kubernetes clusters.” — official website.</blockquote><figure><a href="https://karpenter.sh/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4uxVutq-9o_ZqHIBlgTBXA.png" /></a><figcaption><a href="https://karpenter.sh/">https://karpenter.sh/</a></figcaption></figure><ul><li><strong>Just-in-time provisioning with right-sizing. </strong>Karpenter directly observes unschedulable pods, evaluates their resource requests and provisions optimally sized nodes on-demand.</li><li><strong>Rapid scaling and deprovisioning.</strong> It launches nodes quickly to minimize pod scheduling latency and terminates them when no longer needed.</li><li><strong>Workload consolidation for optimization.</strong> Karpenter consolidates underutilized workloads by bin-packing pods onto fewer, more efficient nodes or replacing expensive/over-provisioned nodes with better-fitting ones.</li><li><strong>Flexible instance selection and cost savings</strong> It supports a wide range of instance types, including Spot instances, On-Demand, and Reserved, with options like weightings, requirements, and limits in NodePools.</li><li><strong>Native kubernetes integration and simplicity</strong> Managed via declarative CRDs like NodePools and NodeClasses, it uses Kubernetes-native APIs for constraints.</li></ul><p>As you can see this is a powerful tool we can use to automate the optimization of our workloads.</p><h4>Strengths of Karpenter:</h4><ul><li><strong>Speed:</strong> 30–60 seconds in most cases vs. minutes in other solutions.</li><li><strong>Optimized:</strong> Best optimizations per workload.</li><li><strong>Cost:</strong> Easily add Spot instances for cost savings.</li><li><strong>Efficient: </strong>Uses bin packing for better optimization.</li></ul><h4>What about alternatives to Karpenter?</h4><p>There are a couple alternatives we can discuss upfront:</p><p><strong>AWS ECS: </strong>Many teams start with AWS ECS for their container orchestration, and it can also scale containers. <strong>But ECS lacks bin packing (optimizing pods in nodes).</strong> Also it has a <strong>fixed ASG (Autoscaling Groups)</strong> so it has less flexibility than Karpenter. <strong>Node consolidation is manual</strong> in ECS. It’s also slower in the autoscaling. Overall while ECS will work for simpler, smaller projects requiring less granular optimization, it does not have the options and granularity of Karpenter.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pKrhHsGd2nxXhPeDjYxB0Q.png" /><figcaption>source: <a href="https://docs.aws.amazon.com/whitepapers/latest/overview-deployment-options/amazon-elastic-container-service.html">https://docs.aws.amazon.com/whitepapers/latest/overview-deployment-options/amazon-elastic-container-service.html</a></figcaption></figure><p><strong>Cluster Autoscaler: </strong>Cluster Autoscaler is the traditional Kubernetes component that automatically adjusts nodes in your cluster. It monitors for unschedulable pods, then scales up by adding nodes to existing Auto Scaling Groups (ASGs) (or Managed Node Groups in EKS), and scales down by removing underutilized nodes.</p><p>The problem with this is it works at the level of predefined node groups with fixed instance types, <strong>which often leads to slower scaling</strong></p><p><a href="https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler">autoscaler/cluster-autoscaler at master · kubernetes/autoscaler</a></p><h3>2. Rebuild Cluster (If Destroyed)</h3><p>At the end of the last article we destroyed our cluster with Terraform to <strong>avoid charges. When you rebuild remember to destroy all resources again and check on it in the AWS console or you will incur fees.</strong></p><p><strong>If you did this already then follow these instructions to rebuild it.</strong></p><ul><li>For the AWS console, log in as root email that you set up before for your EKS admin. <strong>I recommend doing this in an incognito window </strong>to prevent any conflicts with other sessions.</li></ul><p>The rebuild code is similar to the prior article Part 4…</p><p>Except we will make an adjustment here for the count value in CloudFront which gives an initial error.</p><p>We have to modify 2 files: cloudfront.tf and variables.tf</p><p>And in cloudfront.tf update this line under # CloudFront Distribution to this:</p><pre>resource &quot;aws_cloudfront_distribution&quot; &quot;video_app&quot; {<br>  count = var.enable_cloudfront ? 1 : 0</pre><p>And under local variables (above, higher in the file):</p><pre>locals {<br>  # Get LoadBalancer hostname (empty string if service not deployed)<br>  lb_hostname = try(<br>    data.kubernetes_service.video_app.status[0].load_balancer[0].ingress[0].hostname,<br>    &quot;&quot;<br>  )</pre><p>And then the output (lower in that file)</p><pre>output &quot;cloudfront_url&quot; {<br>  description = &quot;Full CloudFront URL&quot;<br>  value       = var.enable_cloudfront ? &quot;https://${aws_cloudfront_distribution.video_app[0].domain_name}&quot; : &quot;Set enable_cloudfront=true after deploying video-app service&quot;<br>}</pre><p>Then at the end of variables.tf add this:</p><pre><br># -----------------------------------------------------------------------------<br># CloudFront Configuration<br># -----------------------------------------------------------------------------<br><br>variable &quot;enable_cloudfront&quot; {<br>  description = &quot;Enable CloudFront distribution (requires video-app K8s service to be deployed first)&quot;<br>  type        = bool<br>  default     = false<br>}</pre><p><strong>Now continue logging in with your AWS CLI</strong></p><pre># Verify AWS CLI profile<br>export AWS_PROFILE=terraform-eks-admin<br>aws sts get-caller-identity</pre><p><strong>And Terraform</strong></p><pre># From project root<br>  cd environments/dev<br>  terraform plan<br>  terraform apply<br><br>  aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster<br>  kubectl get nodes<br>  # this should show 3 nodes<br>  <br>  # Run TWICE: kubectl apply -f ../../k8s/<br>  kubectl apply -f ../../k8s/<br>  kubectl apply -f ../../k8s/<br><br>  # confirm<br>  kubectl get pods -n video-app<br>  kubectl get svc -n video-app<br><br>  # Then to enable CloudFront (after service has LoadBalancer):<br><br>  cd environments/dev # should be here already, but if not<br><br>  terraform apply -var=&quot;enable_cloudfront=true&quot;<br><br>  # Apply Metrics server (needed later for HPA)<br>  kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml</pre><p><strong>After you run these commands you should be up and running.</strong></p><p>✅ If you stop the article at any time, and destroy resources, then you must <strong>do the above again to get back to the correct state.</strong></p><p>⚠️ <strong>REMINDER: </strong>Just remember that you are being charged by Amazon AWS for the resources. You must (1) use terraform destroy AND (2) check the AWS console to <strong>be sure all resources are removed so you won’t get charged.</strong> (recall from previous article, sometimes an error causes resources to not be deleted, so you could still be charged even after running the command.) ⚠️ <strong>Run terraform destroy TWICE and double-check console and remove anything not removed.</strong></p><ul><li>🚨<strong>Warning: </strong>I did notice once my ELB did not get destroyed, even though I used terraform destroy, which then also caused the Internet Gateway to not be destroyed — <strong>so double-check.</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/869/1*ZIrHKPIL-pSbAZiheYU7AA.png" /></figure><h3>3. Prepare for Karpenter</h3><p>In this section we need to create some Terraform for Karpenter configurations.</p><ol><li><strong>IAM Role</strong>: To launch/terminate EC2 instances</li><li><strong>Instance Profile</strong>: For nodes it creates</li><li><strong>Subnet Tags</strong>: To know where to launch nodes</li><li><strong>Security Group Tags</strong>: To attach to nodes</li><li><strong>OIDC Provider</strong>: For IAM authentication (already have from Part 2)</li></ol><h4>Create Karpenter IAM Resources</h4><p>Create file: environments/dev/karpenter-iam.tf</p><pre># =============================================================================<br># KARPENTER IAM CONFIGURATION<br># =============================================================================<br># IAM roles and policies required for Karpenter to provision EC2 instances.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Data Sources<br># -----------------------------------------------------------------------------<br>data &quot;aws_iam_policy_document&quot; &quot;karpenter_controller_assume_role&quot; {<br>  statement {<br>    actions = [&quot;sts:AssumeRoleWithWebIdentity&quot;]<br>    effect  = &quot;Allow&quot;<br><br>    condition {<br>      test     = &quot;StringEquals&quot;<br>      variable = &quot;${replace(module.eks.cluster_oidc_issuer_url, &quot;https://&quot;, &quot;&quot;)}:sub&quot;<br>      values   = [&quot;system:serviceaccount:karpenter:karpenter&quot;]<br>    }<br><br>    condition {<br>      test     = &quot;StringEquals&quot;<br>      variable = &quot;${replace(module.eks.cluster_oidc_issuer_url, &quot;https://&quot;, &quot;&quot;)}:aud&quot;<br>      values   = [&quot;sts.amazonaws.com&quot;]<br>    }<br><br>    principals {<br>      type        = &quot;Federated&quot;<br>      identifiers = [module.eks.oidc_provider_arn]<br>    }<br>  }<br>}<br><br># -----------------------------------------------------------------------------<br># Karpenter Controller IAM Role<br># -----------------------------------------------------------------------------<br>resource &quot;aws_iam_role&quot; &quot;karpenter_controller&quot; {<br>  name               = &quot;${var.cluster_name}-karpenter-controller&quot;<br>  assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role.json<br><br>  tags = local.tags<br>}<br><br># -----------------------------------------------------------------------------<br># Karpenter Controller Policy<br># -----------------------------------------------------------------------------<br>resource &quot;aws_iam_policy&quot; &quot;karpenter_controller&quot; {<br>  name        = &quot;${var.cluster_name}-karpenter-controller&quot;<br>  description = &quot;Policy for Karpenter controller&quot;<br><br>  policy = jsonencode({<br>    Version = &quot;2012-10-17&quot;<br>    Statement = [<br>      {<br>        Sid    = &quot;Karpenter&quot;<br>        Effect = &quot;Allow&quot;<br>        Action = [<br>          &quot;ec2:CreateFleet&quot;,<br>          &quot;ec2:CreateLaunchTemplate&quot;,<br>          &quot;ec2:CreateTags&quot;,<br>          &quot;ec2:DeleteLaunchTemplate&quot;,<br>          &quot;ec2:DescribeAvailabilityZones&quot;,<br>          &quot;ec2:DescribeImages&quot;,<br>          &quot;ec2:DescribeInstances&quot;,<br>          &quot;ec2:DescribeInstanceTypeOfferings&quot;,<br>          &quot;ec2:DescribeInstanceTypes&quot;,<br>          &quot;ec2:DescribeLaunchTemplates&quot;,<br>          &quot;ec2:DescribeSecurityGroups&quot;,<br>          &quot;ec2:DescribeSpotPriceHistory&quot;,<br>          &quot;ec2:DescribeSubnets&quot;,<br>          &quot;ec2:RunInstances&quot;,<br>          &quot;ec2:TerminateInstances&quot;,<br>          &quot;iam:AddRoleToInstanceProfile&quot;,<br>          &quot;iam:CreateInstanceProfile&quot;,<br>          &quot;iam:DeleteInstanceProfile&quot;,<br>          &quot;iam:GetInstanceProfile&quot;,<br>          &quot;iam:PassRole&quot;,<br>          &quot;iam:RemoveRoleFromInstanceProfile&quot;,<br>          &quot;iam:TagInstanceProfile&quot;,<br>          &quot;pricing:GetProducts&quot;,<br>          &quot;ssm:GetParameter&quot;<br>        ]<br>        Resource = &quot;*&quot;<br>      },<br>      {<br>        Sid    = &quot;ConditionalEC2Termination&quot;<br>        Effect = &quot;Allow&quot;<br>        Action = &quot;ec2:TerminateInstances&quot;<br>        Resource = &quot;*&quot;<br>        Condition = {<br>          StringLike = {<br>            &quot;ec2:ResourceTag/karpenter.sh/nodepool&quot; = &quot;*&quot;<br>          }<br>        }<br>      },<br>      {<br>        Sid    = &quot;PassNodeIAMRole&quot;<br>        Effect = &quot;Allow&quot;<br>        Action = &quot;iam:PassRole&quot;<br>        Resource = aws_iam_role.karpenter_node.arn<br>      },<br>      {<br>        Sid    = &quot;EKSClusterEndpointLookup&quot;<br>        Effect = &quot;Allow&quot;<br>        Action = &quot;eks:DescribeCluster&quot;<br>        Resource = module.eks.cluster_arn<br>      },<br>      {<br>        Sid    = &quot;SQSInterruptionQueue&quot;<br>        Effect = &quot;Allow&quot;<br>        Action = [<br>          &quot;sqs:DeleteMessage&quot;,<br>          &quot;sqs:GetQueueAttributes&quot;,<br>          &quot;sqs:GetQueueUrl&quot;,<br>          &quot;sqs:ReceiveMessage&quot;<br>        ]<br>        Resource = aws_sqs_queue.karpenter_interruption.arn<br>      }<br>    ]<br>  })<br><br>  tags = local.tags<br>}<br><br>resource &quot;aws_iam_role_policy_attachment&quot; &quot;karpenter_controller&quot; {<br>  role       = aws_iam_role.karpenter_controller.name<br>  policy_arn = aws_iam_policy.karpenter_controller.arn<br>}<br><br># -----------------------------------------------------------------------------<br># Karpenter Node IAM Role (for nodes Karpenter creates)<br># -----------------------------------------------------------------------------<br>resource &quot;aws_iam_role&quot; &quot;karpenter_node&quot; {<br>  name = &quot;${var.cluster_name}-karpenter-node&quot;<br><br>  assume_role_policy = jsonencode({<br>    Version = &quot;2012-10-17&quot;<br>    Statement = [<br>      {<br>        Effect = &quot;Allow&quot;<br>        Principal = {<br>          Service = &quot;ec2.amazonaws.com&quot;<br>        }<br>        Action = &quot;sts:AssumeRole&quot;<br>      }<br>    ]<br>  })<br><br>  tags = local.tags<br>}<br><br># Attach required policies for EKS worker nodes<br>resource &quot;aws_iam_role_policy_attachment&quot; &quot;karpenter_node_worker&quot; {<br>  role       = aws_iam_role.karpenter_node.name<br>  policy_arn = &quot;arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy&quot;<br>}<br><br>resource &quot;aws_iam_role_policy_attachment&quot; &quot;karpenter_node_cni&quot; {<br>  role       = aws_iam_role.karpenter_node.name<br>  policy_arn = &quot;arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy&quot;<br>}<br><br>resource &quot;aws_iam_role_policy_attachment&quot; &quot;karpenter_node_ecr&quot; {<br>  role       = aws_iam_role.karpenter_node.name<br>  policy_arn = &quot;arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly&quot;<br>}<br><br>resource &quot;aws_iam_role_policy_attachment&quot; &quot;karpenter_node_ssm&quot; {<br>  role       = aws_iam_role.karpenter_node.name<br>  policy_arn = &quot;arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore&quot;<br>}<br><br># Instance profile for Karpenter nodes<br>resource &quot;aws_iam_instance_profile&quot; &quot;karpenter_node&quot; {<br>  name = &quot;${var.cluster_name}-karpenter-node&quot;<br>  role = aws_iam_role.karpenter_node.name<br><br>  tags = local.tags<br>}<br><br># -----------------------------------------------------------------------------<br># SQS Queue for Spot Interruption Handling<br># -----------------------------------------------------------------------------<br>resource &quot;aws_sqs_queue&quot; &quot;karpenter_interruption&quot; {<br>  name                      = &quot;${var.cluster_name}-karpenter-interruption&quot;<br>  message_retention_seconds = 300<br>  sqs_managed_sse_enabled   = true<br><br>  tags = local.tags<br>}<br><br>resource &quot;aws_sqs_queue_policy&quot; &quot;karpenter_interruption&quot; {<br>  queue_url = aws_sqs_queue.karpenter_interruption.id<br><br>  policy = jsonencode({<br>    Version = &quot;2012-10-17&quot;<br>    Statement = [<br>      {<br>        Effect = &quot;Allow&quot;<br>        Principal = {<br>          Service = [<br>            &quot;events.amazonaws.com&quot;,<br>            &quot;sqs.amazonaws.com&quot;<br>          ]<br>        }<br>        Action   = &quot;sqs:SendMessage&quot;<br>        Resource = aws_sqs_queue.karpenter_interruption.arn<br>      }<br>    ]<br>  })<br>}<br><br># EventBridge rules for Spot interruption events<br>resource &quot;aws_cloudwatch_event_rule&quot; &quot;karpenter_spot_interruption&quot; {<br>  name        = &quot;${var.cluster_name}-karpenter-spot-interruption&quot;<br>  description = &quot;Spot instance interruption warnings for Karpenter&quot;<br><br>  event_pattern = jsonencode({<br>    source      = [&quot;aws.ec2&quot;]<br>    detail-type = [&quot;EC2 Spot Instance Interruption Warning&quot;]<br>  })<br><br>  tags = local.tags<br>}<br><br>resource &quot;aws_cloudwatch_event_target&quot; &quot;karpenter_spot_interruption&quot; {<br>  rule      = aws_cloudwatch_event_rule.karpenter_spot_interruption.name<br>  target_id = &quot;KarpenterInterruptionQueue&quot;<br>  arn       = aws_sqs_queue.karpenter_interruption.arn<br>}<br><br># EventBridge rule for instance rebalance recommendations<br>resource &quot;aws_cloudwatch_event_rule&quot; &quot;karpenter_rebalance&quot; {<br>  name        = &quot;${var.cluster_name}-karpenter-rebalance&quot;<br>  description = &quot;EC2 instance rebalance recommendations for Karpenter&quot;<br><br>  event_pattern = jsonencode({<br>    source      = [&quot;aws.ec2&quot;]<br>    detail-type = [&quot;EC2 Instance Rebalance Recommendation&quot;]<br>  })<br><br>  tags = local.tags<br>}<br><br>resource &quot;aws_cloudwatch_event_target&quot; &quot;karpenter_rebalance&quot; {<br>  rule      = aws_cloudwatch_event_rule.karpenter_rebalance.name<br>  target_id = &quot;KarpenterInterruptionQueue&quot;<br>  arn       = aws_sqs_queue.karpenter_interruption.arn<br>}<br><br># -----------------------------------------------------------------------------<br># Outputs<br># -----------------------------------------------------------------------------<br>output &quot;karpenter_controller_role_arn&quot; {<br>  description = &quot;ARN of the Karpenter controller IAM role&quot;<br>  value       = aws_iam_role.karpenter_controller.arn<br>}<br><br>output &quot;karpenter_node_role_arn&quot; {<br>  description = &quot;ARN of the Karpenter node IAM role&quot;<br>  value       = aws_iam_role.karpenter_node.arn<br>}<br><br>output &quot;karpenter_instance_profile_name&quot; {<br>  description = &quot;Name of the Karpenter node instance profile&quot;<br>  value       = aws_iam_instance_profile.karpenter_node.name<br>}<br><br>output &quot;karpenter_interruption_queue_name&quot; {<br>  description = &quot;Name of the SQS queue for Spot interruption handling&quot;<br>  value       = aws_sqs_queue.karpenter_interruption.name<br>}<br><br># -----------------------------------------------------------------------------<br># Authorize Karpenter Node Role in aws-auth ConfigMap<br># -----------------------------------------------------------------------------<br># This allows EC2 instances launched by Karpenter to join the EKS cluster.<br># Without this, nodes will get &quot;Unauthorized&quot; when trying to register.<br># -----------------------------------------------------------------------------<br>resource &quot;kubernetes_config_map_v1_data&quot; &quot;aws_auth_karpenter&quot; {<br>  metadata {<br>    name      = &quot;aws-auth&quot;<br>    namespace = &quot;kube-system&quot;<br>  }<br><br>  data = {<br>    mapRoles = yamlencode([<br>      {<br>        rolearn  = module.eks.eks_managed_node_groups[&quot;primary&quot;].iam_role_arn<br>        username = &quot;system:node:{{EC2PrivateDNSName}}&quot;<br>        groups   = [&quot;system:bootstrappers&quot;, &quot;system:nodes&quot;]<br>      },<br>      {<br>        rolearn  = aws_iam_role.karpenter_node.arn<br>        username = &quot;system:node:{{EC2PrivateDNSName}}&quot;<br>        groups   = [&quot;system:bootstrappers&quot;, &quot;system:nodes&quot;]<br>      }<br>    ])<br>  }<br><br>  force = true<br><br>  depends_on = [module.eks]<br>}</pre><p>⚠️ Make sure you copy/paste the above perfectly. I ended up having to add the entire section, which was previously forgotten and caused error:</p><pre>Authorize Karpenter Node Role in aws-auth ConfigMap</pre><p>If you get errors, check this IAM config file, because if you are missing permissions this will not work!</p><h4>Tag subnets for Karpenter discovery</h4><p>Update environments/dev/main.tf - add Karpenter discovery tags to subnets:</p><pre># In the VPC module, update private_subnet_tags:<br>private_subnet_tags = {<br>  &quot;kubernetes.io/cluster/${var.cluster_name}&quot; = &quot;shared&quot;<br>  &quot;kubernetes.io/role/internal-elb&quot;           = &quot;1&quot;<br>  &quot;karpenter.sh/discovery&quot;                    = var.cluster_name  # ADD THIS<br>  &quot;Tier&quot;                                      = &quot;private&quot;<br>}</pre><p>Also, I found out later in the article we need this to below, I believe you can <strong>add it now but I will mention it later when we launch Karpenter, </strong>add it is under Tags in its own block in main.tf</p><pre>  # ---------------------------------------------------------------------------<br>  # Karpenter Discovery Tag for Security Groups<br>  # ---------------------------------------------------------------------------<br>  node_security_group_tags = {<br>    &quot;karpenter.sh/discovery&quot; = var.cluster_name<br>  }</pre><p><strong>Apply the Terraform changes above:</strong></p><pre>cd environments/dev<br>terraform plan<br>terraform apply</pre><h3>4. Install Karpenter with Terraform/Helm</h3><p>Next we are going to install Helm and Karpenter by using Helm.</p><p><strong>Helm is the package manager for Kubernetes — </strong>similar to how npm works for Node.js, apt/yum for Linux, or Homebrew for macOS.</p><p>Technically, you can install Karpenter without Helm but I’m showing you the standards and we can use Helm for other things.</p><p>Helm simplifies how we <strong>define</strong>, <strong>install</strong>, <strong>configure</strong>, <strong>update</strong>, and <strong>manage</strong> even complex applications on Kubernetes clusters.</p><p><strong>The main reasons we use Helm:</strong></p><ul><li>Without Helm there are a lot of YAML files to manage, with Helm we just install with one command helm install</li><li>Easily handle updates with helm upgrade</li><li>Easy rollbacks with helm rollback</li></ul><h4><strong>Helm core concepts:</strong></h4><ul><li><strong>Chart: </strong>Package definition (like package.json), templates, default values, and metadata.</li><li><strong>Values: </strong>Configuration overrides (like .env files).</li><li><strong>Release:</strong> Installed instance of a chart.</li><li><strong>Repository: </strong>Where charts are stored (like npm registry). For example Karpenter uses: oci://public.ecr.aws/karpenter</li></ul><h4>Normal flow of how to use Helm with Karpenter:</h4><ol><li><strong>Chart contains YAML templates with</strong> {{ .Values.xxx }} placeholders.</li><li><strong>You provide values</strong> (via values.yaml or — set flags). Since we are using Terraform, we have the this in the Terraform.</li><li><strong>Helm renders templates,</strong> produces valid Kubernetes YAML.</li><li><strong>Helm applies the YAML to your cluster </strong>(like kubectl apply).</li><li><strong>Helm tracks the “release”</strong> so you can upgrade/rollback later!</li></ol><p><strong>Keep in mind</strong> since we are using <strong>Terraform</strong> with this, there is some slight variation. Without terraform you create a yaml file and execute the command to install with Helm. In this case, we are using terraform and helm as a provider.</p><h4>Configure Helm provider</h4><p>Update environments/dev/providers.tf:</p><pre># Add to required_providers block:<br>helm = {<br>  source  = &quot;hashicorp/helm&quot;<br>  version = &quot;~&gt; 2.12&quot;<br>}<br><br># Add Helm provider configuration:<br> provider &quot;helm&quot; {<br>  kubernetes {<br>    host                   = module.eks.cluster_endpoint<br>    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)<br><br>    exec {<br>      api_version = &quot;client.authentication.k8s.io/v1beta1&quot;<br>      command     = &quot;aws&quot;<br>      args = [<br>        &quot;eks&quot;,<br>        &quot;get-token&quot;,<br>        &quot;--cluster-name&quot;,<br>        var.cluster_name,<br>        &quot;--region&quot;,<br>        var.aws_region<br>      ]<br>    }<br>  }<br>}<br></pre><h4>Create Karpenter Helm Installation</h4><p>These are the instructions to use Helm to install Karpenter.<br>Create file: environments/dev/karpenter.tf</p><pre># =============================================================================<br># KARPENTER INSTALLATION<br># =============================================================================<br># Installs Karpenter using Helm chart via Terraform.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Karpenter Namespace<br># -----------------------------------------------------------------------------<br>resource &quot;kubernetes_namespace&quot; &quot;karpenter&quot; {<br>  metadata {<br>    name = &quot;karpenter&quot;<br><br>    labels = {<br>      name = &quot;karpenter&quot;<br>    }<br>  }<br><br>  depends_on = [module.eks]<br>}<br><br># -----------------------------------------------------------------------------<br># Karpenter Helm Release<br># -----------------------------------------------------------------------------<br>resource &quot;helm_release&quot; &quot;karpenter&quot; {<br>  namespace  = kubernetes_namespace.karpenter.metadata[0].name<br>  name       = &quot;karpenter&quot;<br>  repository = &quot;oci://public.ecr.aws/karpenter&quot;<br>  chart      = &quot;karpenter&quot;<br>  version    = &quot;1.6.0&quot;  # Supports K8s 1.34 - can use 1.8.x<br><br>  # Wait for CRDs to be ready<br>  wait    = true<br>  timeout = 600  # 10 minutes (I added this later because of a timeout)<br><br>  # Force upgrade if stuck (I added this later because of timeout)<br>  force_update  = true<br>  recreate_pods = true<br><br>  values = [<br>    yamlencode({<br>      settings = {<br>        clusterName       = module.eks.cluster_name<br>        clusterEndpoint   = module.eks.cluster_endpoint<br>        interruptionQueue = aws_sqs_queue.karpenter_interruption.name<br>      }<br><br>      serviceAccount = {<br>        annotations = {<br>          &quot;eks.amazonaws.com/role-arn&quot; = aws_iam_role.karpenter_controller.arn<br>        }<br>      }<br><br>      controller = {<br>        resources = {<br>          requests = {<br>            cpu    = &quot;200m&quot;<br>            memory = &quot;256Mi&quot;<br>          }<br>          limits = {<br>            cpu    = &quot;1&quot;<br>            memory = &quot;1Gi&quot;<br>          }<br>        }<br>      }<br><br>      # Enable logging<br>      logLevel = &quot;info&quot;<br>    })<br>  ]<br><br>  depends_on = [<br>    kubernetes_namespace.karpenter,<br>    aws_iam_role_policy_attachment.karpenter_controller,<br>    module.eks<br>  ]<br>}<br><br># -----------------------------------------------------------------------------<br># Outputs<br># -----------------------------------------------------------------------------<br>output &quot;karpenter_namespace&quot; {<br>  description = &quot;Karpenter namespace&quot;<br>  value       = kubernetes_namespace.karpenter.metadata[0].name<br>}<br><br>output &quot;karpenter_chart_version&quot; {<br>  description = &quot;Karpenter Helm chart version&quot;<br>  value       = helm_release.karpenter.version<br>}</pre><h4>What this means:</h4><ul><li><strong>Creates a dedicated namespace.</strong> A karpenter namespace is created in your EKS cluster to isolate Karpenter’s components from other workloads</li><li><strong>CRD = Custom Resource Definition. </strong>Once you install a CRD, Kubernetes recognizes a new kind of object. <strong>Karpenter is built entirely around CRDs.</strong></li><li><strong>Installs Karpenter via Helm. </strong>Pulls the official Karpenter chart (v1.0.0) from AWS’s public ECR registry (oci://public.ecr.aws/karpenter) and deploys it to your cluster</li><li><strong>Configures cluster connection. </strong>Passes your EKS cluster name, API endpoint, and SQS interruption queue so Karpenter knows which cluster to manage and can handle Spot instance interruptions.</li><li><strong>Sets up IAM authentication.</strong> Annotates the Karpenter service account with the IAM role ARN, enabling IRSA (IAM Roles for Service Accounts) so Karpenter can provision EC2 instances.</li><li><strong>Defines resource limits. </strong>Allocates 200m CPU / 256Mi memory (requests) up to 1 CPU / 1Gi memory (limits) for the Karpenter controller pod to ensure stable operation</li></ul><h4>Apply Karpenter installation</h4><pre>cd environments/dev<br>terraform init  # Needed for new Helm provider &lt;------- ATTENTION<br>terraform plan<br>terraform apply<br><br># Verify Karpenter is running<br>kubectl get pods -n karpenter<br>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=20<br><br># see leader node<br>kubectl get lease -n karpenter</pre><p>Results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/824/1*4tT-YImVQKlWUCQcBSLVTA.png" /></figure><ol><li><strong>Leader pod: </strong>Actively provisions/terminates nodes.</li><li><strong>Standby pod:</strong> Ready to take over immediately if the leader fails.</li></ol><p><strong>Karpenter keeps working even if one pod crashes or gets evicted during node scaling</strong>. Only one pod is actively making decisions at any time (leader election), but having two means zero downtime for the autoscaler itself.</p><p>Karpenter runs as a Deployment and uses the standard Kubernetes built-in leader election mechanism based on Leases (from the coordination.k8s.io API group). When the leader fails, one of the standby pods detects the lease has expired and acquires it usually in seconds.</p><p><strong>Logs results:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SNQqVGLfeACMrzckDOLx5A.png" /></figure><p>note: There are a couple of minor errors. <strong>These are normal startup messages,</strong> not real errors. They resolve automatically I believe. The rest of the install looks normal.</p><h3>5. Create NodePool &amp; EC2NodeClass</h3><p>Now we need to define some of the basic EC2 instance settings.</p><h4>Create Karpenter resources</h4><p>Create file: k8s/karpenter-nodepool.yaml</p><pre># =============================================================================<br># KARPENTER NODEPOOL<br># =============================================================================<br># Defines what instances Karpenter can provision and constraints.<br>#<br># ECS Comparison:<br># - NodePool is similar to ASG Launch Template + Scaling Policy<br># - But more flexible: Karpenter chooses optimal instance per workload<br># =============================================================================<br>apiVersion: karpenter.sh/v1<br>kind: NodePool<br>metadata:<br>  name: default<br>spec:<br>  # ---------------------------------------------------------------------------<br>  # Template: What nodes look like<br>  # ---------------------------------------------------------------------------<br>  template:<br>    metadata:<br>      labels:<br>        managed-by: karpenter<br>        environment: dev<br>    spec:<br>      # Reference to EC2NodeClass (AWS-specific settings)<br>      nodeClassRef:<br>        group: karpenter.k8s.aws<br>        kind: EC2NodeClass<br>        name: default<br><br>      # Instance requirements<br>      requirements:<br>        # Instance category (general purpose is cost-effective)<br>        - key: karpenter.k8s.aws/instance-category<br>          operator: In<br>          values: [&quot;t&quot;, &quot;m&quot;, &quot;c&quot;]  # t3, m5, c5 families<br><br>        # Instance size (small to large for flexibility)<br>        - key: karpenter.k8s.aws/instance-size<br>          operator: In<br>          values: [&quot;small&quot;, &quot;medium&quot;, &quot;large&quot;]<br><br>        # Capacity type: Prefer Spot, fallback to On-Demand<br>        - key: karpenter.sh/capacity-type<br>          operator: In<br>          values: [&quot;spot&quot;, &quot;on-demand&quot;]<br><br>        # Architecture<br>        - key: kubernetes.io/arch<br>          operator: In<br>          values: [&quot;amd64&quot;]<br><br>        # Operating system<br>        - key: kubernetes.io/os<br>          operator: In<br>          values: [&quot;linux&quot;]<br><br>  # ---------------------------------------------------------------------------<br>  # Disruption: When Karpenter can remove/replace nodes<br>  # ---------------------------------------------------------------------------<br>  disruption:<br>    # Consolidation policy: Remove underutilized nodes<br>    consolidationPolicy: WhenEmptyOrUnderutilized<br>    consolidateAfter: 30s<br><br>    # Budget: How many nodes can be disrupted at once<br>    budgets:<br>      - nodes: &quot;20%&quot;<br><br>  # ---------------------------------------------------------------------------<br>  # Limits: Maximum resources Karpenter can provision<br>  # ---------------------------------------------------------------------------<br>  limits:<br>    cpu: &quot;100&quot;        # Max 100 vCPUs total<br>    memory: &quot;200Gi&quot;   # Max 200 GB memory total<br><br>  # ---------------------------------------------------------------------------<br>  # Weight: Priority when multiple NodePools exist<br>  # ---------------------------------------------------------------------------<br>  weight: 100<br><br>---<br># =============================================================================<br># EC2NODECLASS<br># =============================================================================<br># AWS-specific configuration: AMI, subnets, security groups, etc.<br># =============================================================================<br>apiVersion: karpenter.k8s.aws/v1<br>kind: EC2NodeClass<br>metadata:<br>  name: default<br>spec:<br>  # ---------------------------------------------------------------------------<br>  # AMI Configuration<br>  # ---------------------------------------------------------------------------<br>  # Use EKS-optimized AMI (Amazon Linux 2023)<br>  amiSelectorTerms:<br>    - alias: al2023@latest<br><br>  # ---------------------------------------------------------------------------<br>  # Network Configuration<br>  # ---------------------------------------------------------------------------<br>  # Discover subnets by tag (set in Terraform)<br>  subnetSelectorTerms:<br>    - tags:<br>        karpenter.sh/discovery: eks-video-cluster<br><br>  # Discover security groups by tag<br>  securityGroupSelectorTerms:<br>    - tags:<br>        karpenter.sh/discovery: eks-video-cluster<br><br>  # ---------------------------------------------------------------------------<br>  # IAM Configuration<br>  # ---------------------------------------------------------------------------<br>  # Instance profile for nodes (created in Terraform)<br>  instanceProfile: eks-video-cluster-karpenter-node<br><br>  # ---------------------------------------------------------------------------<br>  # Storage Configuration<br>  # ---------------------------------------------------------------------------<br>  blockDeviceMappings:<br>    - deviceName: /dev/xvda<br>      ebs:<br>        volumeSize: 30Gi<br>        volumeType: gp3<br>        iops: 3000<br>        throughput: 125<br>        deleteOnTermination: true<br>        encrypted: true<br><br>  # ---------------------------------------------------------------------------<br>  # Metadata Options<br>  # ---------------------------------------------------------------------------<br>  metadataOptions:<br>    httpEndpoint: enabled<br>    httpProtocolIPv6: disabled<br>    httpPutResponseHopLimit: 2<br>    httpTokens: required  # IMDSv2 required (security best practice)<br><br>  # ---------------------------------------------------------------------------<br>  # Tags for nodes Karpenter creates<br>  # ---------------------------------------------------------------------------<br>  tags:<br>    Project: eks-video-tutorial<br>    Environment: dev<br>    ManagedBy: karpenter</pre><p>Make sure you<em> </em>understand what this code does!</p><h4>What this NodePool configuration does</h4><ol><li><strong>Defines allowed instance types. </strong>Karpenter can provision t3/m5/c5 families in small/medium/large sizes, preferring Spot instances (70% cheaper) with On-Demand fallback for reliability.</li><li><strong>Sets resource limits. </strong>Caps total provisioned capacity at 100 vCPUs and 200Gi memory to prevent runaway costs from autoscaling.</li><li><strong>Enables automatic consolidation. </strong>Removes underutilized or empty nodes after 30 seconds, bin-packing workloads onto fewer nodes to save money.</li><li><strong>Configures AWS-specific settings (EC2NodeClass). </strong>Uses EKS-optimized AL2023 AMI, discovers subnets/security groups by tag, provisions 30Gi gp3 encrypted volumes.</li><li><strong>Enforces security. </strong>Requires IMDSv2 (httpTokens: required) to protect against SSRF attacks on instance metadata.</li></ol><h4>Tag Security Groups for Karpenter</h4><p>Add to environments/dev/main.tf in the EKS module:</p><pre># Add tags to node security group<br>node_security_group_tags = {<br>  &quot;karpenter.sh/discovery&quot; = var.cluster_name<br>}</pre><p>Apply changes</p><pre># Apply Terraform changes (security group tags)<br>cd environments/dev<br>terraform apply<br><br># Apply Karpenter NodePool and EC2NodeClass<br>kubectl apply -f ../../k8s/karpenter-nodepool.yaml<br><br># Verify resources created<br>kubectl get nodepool<br>kubectl get ec2nodeclass</pre><p><strong>Results:</strong></p><pre>nodepool.karpenter.sh/default created<br>ec2nodeclass.karpenter.k8s.aws/default created</pre><p>and then…<strong> whoops… I see an issue/error:</strong></p><pre>$ kubectl get nodepool<br>NAME      NODECLASS   NODES   READY   AGE<br>default   default     0       False   30s<br><br>$ kubectl get ec2nodeclass<br>NAME      READY   AGE<br>default   False   41s</pre><p>I am leaving this in so you can see what happens if there is an error. Its says that is it is not ready. I saw some errors in the logs.</p><p>You can also see the errors in the logs with:</p><pre>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50 | grep -i &quot;error\|nodeclass&quot;</pre><p><strong>I changed Helm version in </strong><strong>karpenter.tf</strong></p><pre> version    = &quot;1.6.0&quot;  # Supports K8s 1.34</pre><p><strong>Also in </strong><strong>main.tf added at the end of the file:</strong></p><pre>+  # ---------------------------------------------------------------------------                                                       <br># Karpenter Discovery Tag for Security Groups                                                                                       <br># ---------------------------------------------------------------------------                                                       <br>node_security_group_tags = {                                                                                                        <br>      &quot;karpenter.sh/discovery&quot; = var.cluster_name                                                                                       <br>}   </pre><p>Re-run commands:</p><pre># Apply Terraform changes (adds security group tag + upgrades Karpenter)<br>  terraform plan<br>  terraform apply<br><br>  # Re-apply the NodePool (CRDs might have changed with new version)<br>  kubectl delete -f ../../k8s/karpenter-nodepool.yaml<br>  kubectl apply -f ../../k8s/karpenter-nodepool.yaml<br><br>  # Check status<br>  kubectl get ec2nodeclass<br>  kubectl get nodepool</pre><p><strong>Results:</strong></p><pre>$ kubectl get nodepool<br>NAME      NODECLASS   NODES   READY   AGE<br>default   default     0       True    67m<br><br>$ kubectl get ec2nodeclass<br>NAME      READY   AGE<br>default   True    67m</pre><p>I also got some more errors the first time so had to re-apply using the below… this is good to hold onto if you need to reset it.</p><h3>⚠️Re-apply Karpenter install if any errors</h3><p>If you need to re-apply <strong>at any time</strong>, then you can reset the Karpenter install with:</p><pre>  <br>  # if you have helm installed locally such as with:  brew install helm<br>  helm uninstall karpenter -n karpenter --wait<br><br>  # Delete the Karpenter deployment directly<br>  kubectl delete deployment -n karpenter --all<br><br>  # Delete the namespace (this removes everything in it)<br>  kubectl delete namespace karpenter<br><br>  # Remove the Helm release from Terraform state so it can recreate<br>  cd environments/dev<br>  terraform state rm helm_release.karpenter<br>  terraform state rm kubernetes_namespace.karpenter<br><br>  # Re-apply<br>  terraform apply<br>  <br>  # Re-apply kubectl<br>  kubectl apply -f ../../k8s/<br>  kubectl apply -f ../../k8s/ # second time fixed the namespace<br><br>  kubectl get ec2nodeclass<br>  kubectl get nodepool<br></pre><p><strong>Fixed! Just make sure it says “True” under ready</strong></p><pre>$ kubectl get ec2nodeclass<br><br>NAME      READY   AGE<br>default   True    113s<br><br>$ kubectl get nodepool<br><br>NAME      NODECLASS   NODES   READY   AGE<br>default   default     0       True    2m3s</pre><p>⚠️ <strong>REMINDER: </strong>Just remember that you are being charged by Amazon AWS for the resources. You must (1) use terraform destroy AND (2) check the AWS console to <strong>be sure all resources are removed so you won’t get charged.</strong> (recall from previous article, sometimes an error causes resources to not be deleted, so you could still be charged even after running the command.) ⚠️ <strong>Run terraform destroy TWICE and double-check console and remove anything not removed.</strong></p><h3>6. Test Node Provisioning</h3><p>In this section, we’ll migrate workloads from the managed node group to Karpenter-provisioned nodes. This demonstrates Karpenter’s ability to automatically provision nodes when pods are pending.</p><p>⚠️This section can be a little tricky. I had to redo it twice, because I ran into some errors such as with the service role below, but I believe this should work fine. Stick with it. You are learning.</p><p><strong>Karpenter must stay running during migration. </strong>It needs to be active to detect pending pods and <strong>provision</strong> <strong>replacement</strong> nodes.</p><h4>6.1 AWS Spot role</h4><p>Before Karpenter can launch Spot instances, <strong>AWS requires a service-linked role. </strong>This is a <strong>one-time setup</strong> per AWS account:</p><pre># Create the Spot service-linked role<br>aws iam create-service-linked-role --aws-service-name spot.amazonaws.com<br><br># If you see &quot;Role already exists&quot;, that&#39;s fine - you&#39;re good to proceed</pre><p><strong>AWS uses this role to manage Spot instance lifecycle events.</strong></p><p>Without it, Karpenter’s Spot instance requests will fail with: <em>“AuthFailure.ServiceLinkedRoleCreationNotPermitted”.</em></p><h4>6.2 Verify Karpenter is Running</h4><pre># Check Karpenter pods (should show 2 replicas for HA)<br>kubectl get pods -n karpenter<br><br># Expected output:<br>NAME                         READY   STATUS    RESTARTS   AGE<br>karpenter-859bfc7db7-h4wf6   1/1     Running   0          38m<br>karpenter-859bfc7db7-zr6d2   1/1     Running   0          62m<br><br># Verify NodePool and EC2NodeClass are ready<br>kubectl get nodepool<br>kubectl get ec2nodeclass<br><br># Expected output:<br>NAME      NODECLASS   NODES   READY   AGE<br>default   default     0       True    10m<br><br>NAME      READY   AGE<br>default   True    10m</pre><h4>6.3 Remove PodDisruptionBudgets (PDBs)</h4><p>PDBs protect pods from eviction but will block our drain operation. Delete them temporarily — they’ll be recreated automatically:</p><pre># Check existing PDBs<br>kubectl get pdb -A<br><br># Delete Karpenter PDB<br>kubectl delete pdb -n karpenter --all<br><br># Delete CoreDNS PDB<br>kubectl delete pdb -n kube-system coredns</pre><p>⚠️ <strong>If it says no resources found, that is fine — we want to delete it if it exists. But if does not exist, then just continue.</strong></p><p><strong>Why we do this:</strong> PDBs enforce minimum availability during voluntary disruptions. During migration, they can create a deadlock where pods can’t be evicted because there’s nowhere for them to go, but Karpenter won’t provision nodes because there are no pending pods.</p><h4>6.4 Phased Node Migration</h4><h4>6.4.1. Get node names</h4><pre>kubectl get nodes -l eks.amazonaws.com/nodegroup=primary<br><br># Example output:<br>NAME                         STATUS   ROLES    AGE   VERSION<br>ip-10-0-1-239.ec2.internal   Ready    &lt;none&gt;   81m   v1.34.2-eks-ecaa3a6<br>ip-10-0-2-106.ec2.internal   Ready    &lt;none&gt;   81m   v1.34.2-eks-ecaa3a6<br>ip-10-0-3-62.ec2.internal    Ready    &lt;none&gt;   81m   v1.34.2-eks-ecaa3a6</pre><h4>6.4.2. Cordon 2 of 3 nodes (keep one for Karpenter)</h4><p>Cordoning marks nodes as unschedulable — no new pods will be placed on those nodes.</p><p>⚠️ The name is like ip-10–0–1–239.ec2.internal and obviously yours will be different, these were from my output above.</p><pre># Keep the FIRST node available for Karpenter<br># Cordon the other two nodes<br>kubectl cordon ip-10-0-2-106.ec2.internal<br>kubectl cordon ip-10-0-3-62.ec2.internal<br><br># Verify status<br>kubectl get nodes<br><br># Expected output - two nodes show SchedulingDisabled:<br>NAME                         STATUS                     ROLES    AGE   VERSION<br>ip-10-0-1-239.ec2.internal   Ready                      &lt;none&gt;   45m   v1.34.2-eks-ecaa3a6<br>ip-10-0-2-106.ec2.internal   Ready,SchedulingDisabled   &lt;none&gt;   45m   v1.34.2-eks-ecaa3a6<br>ip-10-0-3-62.ec2.internal    Ready,SchedulingDisabled   &lt;none&gt;   45m   v1.34.2-eks-ecaa3a6</pre><h4>6.4.3. Drain the cordoned nodes</h4><p>Draining evicts all pods from the nodes.</p><p>Karpenter will detect pending pods and provision new nodes.</p><pre># Drain the two cordoned nodes<br>kubectl drain ip-10-0-2-106.ec2.internal --ignore-daemonsets --delete-emptydir-data<br>kubectl drain ip-10-0-3-62.ec2.internal --ignore-daemonsets --delete-emptydir-data<br><br># Expected output:<br>node/ip-10-0-2-106.ec2.internal already cordoned<br>Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-2ttcv, kube-system/eks-pod-identity-agent-9mkmt, kube-system/kube-proxy-tcfd8<br>evicting pod karpenter/karpenter-859bfc7db7-h4wf6<br>pod/karpenter-859bfc7db7-h4wf6 evicted<br>node/ip-10-0-2-106.ec2.internal drained<br>node/ip-10-0-3-62.ec2.internal already cordoned<br>Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-gtzxk, kube-system/eks-pod-identity-agent-qdsqp, kube-system/kube-proxy-crhmx<br>node/ip-10-0-3-62.ec2.internal drained</pre><h4>6.5 Deployment to Trigger Karpenter</h4><p>With only one node available and multiple pods, we need to create demand that exceeds capacity:</p><pre># Scale video-app to create pending pods (or change to 20 if nothing)<br>kubectl scale deployment video-app -n video-app --replicas=10<br><br># Watch for new nodes (Ctrl+C to exit)<br>kubectl get nodes -w<br><br># You should see some pending pods with this, it&#39;s scaling up<br>kubectl get pods -n video-app<br><br>NAME                         READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-29rgx   0/1     Pending   0          3m26s<br>video-app-6498b5dd57-7778w   1/1     Running   0          50m<br>video-app-6498b5dd57-b9t6h   1/1     Running   0          75m<br>video-app-6498b5dd57-jzsbq   1/1     Running   0          3m26s<br>video-app-6498b5dd57-n87vs   1/1     Running   0          75m<br>video-app-6498b5dd57-nsq57   0/1     Pending   0          3m25s<br>video-app-6498b5dd57-ppdbk   0/1     Pending   0          3m25s<br>video-app-6498b5dd57-xbzj5   1/1     Running   0          3m26s<br></pre><p>You can also check in the logs.</p><pre>$ kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=20<br><br># You will see some entries like <br>{<br>  &quot;level&quot;: &quot;INFO&quot;,<br>  &quot;time&quot;: &quot;2026-01-02T22:33:03.370Z&quot;,<br>  &quot;logger&quot;: &quot;controller&quot;,<br>  &quot;message&quot;: &quot;computed new nodeclaim(s) to fit pod(s)&quot;,<br>  &quot;commit&quot;: &quot;c8c45c1&quot;,<br>  &quot;controller&quot;: &quot;provisioner&quot;,<br>  &quot;namespace&quot;: &quot;&quot;,<br>  &quot;name&quot;: &quot;&quot;,<br>  &quot;reconcileID&quot;: &quot;eb5c68a5-3da2-4dd4-bf22-d7b1067ac4f8&quot;,<br>  &quot;nodeclaims&quot;: 2,<br>  &quot;pods&quot;: 3<br>}</pre><p><strong>Behind the scenes:</strong></p><pre>Timeline:<br>0s   - Scale to 10 replicas requested<br>5s   - Some pods are Pending (insufficient capacity on single node)<br>10s  - Karpenter detects Pending pods<br>15s  - Karpenter calculates optimal instance type<br>20s  - Karpenter calls EC2 CreateFleet API<br>45s  - New EC2 instance launches and joins cluster<br>50s  - Pending pods scheduled on new node<br>60s  - All pods Running!</pre><h4>6.6 Verify Karpenter Provisioned Nodes</h4><pre><br>  # List nodes managed by Karpenter (should include new ones)<br>  kubectl get nodes -l karpenter.sh/nodepool=default<br><br>  # Expected output - new nodes with private DNS names:<br>  # NAME                         STATUS   ROLES    AGE   VERSION<br>  # ip-10-0-2-221.ec2.internal   Ready    &lt;none&gt;   60s   v1.34.2-eks-ecaa3a6<br>  # ip-10-0-3-185.ec2.internal   Ready    &lt;none&gt;   60s   v1.34.2-eks-ecaa3a6<br><br>  # Check capacity type (Spot vs On-Demand) - look at CAPACITY-TYPE column<br>  kubectl get nodes -L karpenter.sh/capacity-type<br><br>  # Expected output - Karpenter nodes show &quot;spot&quot;:<br>  # NAME                         STATUS                     ROLES    AGE    CAPACITY-TYPE<br>  # ip-10-0-1-239.ec2.internal   Ready                      &lt;none&gt;   129m<br>  # ip-10-0-2-106.ec2.internal   Ready,SchedulingDisabled   &lt;none&gt;   129m<br>  # ip-10-0-2-221.ec2.internal   Ready                      &lt;none&gt;   73s    spot<br>  # ip-10-0-3-185.ec2.internal   Ready                      &lt;none&gt;   73s    spot<br>  # ip-10-0-3-62.ec2.internal    Ready,SchedulingDisabled   &lt;none&gt;   129m<br><br>  # Verify all video-app pods are running<br>  kubectl get pods -n video-app<br><br>  # Expected output - all pods Running:<br>  # NAME                         READY   STATUS    RESTARTS   AGE<br>  # video-app-6498b5dd57-xxxxx   1/1     Running   0          5m<br>  # video-app-6498b5dd57-xxxxx   1/1     Running   0          5m<br>  # ... (8-10 pods total)<br><br>  # Troubleshooting: Nodeclaims Stuck in Unknown<br>  # If you see nodeclaims stuck with READY: Unknown:<br>  # this can sometimes happen due to various errors<br><br>  # Check nodeclaims status<br>  kubectl get nodeclaims<br><br>  # If stuck in Unknown, delete and let Karpenter retry<br>  kubectl delete nodeclaims --all --force --grace-period=0<br><br>  # Verify deleted<br>  kubectl get nodeclaims<br><br>  # Watch for new nodes (should appear within 60 seconds)<br>  kubectl get nodes -w<br></pre><h4>🚀 Hopefully you got all that!</h4><p>If you run into any issues stick with it — I ran through this tutorial 3 times while writing it and ran into various errors, mostly related to IAM actually (permissions preventing EC2 instances from joining a cluster) — I have updated the code above obviously to resolve that (namely the Terrafrom IAM for Karpenter).</p><p>But the point is, there are a lot of details here, and just be diligent and consider some errors as part of the learning process.</p><p><strong>But this is a more advanced tutorial, so keep going!</strong></p><p><strong>🔥 It is quite an accomplishment to even get this far!</strong></p><h3>7. Combined Load Test (HPA + Karpenter)</h3><p>The goal of this section</p><p>Load HPA scales pods -&gt;Pods Pending -&gt; <br>Karpenter provisions nodes -&gt; Pods running</p><h4>Open Multiple Terminals</h4><p><strong>Terminal 1 — HPA Watch:</strong></p><pre>kubectl get hpa -n video-app --watch</pre><p><strong>Terminal 2 — Pod Watch:</strong></p><pre>kubectl get pods -n video-app --watch</pre><p><strong>Terminal 3 — Node Watch:</strong></p><pre>kubectl get nodes --watch</pre><p><strong>Terminal 4 — Karpenter Logs:</strong></p><pre>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f</pre><p><strong>Terminal 5 — Generate Load:</strong></p><pre># Get Load Balancer URL<br>LB_URL=$(kubectl get service video-app -n video-app -o jsonpath=&#39;{.status.loadBalancer.ingress[0].hostname}&#39;)<br>echo &quot;Load Balancer URL: $LB_URL&quot;<br><br># Verify URL works<br>curl -s http://$LB_URL/health<br><br># output<br><br>{<br>  &quot;status&quot;: &quot;healthy&quot;,<br>  &quot;timestamp&quot;: &quot;2026-01-03T14:57:37.672Z&quot;,<br>  &quot;hostname&quot;: &quot;video-app-6498b5dd57-dgzmk&quot;,<br>  &quot;uptime&quot;: 295.935155309<br>}</pre><h4>Generate Load with k6</h4><p>In Terminal 5, run the load test with k6.</p><p>First run this in your terminal, this creates a script for us to run:</p><pre>cat &gt; /tmp/loadtest.js &lt;&lt; &#39;SCRIPT&#39;<br>import http from &#39;k6/http&#39;;<br>import { check, sleep } from &#39;k6&#39;;<br><br>export default function () {<br>  const res = http.get(__ENV.TARGET_URL);<br>  check(res, { &#39;status is 200&#39;: (r) =&gt; r.status === 200 });<br>  sleep(0.1);<br>}<br>SCRIPT</pre><p>Then run this command (for more load try 400 vus):</p><pre>k6 run --vus 200 --duration 5m -e TARGET_URL=http://$LB_URL/api/info /tmp/loadtest.js</pre><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*32gtEx3eRqr2E6EoYMgHzw.png" /></figure><h4>Expected Scaling Sequence</h4><p>Watch the terminals as the load test runs:</p><pre>Time    HPA Replicas    Nodes    Events<br>─────────────────────────────────────────────────────<br>0:00    3               2        Baseline (Karpenter nodes from Section 6)<br>0:30    3 → 6           2        HPA detects CPU &gt; 70%, scales up<br>1:00    6 → 9           2        HPA continues scaling<br>1:15    9 → 12          2        Pods Pending (insufficient capacity)<br>1:20    12              2        Karpenter detects Pending pods<br>1:45    12              3        New Spot node joins cluster (~30 sec)<br>2:00    12              3        All pods Running<br>2:30    12 → 15         3        HPA continues if load persists<br>3:00    15              4        Another node added if needed</pre><h3>Verify Mixed Capacity Types</h3><p>After scaling, check the node distribution:</p><pre># See Spot vs On-Demand and instance types<br>kubectl get nodes -L karpenter.sh/capacity-type -L node.kubernetes.io/instance-type<br><br># Example output:<br># NAME                         STATUS   CAPACITY-TYPE   INSTANCE-TYPE<br># ip-10-0-1-50.ec2.internal    Ready    spot            t3a.small<br># ip-10-0-2-75.ec2.internal    Ready    spot            t2.small<br># ip-10-0-3-100.ec2.internal   Ready    spot            t3.medium</pre><p>Count nodes</p><pre># Count nodes by capacity type<br>kubectl get nodes -l karpenter.sh/capacity-type=spot --no-headers | wc -l<br>kubectl get nodes -l karpenter.sh/capacity-type=on-demand --no-headers | wc -l</pre><h3>Verify Pod Distribution</h3><p>Check that pods are spread across nodes:</p><pre># See which node each pod is running on<br>kubectl get pods -n video-app -o wide<br><br># Count pods per node<br>kubectl get pods -n video-app -o wide --no-headers | awk &#39;{print $7}&#39; | sort | uniq -c</pre><h4>Stop the Load Test</h4><p>Press Ctrl+C in Terminal 5 to stop the load test or wait for it to end.</p><h4>Watch Scale-Down and Consolidation</h4><p>After stopping the load, watch the automatic scale-down:</p><pre># Watch HPA scale down<br>kubectl get hpa -n video-app --watch<br><br># Watch nodes consolidate (Karpenter removes underutilized nodes)<br>kubectl get nodes --watch<br><br># Watch Karpenter logs for consolidation<br>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i consolidat</pre><p><strong>Expected scale-down sequence:</strong></p><pre>Time    Pods    Nodes    Events<br>─────────────────────────────────────────────────────<br>0:00    15      4        Load test ended<br>0:30    15      4        CPU drops below 70%<br>1:00    12      4        HPA scales down<br>2:00    9       4        HPA continues scaling down<br>3:00    6       4        Karpenter detects underutilization<br>3:30    6       3        Node cordoned, pods moved<br>4:00    3       3        HPA reaches minReplicas<br>5:00    3       2        Karpenter consolidates to fewer nodes</pre><h4>Final Verification</h4><pre># Check final state<br>kubectl get nodes -L karpenter.sh/capacity-type<br>kubectl get pods -n video-app<br>kubectl get hpa -n video-app<br><br># Check Karpenter-managed nodes<br>kubectl get nodes -l karpenter.sh/nodepool=default</pre><p>Examples results after several minutes, running 1000 vus (virtual users):</p><pre>$ kubectl get hpa -n video-app --watch<br><br>video-app-hpa   Deployment/video-app   cpu: 73%/70%   3         10        10         36m<br>video-app-hpa   Deployment/video-app   cpu: 74%/70%   3         10        10         36m<br>video-app-hpa   Deployment/video-app   cpu: 73%/70%   3         10        10         37m<br>video-app-hpa   Deployment/video-app   cpu: 74%/70%   3         10        10         37m<br>video-app-hpa   Deployment/video-app   cpu: 50%/70%   3         10        10         37m<br>video-app-hpa   Deployment/video-app   cpu: 21%/70%   3         10        10         37m<br>video-app-hpa   Deployment/video-app   cpu: 64%/70%   3         10        10         38m<br>video-app-hpa   Deployment/video-app   cpu: 128%/70%   3         10        10         38m<br>video-app-hpa   Deployment/video-app   cpu: 130%/70%   3         10        10         38m<br>video-app-hpa   Deployment/video-app   cpu: 126%/70%   3         10        10         38m<br>video-app-hpa   Deployment/video-app   cpu: 46%/70%    3         10        10         39m<br>video-app-hpa   Deployment/video-app   cpu: 57%/70%    3         10        10         39m<br>video-app-hpa   Deployment/video-app   cpu: 182%/70%   3         10        10         39m<br>video-app-hpa   Deployment/video-app   cpu: 293%/70%   3         10        10         39m<br>video-app-hpa   Deployment/video-app   cpu: 303%/70%   3         10        10         40m<br>video-app-hpa   Deployment/video-app   cpu: 298%/70%   3         10        10         40m<br>video-app-hpa   Deployment/video-app   cpu: 304%/70%   3         10        10         40m<br>video-app-hpa   Deployment/video-app   cpu: 300%/70%   3         10        10         40m<br>video-app-hpa   Deployment/video-app   cpu: 304%/70%   3         10        10         41m<br>video-app-hpa   Deployment/video-app   cpu: 114%/70%   3         10        10         41m<br>video-app-hpa   Deployment/video-app   cpu: 3%/70%     3         10        10         41m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        10         41mExplain how it works.</pre><h3>Successful Test!</h3><p>The complete autoscaling stack has been observed.</p><h4>Scale-Up:</h4><ol><li>Load test generated traffic</li><li>HPA detected high CPU usage (&gt;70%)</li><li>HPA scaled pods: 3 → 6 → 9 → 12+</li><li>Some pods went Pending (no capacity)</li><li>Karpenter detected Pending pods</li><li>Karpenter provisioned Spot instances (~30 seconds)</li><li>New nodes joined the cluster</li><li>Pods scheduled on new nodes</li><li>All requests served successfully!</li></ol><h4>Scale-Down:</h4><ol><li>Load test ended</li><li>HPA detected low CPU usage</li><li>HPA scaled down pods</li><li>Karpenter detected underutilized nodes</li><li>Karpenter consolidated workloads</li><li>Empty nodes terminated</li><li>Cost savings achieved!</li></ol><h3>8. Node Consolidation, <strong>Configure Spot</strong> instances (extra advanced notes about this)</h3><h4>Understanding Spot Instances</h4><p>Spot instances are spare AWS capacity offered at typically 60–90% discount. <strong>The trade-off: </strong>AWS can reclaim them with 2-minute warning. See more info at <a href="https://aws.amazon.com/ec2/spot/instance-advisor/">https://aws.amazon.com/ec2/spot/instance-advisor/</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4vIHFIPQ4ANxwS8M_4x_tw.png" /><figcaption><a href="https://aws.amazon.com/ec2/spot/instance-advisor/">https://aws.amazon.com/ec2/spot/instance-advisor/</a></figcaption></figure><h4>Spot Interruption Handling</h4><p>Karpenter handles Spot interruptions automatically via the SQS queue we configured in the Terraform:</p><pre>Spot Interruption Flow:<br>1. AWS sends 2-minute warning to SQS queue<br>2. Karpenter receives the message<br>3. Karpenter cordons the node (no new pods)<br>4. Karpenter provisions replacement node<br>5. Pods are rescheduled to new node<br>6. Interrupted node terminates<br><br>Result: Near-zero downtime despite Spot reclamation!</pre><p><strong>View interruption events:</strong></p><pre># Watch for interruption handling in logs<br>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i interrupt<br><br># Check SQS queue for interruption messages<br>aws sqs get-queue-attributes \<br>  --queue-url $(aws sqs get-queue-url --queue-name eks-video-cluster-karpenter-interruption --query &#39;QueueUrl&#39; --output text) \<br>  --attribute-names ApproximateNumberOfMessages</pre><h4>Create Spot-Only NodePool (Optional)</h4><p>For workloads that can tolerate interruptions, create a dedicated Spot NodePool</p><pre># k8s/karpenter-nodepool-spot.yaml<br>apiVersion: karpenter.sh/v1<br>kind: NodePool<br>metadata:<br>  name: spot-only<br>spec:<br>  template:<br>    metadata:<br>      labels:<br>        capacity-type: spot<br>    spec:<br>      nodeClassRef:<br>        group: karpenter.k8s.aws<br>        kind: EC2NodeClass<br>        name: default<br>      requirements:<br>        - key: karpenter.k8s.aws/instance-category<br>          operator: In<br>          values: [&quot;t&quot;, &quot;m&quot;, &quot;c&quot;]<br>        - key: karpenter.k8s.aws/instance-size<br>          operator: In<br>          values: [&quot;small&quot;, &quot;medium&quot;, &quot;large&quot;]<br>        # Force Spot only<br>        - key: karpenter.sh/capacity-type<br>          operator: In<br>          values: [&quot;spot&quot;]<br>        - key: kubernetes.io/arch<br>          operator: In<br>          values: [&quot;amd64&quot;]<br>  disruption:<br>    consolidationPolicy: WhenEmptyOrUnderutilized<br>    consolidateAfter: 30s<br>  limits:<br>    cpu: &quot;50&quot;<br>    memory: &quot;100Gi&quot;<br>  weight: 50  # Lower priority than default pool</pre><p><strong>Target specific workloads to Spot nodes:</strong></p><pre># In your deployment spec<br>spec:<br>  template:<br>    spec:<br>      nodeSelector:<br>        capacity-type: spot  # Matches the label in Spot-only NodePool</pre><h4><strong>Trigger </strong>Node Consolidation<strong>:</strong></h4><pre># Stop the load test (Ctrl+C in Terminal 5)<br><br># Scale down deployment<br>kubectl scale deployment video-app -n video-app --replicas=3<br><br># Watch nodes get consolidated<br>kubectl get nodes --watch<br><br># Watch Karpenter logs for consolidation events<br>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i consolidat</pre><p><strong>Karpenter automatically:</strong></p><ul><li>Detects underutilized nodes.</li><li>Moves pods to other nodes.</li><li>Terminates empty nodes.</li><li>Saves costs!</li></ul><pre># Stop the load test (Ctrl+C in Terminal 5)<br><br># Scale down deployment<br>kubectl scale deployment video-app -n video-app --replicas=3<br><br># Watch nodes get consolidated<br>kubectl get nodes --watch<br><br># Watch Karpenter logs for consolidation events<br>kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -i consolidat</pre><h4>Consolidation Settings</h4><pre>disruption:<br>  # When to consolidate<br>  consolidationPolicy: WhenEmptyOrUnderutilized<br><br>  # How long to wait before consolidating<br>  consolidateAfter: 30s<br><br>  # Budget: Limit simultaneous disruptions<br>  budgets:<br>    - nodes: &quot;20%&quot;  # Max 20% of nodes disrupted at once</pre><h3>9. Cleanup</h3><p>⚠️ <strong>REMINDER: </strong>Just remember that you are being charged by Amazon AWS for the resources. You must (1) use terraform destroy AND (2) check the AWS console to <strong>be sure all resources are removed so you won’t get charged.</strong> (recall from previous article, sometimes an error causes resources to not be deleted, so you could still be charged even after running the command.) ⚠️ <strong>Run terraform destroy TWICE and double-check console and remove anything not removed.</strong></p><h4>Remove Karpenter Resources</h4><pre># Delete NodePool and EC2NodeClass<br>kubectl delete -f k8s/karpenter-nodepool.yaml<br><br># Wait for Karpenter nodes to be terminated<br>kubectl get nodes --watch</pre><h4>Destroy All Infrastructure</h4><pre>cd environments/dev<br><br>terraform destroy</pre><p><strong>Recommended to do this TWICE,</strong> and check the AWS console especially in the VPC area for internet gateway and EC2 area for load balancers.</p><ul><li><strong>Do not assume all were destroyed automatically.</strong></li></ul><p><strong>Check AWS Console (especially these):</strong></p><ul><li>EKS cluster deleted</li><li>EC2 instances terminated (including Karpenter nodes)</li><li>Load balancers removed</li><li>IAM roles deleted</li><li>SQS queue deleted</li></ul><h3>AWESOME!</h3><p>After Parts 4 and 5, we have:</p><ul><li>Pods scaling automatically (HPA).</li><li>Nodes scaling automatically (Karpenter).</li><li>Spot instances for cost savings.</li><li>Helm for installs.</li><li>Karpenter for Spot instances, cost savings.</li></ul><p>By now, we are getting pretty experienced with this.</p><p>Is it starting to make sense? I hope so.</p><h3>Coming up!</h3><p>I have not finalized the next article, and I will let some days go by for people to catch up… but the below is vaguely what I am thinking about.</p><h4>These topics may be included in the next article, Part 6:</h4><ol><li>Kubernetes Dashboard for cluster overview</li><li>Prometheus installation for metrics collection</li><li>Grafana dashboards for visualization</li><li>Kubecost (or other) for detailed cost analysis</li><li>CloudWatch Container Insights integration</li><li>Custom dashboards for Karpenter metrics</li><li>HPA scaling history visualization</li><li>Cost comparison dashboards (Spot vs On-Demand)</li><li>Alert configuration for anomalies</li></ol><p>🛠️ Get more articles and tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here.</p><p>🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of <strong>years of research and writing I have done</strong> on cloud best practices and then further integrates that with my prior cloud books and also <strong>code solutions and tutorials integrated using multiple AIs</strong> and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p><strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Savings:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=de2f7c3334ad" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)]]></title>
            <link>https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/6ac06b8080b4</guid>
            <category><![CDATA[openai]]></category>
            <category><![CDATA[web-development]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[chatgpt]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Fri, 26 Dec 2025 15:55:29 GMT</pubDate>
            <atom:updated>2025-12-26T16:11:02.540Z</atom:updated>
            <content:encoded><![CDATA[<h4>Creating OpenAI chat messages in code, handling streaming vs non-streaming responses, temperature/top_p control and other factors and variables. This is a review of essentials for the new series.</h4><p><strong>It’s been a while since I added new articles to AI Dev Tips, </strong>but<strong> </strong>this seems like a great time for a review article and new AI agent series.</p><p>With new tech coming out daily it’s good to set a baseline with this “getting started” #1 in the series.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*mMG9RlYBWsoIlfJ5SLyGbA.jpeg" /><figcaption>Getting Started: AI Agent Coding Series #1</figcaption></figure><p>First I want to make sure we’re on the same page about typical <strong>AI Chat coding, standards, as well as terms and new SDKs</strong> being introduced (I will cover more on that in later articles)</p><p><strong>We will be using </strong><a href="https://openrouter.ai/"><strong>OpenRouter</strong></a><strong> and </strong><a href="https://platform.openai.com/docs/overview"><strong>OpenAI SDK/APIs</strong></a><strong> </strong>initially for this series but may branch into some others as well. OpenAI’s API format is used by many other LLMs (and OpenRouter) as a standardized API format, so it is good to learn that. OpenRouter allows us to use an interface for many different AI LLMs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cucCuAl4QJ7w0WoIMBa6Bg.png" /></figure><p>In this first article of the OpenAI Agent Coding Series, we’ll cover the <strong>fundamentals and essentials of working with the OpenAI Chat Completions API</strong>.</p><p><strong>I think getting these chat completion fundamentals down is really important. </strong>It’s accessible to most people at very little or no cost. And of course, this is just a start and a foundation to build on.</p><p>Later articles we’ll continue with more advanced OpenAI coding with potentially other SDKs like the Open AIs Agent SDK.</p><p>Some upfront notes about this series:</p><ul><li><strong>I am going to be using </strong><a href="https://openrouter.ai/"><strong>OpenRouter</strong></a><strong> for this project.</strong> If you want to slightly modify the code, you can use <a href="https://platform.openai.com/docs/overview">OpenAI</a> directly. There may be some features later not included in OpenRouter’s OpenAI interface and if there is, then we’ll discuss options.</li><li><strong>We’re going to use TypeScript </strong><em>and</em><strong> Python.</strong></li><li><strong>Why both?</strong> For one, in fullstack application coding, I am mainly a <strong>TypeScript</strong> developer with some Python when necessary. Our focus though is pragmatic, so we will use each when it makes sense!</li><li>A lot of code both on the <strong>frontend and backend</strong> for a variety of types of apps especially <strong>UIs, MVPs and prototypes</strong> is often in Javascript/<strong>TypeScript</strong> with React, Node.js and related tooling. For example, the <strong>Next.js</strong> and <strong>Vercel</strong> deploy is a popular option.</li><li><strong>However, no doubt, AI agents, and many AI tools </strong>in general, are often <strong>developed in Python — </strong>so we need to be familiar with Python too.</li></ul><p>We’ll be covering AI Chat coding basics for Typescript and Python:</p><ol><li><strong>Setup and Introduction</strong></li><li><strong>Message Roles and Structure + Example</strong></li><li><strong>Non-Streaming Responses</strong></li><li><strong>Streaming Responses</strong></li><li><strong>Temperature and Top_p Control</strong></li><li><strong>Other Important Parameters</strong></li><li><strong>Error Handling Best Practices</strong></li></ol><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><h3>1. Setup and Introduction</h3><h4>OpenRouter Setup</h4><p>OpenRouter provides a unified API that’s compatible with the OpenAI SDK, allowing you to access multiple LLM providers through a single interface. This makes it perfect for experimentation and comparing models.</p><ol><li>Sign up at <a href="https://openrouter.ai">openrouter.ai</a></li><li>Generate an API key from your dashboard</li><li>Create a .env file in your project root:</li></ol><pre>OPENROUTER_API_KEY=your-api-key-here</pre><blockquote><strong><em>Important</em></strong><em>: Add </em><em>.env to your </em><em>.gitignore to keep your API key secure and out of version control.</em></blockquote><h4>TypeScript Setup</h4><pre># Initialize a new project<br>npm init -y<br><br># Install dependencies<br>npm install openai dotenv<br>npm install -D typescript @types/node ts-node<br><br># Create tsconfig.json<br>npx tsc --init</pre><p>The last command creates tsconfig.json — we need to edit that to add the following to the config:</p><pre>{<br>  &quot;compilerOptions&quot;: {<br>    &quot;target&quot;: &quot;ES2020&quot;,<br>    &quot;module&quot;: &quot;commonjs&quot;,<br>    &quot;strict&quot;: true,<br>    &quot;esModuleInterop&quot;: true,<br>    &quot;skipLibCheck&quot;: true,<br>    &quot;outDir&quot;: &quot;./dist&quot;,<br>// ... other options pre-existing in here should be ok to leave there.<br>// &quot;verbatimModuleSyntax&quot;: true,<br>// Disable verbatimModuleSyntax enforcing strict module syntax <br>  }<br>}<br></pre><p><strong>note:</strong> Disable <strong>verbatimModuleSyntax</strong>. Also, I left in some of the existing options and removed some of the boilerplate that was originally created with init command above. Just make sure you have the above in there.</p><h4>Python Setup</h4><p>We’re not going to do everything in Python but I’ll create a separate file for streaming so you can use it if you want.</p><pre>python -m venv venv</pre><h4>Understanding the Chat Completions API</h4><ul><li>The <strong>Chat Completions API is the primary way to interact </strong>with GPT models.</li><li>At its core, <strong>you send a list of messages and receive a generated response.</strong></li><li><strong>The API is stateless — it doesn’t remember previous conversations,</strong> so you must send the full conversation history with each request.</li></ul><h4>Basic Flow</h4><p>Below is a basic conceptual flow at the highest level.</p><ol><li><strong>Your application </strong>makes an API request →</li><li><strong>API request</strong> (messages + parameters) →</li><li><strong>LLM provider </strong>(OpenAI, Anthropic, Grok etc.) →</li><li><strong>API response</strong> (completion + metadata) →</li><li><strong>Your application</strong> (process/output)</li></ol><h4>Key Concepts:</h4><ul><li><strong>Messages</strong>: An array of message objects representing the conversation</li><li><strong>Model</strong>: Which LLM to use (gpt-4, gpt-3.5-turbo, claude-3-opus)</li><li><strong>Parameters</strong>: Control behavior like creativity, length, and format</li><li><strong>Tokens</strong>: The basic unit of text processing (~4 characters in English)</li></ul><h4>More Complex Flow</h4><p>The basic flow only shows highest-level aspects, this shows other possible practical steps in usage that I often use for integrations.</p><ol><li><strong>Your application </strong>makes an API request →</li><li><strong>Pre-process and authenticate</strong> →</li><li><strong>API request</strong> (messages + parameters) →</li><li><strong>Queue, orchestrate </strong>if multiple calls →</li><li><strong>LLM provider </strong>(OpenAI, Anthropic, Grok etc.) →</li><li><strong>Validate, moderation</strong> →</li><li><strong>Cache and logging</strong> →</li><li><strong>API response</strong> (completion + metadata) →</li><li><strong>Post-Process</strong> (format output, RAG injection, vector database) →</li><li><strong>Your application</strong> (process/output)</li></ol><h3>2. Message Roles and Structure + Example</h3><p>Every message in the Chat Completions API has a role and content. Understanding roles is fundamental to building effective AI applications.</p><ul><li><strong>system:</strong> Sets behavior, personality, and instructions. Example: &quot;You are a helpful coding assistant&quot;.</li><li><strong>user:</strong> Represents human input &quot;How do I sort an array in Python?&quot;.</li><li><strong>assistant:</strong> Represents AI responses. Previous AI responses in conversation.</li></ul><pre>import OpenAI from &#39;openai&#39;;<br><br>// Define message types for clarity<br>type ChatMessage = {<br>  role: &#39;system&#39; | &#39;user&#39; | &#39;assistant&#39;;<br>  content: string;<br>};<br><br>const messages: ChatMessage[] = [<br>  {<br>    role: &#39;system&#39;,<br>    content: &#39;You are a senior software engineer. Provide concise, practical answers with code examples when appropriate.&#39;<br>  },<br>  {<br>    role: &#39;user&#39;,<br>    content: &#39;What is the difference between let and const in JavaScript?&#39;<br>  },<br>  {<br>    role: &#39;assistant&#39;,<br>    content: &#39;let allows reassignment, const does not. Both are block-scoped.&#39;<br>  },<br>  {<br>    role: &#39;user&#39;,<br>    content: &#39;Can you show me an example?&#39;<br>  }<br>];</pre><p>In Python the syntax is slightly different because it’s without the explicit type:</p><pre>from openai import OpenAI<br><br>messages = [<br>    {<br>        &quot;role&quot;: &quot;system&quot;,<br>        &quot;content&quot;: &quot;You are a senior software engineer. Provide concise, practical answers with code examples when appropriate.&quot;<br>    },<br>    {<br>        &quot;role&quot;: &quot;user&quot;,<br>        &quot;content&quot;: &quot;What is the difference between let and const in JavaScript?&quot;<br>    },<br>    {<br>        &quot;role&quot;: &quot;assistant&quot;,<br>        &quot;content&quot;: &quot;let allows reassignment, const does not. Both are block-scoped.&quot;<br>    },<br>    {<br>        &quot;role&quot;: &quot;user&quot;,<br>        &quot;content&quot;: &quot;Can you show me an example?&quot;<br>    }<br>]</pre><h3>Best Practices for Messages</h3><ol><li><strong>System messages first</strong>: Always place your system message at the beginning</li><li><strong>Be specific in system prompts</strong>: Vague instructions lead to inconsistent results</li><li><strong>Include context</strong>: For multi-turn conversations, include relevant history</li><li><strong>Trim when needed</strong>: Long conversations can be truncated (oldest first) to stay within token limits</li><li><strong>Include context:</strong> For multi-turn conversations, include relevant history (previous user + assistant messages) to maintain coherence.</li><li><strong>Leverage few-shot examples effectively:</strong> Include 2–5 high-quality input/output examples inside the system/developer message (or as alternating user/assistant pairs). Use clear delimiters (XML tags, Markdown sections, or numbered lists) to separate examples.</li></ol><h3>3. Non-Streaming Responses</h3><p>Non-streaming (also called “blocking” or “synchronous”) responses wait until the entire response is generated before returning.</p><p>This is simpler to implement but provides a <strong>worse user experience </strong>for longer responses.</p><h4>When to Use Non-Streaming</h4><ul><li>Short, quick responses</li><li>Background processing where latency doesn’t matter</li><li>When you need the complete response before processing.</li></ul><pre>import OpenAI from &#39;openai&#39;;<br>import dotenv from &#39;dotenv&#39;;<br><br>dotenv.config({ path: &#39;../../.env&#39; });<br><br>const client = new OpenAI({<br>  baseURL: &#39;https://openrouter.ai/api/v1&#39;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>interface ChatCompletionResult {<br>  content: string;<br>  response: OpenAI.Chat.Completions.ChatCompletion;<br>}<br><br>async function getChatCompletion(userMessage: string): Promise&lt;ChatCompletionResult&gt; {<br>  const response = await client.chat.completions.create({<br>    model: &#39;openai/gpt-4-turbo&#39;,<br>    messages: [<br>      {<br>        role: &#39;system&#39;,<br>        content: &#39;You are a helpful senior developer.&#39;<br>      },<br>      {<br>        role: &#39;user&#39;,<br>        content: userMessage<br>      }<br>    ],<br>    // Non-streaming is the default (stream: false)<br>  });<br><br>  // Extract the response content<br>  const content = response.choices[0]?.message?.content;<br><br>  if (!content) {<br>    throw new Error(&#39;No content in response&#39;);<br>  }<br><br>  return { content, response };<br>}<br><br>// Usage<br>async function main() {<br>  try {<br>    const { content, response } = await getChatCompletion(&#39;Explain async/await in one paragraph&#39;);<br>    console.log(&#39;Content:&#39;, content);<br>    console.log(&#39;Model used:&#39;, response.model);<br>    console.log(&#39;Usage:&#39;, response.usage);<br>    console.log(&#39;Message:&#39;, JSON.stringify(response?.choices?.[0]?.message ?? null, null, 2));<br>    console.log(&#39;Full response:&#39;, response);<br>  } catch (error) {<br>    console.error(&#39;Error:&#39;, error);<br>  }<br>}<br><br>main();</pre><p>Run this command from the directory you are in (you may need to adjust the path for dotenv.config({ path: ‘../../.env’ });):</p><p><strong>Run:</strong></p><pre>npx ts-node non-streaming-ts</pre><p><strong>Response (without the message object expanded):</strong></p><pre><br>// Message: <br><br>{<br>  &quot;role&quot;: &quot;assistant&quot;,<br>  &quot;content&quot;: &quot;Async/await is a syntax in modern programming languages such as JavaScript designed to make asynchronous programming simpler and more readable. The `async` keyword is used when declaring a function, indicating that the function will handle asynchronous operations and will return a promise. Within an `async` function, the `await` keyword is used before a function call to pause the execution of the async function until the promise is resolved or rejected. This allows the code to be written in a more synchronous manner, which helps in reducing the complexity of code managing multiple concurrent operations and error handling, without blocking the main thread. Essentially, async/await helps manage asynchronous code, making it easier to follow and maintain by writing code that avoids deep nesting of callbacks and complex chains of promise handlers.&quot;,<br>  &quot;refusal&quot;: null,<br>  &quot;reasoning&quot;: null<br>}<br><br>// Full response: <br><br>{<br> {<br>  id: &#39;gen-1766607459-DshFs0qxxxxxxxxxxxx&#39;,<br>  provider: &#39;OpenAI&#39;,<br>  model: &#39;openai/gpt-4-turbo&#39;,<br>  object: &#39;chat.completion&#39;,<br>  created: 1766607459,<br>  choices: [<br>    {<br>      logprobs: null,<br>      finish_reason: &#39;stop&#39;,<br>      native_finish_reason: &#39;stop&#39;,<br>      index: 0,<br>      message: [Object]<br>    }<br>  ],<br>  system_fingerprint: &#39;fp_de235xxxxx&#39;,<br>  usage: {<br>    prompt_tokens: 26,<br>    completion_tokens: 155,<br>    total_tokens: 181,<br>    cost: 0.00491,<br>    is_byok: false,<br>    prompt_tokens_details: { cached_tokens: 0, audio_tokens: 0, video_tokens: 0 },<br>    cost_details: {<br>      upstream_inference_cost: null,<br>      upstream_inference_prompt_cost: 0.00026,<br>      upstream_inference_completions_cost: 0.00465<br>    },<br>    completion_tokens_details: { reasoning_tokens: 0, image_tokens: 0 }<br>  }<br>}</pre><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>4. Streaming Responses</h3><p>Streaming returns the response incrementally as it’s generated. This provides a much better user experience for longer responses — users see content appearing in real-time rather than waiting for the complete response.</p><h4>When to Use Streaming</h4><ul><li>Chat interfaces where users expect real-time feedback</li><li>Long-form content generation</li><li>When perceived latency matters more than simplicity</li><li>Interactive applications</li></ul><pre>import OpenAI from &#39;openai&#39;;<br>import dotenv from &#39;dotenv&#39;;<br><br>dotenv.config({ path: &#39;../../.env&#39; });<br><br>const client = new OpenAI({<br>  baseURL: &#39;https://openrouter.ai/api/v1&#39;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>interface StreamingResult {<br>  content: string;<br>  chunks: string[];<br>}<br><br>async function streamChatCompletion(userMessage: string): Promise&lt;StreamingResult&gt; {<br>  const stream = await client.chat.completions.create({<br>    model: &#39;openai/gpt-4-turbo&#39;,<br>    messages: [<br>      {<br>        role: &#39;system&#39;,<br>        content: &#39;You are a helpful senior developer.&#39;<br>      },<br>      {<br>        role: &#39;user&#39;,<br>        content: userMessage<br>      }<br>    ],<br>    stream: true,  // Enable streaming<br>    //  temperature: 0.2,  // Low temperature for consistent code<br>  });<br><br>  let fullResponse = &#39;&#39;;<br>  const chunks: string[] = [];<br><br>  // Process chunks as they arrive<br>  for await (const chunk of stream) {<br>    const content = chunk.choices[0]?.delta?.content;<br><br>    if (content) {<br>      process.stdout.write(content);  // Print without newline<br>      fullResponse += content;<br>      chunks.push(content);<br>    }<br>  }<br><br>  console.log();  // Final newline<br>  return { content: fullResponse, chunks };<br>}<br><br>// Usage<br>async function main() {<br>  try {<br>    console.log(&#39;Streaming response:\n&#39;);<br>    const { content, chunks } = await streamChatCompletion(<br>      &#39;Write a short poem about coding&#39;<br>    );<br>    console.log(&#39;\n--- Complete response captured ---&#39;);<br>    console.log(&#39;Length:&#39;, content.length, &#39;characters&#39;);<br>    console.log(&#39;Chunks received:&#39;, chunks.length);<br>  } catch (error) {<br>    console.error(&#39;Error:&#39;, error);<br>  }<br>}<br><br>main();</pre><p><strong>Run the script</strong></p><pre>npx ts-node streaming.ts</pre><p>You should see the console.log results appear as streaming.</p><p><strong>The image below is the Python version</strong> but functionality is the same.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/1*iSFKw034RsA9eNiee78PAQ.gif" /></figure><p><strong>Python version:</strong></p><pre>import os<br>from openai import OpenAI<br>from dotenv import load_dotenv<br><br>load_dotenv()<br><br>client = OpenAI(<br>    base_url=&quot;https://openrouter.ai/api/v1&quot;,<br>    api_key=os.getenv(&quot;OPENROUTER_API_KEY&quot;),<br>)<br><br>def stream_chat_completion(user_message: str) -&gt; str:<br>    &quot;&quot;&quot;Get a streaming chat completion.&quot;&quot;&quot;<br>    stream = client.chat.completions.create(<br>        model=&quot;openai/gpt-4-turbo&quot;,<br>        messages=[<br>            {<br>                &quot;role&quot;: &quot;system&quot;,<br>                &quot;content&quot;: &quot;You are a helpful senior developer.&quot;<br>            },<br>            {<br>                &quot;role&quot;: &quot;user&quot;,<br>                &quot;content&quot;: user_message<br>            }<br>        ],<br>        stream=True,  # Enable streaming<br>    )<br><br>    full_response = &quot;&quot;<br><br>    for chunk in stream:<br>        content = chunk.choices[0].delta.content<br><br>        if content:<br>            print(content, end=&quot;&quot;, flush=True)<br>            full_response += content<br><br>    print()  # Final newline<br>    return full_response<br><br># Usage<br>if __name__ == &quot;__main__&quot;:<br>    try:<br>        print(&quot;Streaming response:\n&quot;)<br>        full_text = stream_chat_completion(&quot;Write a short poem about coding&quot;)<br>        print(&quot;\n--- Complete response captured ---&quot;)<br>        print(f&quot;Length: {len(full_text)} characters&quot;)<br>    except Exception as e:<br>        print(f&quot;Error: {e}&quot;)</pre><p>Make sure to install the libraries before running it.</p><pre># install libraries<br>pip install openai python-dotenv<br><br># run it<br>python ./streaming.py</pre><p>This should stream the results.</p><p><strong>In the next few sections</strong> we talk about <strong>parameters</strong> that you can add to the request that may affect the response.</p><h3>5. Temperature and Top_p Control</h3><p>These <strong>two parameters control the “creativity” or randomness</strong> of the model’s output. Understanding them is crucial for getting consistent, appropriate responses for your use case.</p><p>⚠️ <strong>OpenAI recommends </strong>altering <strong>either</strong> temperature <strong>or</strong> top_p, <strong>but not both.</strong> They serve similar purposes, and adjusting both can lead to unpredictable results.</p><h4>temperature</h4><p>Temperature controls randomness. Lower values make output more focused and deterministic; higher values make it more creative and varied.</p><p><strong>Range</strong>: 0.0 to 2.0 (default: 1.0)</p><p><strong>0.0–0.3 … Very focused, deterministic</strong><br>Code generation, factual Q&amp;A, data extraction.</p><p><strong>0.4–0.7 … Balanced</strong><br>General conversation, explanations.</p><p><strong>0.8–1.2 … Creative, varied</strong><br>Creative writing, brainstorming.</p><p>You add temperature in this section of the above code we did already:</p><pre>const stream = await client.chat.completions.create({<br>    model: &#39;openai/gpt-4-turbo&#39;,<br>    messages: [<br>      {<br>        role: &#39;system&#39;,<br>        content: &#39;You are a helpful senior developer.&#39;<br>      },<br>      {<br>        role: &#39;user&#39;,<br>        content: userMessage<br>      }<br>    ],<br>    stream: true,  // Enable streaming<br>    temperature: 0.5,  // Balance results<br>  });</pre><h4>top_p (Nucleus Sampling)</h4><p>Top_p considers only the tokens comprising the top p probability mass. A value of 0.1 means only tokens in the top 10% probability are considered.</p><p><strong>Range</strong>: 0.0 to 1.0 (default: 1.0)</p><p><strong>0.1 … Very focused,</strong> limited vocabulary<br><strong>0.9 … High diversity,</strong> most tokens considered</p><h3>6. Other Important Parameters</h3><p>Beyond temperature and top_p, several other parameters help you control the API behavior.</p><h4>max_tokens (<strong>or</strong> max_completion_tokens)</h4><p>Maximum length of the generated response. Important for cost control and preventing runaway responses.</p><ul><li><strong>⚠️ Very important for cost control.</strong></li><li>max_completion_tokens is newer syntax</li></ul><pre>const response = await client.chat.completions.create({<br>  model: &#39;openai/gpt-4-turbo&#39;,<br>  messages: [...],<br>  max_tokens: 500,  // Limit response length<br>});</pre><h4>n</h4><p>This will give you multiple completions to choose from if you set n to &gt;0</p><h4>reasoning (newer reasoning models)</h4><p>This is on some newer reasoning models, to allow you to adjust how much reasoning is applied.</p><h4>stop</h4><p>Specify sequences where the model should stop generating. Useful for structured output. Lets say I am getting some content from somewhere else and I add a keyword ENDHERE — it will stop at my keyword if I add it ot the stop array.</p><pre>const response = await client.chat.completions.create({<br>  model: &#39;openai/gpt-4-turbo&#39;,<br>  messages: [...],<br>  stop: [&#39;\n\n&#39;, &#39;END&#39;, &#39;---&#39;],  // Stop at any of these<br>});</pre><h4>user</h4><p>This is to track which user is placing requests, commonly used for security purposes, such as if somebody is abusing the system.</p><h3>More Complete List of Parameters</h3><p>These were just a few to get you started.</p><p>Our next article is going to cover one of the most important, which is tools and tool_choice and response_format — these are important for AI agents as it allows you a lot of extra customization.</p><p>The less common parameters and advanced sampling, honestly, I have not really used too much yet, but we may delve into these more later. For now do not worry about them too much as they are for edge case scenarios.</p><pre>interface ChatCompletionParams {<br>  // ── Core (always required) ───────────────────────────────────────────────<br>  model: string;                    // Required: e.g. &quot;gpt-4o&quot;, &quot;o1&quot;, &quot;o3-mini&quot;, &quot;gpt-5o-preview&quot;<br>  messages: ChatCompletionMessage[]; // Required: array of message objects<br><br>  // ── Generation control ───────────────────────────────────────────────────<br>  max_tokens?: number;              // Classic name (still widely supported)<br>  max_completion_tokens?: number;   // Preferred on newer 2025+ models (especially reasoning ones)<br>  temperature?: number;             // 0–2 (often ignored/fixed on o1/o3/o4 reasoning models)<br>  top_p?: number;                   // 0–1 (nucleus sampling)<br>  n?: number;                       // Number of completions to generate (usually 1)<br>  stream?: boolean;                 // true → returns a stream of chunks<br>  stop?: string | string[];         // Stop generation on these sequences<br><br>  // ── Repetition &amp; diversity penalties ─────────────────────────────────────<br>  presence_penalty?: number;        // -2.0 to 2.0<br>  frequency_penalty?: number;       // -2.0 to 2.0<br><br>  // ── Reasoning / o-series specific (recent additions) ─────────────<br>  reasoning?: {<br>    effort?: &quot;low&quot; | &quot;medium&quot; | &quot;high&quot;;  // Controls thinking budget (o1/o3/o4 family)<br>    summary?: &quot;auto&quot; | &quot;concise&quot; | &quot;detailed&quot;; // How much of reasoning to show<br>  };<br><br>  // ── Advanced sampling &amp; reproducibility ──────────────────────────────────<br>  seed?: number;                    // For reproducible outputs (when possible)<br>  top_logprobs?: number;            // 0–20 (return top log probabilities)<br>  logprobs?: boolean;               // Return logprobs (many new models ignore this)<br><br>  // ── Structured output &amp; tool calling ─────────────────────────────────────<br>  tools?: ChatCompletionTool[];     // Function calling / tool definitions<br>  tool_choice?: &quot;auto&quot; | &quot;none&quot; | &quot;required&quot; | { type: &quot;function&quot;; function: { name: string } };<br>  response_format?: <br>    | { type: &quot;text&quot; }<br>    | { type: &quot;json_object&quot; }<br>    | { type: &quot;json_schema&quot;; json_schema: { name: string; strict: boolean; schema: object } };<br><br>  // ── Metadata &amp; tracking ──────────────────────────────────────────────────<br>  user?: string;                    // End-user identifier (for abuse monitoring)<br>  store?: boolean;                  // Whether to store this request for later distillation/evals (new 2025)<br><br>  // ── Less common / enterprise / preview features (2025) ───────────────────<br>  service_tier?: &quot;default&quot; | &quot;low_latency&quot; | &quot;high_throughput&quot;; // Latency/scale tier<br>  include_usage?: boolean;          // Include token usage in final chunk (streaming)<br>  parallel_tool_calls?: boolean;    // Allow model to call multiple tools at once<br>}</pre><h3>7. Error Handling Best Practices</h3><p>Before ending this I did want to mention error handling and validation. This is important for user experience.</p><p>Robust error handling is essential for production applications.</p><p>⚠️ The OpenAI SDK throws specific error types that you can catch and handle appropriately.</p><p>Below is an example of how you can <strong>handle errors and retries </strong>in the catch block of the above functional streaming script:</p><pre>import OpenAI from &#39;openai&#39;;<br><br>const client = new OpenAI({<br>  baseURL: &#39;https://openrouter.ai/api/v1&#39;,<br>  apiKey: process.env.OPENROUTER_API_KEY,<br>});<br><br>async function safeChatCompletion(<br>  messages: OpenAI.Chat.ChatCompletionMessageParam[],<br>  retries: number = 3<br>): Promise&lt;string&gt; {<br>  for (let attempt = 1; attempt &lt;= retries; attempt++) {<br>    try {<br>      const response = await client.chat.completions.create({<br>        model: &#39;openai/gpt-4-turbo&#39;,<br>        messages,<br>      });<br><br>      const content = response.choices[0]?.message?.content;<br><br>      if (!content) {<br>        throw new Error(&#39;Empty response from API&#39;);<br>      }<br><br>      return content;<br><br>    } catch (error) {<br>      if (error instanceof OpenAI.APIError) {<br>        const { status, message } = error;<br><br>        if (status === 401) {<br>          throw new Error(&#39;Invalid API key. Check your configuration.&#39;);<br>        }<br><br>        if (status === 429) {<br>          // Rate limited - wait and retry<br>          const waitTime = Math.pow(2, attempt) * 1000;<br>          console.log(`Rate limited. Waiting ${waitTime}ms before retry...`);<br>          await new Promise(resolve =&gt; setTimeout(resolve, waitTime));<br>          continue;<br>        }<br><br>        if (status === 500 || status === 503) {<br>          if (attempt &lt; retries) {<br>            const waitTime = Math.pow(2, attempt) * 1000;<br>            console.log(`Server error. Retrying in ${waitTime}ms...`);<br>            await new Promise(resolve =&gt; setTimeout(resolve, waitTime));<br>            continue;<br>          }<br>        }<br><br>        throw new Error(`API Error (${status}): ${message}`);<br>      }<br><br>      throw error;<br>    }<br>  }<br><br>  throw new Error(&#39;Max retries exceeded&#39;);<br>}</pre><p>This gives the ability to catch errors and do retries.</p><p>In the next article of the series we’ll look at a more advanced class-based approach for this and explain it.</p><p><strong>This is a good way to advance your coding skills.</strong></p><p>Look for it soon!</p><p>Thanks</p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow new articles in the series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of <strong>years of research and writing I have done</strong> on cloud best practices and then further integrates that with my prior cloud books and also <strong>code solutions and tutorials integrated using multiple AIs</strong> and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p><strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Savings:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6ac06b8080b4" width="1" height="1" alt=""><hr><p><a href="https://medium.com/ai-dev-tips/ai-chat-coding-essentials-with-openai-ai-agent-coding-series-1-6ac06b8080b4">AI Chat Coding Essentials with OpenAI (AI Agent Coding Series #1)</a> was originally published in <a href="https://medium.com/ai-dev-tips">AI Dev Tips</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[✅ Cloud Storage I/O Performance Checklist]]></title>
            <link>https://cloudchecklists.com/cloud-storage-i-o-performance-checklist-f05fea941780?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/f05fea941780</guid>
            <category><![CDATA[cloud]]></category>
            <category><![CDATA[scalability]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[cloud-storage]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Tue, 23 Dec 2025 16:04:33 GMT</pubDate>
            <atom:updated>2025-12-23T16:06:06.616Z</atom:updated>
            <content:encoded><![CDATA[<h4><strong>Cloud storage I/O performance strategies, tips and pitfalls for professional cloud engineers and startups. </strong>(CC #3)</h4><p>This latest article in our <strong>Cloud Checklist series</strong> continues our dedicated focus on Performance, which we continue from the last 2 articles.</p><p>We break down <strong>the most common cloud storage I/O performance strategies, tips and pitfalls</strong> and how to avoid storage I/O problems before they impact production systems.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*w8fhLOaMRRYfdkfpvgcxLA.jpeg" /></figure><p>Rather than focusing on theory, <strong>this is a practical checklist that helps identify where storage latency, throughput limits, and IOPS constraints typically appear </strong>in real workloads.</p><h4>✅ Cloud Storage I/O Performance Checklist</h4><ol><li>Choose<strong> local NVMe SSD</strong></li><li>Provision required <strong>IOPs</strong></li><li>Distribute <strong>load across volumes</strong></li><li><strong>Separate</strong> data and logs</li><li>Monitor <strong>queue depth</strong></li><li><strong>Pre-warm</strong> new volumes</li><li>Use <strong>high-performance tiers</strong></li><li><strong>Avoid IOPS/network throttling</strong></li><li><strong>Enable multi-attach </strong>when supported</li><li><strong>Stripe volumes for bandwidth</strong></li></ol><p>As always in our cloud checklists, <strong>I have intentionally limited this to what I consider the 10 most important points for the topic.</strong></p><p>There are many other I/O related tips if you dig deeper and depending on your setup, but<strong> these are the “headline” Cloud Storage I/O Performance factors that you need to consider as best practice.</strong></p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow the Cloud Checklists series <a href="https://cloudchecklists.com/">here</a>. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next checklist that I put out!</p><h3>1. Choose<strong> local NVMe SSD</strong></h3><p><strong>Local NVMe SSDs deliver the lowest latency</strong> and <strong>highest raw IOPS/throughput</strong> available on cloud instances, typically greatly outperforming network-attached storage.</p><p><strong>NVMe</strong> stands for <strong>Non-Volatile Memory Express</strong>, a communication protocol and interface spec designed for solid-state storage (not for spinning hard drives, which are slower).</p><p><strong>⚡️IOPS</strong> = <strong>Input/Output operations per second</strong>. It’s a measure of how many <strong>read or write requests </strong>your storage can handle <strong>each second.</strong></p><p>⚡️<strong>Throughput: </strong>Amount of data per second. (not the same as IOPS)</p><p>Use them for<strong> latency-sensitive workloads</strong> like <strong>databases</strong>, ca<strong>c</strong>hes, or real-time analytics. Here are some factors I like to look at first:</p><h4><strong>⚡️ For performance:</strong></h4><ul><li><strong>IOPS: </strong>Determine whether <strong>your workload is read-heavy, write-heavy, or mixed. Random read/write IOPS matter most for databases and virtualization,</strong> while <strong>sequential</strong> throughput matters for <strong>analytics</strong> and <strong>media</strong> streaming.</li><li><strong>Latency: </strong>Look for low and <strong>predictable latency</strong> (often measured at 99th or 99.9th percentile), especially for latency-sensitive applications like real-time analytics or transaction processing.</li><li><strong>Throughput</strong> (MB/s or MiB/s): Sequential data transfer rate</li></ul><h4>⚡️ For durability:</h4><ul><li><strong>DWPD (Drive Writes Per Day): </strong>Higher write-intensive workloads (logging, caching, databases) need drives rated for 3+ DWPD, while read-heavy workloads can use lower-endurance drives (0.3–1 DWPD).</li><li><strong>TBW (Terabytes Written): </strong>Ensure the drive’s total write endurance matches your expected lifespan and write volume.</li></ul><p><strong>Not prioritizing local SSDs?</strong> You risk <strong>unnecessary network overhead</strong> and <strong>performance bottlenecks.</strong></p><ul><li><strong>On AWS EC2,</strong> select instance types with instance store NVMe SSDs (i4i, m5d, r5d series) for the highest ephemeral performance, ideal for caches or temporary data.</li></ul><p><strong>AWS docs: </strong>“The data on NVMe instance storage is encrypted using an XTS-AES-256 block cipher implemented in a hardware module on the instance. The encryption keys are generated using the hardware module and are unique to each NVMe instance storage device.”</p><ul><li><strong>Linux: </strong>Monitor with iostat -x or iotop to confirm low latency; use fiofor benchmarking NVMe performance.</li><li><strong>To benchmark Persistent Disk performance on Linux</strong>, use <a href="https://github.com/axboe/fio"><strong>Flexible I/O tester (FIO)</strong></a> instead of other disk benchmarking tools such as dd. By default, dd uses a very low I/O queue depth, and might not accurately test disk performance.</li></ul><p><strong>⚠️ Gotcha: </strong>Local/instance store volumes are <strong>ephemeral</strong> which means data is lost on instance stop, termination, or hardware failure, so never use them for persistent data without replication</p><p><strong>Related Tools </strong>(useful throughout this article)<strong>:</strong></p><p><strong>sysstat:</strong> Open source performance monitoring suite including iostat and iotop for monitoring NVMe latency and IOPS on Linux. <a href="https://github.com/sysstat/sysstat">https://github.com/sysstat/sysstat</a></p><p><strong>systat main features:</strong> <a href="https://sysstat.github.io/features.html">https://sysstat.github.io/features.html</a></p><p><strong>fio: </strong>Open source tool for benchmarking NVMe SSD performance, measuring IOPS, latency, and throughput in cloud environments. <a href="https://github.com/axboe/fio">https://github.com/axboe/fio</a></p><p><strong>ezFIO: </strong>Open source NVMe-specific benchmarking tool for testing SSD performance parameters like queue depth and bandwidth. <a href="https://github.com/earlephilhower/ezfio">https://github.com/earlephilhower/ezfio</a></p><p><strong>Iometer: </strong>Open source tool for SSD benchmarking, including NVMe, to simulate workloads and measure IOPS/throughput. <a href="https://sourceforge.net/projects/iometer/">https://sourceforge.net/projects/iometer/</a></p><h3><strong>2. Provision required IOPS</strong></h3><p>Provisioning the <strong>correct IOPS upfront</strong> allows you to sustain the peak demands of your workload without throttling or queue buildup.</p><p><strong>Under-provisioning</strong> leads to <strong>unpredictable performance degradation</strong> during spikes, while over-provisioning wastes money.</p><p>Modern cloud volumes allow independent IOPS scaling, making it essential to right-size based on benchmarks rather than defaults.</p><ul><li><strong>Burst vs. baseline: </strong>Many cloud disks offer burst IOPS that deplete over time. <strong>Provision for sustained needs</strong>, not burst.</li><li><strong>Disk size often determines IOPS: </strong>On AWS EBS, GCP PD, and Azure, larger disks get more baseline IOPS.</li><li><strong>VM limits: </strong>The VM type caps total IOPS regardless of disk capability.</li></ul><p><strong>Optimize block/I/O size: </strong>Aligning your I/O request size to your workload pattern (e.g., 256KB+ for sequential, smaller for random) has a major impact on throughput and IOPS efficiency.</p><p>Many performance issues stem from mismatched I/O sizes.</p><p><em>(note IOPS estimates below may change, they do update frequently)</em></p><ul><li><strong>AWS: </strong>Use io2 or gp3 volumes and provision IOPS independently (up to 256,000 IOPS per volume with io2 Block Express).</li><li><strong>Azure:</strong> Choose Premium SSD v2 or Ultra Disks to provision IOPS directly. Premium SSD v2: Maximum of 80,000 IOPS, Ultra Disk: Maximum of 400,000 IOPS</li><li><strong>Google Cloud: </strong>Select Extreme Persistent Disk and provision IOPS separately (up to 120,000).</li><li><strong>Linux tip:</strong> Use iostat -x 1 to monitor delivered vs. provisioned IOPS</li></ul><p>Example:</p><pre># iostat - shows IOPS per device (r/s = reads, w/s = writes)<br>iostat -x 1 10<br><br># Key columns to watch:<br># r/s     - read IOPS<br># w/s     - write IOPS<br># await   - average latency (ms)<br># %util   - device saturation<br><br># iotop - shows IOPS by process<br>sudo iotop -aoP</pre><p>Simulate with fio:</p><pre>fio --name=mixed_workload \<br>    --ioengine=libaio \<br>    --rw=randrw \<br>    --rwmixread=70 \<br>    --bs=8k \<br>    --iodepth=64 \<br>    --numjobs=4 \<br>    --size=10G \<br>    --runtime=60 \<br>    --time_based \<br>    --group_reporting \<br>    --direct=1</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hjuk4grbhRiZVJRfAXKEqg.png" /><figcaption><a href="https://github.com/axboe/fio">https://github.com/axboe/fio</a></figcaption></figure><h3><strong>3. Distribute load across volumes</strong></h3><p><strong>Distribute I/O load</strong> across <strong>multiple volumes </strong>to <strong>parallelize requests</strong> and avoid hitting per-volume IOPS/throughput ceilings.</p><p>A single volume, no matter how highly provisioned, has <strong>hard limits</strong> that can become a bottleneck under heavy load.</p><p><strong>Proper distribution maximizes overall instance performance</strong> and provides better scalability.</p><p><strong>⚡️ Striping:</strong> Splits it into chunks and writes them across multiple disks <em>simultaneously</em>. <strong>⚠️ </strong>If any disk in a striped set fails, you typically lose ALL your data (because pieces are scattered across all disks).</p><p><strong>⚡️ LVM (Logical Volume Manager): </strong>a “virtual disk manager” that lets you combine, resize, and organize storage without being locked to physical disk boundaries. Also, create striped volumes.</p><p><strong>Physical Disks</strong> (/dev/sdb, /dev/sdc, etc.)<br><strong> ↓</strong><br><strong>Physical Volumes (PVs): </strong>LVM’s view of the disks<br><strong> ↓</strong><br><strong>Volume Group (VG) : </strong>Pool of storage from multiple PVs<br><strong> ↓</strong><br><strong>Logical Volumes (LVs): </strong>Your “virtual disks” that you format and mount</p><p><strong>mdadm (Multiple Device Administrator:</strong> Linux’s software RAID tool. It combines multiple physical disks into RAID arrays directly at the kernel level, creating a single device like /dev/md0.</p><ul><li><strong>Spread I/O across multiple volumes </strong>to avoid single-volume limits and improve parallelism.</li><li><strong>AWS/Azure/GCP: </strong>Attach multiple block volumes and balance application load (e.g., via database sharding).</li><li><strong>Linux tip: </strong>Use LVM or mdadm to create striped logical volumes if needed, but prefer application-level distribution.</li><li>Monitor per-device stats with iostat -dx 1 to identify hot volumes.</li></ul><p><em>Example to create striped logical volume:</em></p><pre># Create a striped logical volume across 3 disks<br>lvcreate --type raid0 -L 500G --stripes 3 --stripesize 64k -n mydata_lv my_vg</pre><p><strong>⚠️ Gotcha: </strong>Adding <strong>more volumes increases management complexity </strong>and may require application-level or filesystem changes to balance load.</p><p><strong>Tools:</strong></p><p><strong>mdadm: </strong>Open source Linux tool for managing RAID arrays to stripe and distribute load across volumes for better parallelism. <a href="https://github.com/neilbrown/mdadm">https://github.com/neilbrown/mdadm</a></p><h3><strong>4. Separate data and logs</strong></h3><p><strong>Separating data and transaction logs onto different volumes </strong>allows us to optimize each for its <strong>access pattern, random for data, sequential for logs</strong>, while <strong>isolating the intense write activity</strong> of logs from data reads.</p><p>This separation can highly improve overall database performance and recovery characteristics.</p><p>Mixing them on one volume often causes <strong>contention</strong> and reduces effective throughput.</p><p><strong>⚠️ Gotcha: </strong>Log volumes usually need <strong>higher write endurance and lower latenc</strong>y; skimping here can become the hidden bottleneck.</p><p><strong>Transaction logs typically have 2–10x write amplification (</strong>each logical write becomes multiple physical writes)</p><ul><li>Place database transaction logs on dedicated high-IOPS volumes for sequential write optimization.</li><li><strong>AWS: </strong>Use separate io2 volumes for logs; enable EBS-optimized instances.</li><li><strong>Azure: </strong>Separate logs to Ultra Disks for sub-millisecond latency.</li><li><strong>GCP:</strong> Use Hyperdisk or Extreme PD for logs.</li><li><strong>Linux tip:</strong> Mount logs with noatime and use XFS/ext4 with appropriate stride for RAID/LVM; monitor with pidstat -d to see per-process I/O.</li></ul><p><strong>noatime (No Access Time): </strong>A Linux mount option that stops the filesystem from writing the “last accessed time” every time a file is read. Every file read normally triggers a write to update metadata which reduces performance. (note: Some backup tools and mail programs check access times, but most modern systems use relatime by default)</p><p>Example usage:</p><pre># In /etc/fstab, add noatime to mount options<br>/dev/sdb1  /var/log/database  ext4  defaults,noatime  0  2</pre><p><strong>pidstat -d and other troubleshooting examples:</strong></p><pre># Monitor all processes with I/O activity, update every 2 seconds<br>pidstat -d 2<br><br># Monitor specific process (e.g., PostgreSQL)<br>pidstat -d -p $(pgrep postgres) 2<br><br># See which device is busy<br>iostat -dx 1<br><br># See both together<br>watch -n 1 &#39;pidstat -d | head -20; iostat -dx | grep -v loop&#39;</pre><p>For a lot more info, see the <a href="https://man7.org/linux/man-pages/man1/iostat.1.html">iostat(1) — Linux manual page</a></p><h3><strong>5. Monitor queue depth</strong></h3><p>Queue depth reflects how <strong>many I/O requests are waiting</strong> to be processed; monitoring it <strong>detects saturation</strong> before latency spikes get bad.</p><p><strong>⚡️ Queue Depth (aqu-sz / avgqu-sz with </strong>iostat -x 1<strong>):</strong> The average queue length of requests that were issued to the device. note: it shows the number of operations that were either<strong> queued OR being serviced</strong>, not just waiting in the queue!</p><p><strong>NVMe SSDs: 32–256 optimal in </strong>iostat -x 1<strong>. </strong>NVMe handles deep queues efficiently due to parallel processing. High aqu-sz + high await = storage saturation. High %util but low aqu-sz = short bursts of activity.</p><p><strong>⚡️ How it works: </strong>Application → I/O Scheduler Queue → Device Driver Queue → Physical Disk</p><p><strong>High queue depth indicates your storage or instance is overwhelmed</strong>, while chronically low depth suggests over-provisioning.</p><p><strong>Proactive monitoring allows timely scaling</strong> or tuning.</p><p><strong>⚠️ Gotcha:</strong> Optimal queue depth varies by device. NVMe likes deeper queues (100+), while traditional spinning disks perform best with shallow queues.</p><p>Linux commands: iostat -x (look at aqu-sz or avgqu-sz); cat /proc/diskstatsfor raw stats; iotop for interactive view.</p><ul><li><strong>AWS: </strong>Use CloudWatch metrics like VolumeQueueLength.</li><li><strong>Azure/GCP: </strong>Monitor via their portals or export to Prometheus.</li><li>Tune with echo 128 &gt; /sys/block/sdX/queue/nr_requests or NVMe-specific settings.</li></ul><h3><strong>6. Pre-warm new volumes</strong></h3><p>Pre-warming ensures that restored volumes from snapshots achieve full performance immediately by populating the underlying storage infrastructure.</p><p>Without it, the first read of each block incurs significant latency penalties as data is fetched from archival locations.</p><p>This step gives us consistent performance after scaling or recovery events.</p><p><strong>⚠️ Gotcha: </strong>Do not pre-warm new empty volumes, it is only for snapshot restores.</p><p><strong>⚠️ Gotcha:</strong> Modern provisioned volumes (AWS gp3/io2) often no longer require manual pre-warming for new empty volumes, but snapshot restores still benefit greatly.</p><ul><li>Primarily needed for volumes restored from snapshots to avoid first-access latency penalty.</li><li><strong>AWS</strong>: New empty volumes no longer need pre-warming; for snapshot restores, read all blocks (esudo dd if=/dev/xvdf of=/dev/null bs=1M or fio).</li></ul><p>Example, there are other similar variations for Azure and GCP:</p><pre># 1. Create volume from snapshot<br>aws ec2 create-volume --snapshot-id snap-abc123 ...<br><br># 2. Attach to temporary instance<br>aws ec2 attach-volume --instance-id i-temp123 ...<br><br># 3. Pre-warm<br>sudo fio --filename=/dev/xvdf --name=init --rw=read --bs=128k \<br>  --iodepth=32 --ioengine=libaio --direct=1<br><br># 4. Detach and attach to production<br>aws ec2 detach-volume --volume-id vol-abc123<br>aws ec2 attach-volume --instance-id i-prod456 ...<br><br># Result: Production sees consistent performance from day 1</pre><p>and fast snapshot restore:</p><pre># Enable FSR for snapshot in specific AZs<br>aws ec2 enable-fast-snapshot-restores \<br>  --availability-zones us-east-1a us-east-1b \<br>  --source-snapshot-ids snap-abc123</pre><ul><li><strong>Azure/GCP:</strong> Similar initialization for restored disks.</li><li><strong>Linux tip </strong>for efficient read-based warming:</li></ul><pre>fio --name=prewarm --filename=/dev/sdX --rw=read --bs=128k --iodepth=32 --ioengine=libaio --direct=1</pre><p>Note: Modern volumes often perform well without manual warming due to lazy loading improvements. For example, GCP snapshots can be used immediately with no performance impact according to some users.</p><h3><strong>7. Use high-performance tiers</strong></h3><p>Select <strong>the right high-performance storage tier </strong>provides guaranteed low latency and high throughput tailored to demanding workloads.</p><p>Lower tiers may suffice for cold data but will throttle critical applications.</p><p>Matching tier to workload characteristics optimizes both performance and cost.</p><h4><strong>Use cases:</strong></h4><ul><li><strong>Latency-sensitive transactional databases: </strong>OLTP workloads (e-commerce order processing, financial trading systems, payment gateways) where sub-millisecond read/write latency is required.</li><li><strong>Real-time analytics and data warehousing:</strong> Platforms like Snowflake, BigQuery, Redshift, or ClickHouse running interactive queries on hot datasets.</li><li><strong>High-performance applications and caches: </strong>In-memory databases (Redis, Memcached), session stores, or application tiers needing ultra-<strong>low latency block storage</strong> for fast random reads/writes, such as gaming backends or ad-tech bidding systems.</li><li><strong>AI/ML training and inference serving: </strong>Model training with frequent checkpointing or inference endpoints requiring extreme throughput</li></ul><p><strong>⚠️ Gotcha:</strong> High-performance tiers are significantly more expensive — always validate with real workload benchmarks before committing.</p><ul><li><strong>AWS: </strong>io2 Block Express or gp3 for cost-effective high performance. For object storage use Intelligent-Tiering on S3 and/or lifecycle policies.</li><li><strong>Azure: </strong>Premium SSD v2 or Ultra Disks for top-tier latency/IOPS.</li><li><strong>GCP: </strong>Hyperdisk Extreme for ultra-high IOPS, or Hyperdisk ML for extreme throughput (ML workloads).</li><li><strong>SaaS tip: </strong>For object storage (S3, Blob, GCS), use intelligent tiering or frequent-access tiers for hot data.</li><li>Monitor tier effectiveness with provider dashboards.</li></ul><p><strong>⚡️ Best Overall Price/Performance</strong> (estimate mid-2025):</p><ul><li><strong>AWS: </strong>gp3 (cheaper than io2 for most workloads)</li><li><strong>Azure: </strong>Premium SSD v2 (better than most competitors)</li><li><strong>GCP: </strong>Hyperdisk Balanced</li></ul><p><strong>⚡️ Lowest Latency Mission-Critical </strong>(estimate mid-2025):</p><ul><li><strong>AWS: </strong>io2 Block Express (&lt;500μs)</li><li><strong>Azure: </strong>Ultra Disk (sub-millisecond)</li><li><strong>GCP: </strong>Hyperdisk Extreme (sub-millisecond)</li></ul><p><strong>Tools:</strong></p><p><strong>Amazon S3 Intelligent-Tiering: </strong>SaaS storage class for automatic tiering to optimize performance and cost for hot/cold data. <a href="https://aws.amazon.com/s3/storage-classes/intelligent-tiering/">https://aws.amazon.com/s3/storage-classes/intelligent-tiering/</a></p><h3>8. <strong>Avoid IOPS/</strong>network<strong> throttling</strong></h3><p>Throttling occurs when you <strong>exceed provisioned or instance limits</strong>, causing sudden performance drops that are hard to diagnose.</p><p>Avoiding it ensures predictable, sustained performance under load.</p><p>Carefully plan instance type, volume configuration, and workload patterns.</p><p><strong>EBS-optimized instances: </strong>AWS EC2 instances with dedicated bandwidth for storage traffic, preventing network and EBS I/O from competing for the same throughput (critical: instance limits can throttle even high-IOPS volumes).</p><p><strong>⚠️ Match instance network bandwidth to storage needs. </strong>For network-attached storage (EBS, Persistent Disk, etc.), your VM’s network throughput can become the bottleneck. EBS-optimized instances or properly sized VMs are essential.</p><p><strong>Burst credits:</strong> Performance tokens consumed when storage (AWS gp2, st1, sc1) or compute exceeds baseline capacity; once exhausted, performance drops to baseline, often causing a major slowdown.</p><p><strong>⚠️ Gotcha:</strong> <strong>Burst-credit systems (AWS gp2, st1)</strong> can hide throttling until credits are exhausted, leading to unexpected cliffs in performance.</p><p>Also <strong>account for snapshot overhead, </strong>as snapshots can degrade I/O performance temporarily, especially on first read after creation.</p><ul><li>Stay within instance limits (AWS EBS bandwidth per instance type) and volume caps.</li><li><strong>Use EBS-optimized instances</strong> (AWS) or accelerated networking (Azure/GCP).</li><li>Avoid burst-dependent types (such as gp2) for steady workloads; prefer provisioned.</li><li><strong>Linux tip: </strong>Benchmark with fio to test sustained vs. burst; watch CloudWatch for VolumeReadOps/WriteOps throttling.</li></ul><p>Look into <strong>Accelerated Networking (SR-IOV):</strong> Single Root I/O Virtualization technology that bypasses the virtual switch to give VMs direct access to physical network hardware. It’s available on major cloud providers.</p><p><strong>Tools:</strong></p><ul><li><strong>Prometheus: </strong>Open source monitoring system for tracking IOPS, throughput, and alerting on throttling in cloud storage. <a href="https://prometheus.io/">https://prometheus.io/</a></li><li><strong>Grafana:</strong> Open source visualization tool paired with Prometheus for dashboards on storage metrics to prevent throttling. <a href="https://grafana.com/">https://grafana.com/</a></li></ul><h3><strong>9. Enable multi-attach when supported</strong></h3><p>Multi-attach allows a single volume to be shared across multiple instances, enabling clustered applications and high-availability setups without replication overhead.</p><p>This is great for <strong>traditional clustered databases or filesystems </strong>needing shared storage in the cloud.</p><p>However, it requires careful coordination to prevent data corruption.</p><h4>Use cases:</h4><ul><li><strong>Clustered databases requiring high availability</strong>: Ideal for failover cluster instances (SQL Server FCI on Azure/AWS, or Oracle RAC-like setups) where multiple nodes need simultaneous read/write access to the same data for fast failover without data replication overhead.</li><li><strong>Traditional on-premises lift-and-shift migrations</strong>: When migrating legacy clustered applications that rely on SAN-like shared block storage to the cloud while preserving existing architectures.</li><li><strong>Active-active or active-standby HA configurations:</strong> In scenarios demanding low-latency shared storage for business intelligence platforms, data-intensive analytics, or applications managing concurrent writes with cluster managers</li></ul><p><strong>⚠️ Gotcha:</strong> Multi-attach typically supports only limited concurrent writers and requires cluster-aware filesystems — misconfiguration can cause severe data loss.</p><p><strong>⚠️ Gotcha: Check cloud platform docs for limitations. </strong>For example in AWS Multi-Attach enabled volumes can’t be created as boot volumes.</p><ul><li><strong>AWS:</strong> io1/io2 volumes support multi-attach for shared storage (e.g., clustered apps).</li><li><strong>Azure: </strong>Premium SSDs support shared disks.</li><li><strong>GCP: </strong>Regional Persistent Disks for multi-writer.</li><li>Requires filesystem like OCFS2 or GFS2; use carefully to avoid corruption.</li><li>Linux tip: Mount with cluster-aware options; monitor with blkid and shared queue management.</li></ul><pre>sudo blkid /dev/sdb1<br># Output: /dev/sdb1: LABEL=&quot;myshareddata&quot; UUID=&quot;...&quot; TYPE=&quot;ocfs2&quot;</pre><p><strong>Tools:</strong></p><ul><li><strong>OCFS2 (Oracle Cluster File System 2): </strong>Open source cluster-aware filesystem for shared multi-attach volumes to prevent data corruption. <a href="https://oss.oracle.com/projects/">https://oss.oracle.com/projects/</a></li><li><strong>GFS2 (Global File System 2):</strong> Open source clustered filesystem for multi-writer access on shared cloud volumes. <a href="https://www.kernel.org/doc/html/v6.0/filesystems/gfs2.html">https://www.kernel.org/doc/html/v6.0/filesystems/gfs2.html</a></li></ul><h3><strong>10. Stripe volumes for bandwidth</strong></h3><p>Striping multiple volumes together aggregates their throughput and IOPS, allowing you to exceed the limits of any single volume.</p><p>This is works great for workloads requiring multi-GB/s bandwidth, such as large-scale data processing or media streaming.</p><p>Without striping, we’re artificially capped by per-volume max.</p><h4>Use cases:</h4><ul><li><strong>High-throughput databases:</strong> For heavily I/O-intensive databases (e.g., MySQL, PostgreSQL, SQL Server) where a single volume’s bandwidth limit becomes a bottleneck, striping multiple provisioned volumes.</li><li><strong>Video editing, rendering, and live streaming: </strong>Workloads involving large sequential file transfers, such as 4K/8K video processing, real-time encoding, or caching live streams</li><li><strong>Big data processing and machine learning training:</strong> In ETL pipelines, Hadoop/Spark jobs, or ML model training with large datasets, striping enables faster data ingestion, shuffling, and checkpointing by parallelizing I/O across volumes.</li><li><strong>High-performance computing (HPC) and scientific simulations:</strong> Applications like genomics, fluid dynamics, or financial modeling</li><li>Use RAID0/LVM striping across multiple volumes to exceed single-volume throughput limits.</li><li>AWS: Common with gp3/io2; e.g., mdadm or LVM stripe for &gt;1 GB/s.</li><li>Azure/GCP: Storage Spaces or logical volume striping.</li><li>Linux tip:</li></ul><pre>lvcreate --stripes N --stripesize 64K<br><br># align filesystem (e.g., mkfs.xfs -d su=64k,sw=N).</pre><ul><li>Benchmark striped setup with fio--rw=randread — bs=1M — numjobs=16.</li></ul><p><strong>⚠️ Gotcha:</strong> Striping (RAID-0) offers no redundancy — if one volume fails, the entire striped set is lost, so combine with backups or application-level resilience.</p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow the Cloud Checklists series <a href="https://cloudchecklists.com/">here</a>. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next checklist that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p>Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Saving:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f05fea941780" width="1" height="1" alt=""><hr><p><a href="https://cloudchecklists.com/cloud-storage-i-o-performance-checklist-f05fea941780">✅ Cloud Storage I/O Performance Checklist</a> was originally published in <a href="https://cloudchecklists.com">Cloud Checklists</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Amazon EKS (K8s) Media Cluster: Part 4— Pod Auto-Scaling (HPA) and CDN]]></title>
            <link>https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-4-pod-auto-scaling-hpa-and-cdn-f1d9a060e20a?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/f1d9a060e20a</guid>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[aws-eks]]></category>
            <category><![CDATA[scaling]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[aws]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Sat, 20 Dec 2025 20:56:57 GMT</pubDate>
            <atom:updated>2025-12-20T23:31:30.475Z</atom:updated>
            <content:encoded><![CDATA[<h4>🚀 Amazon EKS + CloudFront CDN + Horizontal Pod Autoscaler (HPA), load testing with k6</h4><p><strong>✅ “Scale pods automatically, test loads, use a CDN, handle traffic spikes, deliver content fast — and have the ability to build infrastructure from scratch using Terraform”</strong></p><p><strong>In Part 3, </strong>we deployed our video app with 3 replicas for <strong>high availability.</strong> This was a great demo of pod self-healing, but did not have <strong>auto-scaling</strong> fully implemented or a CDN.</p><p><strong>What happens when traffic spikes in a real-world situation?</strong> We’ll experiment with some <strong>load testing</strong> to see what happens!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*eAmECaY-3sx4A3iwwDO8LQ.jpeg" /><figcaption>Amazon EKS (K8s) Media Cluster: Part 4 — Pod Auto-Scaling (HPA) and CDN</figcaption></figure><p><strong>What we’ll work on in this article:</strong></p><p><strong>1. CloudFront CDN distribution</strong> integrated with your application.<br><strong>2. Metrics Server installation </strong>(required for HPA).<br><strong>3. Horizontal Pod Autoscaler (HPA)</strong> configured for 3–10 pods.<br><strong>4. Load testing setup</strong> using k6 tool.<br><strong>5. CPU-based autoscaling </strong>triggers (70% threshold).<br><strong>6. Real-time monitoring </strong>of pod scaling behavior.<br><strong>7. Observe Node capacity limits, pending delays.</strong></p><p><strong>Goals to achieve:</strong></p><ul><li>Generate loads, <strong>watch HPA automatically scale pods</strong> 3 -&gt; 6 -&gt; 9 -&gt; 10 under a variety of conditions.</li><li><strong>Learn how to load test with k6.</strong></li><li>See videos delivered <strong>faster</strong> via <strong>CloudFront edge locations</strong> worldwide.</li><li>Discuss<strong> “Pending pods” problem</strong> when nodes run out of capacity.</li><li>Relationship between <strong>pod scaling </strong>and<strong> node capacity.</strong></li><li><strong>Monitor resource utilization</strong> in real-time with kubectl top</li><li>Identify exactly when and why scaling hits limits.</li></ul><p><strong>Skills we’ll flex in this article:</strong></p><ul><li><strong>Using eksctl </strong>for Amazon EKS.</li><li><strong>CloudFront CDN configuration </strong>and origin setup.</li><li><strong>Metrics Server installation</strong> and configuration.</li><li><strong>HPA manifest creation </strong>with resource targets.</li><li><strong>Load testing techniques </strong>and tools (k6).</li><li><strong>Understanding CPU/memory metrics </strong>in Kubernetes.</li><li><strong>Troubleshooting Pending pods </strong>and resource constraints.</li><li><strong>Resource requests</strong> and limits concepts.</li><li><strong>Node capacity planning </strong>calculations.</li></ul><h3>1. Prerequisites for Part 4</h3><p>Previous articles in this series you should do first:</p><p><strong>✅ PART 1 </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-1-initial-setup-roadmap-176bdb085d32"><strong>Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap</strong></a></p><p><strong>✅ PART 2 </strong><a href="https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-2-deploy-initial-terraform-multi-az-eks-cluster-e1a87efc9925"><strong>Amazon EKS (K8s) Media Cluster: Part 2 — Deploy Initial Terraform Multi-AZ EKS Cluster</strong></a></p><p><strong>✅ PART 3 </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-3-self-healing-video-pods-e4459ad9ecc0?postPublishedType=repub"><strong>Amazon EKS (K8s) Media Cluster: Part 3 — Self-Healing Video Pods</strong></a></p><p>This is where we are at with infra and what we worked on last article:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ebgfzC4Qy9sv4UQu.png" /></figure><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> . </strong>🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>2. Rebuild Cluster (If Destroyed)</h3><p>We destroyed our cluster with Terraform to avoid charges. If you did this already then follow these instructions to rebuild it.</p><pre># Verify AWS CLI profile<br>export AWS_PROFILE=terraform-eks-admin<br>aws sts get-caller-identity<br><br># Rebuild infrastructure (~15-20 min)<br>cd environments/dev<br>terraform plan<br>terraform apply<br><br># Reconnect kubectl<br>aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster<br><br># Verify nodes<br>kubectl get nodes<br><br># Verify video app is running (from Part 3)<br>kubectl get pods -n video-app<br>kubectl get svc -n video-app<br><br>cd ./k8s/<br><br># Apply the manifests<br>kubectl apply -f namespace.yaml<br>kubectl apply -f deployment.yaml<br>kubectl apply -f service.yaml</pre><p>After you run these commands you should be up and running.</p><p>⚠️ Just remember that you are being charged by Amazon AWS for the resources. You must (1) use terraform destroy AND (2) check the AWS console to <strong>be sure all resources are removed so you won’t get charged.</strong> (recall from previous article, sometimes an error causes resources to not be deleted, so you could still be charged even after running the command.)</p><ul><li>🚨<strong>Warning: </strong>I did notice once my ELB did not get destroyed, even though I used terraform destroy, which then also caused the Internet Gateway to not be destroyed — <strong>so double-check.</strong></li></ul><h3>3. Install eksctl and Metrics Server</h3><h4>3.1 Install eksctl (optional)</h4><p><strong>Why do we need eksctl? </strong>Strictly speaking, we do <em>not</em> need it. However, it is Amazon-context aware (EKS/AWS resources) so it does give us some advantages for installing EKS addons and scaling nodes manually (if needed).</p><p>Also you should familiarize yourself with it as an option in these articles. <strong>eksctl</strong> installs as an EKS-managed addon (AWS handles updates). kubectl installs raw manifests (you manage updates).</p><ul><li>The eksctl addon approach requires your cluster to have OIDC provider configured — we did install it earlier, but remember if on another project.</li></ul><pre># macOS<br>brew install eksctl<br><br># Verify<br>eksctl version<br><br># my output:<br>0.220.0-dev+3f73c725c.2025-12-01T08:05:49Z<br><br># DO NOT DO NOW&lt; just an Example: <br># Creating IAM role for a service account<br># kubectl approach: 5+ steps (create OIDC provider, IAM role, policy, annotate SA...)<br># eksctl approach: 1 command<br>eksctl create iamserviceaccount \<br>  --cluster eks-video-cluster \<br>  --name my-app \<br>  --namespace default \<br>  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess</pre><p>I am <strong>also</strong> going to show you how to use it to install Metrics Server, just for practice.</p><p>However, much of the time in this article I will stick with kubectl unless it makes a lot of sense to use eksctl instead!</p><h4>3.2 Install Metrics Server</h4><p><strong>Why do we need Metrics Server? </strong>HPA needs real-time CPU/memory metrics. Metrics Server collects these from kubelets and exposes them via the Kubernetes API.</p><p>Without Metrics Server (or an equivalent implementation), HPA cannot fetch CPU/memory metrics and will show &lt;unknown&gt; values or fail to scale on resources.</p><p><strong>Flow:</strong></p><ul><li>Each node’s <strong>kubelet</strong> (via /metrics/resource endpoint) collects raw container resource usage data.</li><li><strong>Metrics Server</strong> scrapes this data from kubelets across the cluster.</li><li>It aggregates and exposes the metrics through the Kubernetes <strong>Resource Metrics API</strong></li><li>The HPA controller queries this API to compute current utilization (compared to pod resource requests) and adjust replica counts accordingly.</li></ul><p><strong>Use Option 1 </strong>install if you want to try the more universal way of doing it.</p><p><strong>Use Option 2 </strong>install if you want more Amazon-managed options.</p><pre><br># Option 1: Install Metrics Server (kubectl)<br>kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml<br><br># Option 2: Install Metrics Server (eksctl)<br>eksctl create addon \<br>  --cluster eks-video-cluster \<br>  --name metrics-server \<br>  --region us-east-1<br><br># -------------------------------<br><br># output<br><br>serviceaccount/metrics-server created<br>clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created<br>clusterrole.rbac.authorization.k8s.io/system:metrics-server created<br>rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created<br>clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created<br>clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created<br>service/metrics-server created<br>deployment.apps/metrics-server created<br>apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created<br><br># Wait for it to be ready<br>kubectl get deployment metrics-server -n kube-system<br><br># output at 22s<br><br>NAME             READY   UP-TO-DATE   AVAILABLE   AGE<br>metrics-server   0/1     1            0           22s<br><br># output at 43s<br><br>NAME             READY   UP-TO-DATE   AVAILABLE   AGE<br>metrics-server   1/1     1            1           43s<br><br># Verify metrics are available (may take 1-2 minutes)<br>kubectl top nodes<br><br># output<br>NAME                         CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)   <br>ip-10-0-1-65.ec2.internal    29m          1%       462Mi           32%         <br>ip-10-0-2-201.ec2.internal   50m          2%       466Mi           32%         <br>ip-10-0-3-164.ec2.internal   32m          1%       569Mi           39%<br><br>kubectl top pods -n video-app</pre><h4><strong>3.3 Troubleshooting</strong></h4><pre># Check Metrics Server logs<br>kubectl logs -n kube-system -l k8s-app=metrics-server<br><br># output<br><br>I1217 20:26:15.541402       1 configmap_cafile_content.go:205] &quot;Starting controller&quot; name=&quot;client-ca::kube-system::extension-apiserver-authentication::client-ca-file&quot;<br>I1217 20:26:15.541423       1 shared_informer.go:350] &quot;Waiting for caches to sync&quot; controller=&quot;client-ca::kube-system::extension-apiserver-authentication::client-ca-file&quot;<br>I1217 20:26:15.541452       1 configmap_cafile_content.go:205] &quot;Starting controller&quot; name=&quot;client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file&quot;<br>I1217 20:26:15.541469       1 shared_informer.go:350] &quot;Waiting for caches to sync&quot; controller=&quot;client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file&quot;<br>I1217 20:26:15.542411       1 secure_serving.go:211] Serving securely on [::]:10250<br>I1217 20:26:15.542544       1 dynamic_serving_content.go:135] &quot;Starting controller&quot; name=&quot;serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key&quot;<br>I1217 20:26:15.542657       1 tlsconfig.go:243] &quot;Starting DynamicServingCertificateController&quot;<br>I1217 20:26:15.642199       1 shared_informer.go:357] &quot;Caches are synced&quot; controller=&quot;client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file&quot;<br>I1217 20:26:15.642251       1 shared_informer.go:357] &quot;Caches are synced&quot; controller=&quot;client-ca::kube-system::extension-apiserver-authentication::client-ca-file&quot;<br>I1217 20:26:15.642218       1 shared_informer.go:357] &quot;Caches are synced&quot; controller=&quot;RequestHeaderAuthRequestController&quot;<br><br># Common fix: Add --kubelet-insecure-tls flag<br>kubectl patch deployment metrics-server -n kube-system --type=&#39;json&#39; -p=&#39;[{&quot;op&quot;: &quot;add&quot;, &quot;path&quot;: &quot;/spec/template/spec/containers/0/args/-&quot;, &quot;value&quot;: &quot;--kubelet-insecure-tls&quot;}]&#39;</pre><h3>4. Create CloudFront Distribution</h3><p>Amazon CloudFront is a Content Delivery Network (CDN) that caches and delivers content from edge locations worldwide.</p><p>Instead of every user request hitting your EKS cluster directly, CloudFront intercepts requests and serves cached content from the nearest edge location.</p><p>This is how the flow works <strong>without CloudFront, or on a cache miss:</strong></p><p>User → Internet → EKS Load Balancer (us-east-1) → <br>Pod Latency: ~200–300ms</p><p>This is how the flow works with <strong>CloudFront cache hit:</strong></p><p>User → CloudFront Edge → Cached response Latency: ~20–50ms</p><p><strong>4.1 Terraform Configuration</strong></p><p>Create file: environments/dev/cloudfront.tf</p><pre># =============================================================================<br># CLOUDFRONT CDN DISTRIBUTION<br># =============================================================================<br># CloudFront delivers video content from edge locations worldwide,<br># reducing latency and offloading 70-80% of requests from your EKS cluster.<br># <br># NOTE: This requires the video-app Kubernetes service to be deployed first!<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Data: Get Load Balancer Hostname<br># -----------------------------------------------------------------------------<br>data &quot;kubernetes_service&quot; &quot;video_app&quot; {<br>  metadata {<br>    name      = &quot;video-app&quot;<br>    namespace = &quot;video-app&quot;<br>  }<br><br>  depends_on = [module.eks]<br>}<br><br># -----------------------------------------------------------------------------<br># Local Variables<br># -----------------------------------------------------------------------------<br>locals {<br>  # Check if LoadBalancer has been assigned<br>  lb_hostname = try(<br>    data.kubernetes_service.video_app.status[0].load_balancer[0].ingress[0].hostname,<br>    null<br>  )<br>  <br>  # Only create CloudFront if LoadBalancer exists<br>  create_cloudfront = local.lb_hostname != null<br>}<br><br># -----------------------------------------------------------------------------<br># CloudFront Distribution<br># -----------------------------------------------------------------------------<br>resource &quot;aws_cloudfront_distribution&quot; &quot;video_app&quot; {<br>  count = local.create_cloudfront ? 1 : 0<br><br>  enabled             = true<br>  is_ipv6_enabled     = true<br>  comment             = &quot;CDN for EKS Video App - ${var.environment}&quot;<br>  default_root_object = &quot;&quot;<br>  price_class         = &quot;PriceClass_100&quot;  # US, Canada, Europe only (cost savings)<br><br>  # Origin: Your EKS Load Balancer<br>  origin {<br>    domain_name = local.lb_hostname<br>    origin_id   = &quot;eks-video-app&quot;<br><br>    custom_origin_config {<br>      http_port              = 80<br>      https_port             = 443<br>      origin_protocol_policy = &quot;http-only&quot;  # LB is HTTP, CloudFront handles HTTPS<br>      origin_ssl_protocols   = [&quot;TLSv1.2&quot;]<br>    }<br>  }<br><br>  # Default cache behavior<br>  default_cache_behavior {<br>    allowed_methods  = [&quot;GET&quot;, &quot;HEAD&quot;, &quot;OPTIONS&quot;]<br>    cached_methods   = [&quot;GET&quot;, &quot;HEAD&quot;]<br>    target_origin_id = &quot;eks-video-app&quot;<br><br>    forwarded_values {<br>      query_string = false<br>      cookies {<br>        forward = &quot;none&quot;<br>      }<br>    }<br><br>    viewer_protocol_policy = &quot;redirect-to-https&quot;<br>    min_ttl                = 0<br>    default_ttl            = 3600      # 1 hour<br>    max_ttl                = 86400     # 24 hours<br>    compress               = true<br>  }<br><br>  # Cache behavior for video files (longer cache)<br>  ordered_cache_behavior {<br>    path_pattern     = &quot;/videos/*&quot;<br>    allowed_methods  = [&quot;GET&quot;, &quot;HEAD&quot;]<br>    cached_methods   = [&quot;GET&quot;, &quot;HEAD&quot;]<br>    target_origin_id = &quot;eks-video-app&quot;<br><br>    forwarded_values {<br>      query_string = false<br>      cookies {<br>        forward = &quot;none&quot;<br>      }<br>    }<br><br>    viewer_protocol_policy = &quot;redirect-to-https&quot;<br>    min_ttl                = 0<br>    default_ttl            = 86400     # 24 hours for videos<br>    max_ttl                = 604800    # 7 days<br>    compress               = false     # Videos are already compressed<br>  }<br><br>  # Cache behavior for API (no cache)<br>  ordered_cache_behavior {<br>    path_pattern     = &quot;/api/*&quot;<br>    allowed_methods  = [&quot;GET&quot;, &quot;HEAD&quot;, &quot;OPTIONS&quot;]<br>    cached_methods   = [&quot;GET&quot;, &quot;HEAD&quot;]<br>    target_origin_id = &quot;eks-video-app&quot;<br><br>    forwarded_values {<br>      query_string = true<br>      cookies {<br>        forward = &quot;none&quot;<br>      }<br>    }<br><br>    viewer_protocol_policy = &quot;redirect-to-https&quot;<br>    min_ttl                = 0<br>    default_ttl            = 0         # No caching for API<br>    max_ttl                = 0<br>  }<br><br>  # Geo restrictions (none)<br>  restrictions {<br>    geo_restriction {<br>      restriction_type = &quot;none&quot;<br>    }<br>  }<br><br>  # SSL certificate (default CloudFront cert)<br>  viewer_certificate {<br>    cloudfront_default_certificate = true<br>  }<br><br>  tags = local.tags<br>}<br><br># -----------------------------------------------------------------------------<br># Outputs<br># -----------------------------------------------------------------------------<br>output &quot;cloudfront_distribution_id&quot; {<br>  description = &quot;CloudFront distribution ID&quot;<br>  value       = try(aws_cloudfront_distribution.video_app[0].id, null)<br>}<br><br>output &quot;cloudfront_domain_name&quot; {<br>  description = &quot;CloudFront domain name (use this URL!)&quot;<br>  value       = try(aws_cloudfront_distribution.video_app[0].domain_name, null)<br>}<br><br>output &quot;cloudfront_url&quot; {<br>  description = &quot;Full CloudFront URL&quot;<br>  value       = local.create_cloudfront ? &quot;https://${aws_cloudfront_distribution.video_app[0].domain_name}&quot; : &quot;Deploy video-app service first, then re-run terraform apply&quot;<br>}<br><br>output &quot;loadbalancer_hostname&quot; {<br>  description = &quot;LoadBalancer hostname (if available)&quot;<br>  value       = local.lb_hostname<br>}</pre><h4>4.2 Apply CloudFront</h4><pre>cd environments/dev<br>terraform plan<br>terraform apply<br><br># Get CloudFront URL<br>CF_URL=$(terraform output -raw cloudfront_url)<br>echo &quot;CloudFront URL: $CF_URL&quot;<br><br># output<br>CloudFront URL: https://xxxxxxxxxxx.cloudfront.net<br><br># Test via CloudFront (may take 5-10 min to deploy)<br>curl -I $CF_URL/health<br><br>#output<br>HTTP/2 200 <br>content-type: application/json; charset=utf-8<br>content-length: 122<br>x-powered-by: Express<br>etag: W/&quot;7a-yQMmSaT6LIG0mMdnzDkwYGgreec&quot;<br>date: Wed, 17 Dec 2025 20:44:45 GMT<br>x-cache: Miss from cloudfront<br>via: 1.1 xxxxxxxxx.cloudfront.net (CloudFront)<br>x-amz-cf-pop: DEN53-P5<br>x-amz-cf-id: 6K4QNRkctTMnl3YZrKDtCl8wzcqlBr5XHX9ucSXtjnGBuZ4u4sMrkg==</pre><ul><li><strong>Test with the CloudFront url </strong>that you get.</li></ul><p><strong>Estimated saving on latency and requests:</strong></p><ul><li>The Video latency in the US should decrease by ~50% and in the EU probably more.</li><li>Bandwidth costs and requests should also be decreased.</li></ul><h3>5. Configure HPA (Horizontal Pod Autoscaler)</h3><h4>5.1 Update Deployment Resource Requests</h4><p>First, ensure your deployment has proper resource requests (HPA needs these):</p><p>Update k8s/deployment.yaml <strong>resources</strong> section:</p><pre>resources:<br>  requests:<br>    cpu: &quot;100m&quot;      # 0.1 CPU - HPA uses this for calculations<br>    memory: &quot;128Mi&quot;<br>  limits:<br>    cpu: &quot;500m&quot;      # 0.5 CPU max<br>    memory: &quot;256Mi&quot;</pre><p>Apply IF changed (<em>you may have the above correct already</em>):</p><pre>kubectl apply -f k8s/deployment.yaml</pre><h4>5.2 Create HPA Manifest</h4><p>Create file: k8s/hpa.yaml</p><pre># =============================================================================<br># HORIZONTAL POD AUTOSCALER (HPA)<br># =============================================================================<br># Automatically scales pods based on CPU utilization.<br>#<br># ECS Comparison:<br># - HPA = ECS Service Auto Scaling with Target Tracking<br># - Both scale based on metrics (CPU, memory, custom)<br># - Key difference: HPA is declarative YAML (GitOps-friendly)<br># =============================================================================<br>apiVersion: autoscaling/v2<br>kind: HorizontalPodAutoscaler<br>metadata:<br>  name: video-app-hpa<br>  namespace: video-app<br>  labels:<br>    app: video-app<br>spec:<br>  # ---------------------------------------------------------------------------<br>  # Target Deployment<br>  # ---------------------------------------------------------------------------<br>  scaleTargetRef:<br>    apiVersion: apps/v1<br>    kind: Deployment<br>    name: video-app<br><br>  # ---------------------------------------------------------------------------<br>  # Scaling Bounds<br>  # ---------------------------------------------------------------------------<br>  minReplicas: 3   # Match Part 3 baseline (HA)<br>  maxReplicas: 10  # Allow 3x scaling for traffic spikes<br><br>  # ---------------------------------------------------------------------------<br>  # Scaling Metrics<br>  # ---------------------------------------------------------------------------<br>  metrics:<br>    # Scale based on CPU utilization<br>    - type: Resource<br>      resource:<br>        name: cpu<br>        target:<br>          type: Utilization<br>          averageUtilization: 70  # Scale when CPU &gt; 70%<br><br>  # ---------------------------------------------------------------------------<br>  # Scaling Behavior (optional, fine-tuning)<br>  # ---------------------------------------------------------------------------<br>  behavior:<br>    scaleUp:<br>      stabilizationWindowSeconds: 0   # Scale up immediately<br>      policies:<br>        - type: Percent<br>          value: 100                   # Double pods if needed<br>          periodSeconds: 15<br>        - type: Pods<br>          value: 4                     # Or add up to 4 pods<br>          periodSeconds: 15<br>      selectPolicy: Max                # Use whichever adds more pods<br><br>    scaleDown:<br>      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down<br>      policies:<br>        - type: Percent<br>          value: 50                    # Remove up to 50% of pods<br>          periodSeconds: 60</pre><ul><li><strong>Automatic pod scaling based on CPU load</strong> — Monitors your video-app and automatically adds/removes pods when average CPU crosses 70%, keeping performance consistent during traffic spikes.</li><li><strong>Maintains 3–10 pod range</strong> — Minimum 3 pods (one per AZ for HA), maximum 10 pods during peak traffic, providing 3x capacity increase while preventing runaway scaling.</li><li><strong>Aggressive scale-up for fast response</strong> — When CPU hits 70%, immediately doubles capacity or adds up to 4 pods in 15 seconds, preventing overwhelm during sudden viral traffic.</li><li><strong>Conservative scale-down prevents flapping</strong> — Waits 5 minutes before removing pods and only removes 50% max at a time, avoiding wasteful yo-yo scaling that disrupts service.</li><li><strong>Demonstrates node capacity limits</strong> — HPA scales pods 3 → 6 → 9 → 10, but around pod 8–9 you’ll hit capacity on your 3 t3.small nodes, causing “Pending” pods — this proves why you need Karpenter to increase nodes (Part 5).</li></ul><h4>5.3 Apply HPA</h4><pre># Apply HPA<br>kubectl apply -f k8s/hpa.yaml<br><br># output <br>horizontalpodautoscaler.autoscaling/video-app-hpa created<br><br># Verify HPA is created<br>kubectl get hpa -n video-app<br><br># output<br>NAME            REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%   3         10        3          12m<br><br># Watch HPA status<br>kubectl get hpa video-app-hpa -n video-app --watch</pre><h3>6. Load Testing &amp; Observe Scaling</h3><p><strong>I switched to a new terminal window </strong>— remember to use, every time you open a terminal window, just in case you later want to deploy Terraform or AWS CLI commands:</p><pre>export AWS_PROFILE=terraform-eks-admin<br>aws sts get-caller-identity</pre><h4>6.1 Install Load Testing Tool — k6</h4><p>I originally planned to use hey tool for this, but it’s being deprecated and k6 is more modern and updated and I believe maintained by Grafana Labs.</p><pre># macOS<br>brew install k6<br><br># Linux<br>sudo gpg -k<br>sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69<br>echo &quot;deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main&quot; | sudo tee /etc/apt/sources.list.d/k6.list<br>sudo apt-get update<br>sudo apt-get install k6<br><br># Verify<br>k6 version<br><br># output<br>k6 v1.4.2 (commit/devel, go1.25.4, </pre><h4>6.2 Get Load Balancer URL</h4><pre>LB_URL=$(kubectl get service video-app -n video-app -o jsonpath=&#39;{.status.loadBalancer.ingress[0].hostname}&#39;)<br>echo &quot;Load Balancer URL: http://$LB_URL&quot;</pre><h4>6.3 Open Multiple Terminals</h4><p><strong>Terminal 1 — HPA Watch:</strong></p><pre>kubectl get hpa video-app-hpa -n video-app --watch</pre><p><strong>Terminal 2 — Pod Watch:</strong></p><pre>kubectl get pods -n video-app --watch<br><br># output<br>$ kubectl get pods -n video-app --watch<br>NAME                         READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-g82n9   1/1     Running   0          31m<br>video-app-6498b5dd57-njqcr   1/1     Running   0          31m<br>video-app-6498b5dd57-v54wx   1/1     Running   0          31m</pre><p><strong>Terminal 3 — Resource Usage:</strong></p><pre>brew install watch # macOS<br><br>exec $SHELL<br><br>watch -n 2 &#39;kubectl top pods -n video-app&#39;</pre><p><strong>Terminal 4 — Create Load script and Generate Load:</strong></p><pre># I create a script in this directory<br>k8s/load/load-test.js<br><br>import http from &#39;k6/http&#39;;<br>import { check, sleep } from &#39;k6&#39;;<br><br>export const options = {<br>  // Light load (test first)<br>  // vus: 10,<br>  // duration: &#39;30s&#39;,<br><br>  // Heavy load (trigger scaling)<br>  vus: 100,<br>  duration: &#39;5m&#39;,<br>};<br><br>export default function () {<br>  const res = http.get(`${__ENV.LB_URL}/api/info`);<br>  check(res, {<br>    &#39;status is 200&#39;: (r) =&gt; r.status === 200,<br>  });<br>  sleep(0.1);<br>}</pre><h4>Light load</h4><p>Execute load script</p><pre># Set your load balancer URL<br>export LB_URL=http://$(kubectl get svc video-app -n video-app -o jsonpath=&#39;{.status.loadBalancer.ingress[0].hostname}&#39;)<br><br># Light load first (test)<br>k6 run --vus 10 --duration 30s -e LB_URL=$LB_URL k8s/load/load-test.js<br></pre><p><strong>Results</strong></p><p>Below is the load test execution, watching the pods and the load results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/697/1*U6yEmA9MGpuPYXVStgJKUw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/672/1*p-o669XSCJiYt0TvvaV7pQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/880/1*0EI9Eb2onHIQkPu8GB89bw.png" /></figure><h4>Moderate Load</h4><pre># Moderate load (trigger scaling)<br>k6 run --vus 100 --duration 5m -e LB_URL=$LB_URL k8s/load/load-test.js</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/738/1*8zxAbYPvAGt5XKnPnRgpNQ.png" /></figure><p>Output result (did not hit scaling wall) but this does show HPA worked as intended.</p><pre>video-app-hpa   Deployment/video-app   cpu: 117%/70%   3         10        3          36m<br>video-app-hpa   Deployment/video-app   cpu: 109%/70%   3         10        6          37m<br>video-app-hpa   Deployment/video-app   cpu: 60%/70%    3         10        6          37m<br>video-app-hpa   Deployment/video-app   cpu: 53%/70%    3         10        6          37m<br>video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          37m<br>video-app-hpa   Deployment/video-app   cpu: 49%/70%    3         10        6          38m<br>video-app-hpa   Deployment/video-app   cpu: 48%/70%    3         10        6          38m<br>video-app-hpa   Deployment/video-app   cpu: 52%/70%    3         10        6          38m<br>video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          38m<br>video-app-hpa   Deployment/video-app   cpu: 53%/70%    3         10        6          39m<br>video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          39m<br>video-app-hpa   Deployment/video-app   cpu: 54%/70%    3         10        6          39m<br>video-app-hpa   Deployment/video-app   cpu: 51%/70%    3         10        6          39m<br>video-app-hpa   Deployment/video-app   cpu: 49%/70%    3         10        6          40m<br>video-app-hpa   Deployment/video-app   cpu: 50%/70%    3         10        6          40m<br>video-app-hpa   Deployment/video-app   cpu: 52%/70%    3         10        6          40m<br>video-app-hpa   Deployment/video-app   cpu: 51%/70%    3         10        6          40m<br>video-app-hpa   Deployment/video-app   cpu: 49%/70%    3         10        6          41m<br>video-app-hpa   Deployment/video-app   cpu: 7%/70%     3         10        6          41m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        6          41m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        6          42m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        5          42m</pre><p>Results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/1*X2LIDHq9WkvKITB9jsVQpQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/578/1*WOHMLsDbwZoiI_dtsL1Nhg.png" /></figure><p>Scaling up and down.</p><h4>Heavier Load</h4><p>Let’s try one more time so we hit the scaling wall:</p><pre>k6 run --vus 300 --duration 3m -e LB_URL=$LB_URL k8s/load/load-test.js</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/676/1*cpNIMxLte2OCj3XZR9063A.png" /></figure><p>Results</p><pre>kubectl get hpa video-app-hpa -n video-app --watch<br><br>NAME            REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%   3         10        8          59m<br>video-app-hpa   Deployment/video-app   cpu: 45%/70%   3         10        8          60m<br>video-app-hpa   Deployment/video-app   cpu: 72%/70%   3         10        8          60m<br>video-app-hpa   Deployment/video-app   cpu: 102%/70%   3         10        8          61m<br>video-app-hpa   Deployment/video-app   cpu: 106%/70%   3         10        10         61m<br>video-app-hpa   Deployment/video-app   cpu: 101%/70%   3         10        10         61m<br>video-app-hpa   Deployment/video-app   cpu: 88%/70%    3         10        10         61m<br>video-app-hpa   Deployment/video-app   cpu: 86%/70%    3         10        10         62m<br>video-app-hpa   Deployment/video-app   cpu: 88%/70%    3         10        10         62m<br>video-app-hpa   Deployment/video-app   cpu: 84%/70%    3         10        10         62m<br>video-app-hpa   Deployment/video-app   cpu: 88%/70%    3         10        10         63m<br>video-app-hpa   Deployment/video-app   cpu: 85%/70%    3         10        10         63m<br>video-app-hpa   Deployment/video-app   cpu: 92%/70%    3         10        10         63m<br>video-app-hpa   Deployment/video-app   cpu: 87%/70%    3         10        10         63m<br>video-app-hpa   Deployment/video-app   cpu: 86%/70%    3         10        10         64m<br>video-app-hpa   Deployment/video-app   cpu: 84%/70%    3         10        10         64m</pre><p>And</p><pre>kubectl get pods -n video-app --watch<br>NAME                         READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-2jzr5   1/1     Running   0          5m57s<br>video-app-6498b5dd57-2n9dc   1/1     Running   0          5m43s<br>video-app-6498b5dd57-9w8bv   1/1     Running   0          5m58s<br>video-app-6498b5dd57-g82n9   1/1     Running   0          71m<br>video-app-6498b5dd57-njqcr   1/1     Running   0          71m<br>video-app-6498b5dd57-qqcqj   1/1     Running   0          2m42s<br>video-app-6498b5dd57-v54wx   1/1     Running   0          71m<br>video-app-6498b5dd57-xl28k   1/1     Running   0          5m57s<br>video-app-6498b5dd57-bfmf6   0/1     Pending   0          0s<br>video-app-6498b5dd57-bfmf6   0/1     Pending   0          0s<br>video-app-6498b5dd57-xfsf7   0/1     Pending   0          1s<br>video-app-6498b5dd57-bfmf6   0/1     ContainerCreating   0          1s<br>video-app-6498b5dd57-xfsf7   0/1     Pending             0          1s<br>video-app-6498b5dd57-xfsf7   0/1     Running             0          2s<br>video-app-6498b5dd57-bfmf6   0/1     Running             0          2s<br>video-app-6498b5dd57-xfsf7   1/1     Running             0          8s<br>video-app-6498b5dd57-bfmf6   1/1     Running             0          8s</pre><p>When HPA tries to scale to 10 pods, you’ll see pods stuck in “Pending” state for some amount of time and then when it gets to 10 may be over CPU capacity (seen in other metrics), which is overloaded.</p><p>What does this mean?</p><p>✅ <strong>HPA hit maximum capacity: </strong>Scaled to 10 pods (the configured max)<br> ⚠️ <strong>CPU still high at 84–92%</strong>: Even with 10 pods, system is struggling.<br> ⚠️ <strong>Pods were briefly Pending: </strong>Pods 9 and 10 showed Pending for a few seconds.<br> ✅ <strong>Eventually scheduled: </strong>But only because we <em>barely</em> had enough capacity.</p><h4>One more time — at a higher load!</h4><pre>k6 run --vus 400 --duration 3m -e LB_URL=$LB_URL k8s/load/load-test.js</pre><p>Let’s see if we can make it stall out.</p><p>Check why a pod is stalled</p><pre>kubectl describe pod video-app-6498b5dd57-xfsf7 -n video-app</pre><p>Example:</p><pre># Check why pod is pending<br>kubectl describe pod video-app-6498b5dd57-xfsf7 -n video-app<br><br># Look for Events section:<br>Events:<br>  Type     Reason            Age   Message<br>  ----     ------            ----  -------<br>  Warning  FailedScheduling  30s   0/3 nodes are available:<br>                                   3 Insufficient cpu,<br>                                   3 Insufficient memory</pre><h4>Also look at allocatable vs used</h4><pre># See allocatable vs used<br>kubectl describe nodes | grep -A 5 &quot;Allocated resources&quot;<br><br># Example output:<br>Allocated resources:<br>  (Total limits may be over 100 percent, i.e., overcommitted.)<br>  Resource           Requests    Limits<br>  --------           --------    ------<br>  cpu                1800m (90%) 4500m (225%)<br>  memory             1536Mi (79%) 2304Mi (119%)</pre><h4><strong>The Breaking Point: When HPA isn’t enough</strong></h4><pre>NAME            REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%   3         10        5          70m<br>video-app-hpa   Deployment/video-app   cpu: 73%/70%   3         10        5          70m<br>video-app-hpa   Deployment/video-app   cpu: 102%/70%   3         10        5          71m<br>video-app-hpa   Deployment/video-app   cpu: 137%/70%   3         10        8          71m<br>video-app-hpa   Deployment/video-app   cpu: 163%/70%   3         10        10         71m<br>video-app-hpa   Deployment/video-app   cpu: 119%/70%   3         10        10         71m<br>video-app-hpa   Deployment/video-app   cpu: 113%/70%   3         10        10         72m<br>video-app-hpa   Deployment/video-app   cpu: 112%/70%   3         10        10         72m<br>video-app-hpa   Deployment/video-app   cpu: 109%/70%   3         10        10         72m<br>video-app-hpa   Deployment/video-app   cpu: 106%/70%   3         10        10         73m<br>video-app-hpa   Deployment/video-app   cpu: 101%/70%   3         10        10         73m<br>video-app-hpa   Deployment/video-app   cpu: 84%/70%    3         10        10         73m<br>video-app-hpa   Deployment/video-app   cpu: 24%/70%    3         10        10         73m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        10         74m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        10         78m<br>video-app-hpa   Deployment/video-app   cpu: 1%/70%     3         10        5          78m</pre><p>I ran a second load test with 300 virtual users to really push the system:</p><h4><strong>The Sequence:</strong></h4><p>1. CPU spiked from 73% → 102% → 137% → <strong>163%</strong></p><p>🚨 2. HPA rapidly scaled: 5 → 8 → 10 pods</p><p>3. <strong>But CPU stayed above 100% for 3 full minutes</strong></p><p>4. System maxed out with no way to scale further</p><p><strong>What 163% CPU Means: </strong>The math is simple: 163% ÷ 70% = 2.33x overcapacity.</p><p>My system needed ~16 pods to properly handle the load.</p><p>In the future if you need to save logs you can do:</p><pre>kubectl get hpa -n video-app -o yaml &gt; hpa-max-capacity-demo.yaml<br>kubectl top nodes &gt; node-capacity-max.txt<br>kubectl get pods -n video-app -o wide &gt; pods-during-spike.txt</pre><h3>Why we are doing this?</h3><p><strong>The bottleneck? Not enough nodes.</strong></p><p>We got HPA going, and HPA scales pods, but we are limited to how many pods we can do in a node.</p><p>You can manually increase node count in AWS Console or Terraform:</p><pre># Update terraform.tfvars<br>node_desired_size = 5<br>node_min_size = 3<br>node_max_size = 6<br><br># Apply<br>terraform apply</pre><p><strong>But we want to do it automatically!</strong></p><p><strong>In Part 5,</strong> we’ll solve this with <strong>Karpenter</strong>, which will:</p><ul><li>Detect pods that can’t be scheduled.</li><li>Provision new nodes in 30–60 seconds.</li><li>Allow HPA to scale to 15, 20, or even 30 pods.</li><li>Automatically remove nodes when traffic drops.</li><li><strong>All without manual intervention.</strong></li></ul><p><strong>Here is what we’ll work on:</strong></p><ul><li>Karpenter installation and configuration</li><li>NodePool definitions for Spot and On-Demand instances</li><li>Full autoscaling: HPA scales pods, Karpenter scales nodes</li><li>Cost optimization with Spot instances (70% savings)</li><li>Automatic node consolidation when load drops</li></ul><h3>8. Cleanup</h3><h4>8.1 Remove HPA and CloudFront</h4><pre># Delete HPA<br>kubectl delete -f k8s/hpa.yaml<br><br># Scale back to 3 replicas<br>kubectl scale deployment video-app -n video-app --replicas=3</pre><h4>8.2 Destroy All Infrastructure</h4><pre>cd environments/dev<br>terraform destroy</pre><h3>⚠️ Critical Reminder for Removing Resources (Avoid Extra Charges)</h3><p>Double-check in AWS Console:</p><ul><li><strong>EKS cluster deleted.</strong></li><li><strong>EC2 instances terminated.</strong></li><li><strong>Load balancers removed </strong>(⚠️ sometimes does not remove).</li><li><strong>CloudFront distribution</strong> deleted (may take 15–30 min).</li><li><strong>NAT Gateway deleted.</strong></li><li><strong>Internet Gateway</strong> deleted.</li><li>anything else I’m not thinking of 😁</li></ul><h3>9. Conclusion and Looking Ahead</h3><p>What we accomplished in this article:</p><p><strong>1. CloudFront CDN distribution</strong> integrated with your application.<br><strong>2. Metrics Server installation </strong>(required for HPA).<br><strong>3. Horizontal Pod Autoscaler (HPA)</strong> configured for 3–10 pods.<br><strong>4. Load testing setup</strong> using k6 tool.<br><strong>5. CPU-based autoscaling </strong>triggers (70% threshold).<br><strong>6. Real-time monitoring </strong>of pod scaling behavior.<br><strong>7. Observe Node capacity limits, pending delays.</strong></p><h3>What’s next?</h3><p><strong>Below is a tentative look at what is next!</strong></p><p>Now we are starting to get more advanced!</p><p><strong>1. Remove managed node groups</strong> (<strong>Karpenter</strong> replaces them)<br><strong>2. Install Karpenter using Terraform and Helm</strong><br>3. Create <strong>NodePool</strong> resource (defines provisioning rules)<br>4. Create <strong>EC2NodeClass</strong> (AMI, subnets, security groups)<br>5<strong>. Test rapid node</strong> provisioning (30–60 second spin-up)<br>6. <strong>Combined load test: HPA + Karpenter working together</strong><br>7. <strong>Configure Spot</strong> instances for 70% cost savings<br>8. <strong>Observe</strong> automatic node consolidation</p><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow the Cloud Checklists series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next checklist that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p><strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Savings:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f1d9a060e20a" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Amazon EKS (K8s) Media Cluster: Part 3— Self-Healing Video Pods]]></title>
            <link>https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-3-self-healing-video-pods-e4459ad9ecc0?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/e4459ad9ecc0</guid>
            <category><![CDATA[aws-eks]]></category>
            <category><![CDATA[cloud-architecture]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[aws-certification]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Wed, 17 Dec 2025 02:28:51 GMT</pubDate>
            <atom:updated>2025-12-20T18:54:25.280Z</atom:updated>
            <content:encoded><![CDATA[<h4>🚀 Amazon EKS + AWS ECR + Docker with self healing multi-AZ pods and kubectl diagnostics</h4><p>✅ “I need to deploy my video app and Docker image on self-healing Amazon EKS pods with kubectl diagnostics”</p><p><strong>In this article Part 3 of this Amazon EKS series, </strong>we are going to be <strong>deploying the video delivery serving part</strong> of the application, container and cluster. Also we will validate that <strong>high availability</strong> measures are working as expected in different availability zones, and if we lose one.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*aGjJfEYdJ7ddsXOATcpfOw.jpeg" /></figure><p><strong>Review of the last 2 articles </strong>which you should do first (we build on them):</p><p><strong>In Part 1, the first article of this series, we got many of the prerequisites done for our isolated IAM account, AWS CLI and Terraform.</strong></p><p><strong>✅ PART 1 </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-1-initial-setup-roadmap-176bdb085d32"><strong>Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap</strong></a></p><p><strong>Then, in Part 2, the last article, </strong>we focused on the setting up the VPC and basic <strong>Kubernetes</strong> resources with <strong>Terraform</strong> and <strong>kubectl</strong> so we could connect to our cluster.</p><p><strong>✅ PART 2 </strong><a href="https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-2-deploy-initial-terraform-multi-az-eks-cluster-e1a87efc9925"><strong>Amazon EKS (K8s) Media Cluster: Part 2 — Deploy Initial Terraform Multi-AZ EKS Cluster</strong></a></p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h4>Roadmap for this article:</h4><p><strong>1. Introduction:</strong> What we’re building, prerequisites (5m)<br><strong>2. Rebuild Cluster: </strong>Quick terraform apply to restore (15–20m)<br><strong>3. Create ECR Repository:</strong> Terraform for private container registry (5m)<br><strong>4. Build the App: </strong>Node.js server + HTML5 video player (10m)<br><strong>5. Docker Build &amp; Push: </strong>Containerize and push to ECR (10m)<br><strong>6. Kubernetes Manifests:</strong> Deployment + Service with HA (10m)<br><strong>7. Deploy to EKS:</strong> kubectl apply, verify pods (5m)<br><strong>8. Access the App: </strong>Open browser, watch video! (5m)<br><strong>9. Explore &amp; Verify HA:</strong> Test replicas, health checks (10m)<br><strong>10. Cleanup: </strong>Destroy resources</p><p><strong>Time estimate: </strong>45m-60m</p><p><strong>Note: </strong>We are moving from more basic examples on AWS EKS and K8s to more advanced examples with each article.</p><ul><li>This is a simplified video/player server example.</li><li>On a higher volume scenario we will decouple the video serving, and we’ll be doing that in later articles! This is as a building block to future parts in the series.</li></ul><h3>1: Introduction- Goals, Roadmap, Costs</h3><p><strong>Goals:</strong></p><ul><li><strong>ECR Repository: </strong>Private container registry for Node.js Docker image</li><li><strong>Application: </strong>Simple web server in Node.js with HTML5 video player Kubernetes. We’ll create an image with Docker.</li><li><strong>Deployment: </strong>Replicas spread across availability zones LoadBalancer</li><li><strong>Service: </strong>Public URL to access the application</li><li><strong>Destroy the resources </strong>at the end so we do not get charged a lot!</li></ul><p><strong>Prerequisites</strong></p><ul><li><strong>Completed Articles Part 1 and 2</strong></li><li><strong>EKS cluster running </strong>(or ready to rebuild with terraform apply) — setup is in Article Part 2 of this series and below first step we rebuild</li><li><strong>Docker installed locally</strong> (for building the image) — <a href="https://docs.docker.com/">https://docs.docker.com/</a></li><li>Estimated (may vary slightly) ~$1-$2 or less budget for this session ⚠️ if you complete it within the expected 1–2 hours and destroy all AWS resources at that time.</li><li><strong>⚠️ IMPORTANT: There will be ongoing charges if you do not remove AWS resources</strong> built here. Also charges are higher if you use a legacy version of k8s.</li><li>I have isolated my account (see series Part 1 setup) so it’s easy for me to track. As stated in earlier parts of this series use <strong>terraform destroy</strong> and make sure to double check that all resources were destroyed in</li></ul><p><strong>To set up cost guardrails and AWS Budgets alerts see my articles:</strong></p><ul><li><a href="https://medium.com/cloud-cost-savings/12-aws-cost-alerts-to-use-right-now-a082795f3858"><strong>12 AWS Cost Alerts To Use Right Now</strong></a></li><li><a href="https://medium.com/cloud-cost-savings/aws-cost-savings-playbook-2-reports-tracking-heart-soul-of-cost-control-d7ff2d926e72">AWS Cost Savings Playbook (#2): REPORTS/Tracking, Heart &amp; Soul of Cost Control</a></li><li><a href="https://medium.com/cloud-cost-savings/aws-cost-savings-playbook-4-cost-guardrails-e90407984eda">AWS Cost Savings Playbook (#4): Cost Guardrails: Setting up cost guardrails to prevent unintended spending.</a></li></ul><p>This is what we are aiming for here and will experiment with deleting a pod and making sure it’s healed (rebuilt)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AkxrKNmP1gArIlpF-mMQjA.png" /><figcaption>Video player</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_NRw_iO7QMOe-DiQBKx4Bg.png" /><figcaption>Updated diagram</figcaption></figure><h3>2. Rebuild the Cluster (If Destroyed)</h3><p>If you destroyed your cluster after Article 2 (good job saving money!), let’s rebuild it:</p><pre><br># login to to IAM user for the CLI<br><br># if you cannot remember the name of your AWS CLI profile<br>aws configure list-profiles<br><br># use your tutorial creds setup earlier (I used profile &quot;terraform-eks-admin&quot;)<br>export AWS_PROFILE=terraform-eks-admin<br><br># verify what account you are in - if u=issues see Part 1 article.<br>aws sts get-caller-identity<br><br># output<br><br>{<br>    &quot;UserId&quot;: &quot;xxxxxxxxxxxx&quot;,<br>    &quot;Account&quot;: &quot;xxxxxxxxxxxxx&quot;,<br>    &quot;Arn&quot;: &quot;arn:aws:iam::xxxxxxxxxxxxx:user/terraform-eks-admin&quot;<br>}<br><br># cd from your root project dir.<br>cd environments/dev<br><br># preview our build to make sure it did not change form before<br>terraform plan<br><br># Rebuild infrastructure (~15-20 minutes)<br>terraform apply</pre><ul><li>Make sure you were in environments/dev</li><li>After that you did the terraform apply and</li><li>it will take 15–20 min.</li></ul><p>Once complete, reconnect kubectl (change region if yours is different):</p><pre>aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster<br><br># output<br>Updated context ...</pre><p>Verify nodes are ready:</p><pre>kubectl get nodes<br><br># output<br>NAME                             STATUS   ROLES    AGE   VERSION<br>ip-10-0-1-110.ec2.internal   Ready    &lt;none&gt;   10m   v1.34.2-eks-ecaa3a6<br>ip-10-0-2-6.ec2.internal     Ready    &lt;none&gt;   10m   v1.34.2-eks-ecaa3a6<br>ip-10-0-3-7.ec2.internal     Ready    &lt;none&gt;   10m   v1.34.2-eks-ecaa3a6</pre><p>Perfect, let’s continue!</p><h3>3. Create ECR Repository</h3><p>Now we’ll add the ECR repository to our Terraform configuration.</p><p>Create new file: environments/dev/ecr.tf</p><pre># =============================================================================<br># ELASTIC CONTAINER REGISTRY (ECR)<br># =============================================================================<br># ECR is AWS&#39;s private Docker registry. We&#39;ll store our video app image here.<br>#<br># ECS Comparison:<br># - ECR works exactly the same for both ECS and EKS<br># - You push images to ECR, then reference them in task definitions (ECS)<br>#   or pod specs (EKS)<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># ECR Repository for Video App<br># -----------------------------------------------------------------------------<br>resource &quot;aws_ecr_repository&quot; &quot;video_app&quot; {<br>  name                 = &quot;${var.project_name}-video-app&quot;<br>  image_tag_mutability = &quot;MUTABLE&quot;  # Allows overwriting tags like &quot;latest&quot;<br><br>  # Scan images for vulnerabilities on push<br>  image_scanning_configuration {<br>    scan_on_push = true<br>  }<br><br>  # Encryption at rest<br>  encryption_configuration {<br>    encryption_type = &quot;AES256&quot;<br>  }<br><br>  tags = local.tags<br>}<br><br># -----------------------------------------------------------------------------<br># ECR Lifecycle Policy<br># -----------------------------------------------------------------------------<br># Automatically clean up old images to save storage costs<br>resource &quot;aws_ecr_lifecycle_policy&quot; &quot;video_app&quot; {<br>  repository = aws_ecr_repository.video_app.name<br><br>  policy = jsonencode({<br>    rules = [<br>      {<br>        rulePriority = 1<br>        description  = &quot;Keep only 10 most recent images&quot;<br>        selection = {<br>          tagStatus   = &quot;any&quot;<br>          countType   = &quot;imageCountMoreThan&quot;<br>          countNumber = 10<br>        }<br>        action = {<br>          type = &quot;expire&quot;<br>        }<br>      }<br>    ]<br>  })<br>}<br><br># -----------------------------------------------------------------------------<br># Outputs<br># -----------------------------------------------------------------------------<br>output &quot;ecr_repository_url&quot; {<br>  description = &quot;URL of the ECR repository&quot;<br>  value       = aws_ecr_repository.video_app.repository_url<br>}<br><br>output &quot;ecr_repository_name&quot; {<br>  description = &quot;Name of the ECR repository&quot;<br>  value       = aws_ecr_repository.video_app.name<br>}<br><br>output &quot;ecr_login_command&quot; {<br>  description = &quot;Command to authenticate Docker with ECR&quot;<br>  value       = &quot;aws ecr get-login-password --region ${var.aws_region} | docker login --username AWS --password-stdin ${data.aws_caller_identity.current.account_id}.dkr.ecr.${var.aws_region}.amazonaws.com&quot;<br>}<br><br>output &quot;docker_build_push_commands&quot; {<br>  description = &quot;Commands to build and push the video app image&quot;<br>  value       = &lt;&lt;-EOT<br><br>    # Navigate to app directory<br>    cd ~/eks-video-tutorial/app<br><br>    # Build the image<br>    docker build -t ${aws_ecr_repository.video_app.repository_url}:latest .<br><br>    # Login to ECR<br>    aws ecr get-login-password --region ${var.aws_region} | docker login --username AWS --password-stdin ${data.aws_caller_identity.current.account_id}.dkr.ecr.${var.aws_region}.amazonaws.com<br><br>    # Push the image<br>    docker push ${aws_ecr_repository.video_app.repository_url}:latest<br><br>  EOT<br>}</pre><ul><li><strong>ECR is AWS’s private Docker image registry </strong>(like Docker Hub, but a private one for your images)</li><li>Stores container images that ECS/EKS pull when running containers</li><li>scan_on_push = auto-scans images for security vulnerabilities</li><li>Lifecycle: Auto-deletes old images to save storage costs</li><li>Keeps only the 10 most recent images; older ones expire</li></ul><p><strong>Typical workflow of ECR</strong></p><ol><li>Build Docker image locally</li><li>Authenticate Docker to ECR (login command)</li><li>Push image to ECR</li><li>ECS/EKS pulls image from ECR URL when deploying</li></ol><h4>Apply to create ECR repository:</h4><pre>cd environments/dev<br>terraform plan<br><br># output<br>Plan: 2 to add, 0 to change, 0 to destroy.<br><br># apply the changes - yes<br>terraform apply<br><br># output<br>ecr_repository_url = ...<br>ecr_repository_name = &quot;eks-video-tutorial-video-app&quot;</pre><p>You should see in the Terraform output after you run in your console like this (will vary based on your account numbers and ids).</p><p><strong>⚠️ Keep this text we will need it later.</strong></p><p><strong>🚨⚠️ CRITICAL</strong>: <strong>Remember to run </strong><strong>terraform destroy to delete these resources when you are done or will you be charged, </strong>and this could accrue to $15–20 per day in charges.</p><pre><br># Navigate to app directory<br>cd ~/eks-video-tutorial/app<br><br># Build the image<br>docker build -t xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app:latest .<br><br># Login to ECR<br>aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com<br><br># Push the image<br>docker push xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app:latest<br><br><br>EOT<br>ecr_login_command = &quot;aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com&quot;<br>ecr_repository_name = &quot;eks-video-tutorial-video-app&quot;<br>ecr_repository_url = &quot;xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app&quot;<br>get_nodes_command = &quot;kubectl get nodes -o wide&quot;<br>get_pods_command = &quot;kubectl get pods -A&quot;</pre><h3>4. Build the Node.js Application</h3><p>Let’s build the Node.js app for the video streaming player.</p><p>This app will be a simple Express server that serves video content with an HTML5 player, and it will include health checks, static file serving, and basic streaming.</p><pre># from root directory for your project<br>mkdir -p app/public/videos<br>cd app</pre><h4>4.1 Package.json</h4><p>Create app/package.json</p><pre>{<br>  &quot;name&quot;: &quot;eks-video-app&quot;,<br>  &quot;version&quot;: &quot;1.0.0&quot;,<br>  &quot;description&quot;: &quot;Simple video streaming app for EKS tutorial&quot;,<br>  &quot;main&quot;: &quot;server.js&quot;,<br>  &quot;scripts&quot;: {<br>    &quot;start&quot;: &quot;node server.js&quot;<br>  },<br>  &quot;dependencies&quot;: {<br>    &quot;express&quot;: &quot;^4.21.0&quot;<br>  },<br>  &quot;engines&quot;: {<br>    &quot;node&quot;: &quot;&gt;=20.0.0&quot;<br>  }<br>}</pre><h4>4.2 Server.js (Main Application)</h4><p>Create file: app/server.js</p><pre>// =============================================================================<br>// EKS VIDEO STREAMING APP<br>// =============================================================================<br>// A simple Express server that serves video content with an HTML5 player.<br>// Demonstrates: health checks, static file serving, and basic streaming.<br>// =============================================================================<br><br>const express = require(&#39;express&#39;);<br>const path = require(&#39;path&#39;);<br>const os = require(&#39;os&#39;);<br><br>const app = express();<br>const PORT = process.env.PORT || 3000;<br><br>// -----------------------------------------------------------------------------<br>// Middleware<br>// -----------------------------------------------------------------------------<br><br>// Serve static files from &#39;public&#39; directory<br>app.use(express.static(path.join(__dirname, &#39;public&#39;)));<br><br>// Request logging<br>app.use((req, res, next) =&gt; {<br>  const timestamp = new Date().toISOString();<br>  console.log(`[${timestamp}] ${req.method} ${req.path} - ${req.ip}`);<br>  next();<br>});<br><br>// -----------------------------------------------------------------------------<br>// Health Check Endpoints<br>// -----------------------------------------------------------------------------<br>// These are CRITICAL for Kubernetes!<br>// - /health: Used by both liveness and readiness probes<br>// - /ready: Could be used for more complex readiness logic<br><br>// Basic health check - Kubernetes uses this to know if the pod is alive<br>app.get(&#39;/health&#39;, (req, res) =&gt; {<br>  res.status(200).json({<br>    status: &#39;healthy&#39;,<br>    timestamp: new Date().toISOString(),<br>    hostname: os.hostname(),<br>    uptime: process.uptime()<br>  });<br>});<br><br>// Readiness check - could include dependency checks in production<br>app.get(&#39;/ready&#39;, (req, res) =&gt; {<br>  // In production, you might check:<br>  // - Database connectivity<br>  // - External service availability<br>  // - Required files exist<br>  res.status(200).json({<br>    status: &#39;ready&#39;,<br>    timestamp: new Date().toISOString()<br>  });<br>});<br><br>// -----------------------------------------------------------------------------<br>// API Endpoints<br>// -----------------------------------------------------------------------------<br><br>// Server info endpoint - useful for debugging which pod you&#39;re hitting<br>app.get(&#39;/api/info&#39;, (req, res) =&gt; {<br>  res.json({<br>    app: &#39;eks-video-app&#39;,<br>    version: &#39;1.0.0&#39;,<br>    hostname: os.hostname(),<br>    platform: os.platform(),<br>    nodeVersion: process.version,<br>    environment: process.env.NODE_ENV || &#39;development&#39;,<br>    podName: process.env.POD_NAME || os.hostname(),<br>    nodeName: process.env.NODE_NAME || &#39;unknown&#39;,<br>    timestamp: new Date().toISOString()<br>  });<br>});<br><br>// List available videos<br>app.get(&#39;/api/videos&#39;, (req, res) =&gt; {<br>  res.json({<br>    videos: [<br>      {<br>        id: &#39;sample&#39;,<br>        title: &#39;Sample Video&#39;,<br>        description: &#39;A sample video for testing the EKS video streaming platform&#39;,<br>        url: &#39;/videos/sample.mp4&#39;,<br>        thumbnail: &#39;/images/thumbnail.png&#39;<br>      }<br>    ]<br>  });<br>});<br><br>// -----------------------------------------------------------------------------<br>// Main Page<br>// -----------------------------------------------------------------------------<br><br>app.get(&#39;/&#39;, (req, res) =&gt; {<br>  res.sendFile(path.join(__dirname, &#39;public&#39;, &#39;index.html&#39;));<br>});<br><br>// -----------------------------------------------------------------------------<br>// Error Handling<br>// -----------------------------------------------------------------------------<br><br>// 404 handler<br>app.use((req, res) =&gt; {<br>  res.status(404).json({<br>    error: &#39;Not Found&#39;,<br>    path: req.path<br>  });<br>});<br><br>// Error handler<br>app.use((err, req, res, next) =&gt; {<br>  console.error(&#39;Error:&#39;, err);<br>  res.status(500).json({<br>    error: &#39;Internal Server Error&#39;,<br>    message: err.message<br>  });<br>});<br><br>// -----------------------------------------------------------------------------<br>// Start Server<br>// -----------------------------------------------------------------------------<br><br>app.listen(PORT, &#39;0.0.0.0&#39;, () =&gt; {<br>  console.log(&#39;=&#39;.repeat(60));<br>  console.log(&#39;EKS VIDEO STREAMING APP&#39;);<br>  console.log(&#39;=&#39;.repeat(60));<br>  console.log(`Server running on port ${PORT}`);<br>  console.log(`Hostname: ${os.hostname()}`);<br>  console.log(`Node.js: ${process.version}`);<br>  console.log(`Started: ${new Date().toISOString()}`);<br>  console.log(&#39;=&#39;.repeat(60));<br>  console.log(&#39;Endpoints:&#39;);<br>  console.log(`  - GET /          : Video player UI`);<br>  console.log(`  - GET /health    : Health check (liveness)`);<br>  console.log(`  - GET /ready     : Readiness check`);<br>  console.log(`  - GET /api/info  : Server info`);<br>  console.log(`  - GET /api/videos: List videos`);<br>  console.log(&#39;=&#39;.repeat(60));<br>});<br><br>// Graceful shutdown<br>process.on(&#39;SIGTERM&#39;, () =&gt; {<br>  console.log(&#39;SIGTERM received, shutting down gracefully...&#39;);<br>  process.exit(0);<br>});<br><br>process.on(&#39;SIGINT&#39;, () =&gt; {<br>  console.log(&#39;SIGINT received, shutting down gracefully...&#39;);<br>  process.exit(0);<br>});</pre><h4>4.3 HTML Video Player</h4><p>Create file: File: app/public/index.html</p><pre>&lt;!DOCTYPE html&gt;<br>&lt;html lang=&quot;en&quot;&gt;<br>&lt;head&gt;<br>    &lt;meta charset=&quot;UTF-8&quot;&gt;<br>    &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1.0&quot;&gt;<br>    &lt;title&gt;EKS Video Streaming Platform&lt;/title&gt;<br>    &lt;style&gt;<br>        * {<br>            margin: 0;<br>            padding: 0;<br>            box-sizing: border-box;<br>        }<br><br>        body {<br>            font-family: -apple-system, BlinkMacSystemFont, &#39;Segoe UI&#39;, Roboto, Oxygen, Ubuntu, sans-serif;<br>            background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%);<br>            min-height: 100vh;<br>            color: #ffffff;<br>        }<br><br>        .container {<br>            max-width: 1200px;<br>            margin: 0 auto;<br>            padding: 20px;<br>        }<br><br>        header {<br>            text-align: center;<br>            padding: 40px 0;<br>            border-bottom: 1px solid rgba(255,255,255,0.1);<br>            margin-bottom: 40px;<br>        }<br><br>        h1 {<br>            font-size: 2.5rem;<br>            margin-bottom: 10px;<br>            background: linear-gradient(90deg, #e94560, #ff6b6b);<br>            -webkit-background-clip: text;<br>            -webkit-text-fill-color: transparent;<br>            background-clip: text;<br>        }<br><br>        .subtitle {<br>            color: #a0a0a0;<br>            font-size: 1.1rem;<br>        }<br><br>        .badge {<br>            display: inline-block;<br>            background: #e94560;<br>            color: white;<br>            padding: 5px 15px;<br>            border-radius: 20px;<br>            font-size: 0.8rem;<br>            margin-top: 15px;<br>        }<br><br>        .video-section {<br>            background: rgba(255,255,255,0.05);<br>            border-radius: 20px;<br>            padding: 30px;<br>            margin-bottom: 30px;<br>            backdrop-filter: blur(10px);<br>            border: 1px solid rgba(255,255,255,0.1);<br>        }<br><br>        .video-container {<br>            position: relative;<br>            width: 100%;<br>            max-width: 800px;<br>            margin: 0 auto;<br>            border-radius: 15px;<br>            overflow: hidden;<br>            box-shadow: 0 20px 60px rgba(0,0,0,0.5);<br>        }<br><br>        video {<br>            width: 100%;<br>            display: block;<br>            background: #000;<br>        }<br><br>        .video-title {<br>            font-size: 1.5rem;<br>            margin-bottom: 20px;<br>            text-align: center;<br>        }<br><br>        .info-section {<br>            display: grid;<br>            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));<br>            gap: 20px;<br>            margin-top: 40px;<br>        }<br><br>        .info-card {<br>            background: rgba(255,255,255,0.05);<br>            border-radius: 15px;<br>            padding: 25px;<br>            border: 1px solid rgba(255,255,255,0.1);<br>        }<br><br>        .info-card h3 {<br>            color: #e94560;<br>            margin-bottom: 15px;<br>            font-size: 1rem;<br>            text-transform: uppercase;<br>            letter-spacing: 1px;<br>        }<br><br>        .info-card p {<br>            color: #a0a0a0;<br>            line-height: 1.6;<br>        }<br><br>        .server-info {<br>            background: rgba(233, 69, 96, 0.1);<br>            border: 1px solid rgba(233, 69, 96, 0.3);<br>        }<br><br>        .server-info h3 {<br>            color: #ff6b6b;<br>        }<br><br>        #serverInfo {<br>            font-family: &#39;Courier New&#39;, monospace;<br>            font-size: 0.9rem;<br>        }<br><br>        #serverInfo div {<br>            padding: 5px 0;<br>            border-bottom: 1px solid rgba(255,255,255,0.05);<br>        }<br><br>        #serverInfo span {<br>            color: #e94560;<br>        }<br><br>        .ha-badge {<br>            display: inline-block;<br>            background: linear-gradient(90deg, #00b894, #00cec9);<br>            color: white;<br>            padding: 3px 10px;<br>            border-radius: 10px;<br>            font-size: 0.75rem;<br>            margin-left: 10px;<br>        }<br><br>        footer {<br>            text-align: center;<br>            padding: 40px 0;<br>            color: #606060;<br>            border-top: 1px solid rgba(255,255,255,0.1);<br>            margin-top: 40px;<br>        }<br><br>        footer a {<br>            color: #e94560;<br>            text-decoration: none;<br>        }<br><br>        .loading {<br>            text-align: center;<br>            padding: 20px;<br>            color: #a0a0a0;<br>        }<br><br>        @media (max-width: 768px) {<br>            h1 {<br>                font-size: 1.8rem;<br>            }<br>            .container {<br>                padding: 10px;<br>            }<br>            .video-section {<br>                padding: 15px;<br>            }<br>        }<br>    &lt;/style&gt;<br>&lt;/head&gt;<br>&lt;body&gt;<br>    &lt;div class=&quot;container&quot;&gt;<br>        &lt;header&gt;<br>            &lt;h1&gt;🎬 EKS Video Streaming&lt;/h1&gt;<br>            &lt;p class=&quot;subtitle&quot;&gt;A Kubernetes-powered video platform running on Amazon EKS&lt;/p&gt;<br>            &lt;span class=&quot;badge&quot;&gt;Article 3 - First Deployment&lt;/span&gt;<br>        &lt;/header&gt;<br><br>        &lt;section class=&quot;video-section&quot;&gt;<br>            &lt;h2 class=&quot;video-title&quot;&gt;Sample Video &lt;span class=&quot;ha-badge&quot;&gt;HA Enabled&lt;/span&gt;&lt;/h2&gt;<br>            &lt;div class=&quot;video-container&quot;&gt;<br>                &lt;video controls preload=&quot;metadata&quot; poster=&quot;/images/poster.png&quot;&gt;<br>                    &lt;source src=&quot;/videos/sample.mp4&quot; type=&quot;video/mp4&quot;&gt;<br>                    Your browser does not support the video tag.<br>                &lt;/video&gt;<br>            &lt;/div&gt;<br>        &lt;/section&gt;<br><br>        &lt;section class=&quot;info-section&quot;&gt;<br>            &lt;div class=&quot;info-card server-info&quot;&gt;<br>                &lt;h3&gt;🖥️ Server Info (Pod Details)&lt;/h3&gt;<br>                &lt;div id=&quot;serverInfo&quot; class=&quot;loading&quot;&gt;Loading server info...&lt;/div&gt;<br>            &lt;/div&gt;<br><br>            &lt;div class=&quot;info-card&quot;&gt;<br>                &lt;h3&gt;🔷 High Availability&lt;/h3&gt;<br>                &lt;p&gt;<br>                    This app runs with &lt;strong&gt;3 replicas&lt;/strong&gt; spread across multiple <br>                    availability zones. Refresh the page multiple times and watch the <br>                    hostname change - you&#39;re being load balanced between pods!<br>                &lt;/p&gt;<br>            &lt;/div&gt;<br><br>            &lt;div class=&quot;info-card&quot;&gt;<br>                &lt;h3&gt;❤️ Health Checks&lt;/h3&gt;<br>                &lt;p&gt;<br>                    Kubernetes monitors this app using &lt;strong&gt;liveness&lt;/strong&gt; and <br>                    &lt;strong&gt;readiness&lt;/strong&gt; probes. If this pod becomes unhealthy, <br>                    Kubernetes automatically restarts it or stops sending traffic.<br>                &lt;/p&gt;<br>            &lt;/div&gt;<br><br>            &lt;div class=&quot;info-card&quot;&gt;<br>                &lt;h3&gt;📊 ECS Comparison&lt;/h3&gt;<br>                &lt;p&gt;<br>                    In ECS, you&#39;d use a Task Definition + Service. Here, we use a <br>                    &lt;strong&gt;Deployment&lt;/strong&gt; (combines both concepts) and a <br>                    &lt;strong&gt;Service&lt;/strong&gt; for load balancing. Same outcome, K8s vocabulary!<br>                &lt;/p&gt;<br>            &lt;/div&gt;<br>        &lt;/section&gt;<br><br>        &lt;footer&gt;<br>            &lt;p&gt;<br>                Built with ❤️ for the <br>                &lt;a href=&quot;#&quot;&gt;EKS Video Tutorial Series&lt;/a&gt;<br>            &lt;/p&gt;<br>            &lt;p style=&quot;margin-top: 10px; font-size: 0.9rem;&quot;&gt;<br>                Running on Amazon EKS • Node.js • Kubernetes<br>            &lt;/p&gt;<br>        &lt;/footer&gt;<br>    &lt;/div&gt;<br><br>    &lt;script&gt;<br>        // Fetch and display server info<br>        async function loadServerInfo() {<br>            try {<br>                const response = await fetch(&#39;/api/info&#39;);<br>                const data = await response.json();<br>                <br>                const serverInfoEl = document.getElementById(&#39;serverInfo&#39;);<br>                serverInfoEl.innerHTML = `<br>                    &lt;div&gt;&lt;span&gt;Pod Name:&lt;/span&gt; ${data.podName}&lt;/div&gt;<br>                    &lt;div&gt;&lt;span&gt;Node Name:&lt;/span&gt; ${data.nodeName}&lt;/div&gt;<br>                    &lt;div&gt;&lt;span&gt;Hostname:&lt;/span&gt; ${data.hostname}&lt;/div&gt;<br>                    &lt;div&gt;&lt;span&gt;Node.js:&lt;/span&gt; ${data.nodeVersion}&lt;/div&gt;<br>                    &lt;div&gt;&lt;span&gt;Uptime:&lt;/span&gt; &lt;span id=&quot;uptime&quot;&gt;Loading...&lt;/span&gt;&lt;/div&gt;<br>                `;<br><br>                // Update uptime every second<br>                updateUptime();<br>                setInterval(updateUptime, 1000);<br><br>            } catch (error) {<br>                document.getElementById(&#39;serverInfo&#39;).innerHTML = <br>                    &#39;&lt;p style=&quot;color: #e94560;&quot;&gt;Error loading server info&lt;/p&gt;&#39;;<br>            }<br>        }<br><br>        async function updateUptime() {<br>            try {<br>                const response = await fetch(&#39;/health&#39;);<br>                const data = await response.json();<br>                const uptime = Math.floor(data.uptime);<br>                const hours = Math.floor(uptime / 3600);<br>                const minutes = Math.floor((uptime % 3600) / 60);<br>                const seconds = uptime % 60;<br>                document.getElementById(&#39;uptime&#39;).textContent = <br>                    `${hours}h ${minutes}m ${seconds}s`;<br>            } catch (error) {<br>                // Silently fail<br>            }<br>        }<br><br>        // Load server info on page load<br>        loadServerInfo();<br>    &lt;/script&gt;<br>&lt;/body&gt;<br>&lt;/html&gt;</pre><h4>4.4 Download a Sample Video</h4><p>Go to the public video directory we made and download a sample video — this url works right now .</p><p>Source, no license required “Below are sample videos available for download with no license restrictions.”: <a href="https://samplelib.com/sample-mp4.html">https://samplelib.com/sample-mp4.html</a> — OR, use another one of your own videos.</p><pre>cd app/public/videos<br><br># download sample video or use your own<br>curl -L -o sample.mp4 &quot;https://download.samplelib.com/mp4/sample-5s.mp4&quot;</pre><p><strong>You may need a placeholder image</strong></p><pre>mkdir -p app/public/images<br><br># I created an image with https://placehold.co/600x400 and saved:<br>app/public/images/placeholder.png<br></pre><h3>5. Dockerfile and Build</h3><p>Now we will do a multi-stage docker build for a smaller, more secure image.</p><p>Create file: app/Dockerfile</p><pre># =============================================================================<br># DOCKERFILE - EKS Video Streaming App<br># =============================================================================<br># Multi-stage build for a smaller, more secure image<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Stage 1: Dependencies<br># -----------------------------------------------------------------------------<br>FROM node:22-alpine AS deps<br><br>WORKDIR /app<br><br># Copy package files<br>COPY package*.json ./<br><br># Install production dependencies only<br>RUN npm ci --omit=dev<br><br># -----------------------------------------------------------------------------<br># Stage 2: Production<br># -----------------------------------------------------------------------------<br>FROM node:22-alpine AS production<br><br># Add labels for image metadata<br>LABEL maintainer=&quot;EKS Tutorial&quot;<br>LABEL description=&quot;Video streaming app for EKS tutorial&quot;<br>LABEL version=&quot;1.0.0&quot;<br><br># Create non-root user for security<br>RUN addgroup -g 1001 -S appgroup &amp;&amp; \<br>    adduser -u 1001 -S appuser -G appgroup<br><br>WORKDIR /app<br><br># Copy dependencies from deps stage<br>COPY --from=deps /app/node_modules ./node_modules<br><br># Copy application code<br>COPY --chown=appuser:appgroup . .<br><br># Switch to non-root user<br>USER appuser<br><br># Expose port<br>EXPOSE 3000<br><br># Health check<br>HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \<br>    CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1<br><br># Start the application<br>CMD [&quot;node&quot;, &quot;server.js&quot;]</pre><p>Key points of this:</p><ul><li><strong>Multi-stage build</strong> — Uses two stages (deps → production) to keep the final image small by only copying what&#39;s needed</li><li><strong>Alpine base</strong> — Uses node:22-alpine, a minimal Linux distro (~5MB vs ~900MB for full images)</li><li><strong>Dependency isolation</strong> — Stage 1 installs only production deps (RUN npm ci — omit=dev), excluding dev dependencies</li><li><strong>Non-root user</strong> — Creates appuser instead of running as root — limits damage if container is compromised</li><li><strong>Health check</strong> — Kubernetes/ECS can auto-detect if the app is unhealthy by hitting /health every 30s and restart if needed</li><li><strong>.dockerignore</strong> — Excludes unnecessary files (node_modules, .git, docs) from the build context, making builds faster and images smaller</li><li><strong>EXPOSE 3000 + </strong><strong>CMD</strong> — Documents the port and starts the Node.js server</li></ul><h4>5.2 Dockerignore</h4><p>And also we need a Dockerignore file at</p><p>app/.dockerignore</p><pre>node_modules<br>npm-debug.log<br>Dockerfile<br>.dockerignore<br>.git<br>.gitignore<br>README.md<br>.env<br>*.md</pre><h4>5.3 Build and Push to ECR</h4><pre>cd app/<br><br>cd environments/dev<br><br># get ecr url<br><br>ECR_URL=$(terraform output -raw ecr_repository_url)<br>echo &quot;ECR URL: $ECR_URL&quot;<br><br># Output:<br>ECR URL: xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app<br><br>cd ../../app/<br><br>aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URL<br><br></pre><p><strong>⚠️note: At this point I got an error say my Docker client was slightly outdated,</strong> it was confused with another version I had previously installed. I installed the latest Docker and removed the old one.</p><pre>docker --version<br><br># output<br>Docker version 24.0.7<br><br>aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URL<br><br># output<br>Login Succeeded<br><br># Build the image (from app/)<br>docker build -t $ECR_URL:latest -t $ECR_URL:v1.0.0 .<br><br># ONLY if error during build you may need to do, from app/ <br>npm i<br><br># Push the image<br>docker push $ECR_URL:latest<br>docker push $ECR_URL:v1.0.0<br><br># Output expected<br>[+] Building 15.3s (12/12) FINISHED<br> =&gt; [internal] load build definition from Dockerfile<br> =&gt; [internal] load .dockerignore<br> =&gt; [deps 1/3] FROM docker.io/library/node:22-alpine<br><br>etc.</pre><p>This should have pushed your image to ECR and now we can use it with our AWS EKS cluster 🚀</p><p>Result:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Ohy3TLHQln4Jw0MN_eIYcg.png" /></figure><p>And:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9ziB9Lp4Rej6dV4uY0-1AA.png" /></figure><h3>6. Kubernetes Manifests</h3><p>Now we need to create a K8s manifest — make the k8s directory at the top level of our app (same level as app directory)</p><pre># make this k8s dir at the top level of our app<br><br>mkdir -p k8s<br>cd k8s</pre><h4>6.1 Namespace</h4><p>Create file: k8s/namespace.yaml</p><pre># =============================================================================<br># NAMESPACE<br># =============================================================================<br># Namespaces provide isolation and organization for Kubernetes resources.<br># Similar to ECS clusters providing logical grouping.<br># =============================================================================<br>apiVersion: v1<br>kind: Namespace<br>metadata:<br>  name: video-app<br>  labels:<br>    app: video-app<br>    environment: dev<br>    project: eks-video-tutorial</pre><h4>6.2 Namespace Resource Management (Optional)</h4><p>Namespaces provide isolation, but without limits, one application can consume all cluster resources. ResourceQuota and LimitRange prevent this.</p><ul><li>When I was reviewing the full article after originally writing it, I realized this would be a good subtopic that I had missed — so I have added it now, but it is purely <strong>optional</strong>.</li><li>For this tutorial Part 4, ResourceQuota and LimitRange are optional. Our video-app deployment already specifies some resource requests/limits. However there are team/collaboration benefits to breaking these out.</li></ul><p><strong>Problems to solve:</strong></p><ul><li>Pod w/o limits consumes all node memory (<strong>LimitRange</strong> sets defaults)</li><li>One namespace monopolizes cluster (<strong>ResourceQuota</strong> caps total usage)</li><li>Container OOMKilled unexpectedly (<strong>LimitRange</strong> enforces min memory)</li><li>Cost runaway from too many pods (<strong>ResourceQuota</strong> limits pod count)</li></ul><p><strong>ResourceQuota</strong></p><p>Limits <strong>total</strong> resources a namespace can consume. Think of it as a budget for the namespace.</p><p>Create file: k8s/resourcequota.yaml</p><pre># =============================================================================<br># RESOURCEQUOTA<br># =============================================================================<br># Sets hard limits on total resources consumed by all pods in a namespace.<br>#<br># ECS Comparison:<br># - Similar to Service Quotas or capacity provider limits<br># - Prevents one service from monopolizing cluster capacity<br># =============================================================================<br>apiVersion: v1<br>kind: ResourceQuota<br>metadata:<br>  name: video-app-quota<br>  namespace: video-app<br>spec:<br>  hard:<br>    # Compute limits<br>    requests.cpu: &quot;2&quot;           # Total CPU requests across all pods<br>    requests.memory: 2Gi        # Total memory requests<br>    limits.cpu: &quot;4&quot;             # Total CPU limits<br>    limits.memory: 4Gi          # Total memory limits<br><br>    # Object count limits<br>    pods: &quot;20&quot;                  # Max pods in namespace<br>    services: &quot;5&quot;               # Max services<br>    persistentvolumeclaims: &quot;5&quot; # Max PVCs</pre><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>LimitRange</h3><p>Sets <strong>default</strong> and <strong>per-container</strong> limits. Ensures every container has resource constraints.</p><p>Create file: k8s/limitrange.yaml</p><pre># =============================================================================<br># LIMITRANGE<br># =============================================================================<br># Sets default resource requests/limits for containers that don&#39;t specify them.<br># Also enforces min/max constraints per container.<br>#<br># ECS Comparison:<br># - Similar to Task Definition CPU/memory settings<br># - LimitRange provides defaults; ResourceQuota caps the total<br># =============================================================================<br>apiVersion: v1<br>kind: LimitRange<br>metadata:<br>  name: video-app-limits<br>  namespace: video-app<br>spec:<br>  limits:<br>    - type: Container<br>      # Defaults applied when container doesn&#39;t specify<br>      default:<br>        cpu: 500m<br>        memory: 512Mi<br>      defaultRequest:<br>        cpu: 100m<br>        memory: 128Mi<br>      # Hard constraints per container<br>      max:<br>        cpu: &quot;2&quot;<br>        memory: 2Gi<br>      min:<br>        cpu: 50m<br>        memory: 64Mi</pre><p><strong>Apply Resource Controls</strong></p><pre># Apply quota and limits (after namespace exists)<br>kubectl apply -f k8s/resourcequota.yaml<br>kubectl apply -f k8s/limitrange.yaml<br><br># Verify<br>kubectl describe resourcequota video-app-quota -n video-app<br>kubectl describe limitrange video-app-limits -n video-app</pre><h4>6.3 Deployment</h4><p>Create file: k8s/deployment.yaml</p><pre># =============================================================================<br># DEPLOYMENT<br># =============================================================================<br># A Deployment manages ReplicaSets and provides declarative updates for Pods.<br>#<br># ECS Comparison:<br># - Deployment ≈ ECS Service + Task Definition combined<br># - replicas ≈ desired count in ECS Service<br># - template ≈ Task Definition (container specs)<br># - Kubernetes handles rolling updates automatically (like ECS deployments)<br># =============================================================================<br>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: video-app<br>  namespace: video-app<br>  labels:<br>    app: video-app<br>    version: v1.0.0<br>spec:<br>  # ---------------------------------------------------------------------------<br>  # Replica Configuration<br>  # ---------------------------------------------------------------------------<br>  # HA Lesson: Always run at least 2-3 replicas in production!<br>  # A single pod is a single point of failure.<br>  replicas: 3<br><br>  # ---------------------------------------------------------------------------<br>  # Update Strategy<br>  # ---------------------------------------------------------------------------<br>  # RollingUpdate ensures zero-downtime deployments<br>  strategy:<br>    type: RollingUpdate<br>    rollingUpdate:<br>      maxSurge: 1        # Allow 1 extra pod during update<br>      maxUnavailable: 0  # Never reduce below desired count<br><br>  # ---------------------------------------------------------------------------<br>  # Pod Selector<br>  # ---------------------------------------------------------------------------<br>  selector:<br>    matchLabels:<br>      app: video-app<br><br>  # ---------------------------------------------------------------------------<br>  # Pod Template<br>  # ---------------------------------------------------------------------------<br>  template:<br>    metadata:<br>      labels:<br>        app: video-app<br>        version: v1.0.0<br>      annotations:<br>        # Force pod restart on config changes (optional)<br>        kubectl.kubernetes.io/restartedAt: &quot;&quot;<br>    spec:<br>      # -----------------------------------------------------------------------<br>      # Topology Spread Constraints (HA)<br>      # -----------------------------------------------------------------------<br>      # Spread pods across availability zones for high availability<br>      # If one AZ fails, pods in other AZs continue serving traffic<br>      topologySpreadConstraints:<br>        - maxSkew: 1<br>          topologyKey: topology.kubernetes.io/zone<br>          whenUnsatisfiable: ScheduleAnyway<br>          labelSelector:<br>            matchLabels:<br>              app: video-app<br><br>      # -----------------------------------------------------------------------<br>      # Containers<br>      # -----------------------------------------------------------------------<br>      containers:<br>        - name: video-app<br>          # IMPORTANT: Replace with your ECR URL!<br>          # Run: terraform output ecr_repository_url<br>          image: REPLACE_WITH_ECR_URL:latest<br>          imagePullPolicy: Always<br><br>          # Port configuration<br>          ports:<br>            - name: http<br>              containerPort: 3000<br>              protocol: TCP<br><br>          # -------------------------------------------------------------------<br>          # Environment Variables<br>          # -------------------------------------------------------------------<br>          env:<br>            - name: NODE_ENV<br>              value: &quot;production&quot;<br>            - name: PORT<br>              value: &quot;3000&quot;<br>            # Inject pod name for debugging (see /api/info endpoint)<br>            - name: POD_NAME<br>              valueFrom:<br>                fieldRef:<br>                  fieldPath: metadata.name<br>            # Inject node name to see which node the pod runs on<br>            - name: NODE_NAME<br>              valueFrom:<br>                fieldRef:<br>                  fieldPath: spec.nodeName<br><br>          # -------------------------------------------------------------------<br>          # Resource Limits<br>          # -------------------------------------------------------------------<br>          # Always set resource requests and limits!<br>          # Requests: Guaranteed resources for scheduling<br>          # Limits: Maximum resources the container can use<br>          resources:<br>            requests:<br>              cpu: &quot;100m&quot;      # 0.1 CPU cores<br>              memory: &quot;128Mi&quot;  # 128 MB RAM<br>            limits:<br>              cpu: &quot;500m&quot;      # 0.5 CPU cores max<br>              memory: &quot;256Mi&quot;  # 256 MB RAM max<br><br>          # -------------------------------------------------------------------<br>          # Health Checks (CRITICAL for HA!)<br>          # -------------------------------------------------------------------<br>          <br>          # Readiness Probe: Is the pod ready to receive traffic?<br>          # Kubernetes only sends traffic to pods that pass this check<br>          readinessProbe:<br>            httpGet:<br>              path: /health<br>              port: 3000<br>            initialDelaySeconds: 5   # Wait 5s before first check<br>            periodSeconds: 5         # Check every 5s<br>            timeoutSeconds: 3        # Timeout after 3s<br>            successThreshold: 1      # 1 success = ready<br>            failureThreshold: 3      # 3 failures = not ready<br><br>          # Liveness Probe: Is the pod alive and functioning?<br>          # Kubernetes restarts pods that fail this check<br>          livenessProbe:<br>            httpGet:<br>              path: /health<br>              port: 3000<br>            initialDelaySeconds: 15  # Wait 15s before first check<br>            periodSeconds: 20        # Check every 20s<br>            timeoutSeconds: 3        # Timeout after 3s<br>            successThreshold: 1      # 1 success = alive<br>            failureThreshold: 3      # 3 failures = restart pod<br><br>          # -------------------------------------------------------------------<br>          # Security Context<br>          # -------------------------------------------------------------------<br>          securityContext:<br>            readOnlyRootFilesystem: false  # App needs to write logs<br>            runAsNonRoot: true<br>            runAsUser: 1001<br>            allowPrivilegeEscalation: false<br><br>      # -----------------------------------------------------------------------<br>      # Pod-level Settings<br>      # -----------------------------------------------------------------------<br>      <br>      # Termination grace period - time to finish requests before killing pod<br>      terminationGracePeriodSeconds: 30<br><br>      # Restart policy (always restart failed containers)<br>      restartPolicy: Always<br><br>      # DNS policy<br>      dnsPolicy: ClusterFirst</pre><h3>6.4 Service (LoadBalancer)</h3><p>Create file: k8s/service.yaml</p><pre># =============================================================================<br># SERVICE (LoadBalancer)<br># =============================================================================<br># A Service provides a stable endpoint for accessing pods.<br># LoadBalancer type creates an AWS Network Load Balancer (NLB) automatically.<br>#<br># ECS Comparison:<br># - Service ≈ ECS Service with ALB/NLB integration<br># - LoadBalancer type ≈ Attaching a load balancer to ECS Service<br># - Kubernetes automatically registers/deregisters pod IPs<br># =============================================================================<br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: video-app<br>  namespace: video-app<br>  labels:<br>    app: video-app<br>  annotations:<br>    # AWS-specific annotations for NLB<br>    service.beta.kubernetes.io/aws-load-balancer-type: &quot;nlb&quot;<br>    service.beta.kubernetes.io/aws-load-balancer-scheme: &quot;internet-facing&quot;<br>    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: &quot;true&quot;<br>spec:<br>  type: LoadBalancer<br><br>  # Traffic policy - Local means traffic stays on the same node when possible<br>  # This can improve performance and preserve client IP<br>  externalTrafficPolicy: Cluster<br><br>  # Port configuration<br>  ports:<br>    - name: http<br>      port: 80           # External port (what users access)<br>      targetPort: 3000   # Container port (where app listens)<br>      protocol: TCP<br><br>  # Pod selector - matches pods with these labels<br>  selector:<br>    app: video-app</pre><h4>6.5 Update Deployment with Your ECR URL</h4><p>Before deploying, update the image URL in deployment.yaml:</p><pre># Get your ECR URL<br>cd ../environments/dev<br>ECR_URL=$(terraform output -raw ecr_repository_url)<br>echo &quot;Your ECR URL: $ECR_URL&quot;<br><br># output<br>Your ECR URL:  xxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app<br><br># Update the deployment file<br>cd ../../k8s<br><br># manually edit deployment.yaml and replace <br># REPLACE_WITH_ECR_URL with your actual ECR URL from above.<br># replace in the .bak too<br><br># So your deployment.yaml and .bak files should say something like:<br>      containers:<br>        - name: video-app<br>          # IMPORTANT: Replace with your ECR URL!<br>          # Run: terraform output ecr_repository_url<br>          image: xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/eks-video-tutorial-video-app:latest<br></pre><h3>7. Deploy to EKS</h3><p>Now you should be still in the k8s directory, if not, get there</p><pre><br># Apply namespace first<br>kubectl apply -f namespace.yaml<br><br># output<br>namespace/video-app created<br><br># Apply deployment<br>kubectl apply -f deployment.yaml<br><br># output<br>deployment.apps/video-app created<br><br># Apply service<br>kubectl apply -f service.yaml<br><br># output<br>service/video-app created</pre><p><strong>7.1 Verify Deployment</strong></p><p>Check pods:</p><pre>kubectl get pods -n video-app -o wide<br><br># output<br>NAME                       READY   STATUS             RESTARTS      AGE   IP           NODE                         NOMINATED NODE   READINESS GATES<br>video-app-6498b5dd57-ckvwn   1/1     Running   0          16s   10.0.2.176   ip-10-0-2-6.ec2.internal     &lt;none&gt;           &lt;none&gt;<br>video-app-6498b5dd57-q6jrj   1/1     Running   0          16s   10.0.1.112   ip-10-0-1-110.ec2.internal   &lt;none&gt;           &lt;none&gt;</pre><p>Check pods are in different AZs:</p><pre>kubectl get pods -n video-app -o wide --show-labels | grep -E &quot;NAME|video-app&quot;<br><br># output<br><br>NAME                       READY   STATUS             RESTARTS      AGE    IP           NODE                         NOMINATED NODE   READINESS GATES   LABELS<br>video-app-6498b5dd57-ckvwn   1/1     Running   0          4m47s   10.0.2.176   ip-10-0-2-6.ec2.internal     &lt;none&gt;           &lt;none&gt;            app=video-app,pod-template-hash=6498b5dd57,version=v1.0.0<br>video-app-6498b5dd57-g5b8t   0/1     Running   0          8s      10.0.3.145   ip-10-0-3-7.ec2.internal     &lt;none&gt;           &lt;none&gt;            app=video-app,pod-template-hash=6498b5dd57,version=v1.0.0<br>video-app-6498b5dd57-q6jrj   1/1     Running   0          4m47s   10.0.1.112   ip-10-0-1-110.ec2.internal   &lt;none&gt;           &lt;none&gt;            app=video-app,pod-template-hash=6498b5dd57,version=v1.0.0</pre><p><strong>Check service:</strong></p><pre>kubectl get service -n video-app<br><br># output<br>NAME        TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)        AGE<br>video-app   LoadBalancer   172.20.xxx.xx   xxxxxxxxxxxx.elb.us-east-1.amazonaws.com   80:30774/TCP   5m9s</pre><p>You can do watch mode (Press Ctrl+C to stop watching):</p><pre>kubectl get service -n video-app -w</pre><h3>8. Access Your Application!</h3><h4>8.1 Test Load Balancing</h4><p><strong>Get the Load Balancer URL:</strong></p><pre>kubectl get service video-app -n video-app -o jsonpath=&#39;{.status.loadBalancer.ingress[0].hostname}&#39;</pre><h4>⚠️Error Fix — Only needed if your image was broken, if not, skip this</h4><p>⚠️ At this point I ran into an error. You may or may not depending on where you built the local Docker image.</p><p>Although the pod was running the Docker image did not build correctly, because of my local MacOS Apple Silicon chip.</p><p>It was an easy fix to simply</p><pre>cd ../app<br><br># Get your ECR URL<br>ECR_URL=$(terraform -chdir=../environments/dev output -raw ecr_repository_url)<br><br># Login to ECR<br>aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URL<br><br># Build for AMD64 (Linux x86_64) - this is what EKS nodes run<br>docker buildx build --platform linux/amd64 -t $ECR_URL:latest -t $ECR_URL:v1.0.0 --push .</pre><p><strong>Note:</strong> The --push flag at the end of the last line builds and pushes in one command.</p><pre># Delete existing pods (deployment will recreate them)<br>kubectl delete pods -n video-app -l app=video-app<br><br>pod &quot;video-app-6498b5dd57-ckvwn&quot; deleted from video-app namespace<br>pod &quot;video-app-6498b5dd57-g5b8t&quot; deleted from video-app namespace<br>pod &quot;video-app-6498b5dd57-q6jrj&quot; deleted from video-app namespace<br><br># Watch new pods start<br>kubectl get pods -n video-app -w</pre><p>Verify:</p><pre># Check pods are running<br>kubectl get pods -n video-app<br><br>video-app-6498b5dd57-c8k97   1/1     Running   0          22s<br>video-app-6498b5dd57-nx65r   1/1     Running   0          23s<br>video-app-6498b5dd57-s4tj4   1/1     Running   0          23s<br><br># Check logs - should see the startup message now and/or GET /health<br>kubectl logs -n video-app -l app=video-app<br><br># Test with port-forward<br>POD_NAME=$(kubectl get pods -n video-app -o jsonpath=&#39;{.items[0].metadata.name}&#39;)<br><br>kubectl port-forward -n video-app $POD_NAME 8080:3000</pre><p>note: the last command is should lock that window until you do CTRL-C you want that process running.</p><p>Then open a new terminal window:</p><pre>curl http://localhost:8080/health<br><br># output<br>{&quot;status&quot;:&quot;healthy&quot;,&quot;timestamp&quot;:&quot;2025-12-14T21:22:58.782Z&quot;,&quot;hostname&quot;:&quot;video-app-6498b5dd57-c8k97&quot;,&quot;uptime&quot;:141.854259452}</pre><h3>9. Verify High Availability</h3><p>Part of the reason we are doing this tutorial series is to demonstrate high availability cloud engineering skills, and we will be doing a lot more soon, now that we have basic infra.</p><p>Let’s first verify our load balance is set up properly:</p><pre># Set the variable<br>LB_URL=$(kubectl get service video-app -n video-app -o jsonpath=&#39;{.status.loadBalancer.ingress[0].hostname}&#39;)<br><br># Verify it&#39;s set<br>echo &quot;Load Balancer URL: $LB_URL&quot;</pre><h4>9.1 Check Pod Distribution</h4><p>Now lets make sure we’re in several Availability Zones (AZ) so if there is an outage in one AZ, we can divert traffic to another AZ.</p><pre># Show which AZ each pod is in<br>kubectl get nodes --show-labels | grep topology.kubernetes.io/zone<br><br># This will output verbose info with the AZ at the end of each entry</pre><h4>9.2 Check Health Endpoints</h4><p>Next, we need to have proper health checks, or else how do we know the AZ went down?</p><pre># Health check<br>curl http://$LB_URL/health<br><br># output<br>&quot;status&quot;:&quot;healthy&quot;,&quot;timestamp&quot;:&quot;2025-12-14T21:33:06.048Z&quot;,&quot;hostname&quot;:&quot;video-app-6498b5dd57-nx65r&quot;,&quot;uptime&quot;:749.358247182}<br><br># Readiness check<br>curl http://$LB_URL/ready<br><br># output<br>{&quot;status&quot;:&quot;ready&quot;,&quot;timestamp&quot;:&quot;2025-12-14T21:33:21.120Z&quot;}<br><br># Server info<br>curl http://$LB_URL/api/info<br><br># output<br>{&quot;app&quot;:&quot;eks-video-app&quot;,&quot;version&quot;:&quot;1.0.0&quot;,&quot;hostname&quot;:&quot;video-app-6498b5dd57-c8k97&quot;,&quot;platform&quot;:&quot;linux&quot;,&quot;nodeVersion&quot;:&quot;v22.21.1&quot;,&quot;environment&quot;:&quot;production&quot;,&quot;podName&quot;:&quot;video-app-6498b5dd57-c8k97&quot;,&quot;nodeName&quot;:&quot;ip-10-0-1-110.ec2.internal&quot;,&quot;timestamp&quot;:&quot;2025-12-14T21:33:32.040Z&quot;}</pre><h4>9.3 Test Pod Recovery</h4><p>Finally, we are ready for a basic HA test!</p><p>Delete one pod and watch Kubernetes recreate it:</p><pre># Get pod names<br>kubectl get pods -n video-app<br><br>NAME                       READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-c8k97   1/1     Running   0          13m<br>video-app-6498b5dd57-nx65r   1/1     Running   0          13m<br>video-app-6498b5dd57-s4tj4   1/1     Running   0          13m<br></pre><h4><strong>9.4 Delete one pod (replace with actual pod name)</strong></h4><pre>kubectl delete pod &lt;POD_NAME&gt; -n video-app<br><br>pod &quot;video-app-6498b5dd57-c8k97&quot; deleted from video-app namespace</pre><h4>9.5 <strong>Watch it get recreated immediately:</strong></h4><pre>kubectl get pods -n video-app -w<br><br>NAME                       READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-nx65r   1/1     Running   0          14m<br>video-app-6498b5dd57-s4tj4   1/1     Running   0          14m<br>video-app-6498b5dd57-w8l9m   1/1     Running   0          22s</pre><h4>9.6 <strong>Kubernetes’ automatic self-healing</strong>:</h4><ol><li><strong>I deleted</strong> pod video-app-6498b5dd57-c8k97</li><li><strong>The Deployment controller noticed</strong> the replica count dropped from 3 to 2</li><li><strong>Kubernetes immediately created</strong> a new pod video-app-6498b5dd57-w8l9m to maintain the desired state of 3 replicas</li><li><strong>Total time</strong>: ~22 seconds from deletion to new pod running</li></ol><h4><strong>9.7 View the video</strong></h4><pre><br>cd ../environments/dev/<br><br>LB_URL=$(kubectl get service video-app -n video-app -o jsonpath=&#39;{.status.loadBalancer.ingress[0].hostname}&#39;)<br><br>echo &quot;Open in browser: http://$LB_URL&quot;<br><br>Open in browser: http://xxxxxxxxxxxx.elb.us-east-1.amazonaws.com</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AkxrKNmP1gArIlpF-mMQjA.png" /></figure><h4>9.8 Interactive Scaling Tests</h4><pre># Scale up/down (watch self-healing)<br>kubectl scale deployment video-app -n video-app --replicas=5<br>kubectl get pods -n video-app -w<br><br># output<br>NAME                         READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-m2l5f   0/1     Running   0          7s<br>video-app-6498b5dd57-nx65r   1/1     Running   0          28m<br>video-app-6498b5dd57-qz95b   0/1     Running   0          7s<br>video-app-6498b5dd57-s4tj4   1/1     Running   0          28m<br>video-app-6498b5dd57-w8l9m   1/1     Running   0          14m<br>video-app-6498b5dd57-m2l5f   1/1     Running   0          7s<br>video-app-6498b5dd57-qz95b   1/1     Running   0          8s<br><br># Scale back down<br>kubectl scale deployment video-app -n video-app --replicas=3<br><br># output<br>NAME                         READY   STATUS    RESTARTS   AGE<br>video-app-6498b5dd57-nx65r   1/1     Running   0          29m<br>video-app-6498b5dd57-s4tj4   1/1     Running   0          29m<br>video-app-6498b5dd57-w8l9m   1/1     Running   0          15m</pre><h4>9.9 Additional kubectl Commands</h4><p>🔍 <strong>Cluster-Wide Diagnostics</strong></p><pre># See all nodes with resource usage<br>kubectl top nodes<br><br># Get detailed node info (capacity, allocatable resources, conditions)<br>kubectl describe nodes<br><br># View all namespaces and what&#39;s running<br>kubectl get all --all-namespaces<br><br># Check cluster events (shows recent issues/warnings)<br>kubectl get events --all-namespaces --sort-by=&#39;.lastTimestamp&#39;</pre><p><strong>📦 Pod Diagnostics</strong></p><pre># See resource usage for your pods<br>kubectl top pods -n video-app<br><br># Get detailed pod info (events, conditions, resource limits)<br>kubectl describe pod &lt;POD_NAME&gt; -n video-app<br><br># View pod logs (live tail)<br>kubectl logs -f &lt;POD_NAME&gt; -n video-app<br><br># Get logs from all pods with a label<br>kubectl logs -l app=video-app -n video-app --tail=50<br><br># Previous container logs (if pod restarted)<br>kubectl logs &lt;POD_NAME&gt; -n video-app --previous<br><br># Execute commands inside a pod (interactive shell)<br>kubectl exec -it &lt;POD_NAME&gt; -n video-app -- /bin/sh</pre><p>🌐 <strong>Service &amp; Networking</strong></p><pre># See all services and their endpoints<br>kubectl get svc --all-namespaces<br><br># Check which pods are behind your service<br>kubectl get endpoints video-app -n video-app<br><br># Describe your LoadBalancer service (shows events, selectors)<br>kubectl describe svc video-app -n video-app<br><br># Port-forward to test a pod directly (bypasses LoadBalancer)<br>kubectl port-forward &lt;POD_NAME&gt; -n video-app 8080:3000</pre><p>🎯 <strong>Deployment &amp; ReplicaSet</strong></p><pre># See deployment status and history<br>kubectl rollout status deployment/video-app -n video-app<br>kubectl rollout history deployment/video-app -n video-app<br><br># See the ReplicaSet managing your pods<br>kubectl get rs -n video-app<br><br># Check HPA (Horizontal Pod Autoscaler) if you had one<br>kubectl get hpa -n video-app</pre><p><strong>🔐 RBAC &amp; Security</strong></p><pre># Check what permissions you have<br>kubectl auth can-i --list -n video-app<br><br># View secrets (names only for security)<br>kubectl get secrets -n video-app</pre><p>📊 <strong>Resource Quotas &amp; Limits</strong></p><pre># Check if there are resource quotas<br>kubectl get resourcequota -n video-app<br><br># See limit ranges<br>kubectl get limitrange -n video-app<br><br># View pod resource requests and limits<br>kubectl describe pod &lt;POD_NAME&gt; -n video-app | grep -A 5 &quot;Limits:\|Requests:&quot;</pre><h3>10. Cleanup</h3><pre># Delete the video app<br>kubectl delete -f ~/eks-video-tutorial/k8s/<br><br># Destroy all infrastructure<br># if not in this directory still cd<br>cd environments/dev <br><br>terraform destroy</pre><p><strong>🚨⚠️ CRITICAL</strong>: <strong>Double-check in AWS console to make sure all resources are destroyed (especially EKS cluster, ELB, EC2 instances) using </strong><strong>terraform destroy!</strong> There are cases where people didn’t see an error and it was not destroyed, and get a big bill later, so make sure! If you neglect to destroy this, there will be ongoing charges, which could be as much as $10-$20/per day.</p><h3>Looking Ahead…</h3><p><strong>Part 4:</strong> ✅ “I need my Amazon EKS cluster to handle traffic spikes and variable traffic loads”</p><p><strong>In Article 4, we’ll continue with more measures for improved AWS EKS cluster admin and video delivery with high availability:</strong></p><ol><li><strong>CloudFront CDN distribution</strong> integrated with your application</li><li><strong>Metrics Server installation </strong>(required for HPA)</li><li><strong>Horizontal Pod Autoscaler (HPA) </strong>configured for 3–10 pods</li><li><strong>Load testing setup </strong>using `hey` tool</li><li><strong>CPU-based autoscaling t</strong>riggers (70% threshold)</li><li><strong>Real-time monitoring</strong> of pod scaling behavior</li><li><strong>Experience the “Pending pods”</strong> problem (node capacity limits)</li></ol><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>.</p><p>🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p><strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Savings:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e4459ad9ecc0" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Amazon EKS (K8s) Media Cluster: Part 2— Deploy Initial Terraform Multi-AZ EKS Cluster]]></title>
            <link>https://medium.com/@csjcode/amazon-eks-k8s-media-cluster-part-2-deploy-initial-terraform-multi-az-eks-cluster-e1a87efc9925?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/e1a87efc9925</guid>
            <category><![CDATA[scalability]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[aws-certification]]></category>
            <category><![CDATA[aws-eks]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Mon, 15 Dec 2025 23:34:58 GMT</pubDate>
            <atom:updated>2025-12-18T15:24:31.358Z</atom:updated>
            <content:encoded><![CDATA[<h4>🚀Amazon EKS + Terraform, Kubernetes high availability prep for serving media from scalable clusters</h4><p>✅ “I need to deploy my basic Terraform state/structure and an Amazon EKS platform so we can prepare for a high availability media cluster”</p><p><strong>This is Part 2 of a fun ongoing project</strong> to advance with Amazon EKS skills as a Cloud Engineer pro:</p><ul><li>Master skills for <strong>production-grade EKS</strong> at scale (the #1 way companies run Kubernetes going into 2026)</li><li>Deep <strong>Infrastructure as Code</strong> with Terraform (serious companies use it)</li><li>Learn <strong>multi-AZ high availability</strong> the right way (VPC, subnets, load balancers, node placement)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*BBm6aSCEjn3WFpHe87eZOw.jpeg" /></figure><p><strong>Possible future article topics (I’m still preparing these):</strong></p><ul><li>Understand <strong>EKS-optimized node groups</strong>, Karpenter vs Cluster Autoscaler trade-offs</li><li>Work with <strong>EKS Pod Identity (Agent)</strong> — critical for secure media apps talking to S3, DynamoDB, CloudFront, etc.</li><li>Build muscle memory: <strong>kubectl, helm, kustomize, eksctl, AWS CLI</strong> daily</li><li>Set up <strong>monitoring &amp; logging</strong> foundations (CloudWatch, Prometheus)</li><li>Get comfortable with <strong>ALB/NLB Ingress</strong>, cert-manager, external-dns — exactly what media sites need</li><li>We may look later at <strong>GPU workloads</strong> (video encoding/transcoding nodes) and <strong>high-IOPS storage</strong> (EBS gp3, EFS for shared media)</li><li>If we have time may even try some other things like Amazon EKS Capabilities “a layered set of fully managed cluster features that help accelerate developer velocity” .</li></ul><p><strong>If you missed </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-1-initial-setup-roadmap-176bdb085d32"><strong>Part 1</strong></a><strong>,</strong> you do need to do the instructions in that article <strong>before you start this.</strong> I will refer to that config multiple times.</p><p><strong>✅ PART 1 </strong><a href="https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-1-initial-setup-roadmap-176bdb085d32"><strong>Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap</strong></a></p><p>It includes setup of AWS CLI, Terraform, kubectl (K8s), Docker and ECS info and some testing .</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XnLS1SS4jyC1nnpXk8FsqA.png" /><figcaption>A view from later in the article after we deployed</figcaption></figure><h3>🪧Roadmap</h3><p><strong>In this article, Article 2, we’ll build real Amazon EKS infrastructure with Terraform:</strong></p><ol><li><strong>Project Setup.</strong> Create folder structure (10m)</li><li><strong>Terraform backend. </strong>S3 bucket + DynamoDB for state locking (10m)</li><li><strong>Build a multi-AZ VPC. </strong>Subnets across 3 availability zones (10m)</li><li><strong>Provision EKS cluster. </strong>Control plane + managed node group (10m)</li><li><strong>Deploy first node group. </strong>2 × t3.small instances spread across AZs. (5m)</li><li><strong>Connect kubectl to cluster. </strong>Run kubectl get nodes and see nodes! (5m)</li><li><strong>Cleanup. Destroy all resources </strong>to avoid charges. (5m)</li></ol><p>Keep in mind, we are still in the early stages, we will get more advanced as the series continues.</p><p><strong>🔥 What you can do after this part:</strong></p><ul><li>Run kubectl get nodes and see your 3 worker nodes</li><li>Understand Multi-AZ high availability architecture</li><li>Deploy resources to your cluster using kubectl</li><li>Rebuild entire infrastructure in minutes with terraform apply</li></ul><p><strong>⚡️Technical skills gained:</strong></p><ul><li>Terraform state management</li><li>VPC networking (CIDR blocks, subnets, routing)</li><li>EKS cluster configuration</li><li>Node group management</li><li>kubectl authentication setup</li></ul><p><strong>Estimated time:</strong> 45–60 minutes hands-on</p><p><strong>💰Estimated cost:</strong> ~$1, as of writing this article, it’s about $1 or less if you complete and destroy it right away, or ~$1–2 within a few hours. Amazon EKS nodes are standard Amazon EC2 instances and load balancer, so <strong>you pay for those hourly</strong>, and also the EKS managed service.</p><h4>⚠️ <strong>IMPORTANT: Cost variables to be aware of</strong></h4><p><strong>To set up cost guardrails and AWS Budgets alerts see my articles:</strong></p><ul><li><a href="https://medium.com/cloud-cost-savings/12-aws-cost-alerts-to-use-right-now-a082795f3858"><strong>12 AWS Cost Alerts To Use Right Now</strong></a></li><li><a href="https://medium.com/cloud-cost-savings/aws-cost-savings-playbook-2-reports-tracking-heart-soul-of-cost-control-d7ff2d926e72">AWS Cost Savings Playbook (#2): REPORTS/Tracking, Heart &amp; Soul of Cost Control</a></li></ul><p><strong>1. I initially used an earlier version of Kubernetes that was in “extended support” by AWS. It costs more</strong>, ~$0.60/hour — so now I have updated the code in the current article to the newest version of <strong>Kubernetes 1.34</strong> (as I write this) which should charge approx $0.10/hour (as of writing) which should keep costs lower.</p><p><strong>2. If you keep the build up longer it will cost more</strong>, and with ongoing cost, if you do not destroy it. It may cost as much as $10–20 per day, so do not do this. Use terraform destroy and confirm in the AWS dashboard console.</p><p><strong>3. It’s your responsibility, not automatic, to remove resources </strong>so<strong> double check in the AWS console</strong> after you do <strong>terraform destroy</strong> … <strong>I had a case recently where it did not delete the ELBs each time on 3 cycles</strong> and it cost me a couple dollars extra when I figured it out a couple days later.</p><p>4. 🚨<strong>At the end of this article we </strong>will remind you to destroy resources when you’re done for the day! Always remember to do this!</p><p>🛠️ Get more like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><h3>1. Project Setup &amp; AWS CLI: Create folder structure, configure Terraform backend, AWS CLI</h3><p>We’re going to keep a project structure based on this roadmap I’m starting out with.</p><p>Familiarize yourself with some of these files and filenames, they various purposes to keep things organize.</p><pre>eks-video-tutorial/<br>├── .gitignore<br>├── environments/<br>│   └── dev/<br>│       ├── main.tf           # Root module - calls other modules<br>│       ├── variables.tf      # Input variables<br>│       ├── outputs.tf        # Output values<br>│       ├── providers.tf      # AWS provider configuration<br>│       ├── backend.tf        # S3 backend configuration<br>│       └── terraform.tfvars  # Variable values (not committed to git)<br>├── modules/                  # (Future use - Article 3+)<br>└── README.md</pre><ul><li><strong>Git. </strong>Before going too far, I recommend you create a <strong>Github repo </strong>or other git repo to track your changes. You will want to keep this code.</li><li><strong>And perhaps make a branch for each article </strong>so you can easily return to your place and keep it organized</li></ul><p><strong>This is the .gitignore file </strong>I am using (some extra ones in there!):</p><pre># ==========================<br># Terraform<br># ==========================<br>.terraform/<br>*.tfstate<br>*.tfstate.*<br>crash.log<br>crash.*.log<br>*.tfvars<br>*.tfvars.json<br>override.tf<br>override.tf.json<br>*_override.tf<br>*_override.tf.json<br>.terraformrc<br>terraform.rc<br>.terraform.lock.hcl<br><br># ==========================<br># AWS<br># ==========================<br>.aws/<br>*.pem<br>.aws-sam/<br>samconfig.toml<br><br># ==========================<br># Node.js<br># ==========================<br>node_modules/<br>npm-debug.log*<br>yarn-debug.log*<br>yarn-error.log*<br>.npm<br>.yarn-integrity<br>dist/<br>build/<br>.cache/<br><br># ==========================<br># Environment &amp; Secrets<br># ==========================<br>.env<br>.env.*<br>!.env.example<br>*.key<br>*.crt<br>secrets.json<br><br># ==========================<br># IDE &amp; Editors<br># ==========================<br>.idea/<br>.vscode/<br>*.swp<br>*.swo<br>*~<br><br># ==========================<br># OS Files<br># ==========================<br>.DS_Store<br>Thumbs.db<br><br># ==========================<br># Logs &amp; Coverage<br># ==========================<br>logs/<br>*.log<br>coverage/<br>.nyc_output/</pre><p>Put that at the top level in .gitignore so you do not check in those files.</p><p>What we are building architecture diagram:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MMcqvekQPwNpXKYcat_TBA.png" /><figcaption>What we are building in this article with Terraform and Amazon EKS (initial build-out)</figcaption></figure><h3>2. Terraform Backend: S3 bucket + DynamoDB for state locking</h3><p><strong>The Terraform state file</strong> is a record of the resources Terraform has created, modified, or destroyed in your infrastructure.</p><p><strong>Without the state file, Terraform would have no context about existing infrastructure,</strong> leading to potential duplication or errors.</p><h4>2.1 Why Remote State?</h4><p>We are going to setup a<strong> remote/cloud state file in S3 with DynamoDB.</strong></p><p>This is something I always do and is enterprise best practice.</p><p>Using local state (the default behavior, where the state is stored in a terraform.tfstate file on your machine) works fine for solo development or small projects, is not recommended for team projects.</p><p>I think you should do the remote file I show here, it only takes a few minutes and is more pro.</p><p>Downsides of a local TF state file:</p><ul><li><strong>Team. </strong>Multiple engineers might need to work on the same infrastructure. Local state means each person has their own copy, which can lead to conflicts, out-of-sync states, or accidental overwrites</li><li><strong>Security. </strong>Local state files often contain sensitive information, such as AWS resource IDs, secrets, or connection details.</li><li><strong>Backup. </strong>Cloud backups of the state file are important.</li></ul><p>What we are going to do with the<strong> remote state file:</strong></p><ol><li><strong>Use S3 to store the config.</strong> S3 provides durable, versioned, and encrypted storage.</li><li><strong>DynamoDB for state locking. </strong>This is primarily used for making sure there are no conflicts between engineers simultaneously making changes.</li></ol><p>These are important for getting the basics down with Amazon EKS and more complex setup with Terraform, so it is important to do this.</p><h4>2.3 Create state file bootstrap directory</h4><p>We have to run this one time, before the main deploy we are doing later.</p><pre>mkdir -p backend-bootstrap<br>cd backend-bootstrap<br><br># If you did not do this from Article 1 - we removed this test folder<br># just to prevent any confusion<br><br>rm -Rf eks-tutorial-test</pre><h4>2.4 Create Backend Resources</h4><p>We need to create a .tf file to create the resource required for the remote state file. We’re only running this once hopefully.</p><p>Create backend-bootstrap/main.tf</p><p>🚨⚠️ <strong>The code below uses my region,</strong> confirm/change to match yours.</p><pre><br># =============================================================================<br># TERRAFORM BACKEND BOOTSTRAP<br># =============================================================================<br># This configuration creates the S3 bucket and DynamoDB table needed to store<br># Terraform state for our main EKS project.<br>#<br># Run this ONCE, then use these resources as the backend for all other configs.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Terraform Configuration<br># -----------------------------------------------------------------------------<br>terraform {<br>  required_version = &quot;&gt;= 1.10.0&quot;<br><br>  required_providers {<br>    aws = {<br>      source  = &quot;hashicorp/aws&quot;<br>      version = &quot;~&gt; 5.0&quot;<br>    }<br>  }<br><br>  # Note: This bootstrap uses LOCAL state intentionally!<br>  # We can&#39;t use remote state to create the remote state bucket.<br>}<br><br># -----------------------------------------------------------------------------<br># AWS Provider<br>#<br># ⚠️ CHANGE the region to yours<br># -----------------------------------------------------------------------------<br>provider &quot;aws&quot; {<br>  region = &quot;us-east-1&quot;<br><br>  default_tags {<br>    tags = {<br>      Project     = &quot;eks-video-tutorial&quot;<br>      Environment = &quot;dev&quot;<br>      ManagedBy   = &quot;terraform&quot;<br>      Purpose     = &quot;terraform-backend&quot;<br>    }<br>  }<br>}<br><br># -----------------------------------------------------------------------------<br># Data Sources<br># -----------------------------------------------------------------------------<br><br># Get current AWS account ID for unique bucket naming<br>data &quot;aws_caller_identity&quot; &quot;current&quot; {}<br><br># Get current region<br>data &quot;aws_region&quot; &quot;current&quot; {}<br><br># -----------------------------------------------------------------------------<br># Local Variables<br># -----------------------------------------------------------------------------<br>locals {<br>  account_id  = data.aws_caller_identity.current.account_id<br>  region      = data.aws_region.current.name<br>  bucket_name = &quot;eks-tutorial-tfstate-${local.account_id}&quot;<br>  table_name  = &quot;eks-tutorial-terraform-locks&quot;<br>}<br><br># -----------------------------------------------------------------------------<br># S3 Bucket for Terraform State<br># -----------------------------------------------------------------------------<br><br># Create the S3 bucket<br>resource &quot;aws_s3_bucket&quot; &quot;terraform_state&quot; {<br>  bucket = local.bucket_name<br><br>  # Prevent accidental deletion of this bucket<br>  lifecycle {<br>    prevent_destroy = false  # Set to true in production!<br>  }<br><br>  tags = {<br>    Name        = local.bucket_name<br>    Description = &quot;Terraform state storage for EKS tutorial&quot;<br>  }<br>}<br><br># Enable versioning - allows recovery from bad applies or accidental deletions<br>resource &quot;aws_s3_bucket_versioning&quot; &quot;terraform_state&quot; {<br>  bucket = aws_s3_bucket.terraform_state.id<br><br>  versioning_configuration {<br>    status = &quot;Enabled&quot;<br>  }<br>}<br><br># Enable server-side encryption by default<br>resource &quot;aws_s3_bucket_server_side_encryption_configuration&quot; &quot;terraform_state&quot; {<br>  bucket = aws_s3_bucket.terraform_state.id<br><br>  rule {<br>    apply_server_side_encryption_by_default {<br>      sse_algorithm = &quot;AES256&quot;<br>    }<br>    bucket_key_enabled = true<br>  }<br>}<br><br># Block ALL public access - state files should never be public!<br>resource &quot;aws_s3_bucket_public_access_block&quot; &quot;terraform_state&quot; {<br>  bucket = aws_s3_bucket.terraform_state.id<br><br>  block_public_acls       = true<br>  block_public_policy     = true<br>  ignore_public_acls      = true<br>  restrict_public_buckets = true<br>}<br><br># Bucket policy to enforce SSL/TLS connections only<br>resource &quot;aws_s3_bucket_policy&quot; &quot;terraform_state&quot; {<br>  bucket = aws_s3_bucket.terraform_state.id<br><br>  policy = jsonencode({<br>    Version = &quot;2012-10-17&quot;<br>    Statement = [<br>      {<br>        Sid       = &quot;EnforceTLS&quot;<br>        Effect    = &quot;Deny&quot;<br>        Principal = &quot;*&quot;<br>        Action    = &quot;s3:*&quot;<br>        Resource = [<br>          aws_s3_bucket.terraform_state.arn,<br>          &quot;${aws_s3_bucket.terraform_state.arn}/*&quot;<br>        ]<br>        Condition = {<br>          Bool = {<br>            &quot;aws:SecureTransport&quot; = &quot;false&quot;<br>          }<br>        }<br>      }<br>    ]<br>  })<br>}<br><br># -----------------------------------------------------------------------------<br># DynamoDB Table for State Locking<br># -----------------------------------------------------------------------------<br><br># This table prevents concurrent Terraform runs from corrupting state<br>resource &quot;aws_dynamodb_table&quot; &quot;terraform_locks&quot; {<br>  name         = local.table_name<br>  billing_mode = &quot;PAY_PER_REQUEST&quot;  # Only pay for what you use (cheaper for low usage)<br>  hash_key     = &quot;LockID&quot;           # Required by Terraform - do not change!<br><br>  attribute {<br>    name = &quot;LockID&quot;<br>    type = &quot;S&quot;  # String type<br>  }<br><br>  tags = {<br>    Name        = local.table_name<br>    Description = &quot;Terraform state locking for EKS tutorial&quot;<br>  }<br>}<br><br># -----------------------------------------------------------------------------<br># Outputs<br># -----------------------------------------------------------------------------<br><br>output &quot;s3_bucket_name&quot; {<br>  description = &quot;Name of the S3 bucket for Terraform state&quot;<br>  value       = aws_s3_bucket.terraform_state.id<br>}<br><br>output &quot;s3_bucket_arn&quot; {<br>  description = &quot;ARN of the S3 bucket&quot;<br>  value       = aws_s3_bucket.terraform_state.arn<br>}<br><br>output &quot;s3_bucket_region&quot; {<br>  description = &quot;Region of the S3 bucket&quot;<br>  value       = local.region<br>}<br><br>output &quot;dynamodb_table_name&quot; {<br>  description = &quot;Name of the DynamoDB table for state locking&quot;<br>  value       = aws_dynamodb_table.terraform_locks.name<br>}<br><br>output &quot;dynamodb_table_arn&quot; {<br>  description = &quot;ARN of the DynamoDB table&quot;<br>  value       = aws_dynamodb_table.terraform_locks.arn<br>}<br><br># Output the backend configuration block to copy into other projects<br>output &quot;backend_config&quot; {<br>  description = &quot;Backend configuration to use in other Terraform projects&quot;<br>  value       = &lt;&lt;-EOT<br><br>    # Copy this into your backend.tf file:<br>    terraform {<br>      backend &quot;s3&quot; {<br>        bucket         = &quot;${aws_s3_bucket.terraform_state.id}&quot;<br>        key            = &quot;environments/dev/terraform.tfstate&quot;<br>        region         = &quot;${local.region}&quot;<br>        dynamodb_table = &quot;${aws_dynamodb_table.terraform_locks.name}&quot;<br>        encrypt        = true<br>      }<br>    }<br><br>  EOT<br>}</pre><p>⚠️ <strong>The code uses my region,</strong> confirm/change code to match yours.</p><h4>2.5 Make sure your AWS CLI is configured</h4><p>Terraform is using your AWS CLI to create resource.</p><p>Therefore, you need to make sure you are logged in correctly.</p><pre># login to to IAM user for the CLI<br><br># if you cannot remember the name of yours<br>aws configure list-profiles<br><br># use your tutorial creds setup earlier (I used profile &quot;terraform-eks-admin&quot;)<br>export AWS_PROFILE=terraform-eks-admin<br><br># verify what account you are in - if issues see Part 1 article.<br>aws sts get-caller-identity<br><br># output<br><br>{<br>    &quot;UserId&quot;: &quot;xxxxxxxxxxxx&quot;,<br>    &quot;Account&quot;: &quot;xxxxxxxxxxxxx&quot;,<br>    &quot;Arn&quot;: &quot;arn:aws:iam::xxxxxxxxxxxxx:user/terraform-eks-admin&quot;<br>}<br><br></pre><h4>2.6 Apply Backend Bootstrap</h4><ul><li>Commands to run: terraform init, terraform apply</li><li>Expected output description</li><li>Note the bucket name for next step</li></ul><pre><br># Reminder of what Terraform I am using<br>terraform -v<br>Terraform v1.14.1<br><br># Make sure you are in the backend-bootstrap directory<br>$ cd ./backend-bootstrap/<br><br># Initialize Terraform (do this again, we deleted the one from last article) <br>terraform init<br><br># Output<br><br>Initializing the backend...<br>Initializing provider plugins...<br>- Finding hashicorp/aws versions matching &quot;~&gt; 5.0&quot;...<br>- Installing hashicorp/aws v5.100.0...<br>- Installed hashicorp/aws v5.100.0 (signed by HashiCorp)<br>Terraform has created a lock file .terraform.lock.hcl to record the provider<br>selections it made above. Include this file in your version control repository<br>so that Terraform can guarantee to make the same selections by default when<br>you run &quot;terraform init&quot; in the future.<br><br>Terraform has been successfully initialized!</pre><p>Next we need to run terraform plan which runs through our code and basically is a “preview” that checks if there are any errors in the code.</p><p>Remember that you need to be in the same backend-bootstrap directory still!</p><p>You will see some output like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/708/1*zUYmXSr6vQ4_oIZVLlZpqg.png" /><figcaption>Near the top</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/502/1*jJsfo0APSjsIye31jZ4qxQ.png" /><figcaption>At the bottom of the output</figcaption></figure><p>Next we will use terraform apply which is required to actually create the resources.</p><pre>terraform apply<br><br># when prompted input yes to continue<br><br>Do you want to perform these actions?<br>  Terraform will perform the actions described above.<br>  Only &#39;yes&#39; will be accepted to approve.<br><br>  Enter a value: yes<br><br># Output<br><br># You will see all the resources being created and at the end <br><br>Apply complete! Resources: 6 added, 0 changed, 0 destroyed.</pre><p><strong>Did you get an error?</strong> A funny thing happened to me while writing this, I actually had several CLIs open for various aspects of this project, but one was not updated to use the new EKS admin account we created in the previous article so it launched resources in <em>another</em> account!</p><p>Check your account: aws sts get-caller-identity</p><p><strong>Good news! It’s super-easy to fix this mistake if you did too—</strong> just use terraform destroy and make sure to do that from the <strong>same CLI</strong> that was the wrong account, and it will tear down the resources that were placed in that wrong account.</p><p>Then… switch to your cli in the correct account and re-run terraform apply</p><p>More error troubleshooting is below.</p><h4>2.7 Error Troubleshooting</h4><p>If you saw an <strong>error</strong> at any point, it could be due to the following:</p><ul><li><strong>Make sure your AWS CLI is using the correct profile.</strong> That is themost common issue. Terraform is using that. Logged into the incorrect account? Change AWS CLI profiles. (see last article in this series)</li><li><strong>If checking the Amazon Console UI for resources,</strong> remember to be in the correct region for S3 and DDB.</li><li><strong>Resources created in wrong region. </strong>The code above uses my region, you may need to change it to match yours.</li><li><strong>No valid credential sources found for AWS Provider.</strong> Run aws configure to set up credentials.</li><li><strong>ExpiredToken or InvalidClientTokenId. </strong>Your AWS credentials have expired (common with SSO/temporary credentials) — re-login.</li><li><strong>creating S3 Bucket: BucketAlreadyExists. </strong>The account ID suffix usually prevents this, but if it happens, modify local.bucket_name to add a unique identifier.</li><li><strong>AccessDenied or UnauthorizedAccess. </strong>Your IAM user/role needs access to S3. Make sure your AWS CLI is using the correct profile.</li><li><strong>Unsupported Terraform Core version. </strong>Check your terraform -v you need a recent version of terraform for this code I gave.</li><li><strong>Error acquiring the state lock.</strong> Happens when other Terraform instances are already running locally.</li></ul><h4>2.8 Review Resource in AWS Console</h4><p>Lets just check quickly in the console to make sure we see our resources</p><ul><li>Make sure you are looking in the correct region in the UI!</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*N0upfliFq1REbyZEcCfO_w.png" /><figcaption>this is in us-east-1 for me</figcaption></figure><p>⚠️Verify billing mode is “On-demand”</p><p>Estimated costs for this (may vary based on factors liek region)</p><ul><li><strong>S3:</strong> ~$0.023/GB/month for storage + $0.0004 per 1,000 requests</li><li><strong>DynamoDB (On-Demand):</strong> ~$0.25 per million writes, $0.25 per million reads</li></ul><p>And lets see S3</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*gXgfQO56JoS3RxkDTCVE4Q.png" /></figure><p>Good to go, lets continue!</p><h4>2.9 Copy Backend Config</h4><p>You may have seen in the output of apply that we echo output of the backend config. well be needing that … you can get it gaina here:</p><pre><br>terraform output backend_config<br><br># Copy this into your backend.tf file:<br>terraform {<br>  backend &quot;s3&quot; {<br>    bucket         = &quot;eks-tutorial-tfstate-[your account id]&quot;<br>    key            = &quot;environments/dev/terraform.tfstate&quot;<br>    region         = &quot;us-east-1&quot;<br>    dynamodb_table = &quot;eks-tutorial-terraform-locks&quot;<br>    encrypt        = true<br>  }<br>}</pre><p>Also as another check let’s make sure you still have a local state file</p><pre># Check that the state file exists<br>ls -la ~/eks-video-tutorial/backend-bootstrap/<br><br># You should see this in the backend-bootstrap dir:<br><br>-rw-r--r--  main.tf<br>-rw-r--r--  terraform.tfstate<br>-rw-r--r--  terraform.tfstate.backup<br>drwxr-xr-x  .terraform/<br>-rw-r--r--  .terraform.lock.hcl</pre><p>Now we will set up the dev env for our main Terraform resources for our Multi-AZ VPC….</p><h3>3. VPC Infrastructure: Multi-AZ VPC with public/private subnets</h3><h4>3.1 Create the Project Directory Structure</h4><pre># Create the environments/dev directory - FROM your project root!<br><br># swithc to your project root and then do this:<br>mkdir -p environments/dev<br>cd environments/dev</pre><h4>3.2 Create the Project Directory Structure</h4><p>Next I want to configure Terraform to use the S3 backend we just created.</p><p>Confirm your cli is still in the correct account aws sts get-caller-identity</p><p>File to create: environments/dev/backend.tf</p><p>You need to copy in the info you got above from the output of the previous apply:</p><pre># =============================================================================<br># TERRAFORM BACKEND CONFIGURATION<br># =============================================================================<br># This configures Terraform to store state in S3 with DynamoDB locking.<br># The S3 bucket and DynamoDB table were created by the backend-bootstrap config.<br>#<br># IMPORTANT: Update the bucket name to match YOUR account ID!<br># =============================================================================<br><br>terraform {<br>  backend &quot;s3&quot; {<br>    # S3 bucket name - replace XXXXXXXXXXXX with your AWS account ID<br>    # Or copy the exact bucket name from your backend-bootstrap output<br>    bucket = &quot;eks-tutorial-tfstate-XXXXXXXXXXXX&quot;<br><br>    # Path within the bucket for this environment&#39;s state file<br>    key = &quot;environments/dev/terraform.tfstate&quot;<br><br>    # Region where the bucket exists<br>    region = &quot;us-east-1&quot;<br><br>    # DynamoDB table for state locking<br>    dynamodb_table = &quot;eks-tutorial-terraform-locks&quot;<br><br>    # Encrypt state file at rest<br>    encrypt = true<br>  }<br>}</pre><p>When that is done continue.</p><h4>3.3 Provider Configuration</h4><p>Create file for all providers AWS and EKS we need to for deploying EKS environments/dev/providers.tf</p><pre># =============================================================================<br># TERRAFORM AND PROVIDER CONFIGURATION<br># =============================================================================<br># Configures Terraform version requirements and all providers needed for<br># deploying EKS infrastructure.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Terraform Settings<br># -----------------------------------------------------------------------------<br>terraform {<br>  required_version = &quot;&gt;= 1.5.0&quot;<br><br>  required_providers {<br>    # AWS Provider - for all AWS resources<br>    aws = {<br>      source  = &quot;hashicorp/aws&quot;<br>      version = &quot;~&gt; 5.0&quot;<br>    }<br><br>    # Kubernetes Provider - for K8s resources (used later)<br>    kubernetes = {<br>      source  = &quot;hashicorp/kubernetes&quot;<br>      version = &quot;~&gt; 2.23&quot;<br>    }<br><br>    # TLS Provider - for certificate handling<br>    tls = {<br>      source  = &quot;hashicorp/tls&quot;<br>      version = &quot;~&gt; 4.0&quot;<br>    }<br><br>    # Time Provider - for adding delays when needed<br>    time = {<br>      source  = &quot;hashicorp/time&quot;<br>      version = &quot;~&gt; 0.9&quot;<br>    }<br>  }<br>}<br><br># -----------------------------------------------------------------------------<br># AWS Provider<br># -----------------------------------------------------------------------------<br>provider &quot;aws&quot; {<br>  region = var.aws_region<br><br>  # Default tags applied to ALL resources created by this configuration<br>  # This is a best practice for cost tracking and resource management<br>  default_tags {<br>    tags = {<br>      Project     = var.project_name<br>      Environment = var.environment<br>      ManagedBy   = &quot;terraform&quot;<br>      Repository  = &quot;eks-video-tutorial&quot;<br>    }<br>  }<br>}<br><br># -----------------------------------------------------------------------------<br># Kubernetes Provider<br># -----------------------------------------------------------------------------<br># This provider is configured to authenticate with our EKS cluster.<br># It uses the AWS CLI to get a token for authentication.<br>#<br># NOTE: This will show a warning during the first run because the cluster<br># doesn&#39;t exist yet. This is normal and expected.<br># -----------------------------------------------------------------------------<br>provider &quot;kubernetes&quot; {<br>  # Only configure if the cluster exists<br>  host                   = try(module.eks.cluster_endpoint, null)<br>  cluster_ca_certificate = try(base64decode(module.eks.cluster_certificate_authority_data), null)<br><br>  # Use AWS CLI to get authentication token<br>  exec {<br>    api_version = &quot;client.authentication.k8s.io/v1beta1&quot;<br>    command     = &quot;aws&quot;<br>    args = [<br>      &quot;eks&quot;,<br>      &quot;get-token&quot;,<br>      &quot;--cluster-name&quot;,<br>      var.cluster_name,<br>      &quot;--region&quot;,<br>      var.aws_region<br>    ]<br>  }<br>}</pre><h4>3.4 Variables Definition</h4><p>This file defines all input variables for our infrastructure.</p><p>Create the file environments/dev/variables.tf</p><pre># =============================================================================<br># INPUT VARIABLES<br># =============================================================================<br># These variables allow customization of the infrastructure.<br># Default values are set for the tutorial, but can be overridden in<br># terraform.tfvars or via command line.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># General Configuration<br># -----------------------------------------------------------------------------<br><br>variable &quot;aws_region&quot; {<br>  description = &quot;AWS region to deploy resources&quot;<br>  type        = string<br>  default     = &quot;us-east-1&quot;<br>}<br><br>variable &quot;project_name&quot; {<br>  description = &quot;Name of the project - used for resource naming and tagging&quot;<br>  type        = string<br>  default     = &quot;eks-video-tutorial&quot;<br>}<br><br>variable &quot;environment&quot; {<br>  description = &quot;Environment name (dev, staging, prod)&quot;<br>  type        = string<br>  default     = &quot;dev&quot;<br>}<br><br># -----------------------------------------------------------------------------<br># VPC Configuration<br># -----------------------------------------------------------------------------<br><br>variable &quot;vpc_cidr&quot; {<br>  description = &quot;CIDR block for the VPC&quot;<br>  type        = string<br>  default     = &quot;10.0.0.0/16&quot;<br><br>  validation {<br>    condition     = can(cidrhost(var.vpc_cidr, 0))<br>    error_message = &quot;VPC CIDR must be a valid IPv4 CIDR block.&quot;<br>  }<br>}<br><br>variable &quot;availability_zones&quot; {<br>  description = &quot;List of availability zones to use for subnets&quot;<br>  type        = list(string)<br>  default     = [&quot;us-east-1a&quot;, &quot;us-east-1b&quot;, &quot;us-east-1c&quot;]<br><br>  validation {<br>    condition     = length(var.availability_zones) &gt;= 2<br>    error_message = &quot;At least 2 availability zones are required for high availability.&quot;<br>  }<br>}<br><br>variable &quot;private_subnet_cidrs&quot; {<br>  description = &quot;CIDR blocks for private subnets (one per AZ)&quot;<br>  type        = list(string)<br>  default     = [&quot;10.0.1.0/24&quot;, &quot;10.0.2.0/24&quot;, &quot;10.0.3.0/24&quot;]<br>}<br><br>variable &quot;public_subnet_cidrs&quot; {<br>  description = &quot;CIDR blocks for public subnets (one per AZ)&quot;<br>  type        = list(string)<br>  default     = [&quot;10.0.101.0/24&quot;, &quot;10.0.102.0/24&quot;, &quot;10.0.103.0/24&quot;]<br>}<br><br>variable &quot;enable_nat_gateway&quot; {<br>  description = &quot;Enable NAT Gateway for private subnet internet access&quot;<br>  type        = bool<br>  default     = true<br>}<br><br>variable &quot;single_nat_gateway&quot; {<br>  description = &quot;Use a single NAT Gateway (cost savings for dev, not HA)&quot;<br>  type        = bool<br>  default     = true  # Set to false in production for HA<br>}<br><br># -----------------------------------------------------------------------------<br># EKS Cluster Configuration<br># -----------------------------------------------------------------------------<br><br>variable &quot;cluster_name&quot; {<br>  description = &quot;Name of the EKS cluster&quot;<br>  type        = string<br>  default     = &quot;eks-video-cluster&quot;<br><br>  validation {<br>    condition     = can(regex(&quot;^[a-zA-Z][a-zA-Z0-9-]*$&quot;, var.cluster_name))<br>    error_message = &quot;Cluster name must start with a letter and contain only alphanumeric characters and hyphens.&quot;<br>  }<br>}<br><br>variable &quot;cluster_version&quot; {<br>  description = &quot;Kubernetes version for the EKS cluster&quot;<br>  type        = string<br>  default     = &quot;1.33&quot; # use a recent one, older legacy versions on AWS get charged more &quot;extended support&quot;<br><br>  validation {<br>    condition     = can(regex(&quot;^1\\.(2[89]|3[0-5])$&quot;, var.cluster_version)) # if later version make sure to update regex<br>    error_message = &quot;Cluster version must be a supported EKS version&quot;<br>  }<br>}<br><br>variable &quot;cluster_endpoint_public_access&quot; {<br>  description = &quot;Enable public access to the EKS API endpoint&quot;<br>  type        = bool<br>  default     = true  # Required for kubectl access from your machine<br>}<br><br>variable &quot;cluster_endpoint_private_access&quot; {<br>  description = &quot;Enable private access to the EKS API endpoint&quot;<br>  type        = bool<br>  default     = true  # Allows nodes to communicate with control plane privately<br>}<br><br># -----------------------------------------------------------------------------<br># EKS Node Group Configuration<br># -----------------------------------------------------------------------------<br><br>variable &quot;node_instance_types&quot; {<br>  description = &quot;List of EC2 instance types for the node group&quot;<br>  type        = list(string)<br>  default     = [&quot;t3.small&quot;]<br><br>  # t3.small: 2 vCPU, 2 GB RAM - minimum recommended for EKS<br>  # t3.micro is too small for EKS system pods!<br>}<br><br>variable &quot;node_capacity_type&quot; {<br>  description = &quot;Capacity type for nodes: ON_DEMAND or SPOT&quot;<br>  type        = string<br>  default     = &quot;ON_DEMAND&quot;<br><br>  validation {<br>    condition     = contains([&quot;ON_DEMAND&quot;, &quot;SPOT&quot;], var.node_capacity_type)<br>    error_message = &quot;Capacity type must be either ON_DEMAND or SPOT.&quot;<br>  }<br>}<br><br>variable &quot;node_desired_size&quot; {<br>  description = &quot;Desired number of nodes in the node group&quot;<br>  type        = number<br>  default     = 3<br><br>  validation {<br>    condition     = var.node_desired_size &gt;= 1<br>    error_message = &quot;Desired size must be at least 1.&quot;<br>  }<br>}<br><br>variable &quot;node_min_size&quot; {<br>  description = &quot;Minimum number of nodes in the node group&quot;<br>  type        = number<br>  default     = 3 # for high availability<br><br>  validation {<br>    condition     = var.node_min_size &gt;= 1<br>    error_message = &quot;Minimum size must be at least 1.&quot;<br>  }<br>}<br><br>variable &quot;node_max_size&quot; {<br>  description = &quot;Maximum number of nodes in the node group&quot;<br>  type        = number<br>  default     = 4<br><br>  validation {<br>    condition     = var.node_max_size &gt;= 1<br>    error_message = &quot;Maximum size must be at least 1.&quot;<br>  }<br>}<br><br>variable &quot;node_disk_size&quot; {<br>  description = &quot;Disk size in GB for worker nodes&quot;<br>  type        = number<br>  default     = 20  # 20 GB is sufficient for learning<br><br>  validation {<br>    condition     = var.node_disk_size &gt;= 20<br>    error_message = &quot;Disk size must be at least 20 GB.&quot;<br>  }<br>}<br><br># -----------------------------------------------------------------------------<br># Additional Tags<br># -----------------------------------------------------------------------------<br><br>variable &quot;additional_tags&quot; {<br>  description = &quot;Additional tags to apply to all resources&quot;<br>  type        = map(string)<br>  default     = {}<br>}</pre><h3>3.5 Variable Values (tfvars)</h3><p>This file sets the actual values for our variables. Most use defaults, but we include it for clarity and future customization.</p><p>Note: I added this into the .gitignore because sometimes you could have secrets in tfvars files, it’s just good practice to .gitignore this kind of file.</p><p>Create the file: environments/dev/terraform.tfvars</p><pre># =============================================================================<br># TERRAFORM VARIABLE VALUES<br># =============================================================================<br># This file contains the actual values for our infrastructure.<br># These values override the defaults defined in variables.tf<br>#<br># NOTE: This file should NOT be committed to git if it contains secrets!<br>#       For this tutorial, it&#39;s safe since we&#39;re only using non-sensitive values.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># General Configuration<br># -----------------------------------------------------------------------------<br>aws_region   = &quot;us-east-1&quot;<br>project_name = &quot;eks-video-tutorial&quot;<br>environment  = &quot;dev&quot;<br><br># -----------------------------------------------------------------------------<br># VPC Configuration<br># -----------------------------------------------------------------------------<br>vpc_cidr = &quot;10.0.0.0/16&quot;<br><br>availability_zones = [<br>  &quot;us-east-1a&quot;,<br>  &quot;us-east-1b&quot;,<br>  &quot;us-east-1c&quot;<br>]<br><br># Private subnets - where EKS nodes will run<br>private_subnet_cidrs = [<br>  &quot;10.0.1.0/24&quot;,   # AZ-1a: 251 usable IPs<br>  &quot;10.0.2.0/24&quot;,   # AZ-1b: 251 usable IPs<br>  &quot;10.0.3.0/24&quot;    # AZ-1c: 251 usable IPs<br>]<br><br># Public subnets - for load balancers and NAT gateway<br>public_subnet_cidrs = [<br>  &quot;10.0.101.0/24&quot;, # AZ-1a<br>  &quot;10.0.102.0/24&quot;, # AZ-1b<br>  &quot;10.0.103.0/24&quot;  # AZ-1c<br>]<br><br># NAT Gateway settings<br>enable_nat_gateway = true<br>single_nat_gateway = true  # Use one NAT GW to save costs (not HA!)<br><br># -----------------------------------------------------------------------------<br># EKS Configuration<br># -----------------------------------------------------------------------------<br>cluster_name    = &quot;eks-video-cluster&quot;<br>cluster_version = &quot;1.34&quot;<br><br># Access settings<br>cluster_endpoint_public_access  = true   # Allow kubectl from your machine<br>cluster_endpoint_private_access = true   # Allow node-to-control-plane communication<br><br># -----------------------------------------------------------------------------<br># Node Group Configuration<br># -----------------------------------------------------------------------------<br>node_instance_types = [&quot;t3.small&quot;]  # 2 vCPU, 2 GB RAM<br>node_capacity_type  = &quot;ON_DEMAND&quot;   # Use SPOT for cost savings (less stable)<br>node_desired_size   = 3             # Start with 2 nodes<br>node_min_size       = 3             # Never go below 2 (HA)<br>node_max_size       = 4              # Allow scaling up to 4<br>node_disk_size      = 20             # 20 GB per node<br><br># -----------------------------------------------------------------------------<br># Additional Tags<br># -----------------------------------------------------------------------------<br>additional_tags = {<br>  Owner       = &quot;tutorial&quot;<br>  CostCenter  = &quot;learning&quot;<br>  DeleteAfter = &quot;2025-12-31&quot;<br>}</pre><p><strong>Why are we using t3.small?</strong></p><p><strong>I believe t3.micro is not going to give us enough memory,</strong> I’m not sure we can even run what we need on that.</p><p>After some research, I came up with t3.small with 2GB RAM as minimum, and a good start for learning, and to keep costs down.</p><p>Generally though, I believe t3.medium would be a minimum in some use cases, and many companies will run bigger than that, just depends on your use cases.</p><p><strong>High Availability tip: </strong>Running a single node is a <strong>single point of failure.</strong> If that node fails, all your pods go down. With 2 nodes spread across AZs, your application survives node failures, so that’s why we are doing this, as a demo of what you can do.</p><p>⚠️ Also, note, <strong>the first time I did this myself before publishing this article, I accidentally made node_desired_size and node_min_size as 2</strong>…. when I checked from kubectl down below I noticed it was only showing 2 nodes and after troubleshooting updated the code (<strong>now correct above</strong>) as:</p><pre>variable &quot;node_desired_size&quot; {<br>  description = &quot;Desired number of nodes in the node group&quot;<br>  type        = number<br>  default     = 3<br><br>  validation {<br>    condition     = var.node_desired_size &gt;= 1<br>    error_message = &quot;Desired size must be at least 1.&quot;<br>  }<br>}<br><br>variable &quot;node_min_size&quot; {<br>  description = &quot;Minimum number of nodes in the node group&quot;<br>  type        = number<br>  default     = 3 # Minimum 3for high availability<br><br>  validation {<br>    condition     = var.node_min_size &gt;= 1<br>    error_message = &quot;Minimum size must be at least 1.&quot;<br>  }<br>}</pre><p>and in the terraform.tfvars file:</p><pre>node_desired_size   = 3             # Start with 2 nodes<br>node_min_size       = 3             # Never go below 2 (HA)</pre><p><strong>Notice some differences with ECS:</strong></p><p>More granular on</p><ul><li><strong>Terraform module with VPC integration</strong></li><li><strong>Creating a managed Node Group</strong></li><li><strong>K8s-specific subnet tags</strong></li><li><strong>Node Role, EKS Pod Identity (Agent) for pods (not task role)</strong></li><li><strong>We customize to our K8s version</strong></li></ul><p>Take some time to just look through this so you understand the code we are creating and all the resources.</p><p>⚠️ <strong>Remember </strong>when we terraform apply this soon, that you may have some costs unless you destroy it at the end! <strong>It is your responsibility to do that to avoid costs.</strong></p><h3>4. EKS Cluster: Control plane + managed node group setup in Terraform</h3><p>In this section, we’ll build a production-grade Amazon EKS cluster using Terraform.</p><ul><li>We’ll create a highly available setup that spans multiple Availability Zones (AZs), with a secure network architecture separating public and private subnets, and a fully managed control plane backed by AWS.</li></ul><p><strong>This foundation emphasizes resilience:</strong> even if one AZ experiences an outage, your cluster’s control plane and worker nodes in the remaining AZs will continue to operate.</p><h4>4.1 Review key components</h4><p>Let’s review the resources that we will be initially build for our Amazon EKS demo:</p><ul><li><strong>3 Availability Zones </strong>for high availability.</li></ul><p><strong>Multi-AZ VPC</strong> Our VPC spans 3 availability zones. If one AZ has an outage, the other two continue operating. This is foundational High Availability (HA).</p><ul><li><strong>Public Subnets:</strong> For load balancers and NAT Gateway</li><li><strong>Private Subnets:</strong> For EKS worker nodes (more secure)</li></ul><p>By passing all 3 private subnet IDs to the node group, EKS automatically distributes nodes across AZs. If one AZ fails, nodes in other AZs continue running your workloads.</p><ul><li><strong>NAT Gateway:</strong> Allows private subnets to access internet (for pulling images) —⚠️ note this a primary cost for this project, remember that, and you need to make sure it is destroyed when you are done, it’s an hourly rate that adds up.</li></ul><p>⚠️Also note that in an HA config we would want more than one NAT Gateway, we are not doing that in dev and for cost, but in prod your would want another to prevent single source of failure.</p><ul><li><strong>EKS Control Plane:</strong> Fully managed by AWS, automatically HA across 3 AZs, includes the API server etcd, Controller Manager and Scheduler.</li></ul><p>AWS automatically runs the EKS control plane (API server, etcd, controllers) across 3 AZs. You get this High Availability for free — no configuration needed.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MMcqvekQPwNpXKYcat_TBA.png" /><figcaption>What we are building architecture diagram</figcaption></figure><h4>4.2 Main Configuration File</h4><p>Study this file carefully. I’ve tried to comment and document key parts but spend some time reviewing this.</p><p>Create the file environments/dev/main.tf</p><pre># =============================================================================<br># MAIN INFRASTRUCTURE CONFIGURATION<br># =============================================================================<br># This file defines the core infrastructure for our EKS video streaming platform:<br># - VPC with public and private subnets across 3 AZs<br># - EKS cluster with managed node group<br>#<br># ECS Comparison:<br># - In ECS, cluster creation is simpler (just a cluster resource)<br># - EKS requires more explicit VPC configuration and subnet tagging<br># - But EKS gives you full Kubernetes, not just container orchestration<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Local Variables<br># -----------------------------------------------------------------------------<br>locals {<br>  # Common name prefix for resources<br>  name_prefix = &quot;${var.project_name}-${var.environment}&quot;<br><br>  # Common tags to apply to all resources<br>  common_tags = {<br>    Project     = var.project_name<br>    Environment = var.environment<br>    ManagedBy   = &quot;terraform&quot;<br>  }<br><br>  # Merge common tags with any additional tags<br>  tags = merge(local.common_tags, var.additional_tags)<br>}<br><br># -----------------------------------------------------------------------------<br># Data Sources<br># -----------------------------------------------------------------------------<br><br># Get current AWS account ID<br>data &quot;aws_caller_identity&quot; &quot;current&quot; {}<br><br># Get available AZs in the region (validates our AZ choices)<br>data &quot;aws_availability_zones&quot; &quot;available&quot; {<br>  state = &quot;available&quot;<br><br>  # Exclude Local Zones and Wavelength Zones<br>  filter {<br>    name   = &quot;opt-in-status&quot;<br>    values = [&quot;opt-in-not-required&quot;]<br>  }<br>}<br><br># =============================================================================<br># VPC MODULE<br># =============================================================================<br># We use the official AWS VPC module - it&#39;s battle-tested and handles all the<br># complexity of subnets, route tables, NAT gateways, etc.<br>#<br># Documentation: https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws<br># =============================================================================<br><br>module &quot;vpc&quot; {<br>  source  = &quot;terraform-aws-modules/vpc/aws&quot;<br>  version = &quot;~&gt; 5.0&quot;<br><br>  # ---------------------------------------------------------------------------<br>  # Basic VPC Configuration<br>  # ---------------------------------------------------------------------------<br>  name = &quot;${local.name_prefix}-vpc&quot;<br>  cidr = var.vpc_cidr<br><br>  # Use 3 AZs for high availability<br>  azs = var.availability_zones<br><br>  # ---------------------------------------------------------------------------<br>  # Subnet Configuration<br>  # ---------------------------------------------------------------------------<br>  # Private subnets - EKS nodes will run here (more secure)<br>  private_subnets = var.private_subnet_cidrs<br><br>  # Public subnets - for load balancers and NAT gateway<br>  public_subnets = var.public_subnet_cidrs<br><br>  # ---------------------------------------------------------------------------<br>  # NAT Gateway Configuration<br>  # ---------------------------------------------------------------------------<br>  # NAT Gateway allows private subnets to access the internet<br>  # (needed for pulling container images, etc.)<br>  enable_nat_gateway = var.enable_nat_gateway<br><br>  # Single NAT Gateway saves costs (~$32/month) but is not HA<br>  # For production, set this to false (one NAT GW per AZ)<br>  single_nat_gateway = var.single_nat_gateway<br><br>  # Place NAT Gateway in first AZ<br>  one_nat_gateway_per_az = false<br><br>  # ---------------------------------------------------------------------------<br>  # DNS Configuration<br>  # ---------------------------------------------------------------------------<br>  # Required for EKS and service discovery<br>  enable_dns_hostnames = true<br>  enable_dns_support   = true<br><br>  # ---------------------------------------------------------------------------<br>  # VPC Flow Logs (Optional - disabled for cost savings in dev)<br>  # ---------------------------------------------------------------------------<br>  enable_flow_log                      = false<br>  create_flow_log_cloudwatch_log_group = false<br>  create_flow_log_cloudwatch_iam_role  = false<br><br>  # ---------------------------------------------------------------------------<br>  # Subnet Tags for EKS Auto-Discovery<br>  # ---------------------------------------------------------------------------<br>  # These tags are REQUIRED for EKS to discover and use the subnets correctly!<br><br>  # Tags for all subnets<br>  tags = merge(local.tags, {<br>    &quot;kubernetes.io/cluster/${var.cluster_name}&quot; = &quot;shared&quot;<br>  })<br><br>  # Tags for public subnets - tells AWS LB Controller to use these for internet-facing LBs<br>  public_subnet_tags = {<br>    &quot;kubernetes.io/cluster/${var.cluster_name}&quot; = &quot;shared&quot;<br>    &quot;kubernetes.io/role/elb&quot;                    = &quot;1&quot;<br>    &quot;Tier&quot;                                      = &quot;public&quot;<br>  }<br><br>  # Tags for private subnets - tells AWS LB Controller to use these for internal LBs<br>  private_subnet_tags = {<br>    &quot;kubernetes.io/cluster/${var.cluster_name}&quot; = &quot;shared&quot;<br>    &quot;kubernetes.io/role/internal-elb&quot;           = &quot;1&quot;<br>    &quot;Tier&quot;                                      = &quot;private&quot;<br>  }<br>}<br><br># =============================================================================<br># EKS CLUSTER MODULE<br># =============================================================================<br># We use the official Amazon EKS module - it handles the complexity of:<br># - IAM roles and policies<br># - Security groups<br># - OIDC provider<br># - Add-ons (CoreDNS, kube-proxy, vpc-cni)<br># - Managed node groups<br>#<br># Documentation: https://registry.terraform.io/modules/terraform-aws-modules/eks/aws<br>#<br># ECS Comparison:<br># - ECS Cluster = just a logical grouping<br># - EKS Cluster = full Kubernetes control plane with API server, etcd, etc.<br># - EKS automatically runs the control plane across 3 AZs (HA built-in)<br># =============================================================================<br><br>module &quot;eks&quot; {<br>  source  = &quot;terraform-aws-modules/eks/aws&quot;<br>  version = &quot;~&gt; 20.0&quot;<br><br>  # ---------------------------------------------------------------------------<br>  # Cluster Configuration<br>  # ---------------------------------------------------------------------------<br>  cluster_name    = var.cluster_name<br>  cluster_version = var.cluster_version<br><br>  # ---------------------------------------------------------------------------<br>  # Network Configuration<br>  # ---------------------------------------------------------------------------<br>  vpc_id     = module.vpc.vpc_id<br>  subnet_ids = module.vpc.private_subnets<br><br>  # Control plane subnets (can be different from node subnets)<br>  control_plane_subnet_ids = module.vpc.private_subnets<br><br>  # ---------------------------------------------------------------------------<br>  # Cluster Endpoint Access<br>  # ---------------------------------------------------------------------------<br>  # Public access - allows kubectl from your local machine<br>  cluster_endpoint_public_access = var.cluster_endpoint_public_access<br><br>  # Private access - allows nodes to communicate with control plane via VPC<br>  cluster_endpoint_private_access = var.cluster_endpoint_private_access<br><br>  # Restrict public access to specific IPs (optional, more secure)<br>  # cluster_endpoint_public_access_cidrs = [&quot;YOUR_IP/32&quot;]<br><br>  # ---------------------------------------------------------------------------<br>  # Cluster Add-ons<br>  # ---------------------------------------------------------------------------<br>  # These are essential Kubernetes components managed by AWS<br>  cluster_addons = {<br>    # CoreDNS - DNS server for Kubernetes service discovery<br>    coredns = {<br>      most_recent = true<br>      configuration_values = jsonencode({<br>        # Ensure CoreDNS runs on different nodes for HA<br>        affinity = {<br>          podAntiAffinity = {<br>            preferredDuringSchedulingIgnoredDuringExecution = [<br>              {<br>                weight = 100<br>                podAffinityTerm = {<br>                  labelSelector = {<br>                    matchExpressions = [<br>                      {<br>                        key      = &quot;k8s-app&quot;<br>                        operator = &quot;In&quot;<br>                        values   = [&quot;kube-dns&quot;]<br>                      }<br>                    ]<br>                  }<br>                  topologyKey = &quot;kubernetes.io/hostname&quot;<br>                }<br>              }<br>            ]<br>          }<br>        }<br>      })<br>    }<br><br>    # kube-proxy - Network proxy that runs on each node<br>    kube-proxy = {<br>      most_recent = true<br>    }<br><br>    # vpc-cni - AWS VPC CNI plugin for pod networking<br>    vpc-cni = {<br>      most_recent = true<br>      configuration_values = jsonencode({<br>        # Enable prefix delegation for more IPs per node<br>        env = {<br>          ENABLE_PREFIX_DELEGATION = &quot;true&quot;<br>          WARM_PREFIX_TARGET       = &quot;1&quot;<br>        }<br>      })<br>    }<br><br>    # EKS Pod Identity Agent - IAM<br>    eks-pod-identity-agent = {<br>      most_recent = true<br>    }<br>  }<br><br>  # ---------------------------------------------------------------------------<br>  # IAM / Access Configuration<br>  # ---------------------------------------------------------------------------<br>  # Allow the Terraform user to administer the cluster<br>  enable_cluster_creator_admin_permissions = true<br><br>  # Access entry configuration (EKS v1.30+ authentication mode)<br>  authentication_mode = &quot;API_AND_CONFIG_MAP&quot;<br><br>  # ---------------------------------------------------------------------------<br>  # Managed Node Group<br>  # ---------------------------------------------------------------------------<br>  # ECS Comparison:<br>  # - ECS Capacity Provider = similar concept<br>  # - Both manage a pool of EC2 instances for running containers<br>  # - EKS nodes run kubelet and join the K8s cluster automatically<br>  # ---------------------------------------------------------------------------<br>  eks_managed_node_groups = {<br>    # Primary node group<br>    primary = {<br>      # Use shorter names to avoid AWS IAM role name length limits (38 chars max)<br>      name            = &quot;primary&quot;<br>      use_name_prefix = false<br><br>      # Override IAM role name to be shorter<br>      iam_role_name            = &quot;${var.cluster_name}-ng-role&quot;<br>      iam_role_use_name_prefix = false<br><br>      # Instance configuration<br>      instance_types = var.node_instance_types<br>      capacity_type  = var.node_capacity_type<br><br>      # Scaling configuration<br>      min_size     = var.node_min_size<br>      max_size     = var.node_max_size<br>      desired_size = var.node_desired_size<br><br>      # Disk configuration<br>      disk_size = var.node_disk_size<br><br>      # Subnet placement - spread across all private subnets (AZs)<br>      subnet_ids = module.vpc.private_subnets<br><br>      # Labels applied to all nodes in this group<br>      labels = {<br>        Environment = var.environment<br>        NodeGroup   = &quot;primary&quot;<br>        Project     = var.project_name<br>      }<br><br>      # Tags for the node group and EC2 instances<br>      tags = merge(local.tags, {<br>        Name      = &quot;${local.name_prefix}-node&quot;<br>        NodeGroup = &quot;primary&quot;<br>      })<br><br>      # AMI type - Amazon Linux 2023 (AL2023) is the latest<br>      ami_type = &quot;AL2023_x86_64_STANDARD&quot;<br><br>      # Update configuration - how many nodes can be unavailable during updates<br>      update_config = {<br>        max_unavailable_percentage = 50<br>      }<br><br>      # IAM role additional policies<br>      iam_role_additional_policies = {<br>        AmazonSSMManagedInstanceCore = &quot;arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore&quot;<br>      }<br>    }<br>  }<br><br>  # ---------------------------------------------------------------------------<br>  # Node Security Group Additional Rules<br>  # ---------------------------------------------------------------------------<br>  node_security_group_additional_rules = {<br>    # Allow nodes to communicate with each other on all ports<br>    ingress_self_all = {<br>      description = &quot;Node to node all ports/protocols&quot;<br>      protocol    = &quot;-1&quot;<br>      from_port   = 0<br>      to_port     = 0<br>      type        = &quot;ingress&quot;<br>      self        = true<br>    }<br><br>    # Allow outbound traffic to all destinations<br>    egress_all = {<br>      description      = &quot;Node all egress&quot;<br>      protocol         = &quot;-1&quot;<br>      from_port        = 0<br>      to_port          = 0<br>      type             = &quot;egress&quot;<br>      cidr_blocks      = [&quot;0.0.0.0/0&quot;]<br>      ipv6_cidr_blocks = [&quot;::/0&quot;]<br>    }<br>  }<br><br>  # ---------------------------------------------------------------------------<br>  # Tags<br>  # ---------------------------------------------------------------------------<br>  tags = local.tags<br>}<br><br># =============================================================================<br># ADDITIONAL RESOURCES<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># Wait for cluster to be ready<br># -----------------------------------------------------------------------------<br># This ensures the cluster is fully ready before we try to use it<br>resource &quot;time_sleep&quot; &quot;wait_for_cluster&quot; {<br>  depends_on = [module.eks]<br><br>  create_duration = &quot;30s&quot;<br>}<br><br># -----------------------------------------------------------------------------<br># Null resource to update kubeconfig<br># -----------------------------------------------------------------------------<br># This is optional - provides a command to run after apply<br>resource &quot;null_resource&quot; &quot;update_kubeconfig&quot; {<br>  depends_on = [time_sleep.wait_for_cluster]<br><br>  provisioner &quot;local-exec&quot; {<br>    command = &quot;aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}&quot;<br>  }<br><br>  # Only run this when the cluster ARN changes (i.e., on initial creation)<br>  triggers = {<br>    cluster_arn = module.eks.cluster_arn<br>  }<br>}</pre><p><strong>Note #1: I believe EKS should autoprovision EKS Pod Identity Agent</strong> by default, so that means we do not need the OIDC provider, however I am keeping that in my main.tf right now because we may need it later. The old way of using<strong> IRSA (IAM Roles for Service Accounts) </strong>and load balancers I believe will need it. But it has no impact on our deploy.</p><p><strong>Note #2:</strong> we have one extra setup addon for better availability here (copied form the code above).</p><p>We configured CoreDNS with pod anti-affinity to prefer running DNS on different nodes. This prevents DNS from being a single point of failure.</p><p><strong>TF code relating to this:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/659/1*tSAQJBTtTInusiyvu_AyzQ.png" /></figure><p>You can typically make fine-grained adjutsments using sections like this in the TF code.</p><h4>4.3 Output file</h4><p>Create file: environments/dev/outputs.tf</p><p>This will make it easy to see the status and other values after applying.</p><pre># =============================================================================<br># TERRAFORM OUTPUTS<br># =============================================================================<br># These outputs display important information after terraform apply and can be<br># referenced by other Terraform configurations or scripts.<br># =============================================================================<br><br># -----------------------------------------------------------------------------<br># VPC Outputs<br># -----------------------------------------------------------------------------<br><br>output &quot;vpc_id&quot; {<br>  description = &quot;ID of the VPC&quot;<br>  value       = module.vpc.vpc_id<br>}<br><br>output &quot;vpc_cidr_block&quot; {<br>  description = &quot;CIDR block of the VPC&quot;<br>  value       = module.vpc.vpc_cidr_block<br>}<br><br>output &quot;private_subnet_ids&quot; {<br>  description = &quot;List of private subnet IDs&quot;<br>  value       = module.vpc.private_subnets<br>}<br><br>output &quot;public_subnet_ids&quot; {<br>  description = &quot;List of public subnet IDs&quot;<br>  value       = module.vpc.public_subnets<br>}<br><br>output &quot;nat_gateway_ids&quot; {<br>  description = &quot;List of NAT Gateway IDs&quot;<br>  value       = module.vpc.natgw_ids<br>}<br><br>output &quot;availability_zones&quot; {<br>  description = &quot;List of availability zones used&quot;<br>  value       = module.vpc.azs<br>}<br><br># -----------------------------------------------------------------------------<br># EKS Cluster Outputs<br># -----------------------------------------------------------------------------<br><br>output &quot;cluster_name&quot; {<br>  description = &quot;Name of the EKS cluster&quot;<br>  value       = module.eks.cluster_name<br>}<br><br>output &quot;cluster_arn&quot; {<br>  description = &quot;ARN of the EKS cluster&quot;<br>  value       = module.eks.cluster_arn<br>}<br><br>output &quot;cluster_endpoint&quot; {<br>  description = &quot;Endpoint URL for the EKS cluster API server&quot;<br>  value       = module.eks.cluster_endpoint<br>}<br><br>output &quot;cluster_version&quot; {<br>  description = &quot;Kubernetes version of the EKS cluster&quot;<br>  value       = module.eks.cluster_version<br>}<br><br>output &quot;cluster_certificate_authority_data&quot; {<br>  description = &quot;Base64 encoded certificate data for cluster authentication&quot;<br>  value       = module.eks.cluster_certificate_authority_data<br>  sensitive   = true<br>}<br><br>output &quot;cluster_oidc_issuer_url&quot; {<br>  description = &quot;OIDC issuer URL for the cluster&quot;<br>  value       = module.eks.cluster_oidc_issuer_url<br>}<br><br>output &quot;cluster_oidc_provider_arn&quot; {<br>  description = &quot;ARN of the OIDC provider&quot;<br>  value       = module.eks.oidc_provider_arn<br>}<br><br># -----------------------------------------------------------------------------<br># EKS Node Group Outputs<br># -----------------------------------------------------------------------------<br><br>output &quot;node_group_name&quot; {<br>  description = &quot;Name of the primary node group&quot;<br>  value       = try(module.eks.eks_managed_node_groups[&quot;primary&quot;].node_group_id, &quot;&quot;)<br>}<br><br>output &quot;node_group_arn&quot; {<br>  description = &quot;ARN of the primary node group&quot;<br>  value       = try(module.eks.eks_managed_node_groups[&quot;primary&quot;].node_group_arn, &quot;&quot;)<br>}<br><br>output &quot;node_group_status&quot; {<br>  description = &quot;Status of the primary node group&quot;<br>  value       = try(module.eks.eks_managed_node_groups[&quot;primary&quot;].node_group_status, &quot;&quot;)<br>}<br><br>output &quot;node_security_group_id&quot; {<br>  description = &quot;Security group ID attached to the EKS nodes&quot;<br>  value       = module.eks.node_security_group_id<br>}<br><br># -----------------------------------------------------------------------------<br># IAM Outputs<br># -----------------------------------------------------------------------------<br><br>output &quot;cluster_iam_role_arn&quot; {<br>  description = &quot;IAM role ARN of the EKS cluster&quot;<br>  value       = module.eks.cluster_iam_role_arn<br>}<br><br>output &quot;node_iam_role_arn&quot; {<br>  description = &quot;IAM role ARN of the EKS node group&quot;<br>  value       = try(module.eks.eks_managed_node_groups[&quot;primary&quot;].iam_role_arn, &quot;&quot;)<br>}<br><br># -----------------------------------------------------------------------------<br># Useful Commands<br># -----------------------------------------------------------------------------<br><br>output &quot;configure_kubectl&quot; {<br>  description = &quot;Command to configure kubectl for this cluster&quot;<br>  value       = &quot;aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}&quot;<br>}<br><br>output &quot;get_nodes_command&quot; {<br>  description = &quot;Command to list cluster nodes&quot;<br>  value       = &quot;kubectl get nodes -o wide&quot;<br>}<br><br>output &quot;get_pods_command&quot; {<br>  description = &quot;Command to list all pods in all namespaces&quot;<br>  value       = &quot;kubectl get pods -A&quot;<br>}<br><br># -----------------------------------------------------------------------------<br># Summary Output<br># -----------------------------------------------------------------------------<br><br>output &quot;summary&quot; {<br>  description = &quot;Summary of created infrastructure&quot;<br>  value       = &lt;&lt;-EOT<br><br>    ============================================================<br>    EKS CLUSTER DEPLOYMENT COMPLETE!<br>    ============================================================<br><br>    Cluster Name:     ${module.eks.cluster_name}<br>    Cluster Version:  ${module.eks.cluster_version}<br>    Cluster Endpoint: ${module.eks.cluster_endpoint}<br><br>    VPC ID:           ${module.vpc.vpc_id}<br>    VPC CIDR:         ${module.vpc.vpc_cidr_block}<br><br>    Availability Zones: ${join(&quot;, &quot;, module.vpc.azs)}<br><br>    Private Subnets:<br>      - ${join(&quot;\n      - &quot;, module.vpc.private_subnets)}<br><br>    Public Subnets:<br>      - ${join(&quot;\n      - &quot;, module.vpc.public_subnets)}<br><br>    ------------------------------------------------------------<br>    NEXT STEPS:<br>    ------------------------------------------------------------<br><br>    1. Configure kubectl:<br>       aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_name}<br><br>    2. Verify nodes are ready:<br>       kubectl get nodes<br><br>    3. Check system pods:<br>       kubectl get pods -n kube-system<br><br>    ============================================================<br><br>  EOT<br>}<br><br># -----------------------------------------------------------------------------<br># HA Verification Output<br># -----------------------------------------------------------------------------<br><br>output &quot;ha_status&quot; {<br>  description = &quot;High availability configuration status&quot;<br>  value       = &lt;&lt;-EOT<br><br>    ============================================================<br>    HIGH AVAILABILITY STATUS<br>    ============================================================<br><br>    ✅ VPC spans ${length(module.vpc.azs)} Availability Zones<br>    ✅ Private subnets in ${length(module.vpc.private_subnets)} AZs (for nodes)<br>    ✅ Public subnets in ${length(module.vpc.public_subnets)} AZs (for load balancers)<br>    ✅ EKS control plane: Managed by AWS across 3 AZs (automatic)<br>    ✅ Node group: Configured to spread across all private subnets<br><br>    ${var.single_nat_gateway ? &quot;⚠️  Single NAT Gateway: NOT highly available (cost savings for dev)&quot; : &quot;✅ NAT Gateway per AZ: Highly available&quot;}<br><br>    Node Configuration:<br>    - Min nodes: ${var.node_min_size}<br>    - Max nodes: ${var.node_max_size}<br>    - Desired:   ${var.node_desired_size}<br><br>    ============================================================<br><br>  EOT<br>}</pre><p>These will output info for diagnostics for you to fix any errors and for guidance on next steps.</p><h4>4.4 Verify Files</h4><pre>ls -la environments/dev/<br><br>Expected:<br><br>-rw-r--r--  backend.tf<br>-rw-r--r--  main.tf<br>-rw-r--r--  outputs.tf<br>-rw-r--r--  providers.tf<br>-rw-r--r--  terraform.tfvars<br>-rw-r--r--  variables.tf<br>-rw-r--r--  .gitignore # (or may be in higher dir.)</pre><p>These are all the files we should have.</p><h3>5. Deploy AWS Resources with Terraform</h3><p>Now that we have deployed our resources, we need to do some checks to validate that everything is working as it should be.</p><h4>5.1 Verify account</h4><p>Let’s just check we are in the right account before we launch this thing….</p><pre>aws sts get-caller-identity<br><br># Returns the account you are in</pre><h4>5.2 Update Backend Configuration</h4><p>Validat that you backend state bucket name is correct (we pasted this in much earlier, but check again or we’ll have issues)</p><pre><br># cd from the root of your project<br><br>cd backend-bootstrap<br>terraform output s3_bucket_name</pre><p>Copy that bucket name and if you have not done it yet update environments/dev/backend.tf:</p><pre>cd environments/dev<br>cat backend.tf</pre><h4>5.3 Initialize Terraform</h4><p>Now let’s initialize Terraform with the remote backend.</p><p>⚠️ With the local lockfile on the init, this could initially take up a decent amount of hard drive space for library installs. Keep an eye on that.</p><p>Inside environments/dev</p><pre>terraform init<br><br># Output - you should see it installing updated libraries needed<br><br>terraform plan<br><br># Output - this will confirm our syntax is correct and do a preview dry run<br><br># If all is good run in plan, run terraform apply<br><br>terraform apply<br></pre><h3>That is it you are deploying!!!!</h3><p>Pray it works…. I think it will 😅. Worked on my machine 🤣</p><p><strong>Troubleshooting: </strong>Terraform, Amazon EKS and associated libraries change frequently. If you get any errors, it may be due to a small syntax change that happens between versions.</p><p>Research any errors and feel free to update in the comments.</p><pre># Output<br><br>============================================================<br>EKS CLUSTER DEPLOYMENT COMPLETE!<br>============================================================<br><br>Cluster Name:     eks-video-cluster<br>Cluster Version:  1.34<br>Cluster Endpoint: https://913C8xxxxxxxx.gr7.us-east-1.eks.amazonaws.com<br><br>VPC ID:           vpc-05d295xxxxxxxxxx<br>VPC CIDR:         10.0.0.0/16<br><br>Availability Zones: us-east-1a, us-east-1b, us-east-1c<br><br>Private Subnets:<br>  - subnet-04fd3xxxxxxxxxxx<br>  - subnet-0936400xxxxxxxxxxx<br>  - subnet-0807b3xxxxxxxxxxx<br><br>Public Subnets:<br>  - subnet-0612axxxxxxxxxxx<br>  - subnet-01a1xxxxxxxxxxx<br>  - subnet-01cxxxxxxxxxxx<br></pre><p>🚀 Let’s go! You’re doing it!</p><h3>6. Validate Amazon EKS Launch: Examine what was created, run basic commands with kubectl</h3><p>Lets go to the AWS console and and check this out</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CHPEVf1JPJ5IauMSIpflWA.png" /></figure><ul><li><strong>VPCs:</strong> 2 … We created 1 and there is 1 by default.</li><li><strong>Subnets: </strong>12 … 6 default VPC subnets + 6 we created (3 public + 3 private)</li><li><strong>Route Tables: 4 … </strong>1 public route table + 3 private route tables (one/AZ)</li><li><strong>Internet Gateways: </strong>1 default + 1 we created</li><li><strong>NAT Gateways:</strong> 1 … Single NAT Gateway (only 1 instead of 2 for cost savings, use 2 for prod HA)</li><li><strong>Security Groups: </strong>5 … Default + cluster + nodes + additional</li><li><strong>Running Instances: 2 …. </strong>2 EKS worker nodes!</li></ul><h4>Amazon EKS Clusters in AWS Console</h4><p><strong>Note: I initially rolled this out for version 1.31, but changed the article to do version 1.34 because at writing the cost was less</strong> (1.31 is considered legacy “extended support”)</p><p><strong>This was a first run with 1.31 —</strong> see where it says “Extended support” — this costs more hourly, <strong>so I re-ran again, launched a second time, so I do not have to pay the legacy rates.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mDE92RTm9sCzG_cQRk7nOA.png" /></figure><p><strong>Rerun for K8s 1.34 </strong>This is where I re-ran it later to use 1.34 — notice how it says “Standard support”. (watch this, if you run this a year from now it may be legacy)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*N5xo41I9_EdgTyciXwRrsg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WyN62tASTL6OKMpnRa9ApQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BJ0GO7Xw46juA_npvmSN6Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OQtz5VjEGitE1j1fUneuZQ.png" /></figure><h4>Minor IAM issues in AWS Console Amazon EKS</h4><p>⚠️ I noticed a few minor permissions issues in AWS Console, we do not need to fix that yet as we are just validating the basic health, but we will fix that in later articles.</p><p>These are just minor IAM issues that can be quickly fixed but I do not want to distract from the core of this lesson.</p><h4>Connect kubectl: Configure local kubectl, verify nodes</h4><p>Without running this command below (or an similar), kubectl has no idea where your cluster is or how to authenticate to it.</p><ul><li>So kubectl knows the API server endpoint of your EKS cluster.</li><li>It writes (or updates) a cluster config entry in your ~/.kube/config file.</li><li>Sets up AWS IAM-based authentication to use your credentials.</li><li>After running the command, kubectl will automatically switch to this cluster making it the active cluster for this.</li></ul><pre>aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster<br><br># Output<br><br>updated context arn:aws:eks:us-east-1:xxxxxxxxxxxx:cluster/eks-video-cluster in /Users/me/.kube/config</pre><p><strong>You need to re-run this whenever:</strong></p><ul><li>You create a brand-new cluster for first time access.</li><li>You switch between multiple EKS clusters.</li><li>You work on a new laptop/machine (no kubeconfig yet).</li><li>~/.kube/config got deleted or corrupted.</li><li>You changed the AWS CLI profile or region and use a different profile.</li></ul><p><strong>You can also alias it like (optional):</strong></p><pre>aws eks update-kubeconfig --region us-east-1 --name eks-video-cluster --alias video<br><br># then you can do instead:<br><br>kubectl config use-context video</pre><h4>Verify connection is working for kubectl</h4><p>Let’s make sure it is working</p><pre>kubectl cluster-info<br><br># Output<br><br>Kubernetes control plane is running at https://xxxxxxxxx.gr7.us-east-1.eks.amazonaws.com<br><br>CoreDNS is running at https://xxxxxxxxx.gr7.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy</pre><h4>Check Nodes</h4><p>Lets make sure our worker nodes are there.</p><pre><br>kubectl get nodes<br><br># Output:<br><br>NAME                         STATUS   ROLES    AGE   VERSION<br>ip-10-0-1-110.ec2.internal   Ready    &lt;none&gt;   6m26s   v1.34.2-eks-xxxxxx<br>ip-10-0-2-6.ec2.internal     Ready    &lt;none&gt;   6m44s   v1.34.2-eks-xxxxxx<br>ip-10-0-3-7.ec2.internal     Ready    &lt;none&gt;   6m45s   v1.34.2-eks-xxxxxx</pre><p>We run this because it is the most basic health-check we run after we successfully connected kubectl.</p><p><strong>By running this success we know:</strong></p><ul><li>Our AWS credientials are correct</li><li>We have permission to talk to EKS API</li><li>kubectl can reach the EKS control plane over the internet</li><li>The cluster is alive and has worker nodes joined</li></ul><h4>Verify Nodes are in different AZs (HA Check)</h4><pre>kubectl get nodes -L topology.kubernetes.io/zone<br><br># Output - notice now (far right) it shows the AZs<br><br>NAME                         STATUS   ROLES    AGE   VERSION                ZONE<br>ip-10-0-1-110.ec2.internal   Ready    &lt;none&gt;   6m48s   v1.34.2-eks-xxxxxx   us-east-1a<br>ip-10-0-2-6.ec2.internal     Ready    &lt;none&gt;   7m6s    v1.34.2-eks-xxxxxx   us-east-1b<br>ip-10-0-3-7.ec2.internal     Ready    &lt;none&gt;   7m7s    v1.34.2-eks-xxxxxx   us-east-1c</pre><h4>Check System Pods</h4><p>The below is the output you want, it means our brand-new EKS cluster is 100% healthy at the core level.</p><pre>kubectl get pods -n kube-system<br><br># Output<br><br>get pods -n kube-system<br>NAME                           READY   STATUS    RESTARTS   AGE<br>aws-node-8dds9                 2/2     Running   0          6m21s<br>aws-node-l95q6                 2/2     Running   0          6m32s<br>aws-node-vdhm4                 2/2     Running   0          6m10s<br>coredns-975b7d678-f5ghx        1/1     Running   0          6m32s<br>coredns-975b7d678-lnq69        1/1     Running   0          6m32s<br>eks-pod-identity-agent-2wwnk   1/1     Running   0          6m32s<br>eks-pod-identity-agent-7gxt9   1/1     Running   0          6m32s<br>eks-pod-identity-agent-v5fg7   1/1     Running   0          6m32s<br>kube-proxy-2tjxp               1/1     Running   0          6m32s<br>kube-proxy-c2r86               1/1     Running   0          6m24s<br>kube-proxy-ckt9w               1/1     Running   0          6m28s</pre><p><strong>aws-node:</strong> The Amazon VPC CNI plugin (gives pods IPs from your VPC, ENI support, security groups for pods, etc.)</p><p><strong>coredns:</strong> CoreDNS — the cluster’s internal DNS server. 2 replicas for high availability. Both are healthy</p><p><strong>eks-pod-identity-agent:</strong> EKS Pod Identity Agent (the recommended modern replacement for IRSA in newer clusters, if you are working on older EKS or by choice they use IRSA then it may use IRSA instead). Lets pods assume IAM roles securely without putting keys in secrets, one per node</p><p><strong>kube-proxy: </strong>kube-proxy — makes Kubernetes Services (ClusterIP, NodePort, LoadBalancer) actually work, one per node, without which K8s will not work.</p><ul><li>The control plane is healthy</li><li>Worker nodes successfully joined the cluster</li><li>Networking (VPC CNI) is working</li><li>DNS is working</li><li>IAM roles for service accounts / Pod Identity will work</li><li>Kubernetes Services will work</li></ul><h3>Summary of what we did today</h3><p>So to summarize for your resume and knowledge:</p><p>✅ Created Terraform state file cloud backend.<br>✅ Created Terraform templates for Amazon EKS HA deployment.<br>✅ Learned about differences in container orchestration.<br>✅ VPC spans 3 Availability Zones for high availability<br>✅ Worker nodes distributed across multiple AZs.<br>✅ EKS control plane runs across 3 AZs (AWS managed).<br>✅ Diagnostic commands with kubectl for validation<br>✅ CoreDNS replicas on different nodes.</p><p>Ok are you ready to destroy this now?</p><p>⚠️ We need to save money 😁 Until the next article… We will ride again!</p><p>We have Terraform so now we can build all this infra again easily with terraform apply</p><h3>7. Cleanup. Destroy all resources to avoid charges</h3><p>🚨 <strong>IMPORTANT: Your cluster (only once deployed near the end of the article) costs approximately $0.50–$0.75/hour (as of writing, could vary later, depending on region and version of Kubernetes) while running. </strong>Always destroy resources when you’re done for the day! Also, do not just use the destroy command and assume it worked, <strong>double check in the AWS console</strong> that it actually did remove the resources.</p><p>Make sure you are in environments/dev where your main.tf is located</p><pre>$ terraform destroy</pre><p>That’s it, it will destroy your whole infra. Confirm all resources were destroyed.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/590/1*uTT9THr7jRBth7Onxgrzbg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0t2EP0aRlJdA-H_P3wr4ng.png" /></figure><p><strong>🚨⚠️ CRITICAL</strong>: <strong>Double-check in AWS console to make sure all resources are destroyed (especially EKS cluster, ELB, EC2 instances) using </strong><strong>terraform destroy!</strong> There are cases where people didn’t see an error and it was not destroyed, and get a big bill later, so make sure! If you neglect to destroy this, there will be ongoing charges, which could be as much as $10-$20/per day.</p><p><strong>To set up cost guardrails and AWS Budgets alerts see my articles:</strong></p><ul><li><a href="https://medium.com/cloud-cost-savings/aws-cost-savings-playbook-2-reports-tracking-heart-soul-of-cost-control-d7ff2d926e72">AWS Cost Savings Playbook (#2): REPORTS/Tracking, Heart &amp; Soul of Cost Control</a></li><li><a href="https://medium.com/cloud-cost-savings/aws-cost-savings-playbook-4-cost-guardrails-e90407984eda">AWS Cost Savings Playbook (#4): Cost Guardrails: Setting up cost guardrails to prevent unintended spending.</a></li></ul><p><strong>🔥What a project! </strong>(ongoing with more series articles scheduled)</p><p><strong>🚀 Props and respect to you for making it through this!</strong></p><p>You certainly are a professional to stick with it.</p><p>Later you can rebuild it with terraform apply when you want to, just remember what I said about watching costs— you’ll get charged by the hour for several resources.</p><h3>Looking Ahead…</h3><p>Part 3: ✅ “I need to deploy my video app and Docker image on self-healing Amazon EKS nodes/pods with kubectl diagnostics”</p><p><strong>1. Introduction:</strong> What we’re building, prerequisites (5m)<br><strong>2. Rebuild Cluster: </strong>Quick terraform apply to restore (15–20m)<br><strong>3. Create ECR Repository:</strong> Terraform for private container registry (5m)<br><strong>4. Build the App: </strong>Node.js server + HTML5 video player (10m)<br><strong>5. Docker Build &amp; Push: </strong>Containerize and push to ECR (10m)<br><strong>6. Kubernetes Manifests:</strong> Deployment + Service with HA (10m)<br><strong>7. Deploy to EKS:</strong> kubectl apply, verify pods (5m)<br><strong>8. Access the App: </strong>Open browser, watch video! (5m)<br><strong>9. Explore &amp; Verify HA:</strong> Test replicas, health checks (10m)<br><strong>10. Cleanup: </strong>Destroy resources</p><p>Keep in mind, we are still in the early stages, we will get more advanced as the series continues.</p><p>🛠️ Get more like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p>Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Saving:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e1a87efc9925" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AWS EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap]]></title>
            <link>https://medium.com/@csjcode/aws-eks-k8s-media-cluster-part-1-initial-setup-roadmap-176bdb085d32?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/176bdb085d32</guid>
            <category><![CDATA[amazon-eks]]></category>
            <category><![CDATA[scalability]]></category>
            <category><![CDATA[aws-eks]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[aws]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Mon, 15 Dec 2025 14:31:29 GMT</pubDate>
            <atom:updated>2026-01-05T21:15:06.093Z</atom:updated>
            <content:encoded><![CDATA[<h3>Amazon EKS (K8s) Media Cluster: Part 1 — Initial Setup/Roadmap</h3><h4>Setup AWS subaccount/admin, Terraform, Kubernetes, Docker/ECS, AWS CLI and all the prerequisites we need, and look ahead on the roadmap for this project!</h4><p>Although I’ve mentioned Amazon EKS (AWS platform for Kubernetes, K8s) in some past writings, <strong>people have been asking for a while about me doing some more in-depth deep-dive tutorials</strong> on the K8s ecosystem here….</p><p><strong>Now is the time! </strong>This is for intermediate-advanced cloud engineers, or even new devs who have the drive and aspiration to enhance their skills to jump to the next level.</p><p><strong>I’m super-excited</strong> for this <strong>K8s/Amazon EKS</strong> series…. <strong>Let’s do it!</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*SIwrbpc6e4_8R6OQXTOXVA.jpeg" /></figure><p>🛠 ️Is it worth the effort? — <strong>Yes</strong>, definitely!!! 🚀</p><p><strong>Imagine your sense of accomplishment and confidence </strong>after completing the below list of highlights….</p><p><strong>This is the best project series you can do for:</strong></p><ul><li>Mastering <strong>production-grade EKS</strong> at scale.</li><li>Deep <strong>Infrastructure as Code</strong> (IaC) with Terraform .</li><li>Learn <strong>multi-AZ high availability</strong> the right way.</li><li>Understand <strong>EKS-optimized node groups</strong>, Karpenter.</li><li>Work with <strong>EKS Pod Identity (Agent)</strong>.</li><li>Prepare for <strong>GPU workloads</strong> and <strong>high-IOPS storage.</strong></li><li>Build skills with <strong>kubectl, helm, kustomize, eksctl, AWS CLI</strong> daily.</li><li>Set up <strong>monitoring &amp; logging</strong> foundations.</li><li>Get comfortable with <strong>ALB/NLB Ingress</strong>, cert-manager, external-dns.</li><li>With a focus on <strong>high availability (HA) and container management.</strong></li></ul><p><em>Details may vary as I am writing this now, will be incoming in part 2+.</em></p><h3>Intro — How we got here.</h3><p><strong>This will be a <em>practical</em>, hands-on tutorial series.</strong></p><ul><li><strong>Go from zero to a fully functional, scalable video-streaming platform running on Amazon EKS.</strong></li><li>All built with <strong>Terraform</strong> and real-world <strong>AWS best practices</strong></li><li>Special attention to <strong>high availability</strong> and <strong>media delivery</strong>.</li><li>And comparison to other container solutions like ECS.</li></ul><p><strong>When I was at the original MP3.com, back in the early dotcom days, </strong>our cracked Engineering infra team (accomplished devs and upcoming talent) handled massive scale media delivery — but they didn’t have Docker, Kubernetes or fancy orchestration tools.</p><p><strong>Back then infrastructure code was mostly orchestrated with custom scripts</strong> copied with Linux automation and CVS (Concurrent Versions System, a precursor to git)— and that did work to run thousands of colo bare-metal servers!</p><p>But a lot of media delivery then was <strong>ad-hoc, on-the-fly, seat-of-the-pants learning</strong>… 20 years later, a lot of these learnings in the industry have evolved into a new modern cloud constellation of tools.</p><p><strong>Now? It’s almost the other extreme. </strong>We have so many more sophisticated and elaborate choices to integrate with platforms like AWS and their many services…</p><p><strong>And that is good AND bad.</strong> Good for standardization, docs and integrating tools/controls in more fully-featured management/deployment solutions.</p><p><strong>But ironically…. that can make it all more <em>complex</em></strong><em> </em>and take extra time to ramp up devs to, which can be tricky. It’s almost a fulltime job to do that, besides knowing the features of all the other cloud SaaS, architecture and service varieties out there!</p><p>Looking ahead over the first 3 articles:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MMcqvekQPwNpXKYcat_TBA.png" /><figcaption>Basic INITIAL architecture diagram — Part 2 Terraform initial implementation</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ssiDkKAXSQWOLIpafSzwBw.png" /><figcaption>Testing high availability video hosting page (to demo Amazon EKS) — Part 3</figcaption></figure><p>Initially we will host the video on individual pods, but later we will try different configurations such as with S3 and other ideas. This is an experimental series where we will try different scenarios!</p><h3><strong>Containers, Docker and Kubernetes (K8s)</strong></h3><p><strong>You probably already know why we use containers….</strong> they package an application with its exact runtime, libraries, and dependencies into a single, lightweight, immutable unit that runs identically on a developer’s laptop, CI pipeline, staging, production on AWS, GCP, on-prem, or even an edge device.</p><p><strong>Initially, Docker containers became the go-to. Then, services like Amazon ECS simplified some basic orchestration of Docker containers.</strong></p><p>But running dozens or hundreds of Docker containers at scale was still too complex for many companies to pull off, especially when integrating AWS, Azure or GCP.</p><p><strong>Kubernetes (K8s) sought to address this… and has become more popular every year, filling that gap </strong>especially for larger enterprises and now even a growing number of small and medium-sized tech businesses.</p><p><strong>K8s is an advanced and more granular platform for container orchestration, management, and scaling. </strong>We have containers, Docker, pods, a lot of fancy controllers, YAML, kube-xxxx, CLIs and on AWS, for example, serverless options for running containers without managing underlying servers too (if using Fargate).</p><p><strong>While a container service like Amazon ECS is great for small numbers</strong> of Docker-based containers and AWS-native simplicit<strong>y,</strong> <strong>K8s (via EKS) offers a lot of other advantages: portability, lower-latency scalability</strong> and many other benefits discussed below for larger-scale solutions.</p><p><strong>🚀 For many, it’s intimidating. </strong>Let’s get over that, together.</p><h3><strong>Roadmap of this tutorial series</strong></h3><p><strong>Over several focused articles,</strong> we’ll progressively construct a production-like environment while deliberately comparing to other possible solutions, as <strong>Amazon ECS</strong> equivalents, so you can <strong>confidently decide which orchestration tool fits your use case. </strong>Also we’ll look at<strong> related tools in the K8s ecosystem.</strong></p><p>Brief summary of our plans over the series:</p><ul><li><strong>You’ll begin by creating an isolated AWS subaccount, </strong>spinning up a minimal<strong> EKS cluster</strong>, and installing <strong>CLIs</strong> and <strong>Terraform</strong>. Isolated environments in IAM and IaC (infrastructure as code) allows us excellence in the Well-Architected category of <strong>Operational Excellence.</strong></li><li><strong>Next, quickly deploy a simple Node.js app that serves video</strong> — giving you that satisfying “it works!” moment by the second article. Something is happening, its not all configs for the sake of configs.</li><li>From there? <strong>We’ll layer on tools</strong> for High Availability HA like S3 (highly-available 99.99% availability, 11 9s durability) storage, <strong>CloudFront</strong> for low-latency delivery, <strong>autoscaling</strong>, <strong>proper load balancing</strong>, k8s extras and you’ll be feeling a sense of accomplishment!!!</li><li>Then we’ll drill down further with monitoring with <strong>Kubecost</strong> (and related tools) and the <strong>Kubernetes Dashboard,</strong> and finally <strong>CI/CD </strong>with <strong>ArgoCD</strong>, turning the project into <strong>something you’d be proud to show</strong> as a portfolio project.</li></ul><p>By the end of this series you’ll not only have a <strong>cost-aware, auto-scaling video platform</strong> you can build or tear down in minutes, but also a deep, comparative understanding of <strong>K8s, EKS versus ECS</strong>, strong <strong>Terraform</strong> muscle memory, and exposure to the modern tools real teams use every day.</p><p><strong>Whether you’re preparing for cloud certifications, system-design interviews, </strong>or leveling up your Kubernetes game for your job, <strong>follow along, code along, and let’s make this Amazon EKS series project grow</strong> into an impressive, resume-worthy architecture.</p><p>🥰 Thanks for reading! … 🔥 please clap and share this article, thanks! 🚀</p><p>Preview of what is in this article, which is mostly setup for phase 1:</p><ol><li><strong>Why are we using Amazon EKS?</strong></li><li><strong>AWS Account with Organizations. </strong>Isolate tutorial resources.</li><li><strong>AWS CLI setup.</strong></li><li><strong>Terraform CLI v1.5+.</strong> Infrastructure as Code for all AWS resources.</li><li><strong>Install kubectl. </strong>Command-line tool to interact with Kubernetes.</li><li><strong>Docker basics and comparison to ECS.</strong></li><li><strong>Test Terraform</strong> + AWS</li><li><strong>Clean up</strong> test directory</li><li>What’s Next Preview</li></ol><p><strong>Appendix: Troubleshooting</strong></p><p><strong>Estimated time:</strong> 30–60 minutes hands-on</p><p><strong>Estimated cost: ~$0 in this article, creating only local setup and IAM, not other AWS resources.</strong> ⚠️ <strong>Note that later articles incur some small fees</strong> — assuming you use Terraform destroy to remove AWS resources at the end of the lesson within a couple hours. If you do not remove the AWS resources when you are done there will be ongoing charges.</p><p><strong>⚠️ Typically, I found the stacks we make in series part 2 and 3 each should cost ~$1-$2 (or less) for about 2 hours each article. </strong>You can remove the stack sooner and use the latest version of K8s for less cost.</p><p><strong>⚠️ Important: Use a recent version of K8s — AWS charges more per hour for “extended support”. We use 1.34 (current, “standard support” as of Dec. 2025) in our code to keep prices lower, </strong>but if you read this months or a year from now, you may want a newer version<strong>. Also note: prices may change when you read this, so exact cost could vary, </strong>keep an eye on it with AWS Cost Explorer.</p><h3>1. Why use Amazon EKS and K8s.</h3><p>Kubernetes is an open-source cloud tool that is separate from AWS. So we why are we using Amazon EKS (Amazon’s Kubernetes platform)?</p><p><strong>We’re using Amazon EKS because it gives us a fully managed Kubernetes control plane</strong> with zero downtime upgrades, <strong>easy integration with AWS services</strong> (ALB, CloudWatch, IAM, EBS/EFS), and the same production-grade experience used by Netflix, Expedia, and Snap.</p><ul><li><strong>True portability and multi-cloud. </strong>Kubernetes is the industry standard orchestration platform (used by Google, Netflix, Spotify, Airbnb, etc.). Once you know EKS, you can run the exact same manifests on GKE, AKS, DigitalOcean, on-prem, or even move to self-hosted k8s.</li><li><strong>Rich ecosystems and tools.</strong> Examples: <strong>Horizontal Pod Autoscaling + Cluster Autoscaler, </strong>Ingress controllers (ALB, NGINX, Traefik) with real Content-Based routing, canary deployments, etc. Unlike ECS, there are more options for Custom Controller logic with K8s on Amazon EKS.</li><li><strong>Best-in-class deployment and observability stack. </strong>Cost estimate tools, Prometheus/Grafana, K9s/Lens, ArgoCD GitOps — all native and battle-tested. Native support for canary, blue/green, and other custom delivery methods vs. Amazon ECS. Some who have migrated indicate observability is much better and efficient.</li><li><strong>Rich media workload primitives. </strong>Depending how far you want to take this demo you could learn a lot implementing<strong> </strong>Jobs/CronJobs for transcoding, HPA + Cluster Autoscaler for viral spikes, DaemonSets for logging/monitoring agents.</li><li><strong>Advanced networking &amp; scaling control. </strong>We can get more granular on pods and containers and use for scalability and high availability solutions. Some anecdotal discussions from larger companies indicate scaling is much faster with Amazon EKS than ECS. Snapchat has reported up to 50% decrease in scaling latency, after migrating from ECS to EKS, for example.</li><li><strong>In-demand job skill. </strong>A lot of jobs require some knowledge of AWS, k8s and this demonstrates the ability to take on advanced tasks. Amazon EKS + Terraform + ArgoCD, for example, is the stack that many modern startups and mid-size companies actually are implementing for media, gaming, and SaaS in 2026.</li></ul><h3>2. AWS Management Acct. &amp; Organizations Setup</h3><p>We’re going to create an isolated AWS subaccount, role and user specifically for this tutorial series.</p><p><strong>Why I am doing <em>that</em> first?</strong></p><p><strong>Is it really <em>necessary</em>?</strong> Yes.</p><p>If you do not want to, you can could skip the account isolation, but I do not recommend skipping this.</p><p>⚡️There are <strong>several key benefits:</strong></p><ul><li><strong>Cost isolation. </strong>Easily track tutorial spending separately.</li><li><strong>Security isolation. </strong>No risk to production resources.</li><li><strong>Easy cleanup. </strong>Delete the infra and entire account when done. Deleting infra is super-important to reduce costs, as some costs are charged for <strong>hourly</strong> usage. So when we are done in each section, we want to delete infra so we do not get charged.</li><li><strong>Since we are using Terraform (TF) for the app part, </strong>we can easily tear down and rebuild where we were at (delete, recreate).</li></ul><p>If you have not used Terraform, <strong>it’s like a blueprint template for your cloud infra you can run locally and check in templates into Git</strong> — you can automatically and destroy all the pieces easily and quickly. When completely done with everything we will delete the entire account to be sure.</p><ul><li><strong>Realistic practice. </strong>Multi-account is an AWS best practice and commonly used in Enterprise projects.</li></ul><h4>2.1 Be Aware of Potential Costs</h4><p>🚨⚠️ <strong>Possible AWS Cost Alert: </strong>Be aware that there may be some small costs involved in this project,<strong> using the AWS platform and Internet Gateway Interface — </strong>as of now, I am <em>estimating</em> <strong>it could be only $5-$10 USD for the full project </strong>(multiple articles)<strong>, if you do each section promptly <em>and</em> follow my instructions to destroy infra when complete.</strong> We will be using some free tier services, but not all are free. Keep tabs on billing.</p><p>If you are unsure or worried about it then setup AWS Cost Explorer and Budget alerts for the account.</p><p><strong>🚨 I will show you how to destroy the infra with Terraform (to reduce costs) and rebuild it quickly.</strong></p><ul><li>However, keep in mind, <strong>it is 100% <em>your</em> responsibility to confirm that all resources created/charged are also destroyed. If you do not do this, then it will cost and you will be charged in your Amazon bill.</strong></li><li><strong>So confirm you destroyed them at the end of each session.</strong> I will put some warnings top/bottom of each article to remind you.</li><li><strong>Cost awareness is part of being a cloud architect and engineer! </strong>If you do not feel up to that, then read up on it only before implementing this project.</li></ul><h4><strong>2.2 Initial AWS Organizations Setup for Subaccount</strong></h4><ol><li><strong>Sign in</strong> to the AWS Console, available after you signup to AWS.<br><a href="https://us-east-1.console.aws.amazon.com/console/home">https://us-east-1.console.aws.amazon.com/console/home</a> or which ever region you use<br>⚠️ <strong>It is recommended to NOT use your Root account</strong> if possible! <br>⚠️ But <strong>if you do need to use Root </strong>OR<strong> previously have a config set up where your Org is only managed by Root,</strong> then you will have to setup another admin user account to switch into this new subaccount. I do give instructions for that further below, but it takes about 10 min. extra.</li><li>Enable <strong>AWS Organizations if it is not enabled.</strong></li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vVsGuda2AuhEnX3AtfIkmA.png" /></figure><p>3. Click <strong>Create an organization (with all features)</strong></p><p>4. In the AWS Organizations console, click <strong>Add an AWS account.</strong></p><p>5. Select <strong>Create an AWS account</strong></p><ul><li><strong>AWS account name:</strong> eks-tutorial-dev</li><li><strong>Email address:</strong> Use a unique email for this account if possible.</li><li>Tip: If you use Gmail, you can use email aliases. For example, if your email is yourname@gmail.com, use yourname+eks-tutorial@gmail.com. All emails will still go to your main inbox.</li><li><strong>IAM role name:</strong> Leave as default (OrganizationAccountAccessRole). This allows you to assume access</li><li><strong>Tags: </strong>This is best practice, an enterprise may have a system but for these we will do these key values: Project=eks-video-tutorial, Environment=learning, Owner=eks-tutorial-dev, CostCenter=personal-learning, ManagedBy=terraform, DeleteAfter=20251230</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vNvcPzBlO0-TnDSlvjzLiA.png" /><figcaption>You can change “Chris” or leave it in honor of me 🤣</figcaption></figure><p>6. Click <strong>Create AWS accoun</strong>t and wait a few minutes to complete.</p><p>7. Note the <strong>Account ID</strong> of your new subaccount (a 12-digit number like 987654321098). You will need that soon.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xF8d4CoaJqN30ehCes7xVw.png" /></figure><h4>2.3 Access the Subaccount via Role Assumption</h4><p>Now we have to switch to the subaccount using the role.</p><p>⚠️ <strong>IMPORTANT</strong> — You <strong><em>cannot</em></strong> use your <strong>Root</strong> mgmt. AWS login to switch to a subaccount. <strong>AWS console does not allow you to switch to subaccounts from Root account.</strong></p><p><strong>So how to fix this? We will create a special admin account eks-admin </strong>for this project<strong>.</strong> Then you can switch from that!!!</p><ol><li><strong>In the top-right corner of the AWS Console,</strong> click on your account name/number. Do you have a <strong>Switch</strong> link? If so, click that and you are good. If Not continue below…</li><li>There are a couple ways of doing this next part. <strong>Several issues can cause you to <em>not</em> have the “Switch Role” link in the upper right </strong>due to some configs/AWS changes — you may not have it yet and it may be faster to just use the url directly below … if that works then you are good, but if it does not work continue below.</li><li><strong>✅ Prepare a url like this </strong>and put in your browser (one line) — replace 123456789012 in this sample with the new account ID (12 numbers)you were given<br>https://signin.aws.amazon.com/switchrole?account=123456789012&amp;roleName=OrganizationAccountAccessRole&amp;displayName=eks-tutorial-dev</li><li>Confirm the following:</li></ol><ul><li><strong>Account:</strong> Your new subaccount ID (e.g., 123456789012)</li><li><strong>Role:</strong> OrganizationAccountAccessRole</li><li><strong>Display Name:</strong> EKS-Tutorial (for easy identification)</li><li><strong>Color:</strong> Choose any color (helps identify which account you’re in)</li></ul><p>Now click the Switch button. <strong>You should now be in the subaccount.</strong></p><p><strong>⚠️ If a permissions error,</strong> read below, we will create a new admin account, and if that does not resolve the issue, you may need to add extra permissions to your mgmt. account. (Troubleshooting Appendix)</p><h4>2.4 Create admin user for this project</h4><p>We will now create a special admin account, calling it <strong>eks-admin</strong> for this project. Then you login to that and then you can switch to the subaccount.</p><ul><li>In your management account console, go to IAM: <a href="https://console.aws.amazon.com/iam/home#/users">https://console.aws.amazon.com/iam/home#/users</a></li><li>Click <strong>Create user</strong>.</li><li>User name: <strong>eks-admin</strong></li><li>Check Provide user access to the AWS Management Console.</li><li>Console password: Auto-generated (let AWS create one; you’ll reset it next).</li><li>Uncheck “Require password reset” (optional, but easier).</li></ul><p><strong>Permissions</strong></p><ul><li>Click Attach policies directly.</li><li>Search for and select AdministratorAccess as that will be needed for ths project.</li><li>Click Next.</li><li>Click Create user.</li></ul><p><strong>Re-login as the new admin user</strong></p><ol><li>Open a <strong>new incognito/private tab</strong> (to avoid root session conflicts).</li><li>Go to <a href="https://aws.amazon.com/console/">https://aws.amazon.com/console/</a>.</li><li>Select <strong>IAM user sign in</strong>.</li></ol><ul><li>Account ID: (your management ID 12 digit number<strong> of this new user, not the earlier new account number</strong>). You can find it on the new user’s IAM page, somethig like <strong>arn:aws:iam::</strong><strong>69019161xxxx:user/eks-admin (the number highlighted)</strong></li><li>IAM username: eks-admin.</li><li>Password: The one you copied.</li></ul><p>Now do the Switch.</p><p><strong>Use this URL below. You are using this new user to switch to the Organizatinos account.</strong></p><p><strong>Therefore for the url below you should now be using:</strong></p><ul><li><strong>The account number in Organizations we made earlier (first one, not the new user here)</strong></li><li>that account number name, such as if you used what I suggested <a href="https://signin.aws.amazon.com/switchrole?account=123456789012&amp;roleName=OrganizationAccountAccessRole&amp;displayName=eks-tutorial-dev">eks-tutorial-dev</a></li><li><strong>Url:</strong></li></ul><p><a href="https://signin.aws.amazon.com/switchrole?account=123456789012&amp;roleName=OrganizationAccountAccessRole&amp;displayName=eks-tutorial-dev">https://signin.aws.amazon.com/switchrole?account=123456789012&amp;roleName=OrganizationAccountAccessRole&amp;displayName=eks-tutorial-dev</a></p><p>If that does not work still then go to the Appendix: Troubleshooting Subaccount Switch at the bottom of the article.</p><h4>2.5 Create an IAM User in the Subaccount for CLI Access</h4><p>We need a user for CLI/Terraform (TF) for ops work:</p><ol><li>Navigate to <strong>IAM</strong> &gt; <strong>Users</strong> &gt; <strong>Create user</strong></li><li><strong>User details:</strong></li></ol><ul><li>User name: terraform-eks-admin</li><li>Click <strong>Next</strong></li></ul><p>3. Permissions</p><ul><li>Select <strong>Attach policies directly</strong></li><li>Search for and check AdministratorAccess</li></ul><p>note: For now in this demo so I can show many things while I am writing this we are using admin access in this subaccount, but ideally this should be narrowed to least privilege for security. We will revisit that.</p><ul><li>Click <strong>Next</strong></li></ul><p>4. Review and create:</p><ul><li>Click <strong>Create user</strong></li></ul><p><strong>5. Create access keys for CLI:</strong></p><ul><li>Click on the newly created user terraform-eks-admin</li><li>Go to <strong>Security credentials</strong> tab</li><li>Scroll to <strong>Access keys</strong> &gt; Click <strong>Create access key</strong></li><li>Select <strong>Command Line Interface (CLI)</strong></li><li>Check the confirmation box</li><li>Click <strong>Next</strong> &gt; <strong>Create access key</strong></li><li><strong>Important:</strong> Download the CSV and/or copy text to your secrets/password manager</li><li>Access key ID</li><li>Secret access key</li><li>You will need to configure these in your AWS CLI setup below.</li><li>Click <strong>Done</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/870/1*8HnvMiFiJQ10kbTjy34pCw.png" /></figure><p><strong>note: </strong>In an enterprise setup you probably will be using SSO (IAM Identify Center) but this will work for demo purposes for now. Just make sure to delete keys at some point when you are done, to prevent any security issues.</p><h3>3. Install and Configure AWS CLI &amp; Terraform</h3><p>I have written a whole article on this for MacOS and Windows at:</p><p><a href="https://medium.com/@csjcode/aws-cli-terraform-setup-for-macos-windows-64e8cdf805d5">AWS CLI/Terraform Setup for MacOS/Windows</a></p><p>⚠️ <strong>When you are done following the IAM setup article above</strong> “<a href="https://medium.com/@csjcode/aws-cli-terraform-setup-for-macos-windows-64e8cdf805d5"><strong>AWS CLI/Terraform Setup for MacOS/Windows</strong></a>” then continue the below instructions which assume you did that already.</p><p>Verify your identity when you are logged into the CLI</p><pre><br>aws sts get-caller-identity<br><br># or to check that the profile exists <br>aws sts get-caller-identity --profile terraform-eks-admin</pre><p>To avoid having to keep using — profile what you can do is</p><pre>export AWS_PROFILE=terraform-eks-admin</pre><p>Or add it to your shell to be default something like</p><pre># For bash<br>echo &#39;export AWS_PROFILE=eks-tutorial&#39; &gt;&gt; ~/.bashrc<br>source ~/.bashrc<br><br># For zsh (macOS default)<br>echo &#39;export AWS_PROFILE=eks-tutorial&#39; &gt;&gt; ~/.zshrc<br>source ~/.zshrc</pre><p>Now do</p><pre>aws sts get-caller-identity<br><br># should give you the correct account and username<br><br>aws s3 ls<br><br># no error; if incorrect login, will have an error</pre><h4>⚠️ Enable Cost Explorer for the Subaccount</h4><ol><li>While switched into the subaccount in the Console, go to <strong>Billing and Cost Management</strong></li><li>In the left sidebar, click <strong>Cost Explorer</strong></li><li>Click <strong>Enable Cost Explorer</strong> (takes 24 hours to populate data)</li></ol><h3>4. Install Terraform locally if you do not have it</h3><p>Same article as above but I’ll put it here again:</p><p><a href="https://medium.com/@csjcode/aws-cli-terraform-setup-for-macos-windows-64e8cdf805d5">AWS CLI/Terraform Setup for MacOS/Windows</a></p><p>Make sure to get Terraform setup, we’ll be using that.</p><p>For this article I am using</p><pre>terraform -v<br><br>Terraform v1.14.1</pre><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow.</strong></p><h3>5. Install kubectl (k8s)</h3><p><strong>kubectl is the official command-line tool f</strong>or interacting with Kubernetes clusters. Use it locally on your dev machine.</p><p>It allows you to deploy applications, inspect and manage cluster resources, view logs, execute commands in pods, and trigger scaling or rollouts with a single command, and more. It’s essential for interacting with out resources.</p><p>On MacOS:</p><pre>brew install kubectl<br><br># Output: <br><br>==&gt; Fetching downloads for: kubernetes-cli<br>✔︎ Bottle Manifest kubernetes-cli (1.34.3)                                                                [Downloaded    7.5KB/  7.5KB]<br>✔︎ Bottle kubernetes-cli (1.34.3)   </pre><p>On Windows</p><pre>choco install kubernetes-cli</pre><p>Linux (may vary slightly):</p><pre>curl -LO &quot;https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl&quot;<br><br>chmod +x kubectl<br><br>sudo mv kubectl /usr/local/bin/</pre><p><strong>Validate kubectl CLI response:</strong></p><pre>$ kubectl version --client<br>Client Version: v1.34.3<br>Kustomize Version: v5.7.1<br></pre><p><strong>Enable kubectl Autocompletion (Highly Recommended) — Bash example:</strong></p><pre>echo &#39;source &lt;(kubectl completion bash)&#39; &gt;&gt; ~/.bashrc<br>echo &#39;alias k=kubectl&#39; &gt;&gt; ~/.bashrc<br>echo &#39;complete -o default -F __start_kubectl k&#39; &gt;&gt; ~/.bashrc<br>source ~/.bashrc</pre><p>Now you can do something like k get pods instead of typing kubectl get pods — note: that if you do that right now you will get a server error because we have not hooked that up yet!</p><pre>$ k get pods<br><br># Output:<br>E1210 15:28:11.689728   32330 memcache.go:265] &quot;Unhandled Error&quot; err=&quot;couldn&#39;t get current server API group list: Get \&quot;http://localhost:8080/api?timeout=32s\&quot;: dial tcp [::1]:8080: connect: connection refused&quot;<br>E1210 15:28:11.691210   32330 memcache.go:265] &quot;Unhandled Error&quot; err=&quot;couldn&#39;t get current server API group list: Get \&quot;http://localhost:8080/api?timeout=32s\&quot;: dial tcp [::1]:8080: connect: connection refused&quot;</pre><h3>6. Docker Basics (Quick Review)</h3><p>You don’t need Docker installed locally for this article but you should have it for later, I believe article Part 3. <strong>Also</strong> <strong>you should understand these core concepts:</strong></p><ul><li><strong>Image: </strong>Packaged app with all dependencies (same for ECS and EKS)</li><li><strong>Container: </strong>Running instance of an image. (called ECS Task in ECS)</li><li><strong>Dockerfile: </strong>Building container image.</li><li><strong>Registry: </strong>Storage for images (in AWS ECR, Docker Hub) — used in both ECS and EKS</li><li><strong>Tag:</strong> Version label to keep track of versioning.</li></ul><p><strong>Differences between ECS and EKS (sample):</strong></p><ul><li><strong>“Pod Spec”</strong> (EKS), <strong>Task definition</strong> (Amazon ECS), “: container config.</li><li><strong>“Pod” </strong>(EKS), <strong>“Task”</strong> (ECS): Running container instance/s</li><li><strong>EKS Pod Identity </strong>(EKS), “<strong>Task Role”</strong> (ECS): IAM Permissions</li><li><strong>Replicas</strong> (EKS), <strong>Desired Count</strong> (ECS): How many instances to run</li><li><strong>Ingress + Service </strong>(EKS), ALB + Target Group (ECS): routes external traffic.</li><li><strong>Node Group / Karpenter </strong>(EKS), Capacity Provider (ECS): Managing EC2s</li><li><strong>HorizontalPodAutoscaler</strong> (HPA) (EKS), Auto Scaling (ECS)</li><li><strong>ArgoCD / Flux </strong>(EKS), CodeDeploy (ECS): GitOps/pipelines</li></ul><p>As you can see there are<strong> similarities and some differences</strong> on the container aspect for ECS and EKS. quick note: You can use other non-Docker container image platforms. For this we’ll use Docker runtime since it’s most common.</p><p><strong>Sample Dockerfile (plain text, not YAML)</strong></p><pre>FROM node:18-alpine<br># FROM node:18-alpine - Node.js version 18 on Alpine Linux as our base. <br><br># Set working directory inside container<br>WORKDIR /app<br><br># Copy dependency files first (for better caching)<br>COPY package*.json ./<br><br># Install dependencies<br>RUN npm ci --only=production<br># npm ci uses dep. caching, faster, RUN is during image build<br><br># Copy application code<br>COPY . .<br><br># Expose the port our app listens on<br>EXPOSE 3000<br><br># Command to run when container starts, CMD is when container starts<br>CMD [&quot;node&quot;, &quot;server.js&quot;]</pre><p><strong>A Dockerfile is a text filelist of instructions that Docker reads and executes from top to bottom to build a Docker image. </strong>As you can see the Dockerfile basics are easy to follow.</p><h4>Optional: Install Docker Desktop</h4><p>We’ll be needing later. create and test containers locally:</p><p><strong>macOS/Windows:</strong> Download Docker Desktop from <a href="https://www.docker.com/products/docker-desktop/">https://www.docker.com/products/docker-desktop/</a></p><pre>docker --version</pre><p>Dcoker is only optional right now. We will use it in article 3, so you do not need to do it now, but it is get to get familiar with it.</p><h3>7. Test Terraform + AWS Integration</h3><p>Before we build real infrastructure, let’s verify Terraform can communicate with your AWS subaccount correctly.</p><p><strong>Create a test directory </strong>(change path as you like):</p><pre>mkdir ~/eks-tutorial-test<br>cd ~/eks-tutorial-test</pre><p><strong>Create a Test Configuration File:</strong></p><p>We need main.tf for Terraform. This will just confirm everything works so far and we are on the right track for next tutorial article.</p><pre># Terraform configuration<br>terraform {<br>  required_version = &quot;&gt;= 1.5.0&quot;<br>  <br>  required_providers {<br>    aws = {<br>      source  = &quot;hashicorp/aws&quot;<br>      version = &quot;~&gt; 5.0&quot;<br>    }<br>  }<br>}<br><br># AWS Provider - uses your eks-tutorial profile<br>provider &quot;aws&quot; {<br>  region = &quot;us-east-1&quot;<br>  <br>  # Best practice: Default tags for all resources<br>  default_tags {<br>    tags = {<br>      Project     = &quot;eks-video-tutorial&quot;<br>      Environment = &quot;learning&quot;<br>      ManagedBy   = &quot;terraform&quot;<br>    }<br>  }<br>}<br><br># Data source - queries AWS for current identity<br>data &quot;aws_caller_identity&quot; &quot;current&quot; {}<br><br># Data source - gets current region<br>data &quot;aws_region&quot; &quot;current&quot; {}<br><br># Outputs - displayed after terraform apply<br>output &quot;account_id&quot; {<br>  description = &quot;AWS Account ID (should be your subaccount)&quot;<br>  value       = data.aws_caller_identity.current.account_id<br>}<br><br>output &quot;user_arn&quot; {<br>  description = &quot;Current IAM user ARN&quot;<br>  value       = data.aws_caller_identity.current.arn<br>}<br><br>output &quot;region&quot; {<br>  description = &quot;AWS Region&quot;<br>  value       = data.aws_region.current.name<br>}<br><br>output &quot;status&quot; {<br>  value = &quot;Terraform successfully connected to AWS account ${data.aws_caller_identity.current.account_id} in ${data.aws_region.current.name}&quot;<br>}</pre><p><strong>Initialize terraform</strong></p><pre>$ terraform init<br><br># Output:<br><br>Initializing the backend...<br>Initializing provider plugins...<br>- Finding hashicorp/aws versions matching &quot;~&gt; 5.0&quot;...<br>- Installing hashicorp/aws v5.100.0...<br>- Installed hashicorp/aws v5.100.0 (signed by HashiCorp)<br>Terraform has created a lock file .terraform.lock.hcl to record the provider<br>selections it made above. Include this file in your version control repository<br>so that Terraform can guarantee to make the same selections by default when<br>you run &quot;terraform init&quot; in the future.<br><br># Note: Here if you did not have all current libraries needed by TF<br># there will be some downloads as shown here<br><br>Terraform has been successfully initialized!<br><br>You may now begin working with Terraform. Try running &quot;terraform plan&quot; to see<br>any changes that are required for your infrastructure. All Terraform commands<br>should now work.<br><br>If you ever set or change modules or backend configuration for Terraform,<br>rerun this command to reinitialize your working directory. If you forget, other<br>commands will detect it and remind you to do so if necessary.</pre><p>Terraform apply (“111111111111” will be changed to your account number)</p><pre>$ terraform apply<br><br># Output<br><br> terraform apply<br>data.aws_caller_identity.current: Reading...<br>data.aws_region.current: Reading...<br>data.aws_region.current: Read complete after 0s [id=us-east-1]<br>data.aws_caller_identity.current: Read complete after 0s [id=111111111111]<br><br># and so on... there will be a lot of info about what is going to occur.<br><br><br>Do you want to perform these actions?<br>  Terraform will perform the actions described above.<br>  Only &#39;yes&#39; will be accepted to approve.<br><br>$ yes<br><br>Outputs:<br><br>account_id = &quot;111111111111&quot;<br>region = &quot;us-east-1&quot;<br>status = &quot;Terraform successfully connected to AWS account 111111111111 in us-east-1&quot;<br>user_arn = &quot;arn:aws:iam::111111111111:user/terraform-eks-admin&quot;</pre><p>And you can also check the output with this:</p><pre>$ terraform output<br><br># this will output the same info</pre><p>That’s it!</p><h3>8. Cleanup</h3><p>When you are sure you got the correct result and it’s connected as stated above, then you can delete the test folder.</p><pre>rm -Rf eks-tutorial-test</pre><ul><li><strong>NOTE: We do NOT have to run the </strong><strong>terraform destroy command</strong> used to delete Terraform AWS resources normally, because in this example we only created outputs not actual resources in AWS yet.</li></ul><h3>9. What’s Next: Article 2 Preview</h3><p>You’ve completed all the prerequisites! Your AWS subaccount is ready, tools are installed, and Terraform can communicate with AWS.</p><p><strong>In Article 2, we’ll build real infrastructure:</strong></p><ol><li><strong>Create the Terraform backend. </strong>S3 bucket + DynamoDB for state locking</li><li><strong>Build a multi-AZ VPC. </strong>Subnets across 3 availability zones</li><li><strong>Provision your EKS cluster. </strong>Using the terraform-aws-modules/eks module</li><li><strong>Deploy your first node group. </strong>2 × t3.small instances spread across AZs</li><li><strong>Connect kubectl to your cluster. </strong>Run kubectl get nodes and see real nodes!</li></ol><p><strong>Estimated time:</strong> 45–60 minutes hands-on</p><p><strong>Estimated cost:</strong> ~$1–2 if you complete and destroy within a few hours (note: costs may vary or change by the time you read this, so double-check, and keep close track)</p><h3>Looking Ahead…</h3><p>Here are coming article topics…</p><ul><li>Master <strong>production-grade EKS</strong> at scale (the #1 way companies run Kubernetes going into 2026)</li><li>Setup the basics of a a multi-AZ cluster with Kubernetes.</li><li>Deep <strong>Infrastructure as Code</strong> with Terraform (serious companies use it)</li><li>Learn <strong>multi-AZ high availability</strong> the right way (VPC, subnets, load balancers, node placement)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*BBm6aSCEjn3WFpHe87eZOw.jpeg" /></figure><p><strong>Possible future article topics (I’m still preparing these):</strong></p><ul><li>Understand <strong>EKS-optimized node groups</strong>, Karpenter vs Cluster Autoscaler trade-offs</li><li>Work with <strong>EKS Pod Identity (Agent)</strong> — critical for secure media apps talking to S3, DynamoDB, CloudFront, etc.</li><li>Build muscle memory: <strong>kubectl, helm, kustomize, eksctl, AWS CLI</strong> daily</li><li>Set up <strong>monitoring &amp; logging</strong> foundations (CloudWatch, Prometheus)</li><li>Get comfortable with <strong>ALB/NLB Ingress</strong>, cert-manager, external-dns — exactly what media sites need</li></ul><p>If we have time in the series we may even try some other things like Amazon EKS Capabilities “a layered set of fully managed cluster features that help accelerate developer velocity” .</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MMcqvekQPwNpXKYcat_TBA.png" /><figcaption>Basic INITIAL architecture diagram — Part 2 Terraform initial implementation</figcaption></figure><h3>⚠️ APPENDIX: <strong>Troubleshooting Subaccounts, Org Permissions and Switching between them</strong></h3><p>Reminder: You <strong>cannot</strong> switch to a subaccount from the Root account. If you are trying to do that it will not work.</p><p>The initial <strong>management account you logged into to create the Organization and subaccount must have permissions</strong> and/or policy attached for AWSOrganizationsFullAccess. If you get a permissions error, go to IAM &gt; Users &gt; the management account (used to create the above subaccount initially) and add AWSOrganizationsFullAccess policy.</p><p><strong>I have had a couple errors in the past related to this, </strong>and I used a different mgmt. acct. this time and I had to add that, so this part may require some extra steps depending on your configuration.</p><p>⚠️ <strong>Troubleshooting step #2:</strong> I also got this error. If it is still not working while into your mgmt account (that created the subaccount) then you need to add OrganizationAccountAccessRole.</p><ul><li>Go to IAM &gt; Roles &gt; Create Role (while logged into your mgmt account)</li><li>Choose “AWS account” → Choose “Another AWS account”</li><li>In the Account ID box, type the 12-digit ID of your eks-tutorial-dev subaccount</li><li>Click Next</li><li>On Add Permissions page tick the box next to AdministratorAccess &gt; Next</li><li>Then you role have Role Name (type exact): OrganizationAccountAccessRole</li><li>Description: “Management account access role”</li><li>And click “Create role” button.</li></ul><p><strong>Now try the same url we used above.</strong></p><p>⚠️ <strong>Troubleshooting step #3: </strong>If that still does not work, the next step is to login in directly to the IAM subaccount we created.</p><p>To login you log out of your other session &gt; login from the console using the option “login as root” and use your <strong><em>subaccount</em></strong><em> </em><strong><em>email address</em></strong><em>. </em>So if you did myemail+eks+tutorial@gmail.com (pattern suggested above, then login as that). Since you did not setup a password yet, click “forgot password” you will be sent a verification number to put in and setup a new password, and relogin with that.</p><p><strong>Once logged into the subaccount: </strong>Go to IAM &gt;Roles &gt; OrganizationAccountAccessRole &gt; Trust relationships &gt; Edit trust policy</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TT9V6IdjmQSIf1CTVPO64w.png" /></figure><p>🛠️ Get more like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow the Follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next article.</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p>Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Saving:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=176bdb085d32" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[✅ Cloud Memory Optimization Checklist (CC #2)]]></title>
            <link>https://cloudchecklists.com/cloud-memory-optimization-checklist-cc-2-a7ef27863e07?source=rss-649f4282ab20------2</link>
            <guid isPermaLink="false">https://medium.com/p/a7ef27863e07</guid>
            <category><![CDATA[google-cloud-platform]]></category>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[aws]]></category>
            <dc:creator><![CDATA[Chris St. John]]></dc:creator>
            <pubDate>Wed, 03 Dec 2025 15:47:22 GMT</pubDate>
            <atom:updated>2025-12-03T15:47:30.513Z</atom:updated>
            <content:encoded><![CDATA[<h4>Right-size RAM allocations, leverage caching, tune JVMs/containers, offload cold data for max efficiency</h4><p><strong>This is the second checklist in the Cloud Checklists series </strong>— focusing entirely on <strong>memory optimization </strong>across AWS, Azure, GCP, and cloud-native environments, and I am putting a bit more focus on Linux-based servers.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/1*9jn3119n8deRKUWzmrab4w.jpeg" /><figcaption>Cloud Memory Optimization Checklist (CC #2)</figcaption></figure><p><strong>Why memory deserves its own checklist:</strong></p><ul><li><strong>Memory is often the most expensive resource in cloud VMs</strong> — we need to get this right-sized. <strong>Most people remember CPU right-size, but ironically, <em>forget</em> memory!!!</strong></li><li><strong>Poor memory management causes OOM (out of memory) issues,</strong> garbage collection problems, and cascading performance failures more often (in my opinion) than CPU bottlenecks in CPU or I/O</li><li><strong>Most cloud cost overruns</strong> I see in real environments (50%+) come from chronically over-provisioned or poorly tuned memory rather than CPU waste.</li><li>This is a good one to use to <strong>catch issues others missed!</strong></li></ul><p>Here is what I am going to cover for Memory Optimization on cloud projects:</p><h4><strong>✅ Cloud Memory Optimization Checklist</strong></h4><ol><li><strong>Monitor memory </strong>utilization</li><li>Select memory-optimized instances</li><li>Add <strong>in-memory caching</strong></li><li>Use <strong>managed cache services</strong></li><li><strong>Compress inactive data</strong></li><li>Tune application <strong>heap size</strong></li><li><strong>Fix memory leaks</strong> fast</li><li><strong>Scale vertically</strong> before horizontally</li><li>Use <strong>swap only as last resort</strong></li><li><strong>Offload to object storage</strong></li></ol><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow the Cloud Checklists series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next checklist that I put out!</p><h3>1. Monitor memory utilization</h3><p><strong>Memory utilization and true memory pressure is hidden to most default cloud dashboards. For example, </strong>when Linux reports “used” memory, it includes file caches that can be reclaimed instantly under pressure. A system showing 90% memory “used” could still have plenty of headroom,</p><ul><li><strong>Enable any detailed memory metrics </strong>you can get from your platform, look up in the docs, it will give you more clues.</li><li><strong>Track out-of-memory events </strong>in container runtimes.</li></ul><p><strong>Essential metrics from </strong><strong>/proc/meminfo:</strong></p><ul><li><strong>MemAvailable</strong>: An estimate of memory available for new applications without swapping. This is more useful than MemFree because it accounts for reclaimable caches.</li><li><strong>MemFree</strong>: Truly unused memory (often misleadingly low on healthy systems)</li><li><strong>Cached</strong>: Memory used for file caches (mostly reclaimable)</li><li><strong>Buffers</strong>: Memory used for block device buffers</li><li><strong>SwapFree / SwapTotal</strong>: Swap usage (high swap activity indicates memory pressure)</li><li><strong>AnonPages</strong>: Memory allocated by applications (heap, stack) that isn’t backed by files — this is your actual application memory footprint</li><li><strong>Active vs Inactive</strong>: Memory recently accessed vs memory that can be reclaimed</li><li><strong>Set alerts on MemAvailable &lt; 10%:</strong> Reliable indicator of real memory pressure across all Linux OSs, though PSI (Pressure Stall Information) is most highly-regarded on newer Linux servers.</li><li><strong>Use unified observability such as from providers </strong>like Datadog Memory Deep Dive, New Relic APM, Prometheus node_exporter + cAdvisor for containers</li></ul><p>🚀 <strong>Tip: Pressure Stall Information (PSI): </strong>Recommended for Linux 4.20+ PSI is the modern, authoritative way to detect memory pressure.</p><p>CLI examples:</p><pre># Snapshot of memory state<br>free -h<br><br># Detailed breakdown<br>cat /proc/meminfo<br><br># Memory pressure (Linux 4.20+)<br>cat /proc/pressure/memory<br><br># Real-time stats with swap activity<br>vmstat 1<br><br># Per-process memory usage<br>ps aux --sort=-%mem | head -20<br><br># OOM killer logs<br>dmesg | grep -i &quot;killed process&quot;</pre><ul><li><strong>AWS: </strong>CloudWatch, enable detailed metrics with agent directly or through Systems Manager.</li><li><strong>Azure</strong>: Enable VM Insights + Guest OS diagnostics for Percentage Used Memory and Available MB</li><li><strong>GCP</strong>: Install Ops Agent and enable memory metrics in Cloud Monitoring</li><li><strong>Cloud Native: </strong>Use Prometheus + node_exporter node_memory_MemAvailable_bytes with Grafana dashboards</li></ul><figure><a href="https://aws.amazon.com/blogs/mt/setup-memory-metrics-for-amazon-ec2-instances-using-aws-systems-manager/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*h7aE_TxBsDba_OVUpgwxRg.png" /></a><figcaption>More info on setting up CloudWatch</figcaption></figure><h3>2. Select memory-optimized instances</h3><p>Choose instance families <strong>specifically designed for high memory</strong>-to-vCPU ratios when workloads are have the highest memory requirements.</p><p><strong>Example reasons you need to focus on memory-optimized instances:</strong></p><ul><li>Your app performance <strong>scales directly with available RAM</strong></li><li>Data is required to reside <strong>entirely in memory</strong> for acceptable latency</li><li>You’re<strong> over-provisioning vCPUs</strong> just to get enough memory</li><li><strong>Swapping or paging to disk</strong> is degrading performance</li></ul><p>🚀 <strong>Tip: Benchmark your actual workload on a small memory-optimized instance first </strong>— many “memory-heavy” apps actually become CPU-bound once given enough RAM.</p><p><strong>Platforms </strong>(examples I looked up check for more):</p><ul><li><strong>AWS: </strong>R6i/R7g (Intel/Graviton), X2gd (ARM with local NVMe), or High Memory u-*.metal</li><li><strong>Azure: </strong>Easv5/Epsv5 (AMD), Dasv5 (general), Masv5 (SAP HANA)</li><li><strong>GCP: </strong>Tau R4 (Ampere), M3/C3 (high-memory with local SSD)</li><li><strong>Cloud Native: </strong>Karpenter or Cluster Autoscaler</li></ul><h3>3. Add in-memory caching</h3><p>Move “hot” data from disk or database into RAM using application-level caches (Redis, Memcached, Caffeine, Guava, etc.).</p><p><strong>Why this is needed: </strong>Every time someone views a product page, your app queries the database to fetch product details, reviews, and recommendations. If that page gets 5,000 views per hour, you’re hitting the database 5,000 times for data that perhaps rarely changes.</p><ul><li><strong>Database hits can add up and be performance expensive, and add cost.</strong></li></ul><p><strong>Speed estimates (~ballpark):</strong></p><ul><li><strong>RAM access: </strong>~100 nanoseconds</li><li><strong>SSD access: </strong>1,000x slower</li><li><strong>Database query over network:</strong> 100,000x+ slower</li></ul><p><strong><em>I am not going into every caching strategy, </em></strong>it would be an insanely long article, and there are many. <strong>when I was at NIKE I did an innovation sprint report on several and strategies and it was like 20–30 slides (1hr)</strong>, and I went over time limits 🤣. Just be aware of these, and research your use case when it comes up.</p><p><strong>Cache data that is:</strong></p><ul><li><strong>Read frequently: </strong>product catalogs, user profiles, configuration settings.</li><li><strong>Expensive to compute: </strong>analytics dashboards, report aggregations.</li><li><strong>Slow to fetch: </strong>data from external APIs or complex database joins.</li><li><strong>Relatively stable: </strong>data that doesn’t change every second.</li></ul><p><strong>🚀 Tip:</strong> Pre-warm caches on deployment using a “cache primer” or Lambda that reads the top 10K keys</p><p><strong>Well known cache software to explore </strong>(instance-based but I also gave cloud versions):</p><p><strong>Redis</strong> (popular): Stores key-value pairs in memory. Multiple application servers can share the same cache. Complex data types, persistence, and pub/sub. Great for distributed systems. For cloud: <em>AWS ElastiCache for Redis, Azure Cache for Redis, or GCP Memorystore for Redis as fully managed services. AWS also offer Valkey which is a similar fork that has lower cost.</em></p><figure><a href="https://redis.io/software/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9InkaXu84h56q0fcFPmcvw.png" /></a></figure><p><strong>Memcached</strong>: Similar to Redis but simpler, less bells-and-whistles. Used in many high volume stacks. Very fast. For cloud: <em>AWS ElastiCache for Memcached or GCP Memorystore for Memcached.</em></p><figure><a href="https://memcached.org/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*y0S4qZ6q43cuQnB-LOyjXA.png" /></a></figure><p><strong>Caffeine</strong> (Java): An in-process cache library. Data lives inside your application’s memory. Fast but not shared between servers. For cloud: <em>Can run on various cloud compute services for each platform.</em></p><p><strong>Guava Cache</strong> (Java): Google’s library, similar to Caffeine but older. Still widely used in existing codebases.</p><h3>4. Use managed cache services</h3><p>Offload cache operations, eviction, replication, backups, and scaling to fully managed services instead of self-hosted Redis.</p><p><strong>As mentioned above in more detail these are most popular for cloud services </strong>(copied from where I mentioned it above)</p><p><strong>Redis</strong> (popular): <em>AWS ElastiCache for Redis, Azure Cache for Redis, or GCP Memorystore for Redis as fully managed services. AWS also offer Valkey which is a similar fork that has lower cost.</em></p><p><strong>Memcached</strong>: <em>AWS ElastiCache for Memcached, Azure Cache for Redis (Memcached tier), or GCP Memorystore for Memcached.</em></p><p>Also some people use other services like <strong>Upstash, which is popular for Vercel apps</strong>.</p><figure><a href="https://upstash.com/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2lUUSDGj8z-IHN97AMyh9g.png" /></a></figure><p>Managed cache services are quite popular and my experience I they do good for many use cases:</p><ul><li><strong>Zero-ops, automatic failover, </strong>encryption at rest/transit, and built-in monitoring</li><li><strong>Pay only for provisioned memory + requests </strong>(often cheaper than running 24/7 standalone)</li></ul><p>🚀 Tip: Enable <strong>“cluster mode disabled” + Multi-AZ </strong>for Redis if you only need one shard (has limits but is fast).</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next checklist!</p><h3>5. Compress inactive data</h3><p>Compress cold or infrequently accessed objects in memory or before persisting to disk.</p><ul><li><strong>Lower memory usage = smaller instances or more capacity </strong>on existing ones</li><li>More active data fits in memory when <strong>inactive data is compressed</strong></li><li><strong>App</strong>-based, <strong>time</strong>-based compression, <strong>Access</strong>-based compression (access patterns), <strong>columnar</strong> compression (similar data types together)</li></ul><p>Popular algorithms for app level compressions:</p><ul><li><strong>Gzip:</strong> Best compression ratio, slower</li><li><strong>LZ4: </strong>Fast compression/decompression, moderate ratio</li><li><strong>Snappy:</strong> Very fast, lower ratio</li><li><strong>Zstandard: </strong>Good balance of speed and ratio</li></ul><figure><a href="https://www.gzip.org/"><img alt="" src="https://cdn-images-1.medium.com/max/986/1*Lm8LBrHGRROcrYF1BIfELw.png" /></a></figure><p>Tools with caching services and software:</p><ul><li><strong>Redis</strong>: Built-in compression for large values</li><li><strong>Memcached</strong>: Can store compressed data manually</li><li><strong>In-memory DBs</strong>: Often have native compression</li></ul><p><strong><em>⚡️ Quick promo message ⚡️</em></strong><em> (article continues below)</em></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>6. Tune application heap size.</h3><p>Over-provisioned JVMs/Node.js/Python heaps waste huge amounts of RAM and trigger excessive GC.</p><p><strong>The heap is the chunk of RAM your application uses to store objects</strong> and data while running.</p><p>In JVM (Java), Node.js, and Python, you can configure how much memory the heap can use. <strong>It is the maximum amount of RAM allocated for your application’s runtime data</strong> (NOT including code, stack, or native memory).</p><p>😱 So if you don’t have enough heap allocated, you’re screwed. But you should not have too much either.</p><p>What happens if you have…</p><p><strong>Too much (heap allocated):</strong></p><ul><li>Wastes RAM that other apps could use.</li><li>Garbage collection (GC) pauses become longer.</li><li>Waste money on too much RAM. If you’re running 10 containers, each with 10GB heap when they only need 2GB, you’re paying for 80GB of wasted RAM.</li></ul><p><strong>Too little:</strong></p><ul><li>Garbage collection “thrashing” (constant).</li><li>Out of memory errors.</li><li>Application crashes.</li></ul><p><strong>Recommended:</strong></p><ul><li><strong>1.5–2x your actual working memory needs</strong>.</li></ul><p>Example with Node.js</p><pre># Default: ~1.4GB on 64-bit systems<br>node app.js<br><br># Tuned: Set max old space (heap) size<br>node --max-old-space-size=2048 app.js  # 2GB heap</pre><h3>7. Fix memory leaks fast</h3><p>A memory leak is when an app allocates memory but fails to release it, causing gradual memory consumption growth until the system runs out of RAM or crashes.</p><p>Even tiny leaks (a few MB/hour) become multi-GB problems in long-running cloud services.</p><ul><li><strong>Implement continuous memory profiling in production</strong></li><li><strong>Don’t wait for OOM</strong> (Out of Memory) errors to set up alerts, do it preemptively.</li><li><strong>Set alerts when memory usage increases &gt;5% per hour</strong> over baseline</li><li><strong>Circular references </strong>are the #1 cause of leaks in Python, JavaScript</li><li><strong>Enable memory leak detection in CI/CD pipelines</strong></li></ul><p><strong>Platforms:</strong></p><p><strong>AWS:</strong> Use CloudWatch Container Insights with anomaly detection. Automatic alerts on memory trend deviations</p><p><strong>Azure:</strong> Application Insights Live Metrics + Snapshot Debugger</p><p><strong>GCP:</strong> Cloud Profiler continuous profiling</p><p><strong>Kubernetes:</strong> Use VerticalPodAutoscaler with memory leak restart policies</p><h3>8. Scale vertically before horizontally</h3><p><strong>Adding more RAM to a single instance is almost always cheaper and lower latency</strong> than distributing across many small instances.</p><ul><li>Network hops, serialization, and sharding overhead often dominate once you go beyond 4–8 nodes</li><li>Vertical scaling avoids split-brain, quorum, and consistency headaches</li></ul><p>🚀 Tip: Use <strong>AWS EC2 High Memory (u-*.metal) </strong>or Azure Mv2 for &gt;12 TB single-node Redis — impossible to achieve horizontally at same price/performance</p><h3>9. Use swap only as last resort</h3><p>Swap kills performance and causes instance termination in spot/preemptible environments.</p><ul><li>Linux default swappiness=60 is far too high — set to 1 or 0</li></ul><pre># Never do this in cloud<br>echo &#39;vm.swappiness=60&#39; &gt;&gt; /etc/sysctl.conf   # ← BAD<br>echo &#39;vm.swappiness=1&#39;  &gt;&gt; /etc/sysctl.d/99-low-swappiness.conf  # ← GOOD</pre><ul><li>Cloud providers terminate instances that swap heavily on burstable/spot nodes</li></ul><h3>10. Offload to object storage</h3><p><strong>Move large blobs, logs, backups, and static assets out of instance memory/disks</strong> into S3/GCS/Azure Blob.</p><ul><li>Use<strong> S3 Intelligent-Tiering or GCS Nearline f</strong>or infrequently accessed data</li></ul><p><strong>🚀 Tip: </strong>Use AWS S3 + CacheControl headers + CloudFront to make even large 50GB+ datasets feel like local RAM with &lt;50 ms first-byte latency</p><ul><li><strong>AWS S3 / Glacier Instant Retrieval</strong></li><li><strong>Azure Blob Storage Hot/Cool tiers</strong></li><li><strong>GCP Cloud Storage Nearline + Cloud CDN</strong></li></ul><p>🛠️ Get more tips like this at <a href="https://www.systemsarchitect.io"><strong>https://www.systemsarchitect.io</strong></a><strong> </strong>and follow the Cloud Checklists series here. 🚀Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a> — we follow back!</p><p><strong>🥰 Thanks for reading!</strong> 🔥 Please <strong>clap</strong>, <strong>share</strong>, and <strong>follow</strong> for the next checklist that I put out!</p><p><strong><em>⚡️ Quick promo message ⚡️</em></strong></p><ul><li>If you would like to <strong>beta test and get involved with my new app </strong><a href="https://www.systemsarchitect.io/"><strong>SystemsArchitect.io</strong></a><strong> for cloud engineering </strong>check it out, and feel free to send me any comments. You are early, your input counts!</li><li><strong>The Free login (Google/Github or email link signup) gives you access to substantial cloud engineering content,</strong> and I’ll be giving some good <strong>Pro discounts</strong> for testers later for the Pro plan. <em>It’s a slow rollout because there is a lot to test!</em></li></ul><figure><a href="https://www.systemsarchitect.io/"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-yZ21fjpUxzRlVWayAnO9A.png" /></a><figcaption><a href="https://www.systemsarchitect.io/">https://www.systemsarchitect.io/</a></figcaption></figure><p><em>I appreciate any feedback and beta testers who give me feedback may get special discounts and priority access to new features.</em></p><h3>About me</h3><p><strong>I’m a cloud architect, tech lead and founder who enjoys solving high-value challenges with innovative solutions.</strong></p><p><strong>I’m open to discussing projects, for both enterprise and startups.</strong> If you have an idea, an opportunity to discuss or feedback, you can reach me at csjcode at gmail.</p><p><strong>🚀 My current project </strong>I am working on is <a href="https://systemsarchitect.io"><strong>SystemsArchitect.io</strong></a><strong> (in Beta testing) </strong>which is my site/app for Cloud Engineers (Cloud Architects, Devs and DevOps). It consists of years of research and writing I have done on cloud best practices and then further integrates that with my prior cloud books, and also code solutions and tutorials integrated using multiple AIs and other cloud tools. <strong>Check it out: </strong><a href="https://systemsarchitect.io"><strong>https://systemsarchitect.io</strong></a></p><p>Also follow the <strong>SystemsArchitect X account</strong>: <a href="https://x.com/systemsarch">https://x.com/systemsarch</a></p><p><strong>My latest articles on Medium:</strong> <a href="https://medium.com/@csjcode">https://medium.com/@csjcode</a></p><p><strong>Cloud Cost Saving:</strong> <a href="https://medium.com/cloud-cost-savings">https://medium.com/cloud-cost-savings</a></p><p><strong>Cloud Architect Review:</strong> <a href="https://medium.com/cloud-architect-review">https://medium.com/cloud-architect-review</a></p><p><strong>AI Dev Tips:</strong> <a href="https://medium.com/ai-dev-tips">https://medium.com/ai-dev-tips</a></p><p><strong>Solana Dev Tips:</strong> <a href="https://medium.com/solana-dev-tips">https://medium.com/solana-dev-tips</a></p><p><a href="https://medium.com/@csjcode/subscribe?source=post_page-----21534a072917---------------------------------------">Chris St. John - Medium</a></p><p><strong>I’ve worked 20+ years in software development</strong>, both in an <strong>enterprise</strong> setting such as NIKE and the original MP3.com, as well as <strong>startups</strong> like FreshPatents, Verafy AI, SystemsArchitect.io, and Instantiate.io.</p><p>My experience ranges from <strong>cloud ecommerce, API design/implementation,</strong> serverless, <strong>multiple</strong> <strong>AI integration</strong> for development, content management, <strong>frontend UI/UX architecture</strong> and login/authentication. I give tech talks, tutorials and share documentation for architecting software. Also previously held AWS Solutions Architect certification.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a7ef27863e07" width="1" height="1" alt=""><hr><p><a href="https://cloudchecklists.com/cloud-memory-optimization-checklist-cc-2-a7ef27863e07">✅ Cloud Memory Optimization Checklist (CC #2)</a> was originally published in <a href="https://cloudchecklists.com">Cloud Checklists</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>