<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[ITNEXT - Medium]]></title>
        <description><![CDATA[ITNEXT is a platform for IT developers &amp; software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies. - Medium]]></description>
        <link>https://itnext.io?source=rss----5b301f10ddcd---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>ITNEXT - Medium</title>
            <link>https://itnext.io?source=rss----5b301f10ddcd---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 17 May 2026 06:19:41 GMT</lastBuildDate>
        <atom:link href="https://itnext.io/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Building MCP Apps with Angular]]></title>
            <link>https://itnext.io/building-mcp-apps-with-angular-8721e83572ab?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/8721e83572ab</guid>
            <category><![CDATA[typescript]]></category>
            <category><![CDATA[mcp-app]]></category>
            <category><![CDATA[mcp-server]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[angular]]></category>
            <dc:creator><![CDATA[Dale Nguyen]]></dc:creator>
            <pubDate>Sat, 16 May 2026 19:05:20 GMT</pubDate>
            <atom:updated>2026-05-16T19:05:18.368Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*b97rz8ll2jiWy2va.png" /><figcaption>Building MCP Apps with Angular</figcaption></figure><p>If you’ve been building MCP servers, you know the drill: your tool returns JSON, the host renders it as text, and the user squints at a timestamp string. <a href="https://github.com/modelcontextprotocol/ext-apps">MCP Apps</a> change that — they let your server ship an interactive UI that the host renders in an iframe, right inside the conversation.</p><p>MCP Apps are built on the <strong>Model Context Protocol</strong> — an open standard. They’re not tied to Claude or any specific AI provider. Any host that implements the MCP Apps specification (Claude Desktop, custom chat clients, or other AI assistants that adopt MCP) can render your UI. You build it once, and it works everywhere MCP is supported.</p><p>This post walks through building MCP Apps with Angular. We’ll start with a single tool, add a second tool with its own UI, and then show how to share code between them without bloating either bundle.</p><h3>How MCP Apps Work (Quick Recap)</h3><pre>View (Angular App) &lt;--PostMessageTransport--&gt; Host (AppBridge) &lt;--MCP Client--&gt; MCP Server</pre><ul><li><strong>Server</strong> registers tools and resources. Each tool can point to a resource URI containing the UI.</li><li><strong>Host</strong> (the chat client) fetches that resource and renders it in a sandboxed iframe.</li><li><strong>View</strong> is your Angular app running inside that iframe. It uses the App class from @modelcontextprotocol/ext-apps to communicate with the host.</li></ul><p>The key insight: your UI is bundled into a <strong>single self-contained HTML file</strong> using Vite and vite-plugin-singlefile. The host doesn&#39;t need to know it&#39;s Angular — it just loads HTML.</p><h3>Project Structure</h3><pre>basic-server-angular/<br>├── mcp-app.html              # HTML entry point for UI #1<br>├── greeting-app.html         # HTML entry point for UI #2<br>├── src/<br>│   ├── main.ts               # Angular bootstrap for UI #1<br>│   ├── app.component.ts      # Get Time component<br>│   ├── greeting-main.ts      # Angular bootstrap for UI #2<br>│   ├── greeting.component.ts # Greeting component<br>│   ├── shared/<br>│   │   └── mcp-app-setup.ts  # Shared App + theming setup<br>│   └── global.css            # Host-aware CSS variables<br>├── server.ts                 # MCP server (registers tools + resources)<br>├── main.ts                   # Server entry point (HTTP + stdio)<br>└── vite.config.ts            # Builds each HTML into a single file</pre><h3>Step 1: The Server</h3><p>Every MCP App starts on the server side. You register a <strong>tool</strong> (what the LLM calls) and a <strong>resource</strong> (the HTML that gets rendered). They’re linked by a resource URI.</p><pre>// server.ts<br>import { McpServer } from &quot;@modelcontextprotocol/sdk/server/mcp.js&quot;;<br>import type { CallToolResult, ReadResourceResult } from &quot;@modelcontextprotocol/sdk/types.js&quot;;<br>import fs from &quot;node:fs/promises&quot;;<br>import path from &quot;node:path&quot;;<br>import {<br>  registerAppTool,<br>  registerAppResource,<br>  RESOURCE_MIME_TYPE,<br>} from &quot;@modelcontextprotocol/ext-apps/server&quot;;const DIST_DIR = import.meta.filename.endsWith(&quot;.ts&quot;)<br>  ? path.join(import.meta.dirname, &quot;dist&quot;)<br>  : import.meta.dirname;<br><br>export function createServer(): McpServer {<br>  const server = new McpServer({<br>    name: &quot;Basic MCP App Server (Angular)&quot;,<br>    version: &quot;1.0.0&quot;,<br>  });<br>  const resourceUri = &quot;ui://get-time/mcp-app.html&quot;;<br>  // Register the tool - this is what the LLM calls<br>  registerAppTool(server, &quot;get-time&quot;, {<br>    title: &quot;Get Time&quot;,<br>    description: &quot;Returns the current server time as an ISO 8601 string.&quot;,<br>    inputSchema: {},<br>    _meta: { ui: { resourceUri } }, // Links this tool to its UI<br>  }, async (): Promise&lt;CallToolResult&gt; =&gt; {<br>    const time = new Date().toISOString();<br>    return { content: [{ type: &quot;text&quot;, text: time }] };<br>  });<br>  // Register the resource - the bundled HTML for this tool&#39;s UI<br>  registerAppResource(server, resourceUri, resourceUri, {<br>    mimeType: RESOURCE_MIME_TYPE,<br>  }, async (): Promise&lt;ReadResourceResult&gt; =&gt; {<br>    const html = await fs.readFile(<br>      path.join(DIST_DIR, &quot;mcp-app.html&quot;),<br>      &quot;utf-8&quot;,<br>    );<br>    return {<br>      contents: [{ uri: resourceUri, mimeType: RESOURCE_MIME_TYPE, text: html }],<br>    };<br>  });<br>  return server;<br>}</pre><p>The _meta.ui.resourceUri is the glue. When the host calls this tool, it reads that field to know which resource to fetch and render.</p><h3>Step 2: The HTML Entry Point</h3><p>Each UI needs an HTML file at the project root. This is the Vite entry point that gets bundled into a single self-contained file.</p><pre>&lt;!-- mcp-app.html --&gt;<br>&lt;!DOCTYPE html&gt;<br>&lt;html lang=&quot;en&quot;&gt;<br>&lt;head&gt;<br>  &lt;meta charset=&quot;UTF-8&quot;&gt;<br>  &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1.0&quot;&gt;<br>  &lt;meta name=&quot;color-scheme&quot; content=&quot;light dark&quot;&gt;<br>  &lt;title&gt;Get Time App&lt;/title&gt;<br>  &lt;link rel=&quot;stylesheet&quot; href=&quot;/src/global.css&quot;&gt;<br>&lt;/head&gt;<br>&lt;body&gt;<br>  &lt;app-root&gt;&lt;/app-root&gt;<br>  &lt;script type=&quot;module&quot; src=&quot;/src/main.ts&quot;&gt;&lt;/script&gt;<br>&lt;/body&gt;<br>&lt;/html&gt;</pre><h3>Step 3: The Angular App</h3><p>The bootstrap is minimal — Angular 19+ with zoneless change detection:</p><pre>// src/main.ts<br>import &quot;@angular/compiler&quot;;<br>import { bootstrapApplication } from &quot;@angular/platform-browser&quot;;<br>import { provideZonelessChangeDetection } from &quot;@angular/core&quot;;<br>import { AppComponent } from &quot;./app.component&quot;;<br>import &quot;./global.css&quot;;<br><br>bootstrapApplication(AppComponent, {<br>  providers: [provideZonelessChangeDetection()],<br>}).catch((err) =&gt; console.error(err));</pre><p>Now the component itself. The App class from @modelcontextprotocol/ext-apps is the bridge between your Angular code and the host:</p><pre>// src/app.component.ts<br>import { Component, type OnInit, signal } from &quot;@angular/core&quot;;<br>import {<br>  App,<br>  applyDocumentTheme,<br>  applyHostStyleVariables,<br>  applyHostFonts,<br>  type McpUiHostContext,<br>} from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import type { CallToolResult } from &quot;@modelcontextprotocol/sdk/types.js&quot;;<br><br>function extractText(result: CallToolResult): string {<br>  return result.content?.find((c) =&gt; c.type === &quot;text&quot;)!.text;<br>}<br>@Component({<br>  selector: &quot;app-root&quot;,<br>  template: `<br>    &lt;main<br>      [style.padding-top.px]=&quot;hostContext()?.safeAreaInsets?.top&quot;<br>      [style.padding-right.px]=&quot;hostContext()?.safeAreaInsets?.right&quot;<br>      [style.padding-bottom.px]=&quot;hostContext()?.safeAreaInsets?.bottom&quot;<br>      [style.padding-left.px]=&quot;hostContext()?.safeAreaInsets?.left&quot;<br>    &gt;<br>      &lt;p&gt;&lt;strong&gt;Server Time:&lt;/strong&gt; &lt;code&gt;{{ serverTime() }}&lt;/code&gt;&lt;/p&gt;<br>      &lt;button (click)=&quot;handleGetTime()&quot;&gt;Get Server Time&lt;/button&gt;<br>    &lt;/main&gt;<br>  `,<br>})<br>export class AppComponent implements OnInit {<br>  private app: App | null = null;<br>  hostContext = signal&lt;McpUiHostContext | undefined&gt;(undefined);<br>  serverTime = signal(&quot;Loading...&quot;);<br>  async ngOnInit() {<br>    const instance = new App({ name: &quot;Get Time App&quot;, version: &quot;1.0.0&quot; });<br>    // Called when the host sends tool results back to the UI<br>    instance.ontoolresult = (result) =&gt; {<br>      this.serverTime.set(extractText(result));<br>    };<br>    // Respond to theme and style changes from the host<br>    instance.onhostcontextchanged = (params) =&gt; {<br>      const ctx = { ...this.hostContext(), ...params };<br>      this.hostContext.set(ctx);<br>      if (ctx.theme) applyDocumentTheme(ctx.theme);<br>      if (ctx.styles?.variables) applyHostStyleVariables(ctx.styles.variables);<br>      if (ctx.styles?.css?.fonts) applyHostFonts(ctx.styles.css.fonts);<br>    };<br>    // Connect to the host via PostMessageTransport<br>    await instance.connect();<br>    this.app = instance;<br>    // Apply initial host context<br>    const ctx = instance.getHostContext();<br>    this.hostContext.set(ctx);<br>    if (ctx?.theme) applyDocumentTheme(ctx.theme);<br>    if (ctx?.styles?.variables) applyHostStyleVariables(ctx.styles.variables);<br>    if (ctx?.styles?.css?.fonts) applyHostFonts(ctx.styles.css.fonts);<br>  }<br>  async handleGetTime() {<br>    if (!this.app) return;<br>    try {<br>      const result = await this.app.callServerTool({<br>        name: &quot;get-time&quot;,<br>        arguments: {},<br>      });<br>      this.serverTime.set(extractText(result));<br>    } catch {<br>      this.serverTime.set(&quot;[ERROR]&quot;);<br>    }<br>  }<br>}</pre><p>A few things to note:</p><ul><li><strong>App class</strong> — this is the SDK&#39;s main entry point. You create one, wire up callbacks, and call connect(). That&#39;s it.</li><li><strong>ontoolresult</strong> — fires when the host sends a tool result. This is how data flows from the server to your UI.</li><li><strong>callServerTool()</strong> — lets the UI call tools on the server. The host proxies this through the MCP client.</li><li><strong>onhostcontextchanged</strong> — the host pushes theme and style updates. The helper functions (applyDocumentTheme, etc.) apply them as CSS variables on document, so your component styles just work.</li><li><strong>safeAreaInsets</strong> — the host tells you how much padding to leave for its chrome. Use it on your root container.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*G1UOknAjICsfzJIR.png" /><figcaption>Get time UI</figcaption></figure><h3>Step 4: Adding a Second UI</h3><p>Here’s where it gets interesting. Say you want a “Greet” tool with its own UI. Each tool gets its own HTML entry point, its own Angular app, and its own resource registration.</p><h3>Server: Register Both Tools</h3><pre>// server.ts — inside createServer()<br><br>// Tool #1: Get Time<br>const timeResourceUri = &quot;ui://get-time/mcp-app.html&quot;;<br>registerAppTool(server, &quot;get-time&quot;, {<br>  title: &quot;Get Time&quot;,<br>  description: &quot;Returns the current server time.&quot;,<br>  inputSchema: {},<br>  _meta: { ui: { resourceUri: timeResourceUri } },<br>}, async (): Promise&lt;CallToolResult&gt; =&gt; {<br>  return { content: [{ type: &quot;text&quot;, text: new Date().toISOString() }] };<br>});<br>registerAppResource(server, timeResourceUri, timeResourceUri, {<br>  mimeType: RESOURCE_MIME_TYPE,<br>}, async (): Promise&lt;ReadResourceResult&gt; =&gt; {<br>  const html = await fs.readFile(path.join(DIST_DIR, &quot;mcp-app.html&quot;), &quot;utf-8&quot;);<br>  return { contents: [{ uri: timeResourceUri, mimeType: RESOURCE_MIME_TYPE, text: html }] };<br>});<br>// Tool #2: Greet<br>const greetResourceUri = &quot;ui://greet/greeting-app.html&quot;;<br>registerAppTool(server, &quot;greet&quot;, {<br>  title: &quot;Greet&quot;,<br>  description: &quot;Returns a personalised greeting.&quot;,<br>  inputSchema: {<br>    name: z.string().optional().default(&quot;World&quot;).describe(&quot;Name to greet&quot;),<br>  },<br>  _meta: { ui: { resourceUri: greetResourceUri } },<br>}, async ({ name }: { name?: string }): Promise&lt;CallToolResult&gt; =&gt; {<br>  const greeting = `Hello, ${name || &quot;World&quot;}! Welcome to the MCP Apps SDK.`;<br>  return { content: [{ type: &quot;text&quot;, text: greeting }] };<br>});<br>registerAppResource(server, greetResourceUri, greetResourceUri, {<br>  mimeType: RESOURCE_MIME_TYPE,<br>}, async (): Promise&lt;ReadResourceResult&gt; =&gt; {<br>  const html = await fs.readFile(path.join(DIST_DIR, &quot;greeting-app.html&quot;), &quot;utf-8&quot;);<br>  return { contents: [{ uri: greetResourceUri, mimeType: RESOURCE_MIME_TYPE, text: html }] };<br>});</pre><h3>Greeting Component</h3><p>The greeting UI is a completely separate Angular app:</p><pre>&lt;!-- greeting-app.html --&gt;<br>&lt;!DOCTYPE html&gt;<br>&lt;html lang=&quot;en&quot;&gt;<br>&lt;head&gt;<br>  &lt;meta charset=&quot;UTF-8&quot;&gt;<br>  &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1.0&quot;&gt;<br>  &lt;meta name=&quot;color-scheme&quot; content=&quot;light dark&quot;&gt;<br>  &lt;title&gt;Greeting App&lt;/title&gt;<br>  &lt;link rel=&quot;stylesheet&quot; href=&quot;/src/global.css&quot;&gt;<br>&lt;/head&gt;<br>&lt;body&gt;<br>  &lt;greeting-root&gt;&lt;/greeting-root&gt;<br>  &lt;script type=&quot;module&quot; src=&quot;/src/greeting-main.ts&quot;&gt;&lt;/script&gt;<br>&lt;/body&gt;<br>&lt;/html&gt;</pre><pre>// src/greeting-main.ts<br>import &quot;@angular/compiler&quot;;<br>import { bootstrapApplication } from &quot;@angular/platform-browser&quot;;<br>import { provideZonelessChangeDetection } from &quot;@angular/core&quot;;<br>import { GreetingComponent } from &quot;./greeting.component&quot;;<br>import &quot;./global.css&quot;;<br><br>bootstrapApplication(GreetingComponent, {<br>  providers: [provideZonelessChangeDetection()],<br>}).catch((err) =&gt; console.error(err));</pre><pre>// src/greeting.component.ts<br>import { Component, type OnInit, signal } from &quot;@angular/core&quot;;<br>import { FormsModule } from &quot;@angular/forms&quot;;<br>import {<br>  App,<br>  applyDocumentTheme,<br>  applyHostStyleVariables,<br>  applyHostFonts,<br>  type McpUiHostContext,<br>} from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import type { CallToolResult } from &quot;@modelcontextprotocol/sdk/types.js&quot;;<br><br>function extractText(result: CallToolResult): string {<br>  return result.content?.find((c) =&gt; c.type === &quot;text&quot;)!.text;<br>}<br>@Component({<br>  selector: &quot;greeting-root&quot;,<br>  imports: [FormsModule],<br>  template: `<br>    &lt;main<br>      [style.padding-top.px]=&quot;hostContext()?.safeAreaInsets?.top&quot;<br>      [style.padding-right.px]=&quot;hostContext()?.safeAreaInsets?.right&quot;<br>      [style.padding-bottom.px]=&quot;hostContext()?.safeAreaInsets?.bottom&quot;<br>      [style.padding-left.px]=&quot;hostContext()?.safeAreaInsets?.left&quot;<br>    &gt;<br>      &lt;div&gt;<br>        &lt;label&gt;&lt;strong&gt;Your name:&lt;/strong&gt;&lt;/label&gt;<br>        &lt;input type=&quot;text&quot; [(ngModel)]=&quot;nameText&quot; placeholder=&quot;Enter your name&quot;&gt;<br>        &lt;button (click)=&quot;handleGreet()&quot;&gt;Get Greeting&lt;/button&gt;<br>      &lt;/div&gt;<br>      @if (greeting()) {<br>        &lt;div class=&quot;greeting-display&quot;&gt;{{ greeting() }}&lt;/div&gt;<br>      }<br>    &lt;/main&gt;<br>  `,<br>})<br>export class GreetingComponent implements OnInit {<br>  private app: App | null = null;<br>  hostContext = signal&lt;McpUiHostContext | undefined&gt;(undefined);<br>  greeting = signal(&quot;&quot;);<br>  nameText = &quot;&quot;;<br>  async ngOnInit() {<br>    const instance = new App({ name: &quot;Greeting App&quot;, version: &quot;1.0.0&quot; });<br>    instance.ontoolresult = (result) =&gt; {<br>      this.greeting.set(extractText(result));<br>    };<br>    instance.onhostcontextchanged = (params) =&gt; {<br>      const ctx = { ...this.hostContext(), ...params };<br>      this.hostContext.set(ctx);<br>      if (ctx.theme) applyDocumentTheme(ctx.theme);<br>      if (ctx.styles?.variables) applyHostStyleVariables(ctx.styles.variables);<br>      if (ctx.styles?.css?.fonts) applyHostFonts(ctx.styles.css.fonts);<br>    };<br>    await instance.connect();<br>    this.app = instance;<br>    const ctx = instance.getHostContext();<br>    this.hostContext.set(ctx);<br>    if (ctx?.theme) applyDocumentTheme(ctx.theme);<br>    if (ctx?.styles?.variables) applyHostStyleVariables(ctx.styles.variables);<br>    if (ctx?.styles?.css?.fonts) applyHostFonts(ctx.styles.css.fonts);<br>  }<br>  async handleGreet() {<br>    if (!this.app) return;<br>    try {<br>      const name = this.nameText.trim() || &quot;World&quot;;<br>      const result = await this.app.callServerTool({<br>        name: &quot;greet&quot;,<br>        arguments: { name },<br>      });<br>      this.greeting.set(extractText(result));<br>    } catch {<br>      this.greeting.set(&quot;[ERROR]&quot;);<br>    }<br>  }<br>}</pre><h3>Step 5: Sharing Code Between UIs</h3><p>You probably noticed that both components have identical App setup and theming boilerplate. That&#39;s a great candidate for extraction — and since each HTML is a separate Vite entry point, <strong>Vite&#39;s tree-shaking ensures each bundle only includes what it actually imports</strong>.</p><p>Create a shared setup module:</p><pre>// src/shared/mcp-app-setup.ts<br>import {<br>  App,<br>  applyDocumentTheme,<br>  applyHostStyleVariables,<br>  applyHostFonts,<br>  type McpUiHostContext,<br>} from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import type { WritableSignal } from &quot;@angular/core&quot;;<br>import type { CallToolResult } from &quot;@modelcontextprotocol/sdk/types.js&quot;;<br><br>/**<br> * Extract text content from a tool result.<br> */<br>export function extractText(result: CallToolResult): string {<br>  return result.content?.find((c) =&gt; c.type === &quot;text&quot;)!.text;<br>}<br>/**<br> * Apply host context (theme, styles, fonts) to the document.<br> */<br>function applyContext(ctx: McpUiHostContext): void {<br>  if (ctx.theme) applyDocumentTheme(ctx.theme);<br>  if (ctx.styles?.variables) applyHostStyleVariables(ctx.styles.variables);<br>  if (ctx.styles?.css?.fonts) applyHostFonts(ctx.styles.css.fonts);<br>}<br>/**<br> * Create and connect an MCP App instance with standard host-context<br> * handling wired up. Both UIs call this instead of duplicating setup.<br> */<br>export async function createMcpApp(<br>  name: string,<br>  hostContext: WritableSignal&lt;McpUiHostContext | undefined&gt;,<br>  onToolResult?: (result: CallToolResult) =&gt; void,<br>): Promise&lt;App&gt; {<br>  const app = new App({ name, version: &quot;1.0.0&quot; });<br>  app.ontoolinput = (params) =&gt; console.info(&quot;Received tool input:&quot;, params);<br>  app.ontoolcancelled = (params) =&gt; console.info(&quot;Tool cancelled:&quot;, params.reason);<br>  app.onerror = console.error;<br>  if (onToolResult) {<br>    app.ontoolresult = onToolResult;<br>  }<br>  app.onhostcontextchanged = (params) =&gt; {<br>    const ctx = { ...hostContext(), ...params } as McpUiHostContext;<br>    hostContext.set(ctx);<br>    applyContext(ctx);<br>  };<br>  await app.connect();<br>  const ctx = app.getHostContext();<br>  hostContext.set(ctx);<br>  if (ctx) applyContext(ctx);<br>  return app;<br>}</pre><p>Now both components become much simpler:</p><pre>// src/app.component.ts — simplified<br>import { Component, type OnInit, signal } from &quot;@angular/core&quot;;<br>import type { McpUiHostContext } from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import type { App } from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import { createMcpApp, extractText } from &quot;./shared/mcp-app-setup&quot;;<br><br>@Component({<br>  selector: &quot;app-root&quot;,<br>  template: `<br>    &lt;main [style.padding-top.px]=&quot;hostContext()?.safeAreaInsets?.top&quot;&gt;<br>      &lt;p&gt;&lt;strong&gt;Server Time:&lt;/strong&gt; &lt;code&gt;{{ serverTime() }}&lt;/code&gt;&lt;/p&gt;<br>      &lt;button (click)=&quot;handleGetTime()&quot;&gt;Get Server Time&lt;/button&gt;<br>    &lt;/main&gt;<br>  `,<br>})<br>export class AppComponent implements OnInit {<br>  private app: App | null = null;<br>  hostContext = signal&lt;McpUiHostContext | undefined&gt;(undefined);<br>  serverTime = signal(&quot;Loading...&quot;);<br>  async ngOnInit() {<br>    this.app = await createMcpApp(<br>      &quot;Get Time App&quot;,<br>      this.hostContext,<br>      (result) =&gt; this.serverTime.set(extractText(result)),<br>    );<br>  }<br>  async handleGetTime() {<br>    if (!this.app) return;<br>    try {<br>      const result = await this.app.callServerTool({<br>        name: &quot;get-time&quot;,<br>        arguments: {},<br>      });<br>      this.serverTime.set(extractText(result));<br>    } catch {<br>      this.serverTime.set(&quot;[ERROR]&quot;);<br>    }<br>  }<br>}</pre><pre>// src/greeting.component.ts — simplified<br>import { Component, type OnInit, signal } from &quot;@angular/core&quot;;<br>import { FormsModule } from &quot;@angular/forms&quot;;<br>import type { McpUiHostContext } from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import type { App } from &quot;@modelcontextprotocol/ext-apps&quot;;<br>import { createMcpApp, extractText } from &quot;./shared/mcp-app-setup&quot;;<br><br>@Component({<br>  selector: &quot;greeting-root&quot;,<br>  imports: [FormsModule],<br>  template: `<br>    &lt;main [style.padding-top.px]=&quot;hostContext()?.safeAreaInsets?.top&quot;&gt;<br>      &lt;label&gt;&lt;strong&gt;Your name:&lt;/strong&gt;&lt;/label&gt;<br>      &lt;input type=&quot;text&quot; [(ngModel)]=&quot;nameText&quot; placeholder=&quot;Enter your name&quot;&gt;<br>      &lt;button (click)=&quot;handleGreet()&quot;&gt;Get Greeting&lt;/button&gt;<br>      @if (greeting()) {<br>        &lt;div class=&quot;greeting-display&quot;&gt;{{ greeting() }}&lt;/div&gt;<br>      }<br>    &lt;/main&gt;<br>  `,<br>})<br>export class GreetingComponent implements OnInit {<br>  private app: App | null = null;<br>  hostContext = signal&lt;McpUiHostContext | undefined&gt;(undefined);<br>  greeting = signal(&quot;&quot;);<br>  nameText = &quot;&quot;;<br>  async ngOnInit() {<br>    this.app = await createMcpApp(<br>      &quot;Greeting App&quot;,<br>      this.hostContext,<br>      (result) =&gt; this.greeting.set(extractText(result)),<br>    );<br>  }<br>  async handleGreet() {<br>    if (!this.app) return;<br>    try {<br>      const name = this.nameText.trim() || &quot;World&quot;;<br>      const result = await this.app.callServerTool({<br>        name: &quot;greet&quot;,<br>        arguments: { name },<br>      });<br>      this.greeting.set(extractText(result));<br>    } catch {<br>      this.greeting.set(&quot;[ERROR]&quot;);<br>    }<br>  }<br>}</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*UrhYUXIMyX0IEto-.png" /><figcaption>Greeting UI</figcaption></figure><p>Both components import createMcpApp and extractText from the shared module. Since they&#39;re in separate Vite builds, tree-shaking still applies — if you add more shared utilities later, each bundle only pulls in what it calls.</p><h3>Sharing Models and Types</h3><p>The same principle works for shared data models. If both UIs work with common types — say a user profile that comes from the server:</p><pre>// src/shared/models.ts<br>import type { CallToolResult } from &quot;@modelcontextprotocol/sdk/types.js&quot;;<br><br>export interface UserProfile {<br>  name: string;<br>  email: string;<br>  role: &quot;admin&quot; | &quot;viewer&quot;;<br>}<br>export function parseUserProfile(result: CallToolResult): UserProfile {<br>  const text = result.content?.find((c) =&gt; c.type === &quot;text&quot;)!.text;<br>  return JSON.parse(text) as UserProfile;<br>}</pre><p>Both components can import { UserProfile, parseUserProfile } from &quot;./shared/models&quot; — the types are erased at build time (zero cost), and the parser function is only included in bundles that call it. This is a natural place to put validation logic, formatters, or any domain code that multiple UIs need.</p><h3>Step 6: The Build</h3><p>The Vite config uses an INPUT environment variable to select which HTML file to build:</p><pre>// vite.config.ts<br>import { defineConfig } from &quot;vite&quot;;<br>import { viteSingleFile } from &quot;vite-plugin-singlefile&quot;;<br><br>const INPUT = process.env.INPUT;<br>if (!INPUT) throw new Error(&quot;INPUT environment variable is not set&quot;);<br>export default defineConfig({<br>  plugins: [viteSingleFile()],<br>  build: {<br>    rollupOptions: { input: INPUT },<br>    outDir: &quot;dist&quot;,<br>    emptyOutDir: false, // Key: don&#39;t wipe previous builds<br>  },<br>});</pre><p>The emptyOutDir: false is important — it lets you run Vite multiple times, once per HTML file, into the same dist/ directory.</p><p>The build script chains them:</p><pre>{<br>  &quot;scripts&quot;: {<br>    &quot;build&quot;: &quot;tsc --noEmit &amp;&amp; cross-env INPUT=mcp-app.html vite build &amp;&amp; cross-env INPUT=greeting-app.html vite build &amp;&amp; tsc -p tsconfig.server.json &amp;&amp; bun build server.ts --outdir dist --target node&quot;<br>  }<br>}</pre><p>Each HTML file produces a fully self-contained output (all JS, CSS, and Angular runtime inlined). The two bundles are completely independent.</p><h3>Theming: Looking Native in Any Host</h3><p>MCP Apps can look native in any host (Claude Desktop, a custom chat client, etc.) by using CSS variables that the host provides. The global.css file defines sensible fallbacks:</p><pre>:root {<br>  color-scheme: light dark;<br><br>--color-text-primary: light-dark(#1f2937, #f3f4f6);<br>  --color-background-primary: light-dark(#ffffff, #1a1a1a);<br>  --color-accent: #2563eb;<br>  --color-text-on-accent: #ffffff;<br>  --border-radius-md: 6px;<br>  --spacing-unit: var(--font-text-md-size);<br>  --spacing-sm: calc(var(--spacing-unit) * 0.5);<br>  --spacing-md: var(--spacing-unit);<br>  --spacing-lg: calc(var(--spacing-unit) * 1.5);<br>  /* ... more variables */<br>}</pre><p>When the host sends style updates via onhostcontextchanged, the helper functions overwrite these variables on the document root. Your Angular component styles reference the variables (var(--color-accent), var(--spacing-md)), so they adapt automatically — no theme prop drilling needed.</p><h3>Recap</h3><p>The pattern for building Angular MCP Apps:</p><ol><li><strong>Server</strong>: register a tool + resource pair per UI, linked by a resource URI</li><li><strong>HTML</strong>: one entry point per UI, each bootstrapping its own Angular app</li><li><strong>Component</strong>: create an App instance, wire up callbacks, call connect()</li><li><strong>Shared code</strong>: extract common setup into a shared module — Vite tree-shakes per entry point</li><li><strong>Build</strong>: run Vite once per HTML file into the same dist/ directory</li><li><strong>Theming</strong>: use host CSS variables with fallbacks, apply updates via onhostcontextchanged</li></ol><p>Each UI is a self-contained Angular application. They share a server, they can share code, but their bundles are independent. Add a third tool? Same pattern — new HTML, new component, new registration, one more vite build in the chain.</p><p>The full source is available in the <a href="https://github.com/dalenguyen/ext-apps/tree/main/examples/basic-server-angular">ext-apps examples</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8721e83572ab" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/building-mcp-apps-with-angular-8721e83572ab">Building MCP Apps with Angular</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Securing Your MCP Server with Firebase Auth: A Production Walkthrough]]></title>
            <link>https://itnext.io/securing-your-mcp-server-with-firebase-auth-a-production-walkthrough-651bf398d797?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/651bf398d797</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[firebase]]></category>
            <category><![CDATA[personal-finance]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[mcp-server]]></category>
            <dc:creator><![CDATA[Dale Nguyen]]></dc:creator>
            <pubDate>Sat, 16 May 2026 19:04:48 GMT</pubDate>
            <atom:updated>2026-05-16T19:04:45.915Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*tK9TMrsZJOEN2Tvo.png" /><figcaption>Add Authentication to your MCP</figcaption></figure><p>Model Context Protocol (MCP) servers let AI assistants interact with real user data. That means auth isn’t optional — it’s the difference between a useful tool and a data breach. This post walks through exactly how <a href="https://cantax.fyi/">Can Tax Pro</a> secures its Python MCP server with Firebase Authentication, supporting both Firebase ID tokens (for direct access) and a custom OAuth 2.0 flow (for third-party clients like Claude.ai).</p><h3>Architecture Overview</h3><p>The system has three moving parts:</p><pre>Browser / Claude.ai Client<br>       │<br>       │  Authorization: Bearer &lt;token&gt;<br>       ▼<br>  MCP Server (Python/FastMCP on Cloud Run)<br>       │<br>       │  Firebase Admin SDK<br>       ▼<br>  Firestore (data isolated by userId)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*1cPMrKfra9__psTh.gif" /></figure><p>The MCP server accepts <strong>two token types</strong>:</p><ol><li><strong>Firebase ID tokens</strong> — issued by Firebase Authentication, verified cryptographically</li><li><strong>Custom OAuth tokens</strong> (ctpo_*) — issued by the web app&#39;s OAuth server, stored as hashes in Firestore</li></ol><p>The web app itself acts as the OAuth authorization server for third-party integrations.</p><h3>Step 1: Initialize Firebase Admin SDK</h3><p>The server initializes Firebase Admin once at startup, with environment-aware credential resolution:</p><pre># main.py<br>import firebase_admin<br>from firebase_admin import credentials, auth as firebase_auth, firestore</pre><pre>if not firebase_admin._apps:<br>    sa_json = os.environ.get(&quot;FIREBASE_SERVICE_ACCOUNT&quot;)<br>    project_id = os.environ.get(&quot;FIREBASE_PROJECT_ID&quot;)<br>    if sa_json:<br>        cred = credentials.Certificate(json.loads(sa_json))<br>        firebase_admin.initialize_app(cred, {&quot;projectId&quot;: project_id})<br>    else:<br>        firebase_admin.initialize_app(options={&quot;projectId&quot;: project_id})</pre><p><strong>Locally</strong>: set FIREBASE_SERVICE_ACCOUNT to your service account JSON. <strong>On Cloud Run</strong>: omit it entirely — the SDK picks up Application Default Credentials (ADC) automatically via Workload Identity.</p><p>This means no secrets in production. Your Cloud Run service account just needs the Firebase Admin SDK Administrator Service Agent IAM role.</p><h3>Step 2: Token Resolution</h3><p>Two resolver functions handle each token type:</p><h3>OAuth Tokens (ctpo_*)</h3><p>Custom tokens are never stored in plaintext. The server hashes them with SHA-256 and looks up the hash in Firestore:</p><pre>def resolve_oauth_token(bearer_token: str) -&gt; str:<br>    token_hash = hashlib.sha256(bearer_token.encode()).hexdigest()<br>    doc = db.collection(&quot;oauthTokens&quot;).document(token_hash).get()<br>    if not doc.exists:<br>        raise ValueError(&quot;Invalid or revoked OAuth token&quot;)<br>    data = doc.to_dict()<br>    expires_at = data.get(&quot;expiresAt&quot;)<br>    if expires_at and expires_at &lt; datetime.now(timezone.utc):<br>        raise ValueError(&quot;OAuth token expired&quot;)<br>    return data[&quot;userId&quot;]</pre><p>The Firestore document stores userId, expiresAt, clientId, and a refreshTokenHash. Revocation is instant — delete the document and the token stops working on the next request.</p><h3>Firebase ID Tokens</h3><p>Firebase handles the hard part:</p><pre>def resolve_id_token(id_token: str) -&gt; str:<br>    decoded = firebase_auth.verify_id_token(id_token)<br>    return decoded[&quot;uid&quot;]</pre><p>verify_id_token checks the signature against Google&#39;s public keys and validates claims (expiry, issuer, audience). It caches the public keys locally so it doesn&#39;t make a network call on every request.</p><h3>Step 3: The Auth Middleware</h3><p>A Starlette BaseHTTPMiddleware wraps every request. It tries the OAuth path first, then falls back to Firebase ID tokens:</p><pre>class AuthMiddleware(BaseHTTPMiddleware):<br>    async def dispatch(self, request: Request, call_next):<br>        # Skip auth for public endpoints<br>        if request.url.path in _PUBLIC_PATHS:<br>            return await call_next(request)<br>        <br>        auth_header = request.headers.get(&quot;Authorization&quot;, &quot;&quot;)<br>        <br>        if not auth_header.startswith(&quot;Bearer &quot;):<br>            return Response(<br>                status_code=401,<br>                headers={&quot;WWW-Authenticate&quot;: &#39;Bearer realm=&quot;tax-mcp&quot;&#39;},<br>            )<br>        token = auth_header.removeprefix(&quot;Bearer &quot;).strip()<br>        token_set = _user_id_var.set(&quot;&quot;)<br>        try:<br>            # Try OAuth token first<br>            if token.startswith(&quot;ctpo_&quot;):<br>                user_id = resolve_oauth_token(token)<br>            else:<br>                user_id = resolve_id_token(token)<br>            _user_id_var.set(user_id)<br>            return await call_next(request)<br>        except Exception as e:<br>            return Response(<br>                status_code=401,<br>                content=str(e),<br>                headers={&quot;WWW-Authenticate&quot;: &#39;Bearer realm=&quot;tax-mcp&quot;&#39;},<br>            )<br>        finally:<br>            _user_id_var.reset(token_set)</pre><p>Public paths (/health, /.well-known/oauth-protected-resource) bypass auth — necessary for health checks and OAuth discovery.</p><h3>Step 4: Thread-Safe User Context</h3><p>Tools shouldn’t take a user_id parameter — that would pollute every signature and make testing awkward. Instead, use a ContextVar to propagate identity through the async call stack:</p><pre># _context.py<br>from contextvars import ContextVar<br><br>_user_id_var: ContextVar[str] = ContextVar(&quot;user_id&quot;)<br>def get_user_id() -&gt; str:<br>    try:<br>        return _user_id_var.get()<br>    except LookupError:<br>        raise RuntimeError(&quot;User context not set - request did not pass through auth middleware.&quot;)</pre><p>Every tool calls get_user_id() to scope its Firestore queries:</p><pre># tools/income.py<br>from .._context import get_user_id<br><br>@mcp.tool()<br>def list_income(tax_year_id: str) -&gt; list[dict]:<br>    docs = (<br>        db.collection(&quot;users&quot;)<br>          .document(get_user_id())<br>          .collection(&quot;taxYears&quot;)<br>          .document(tax_year_id)<br>          .collection(&quot;incomeEntries&quot;)<br>          .stream()<br>    )<br>    return [{&quot;id&quot;: d.id, **d.to_dict()} for d in docs]</pre><p>ContextVar is async-safe — each concurrent request gets its own context, so there&#39;s no cross-contamination between users even under high concurrency.</p><p>The middleware resets the var in a finally block, which is critical: without cleanup, the context leaks to the next request on a reused coroutine.</p><h3>Step 5: The OAuth 2.0 Server (Web App Side)</h3><p>For third-party clients (Claude.ai, etc.), the web app implements an OAuth 2.0 authorization server. Here’s the flow:</p><pre>1. Client → GET /oauth/authorize?client_id=...&amp;code_challenge=...<br>2. User logs in (Firebase Auth)<br>3. Server → redirect with authorization_code<br>4. Client → POST /oauth/token with code + code_verifier<br>5. Server → { access_token: &quot;ctpo_...&quot;, refresh_token: &quot;ctpr_...&quot; }</pre><p><strong>Token generation</strong> (TypeScript, server-side):</p><pre>// server/routes/oauth/token.post.ts<br>import { createHash, randomBytes } from &quot;crypto&quot;;<br><br>const accessToken = &quot;ctpo_&quot; + randomBytes(32).toString(&quot;hex&quot;);<br>const tokenHash = createHash(&quot;sha256&quot;).update(accessToken).digest(&quot;hex&quot;);<br>await db.collection(&quot;oauthTokens&quot;).doc(tokenHash).set({<br>  userId: session.userId,<br>  clientId: session.clientId,<br>  expiresAt: new Date(Date.now() + 60 * 60 * 1000), // 1 hour<br>  createdAt: new Date(),<br>});<br>return { access_token: accessToken, token_type: &quot;bearer&quot;, expires_in: 3600 };</pre><p>The ctpo_ prefix lets the MCP server route to the right resolver without trying Firebase first. Only the hash ever touches the database.</p><p><strong>PKCE</strong> (Proof Key for Code Exchange) protects the authorization code flow. The client sends a code_challenge (SHA-256 of a random code_verifier) in step 1, then proves ownership by sending the raw code_verifier in step 4. The server hashes it and compares:</p><pre>const verifierHash = createHash(&quot;sha256&quot;)<br>  .update(body.code_verifier)<br>  .digest(&quot;base64url&quot;);<br><br>if (verifierHash !== session.codeChallenge) {<br>  throw createError({ statusCode: 400, message: &quot;Invalid code_verifier&quot; });<br>}</pre><h3>Step 6: Client-Side (Browser)</h3><p>The Angular/Analog web app attaches Firebase ID tokens to outbound API requests via an HTTP interceptor:</p><pre>// auth.interceptor.ts<br>export const authInterceptor: HttpInterceptorFn = (req, next) =&gt; {<br>  const auth = inject(Auth);<br>  return from(auth.currentUser?.getIdToken() ?? Promise.resolve(null)).pipe(<br>    switchMap((token) =&gt; {<br>      if (!token) return next(req);<br>      return next(req.clone({<br>        setHeaders: { Authorization: `Bearer ${token}` },<br>      }));<br>    }),<br>  );<br>};</pre><p>getIdToken() auto-refreshes the token when it&#39;s close to expiry, so you never send a stale JWT.</p><h3>Step 7: Firestore Security Rules</h3><p>The MCP server uses the <strong>Admin SDK</strong>, which bypasses all Firestore security rules. The get_user_id() context is your authorization layer. But rules provide defense-in-depth for direct client SDK access:</p><pre>rules_version = &#39;2&#39;;<br>service cloud.firestore {<br>  match /databases/{database}/documents {<br>    match /users/{userId}/{document=**} {<br>      allow read, write: if request.auth != null &amp;&amp; request.auth.uid == userId;<br>    }<br>    match /oauthTokens/{tokenId} {<br>      allow read, write: if false; // Server-only<br>    }<br>  }<br>}</pre><p>OAuth token documents are locked to server-only access. Users can never read or write them directly.</p><h3>Step 8: OAuth Discovery Endpoint</h3><p>Well-behaved OAuth clients (including Claude.ai) auto-discover server capabilities. Expose the RFC 9728 metadata endpoint:</p><pre>@app.get(&quot;/.well-known/oauth-protected-resource&quot;)<br>async def oauth_protected_resource():<br>    return JSONResponse({<br>        &quot;resource&quot;: &quot;https://your-mcp-server.run.app&quot;,<br>        &quot;authorization_servers&quot;: [&quot;https://your-web-app.com&quot;],<br>        &quot;bearer_methods_supported&quot;: [&quot;header&quot;],<br>    })</pre><p>This tells clients where to find the authorization server and how to present tokens. It’s a public endpoint (no auth required) — make sure it’s in _PUBLIC_PATHS.</p><h3>Step 9: Local Testing</h3><p>Don’t rely on a real Firebase project for unit tests. Seed a test token directly:</p><pre># test_auth.py<br>import hashlib<br>from firebase_admin import firestore<br>from datetime import datetime, timezone, timedelta<br><br>TEST_TOKEN = &quot;ctpo_localtest_&quot; + &quot;a&quot; * 48<br>TEST_TOKEN_HASH = hashlib.sha256(TEST_TOKEN.encode()).hexdigest()<br>def seed_test_token(db, user_id: str):<br>    db.collection(&quot;oauthTokens&quot;).document(TEST_TOKEN_HASH).set({<br>        &quot;userId&quot;: user_id,<br>        &quot;clientId&quot;: &quot;test-client&quot;,<br>        &quot;expiresAt&quot;: datetime.now(timezone.utc) + timedelta(hours=1),<br>    })<br>def cleanup(db):<br>    db.collection(&quot;oauthTokens&quot;).document(TEST_TOKEN_HASH).delete()</pre><p>Then test the three critical paths: no token → 401, invalid token → 401, valid token → 200.</p><h3>Security Properties</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FnXXqXRmiD7FlEDnn4B0bg.png" /></figure><h3>Summary</h3><p>Securing an MCP server with Firebase Auth comes down to four things:</p><ol><li><strong>Middleware</strong> that validates tokens before any tool runs</li><li><strong>ContextVar</strong> to propagate user identity without polluting tool signatures</li><li><strong>Hash-only storage</strong> for custom OAuth tokens</li><li><strong>Workload Identity</strong> to eliminate service account key management in production</li></ol><p>The dual-token design (Firebase ID tokens for direct use, custom OAuth tokens for third-party clients) keeps the server flexible while maintaining a single, auditable auth path. Every request either has a valid, unexpired token mapping to a real user, or it gets a 401.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=651bf398d797" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/securing-your-mcp-server-with-firebase-auth-a-production-walkthrough-651bf398d797">Securing Your MCP Server with Firebase Auth: A Production Walkthrough</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Microsoft Security Copilot Agents: Inside the Agentic SOC]]></title>
            <description><![CDATA[<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://itnext.io/microsoft-security-copilot-agents-inside-the-agentic-soc-fcf7bd081afd?source=rss----5b301f10ddcd---4"><img src="https://cdn-images-1.medium.com/max/1448/1*iwVQvfwrdxfw4JKRO3qacg.png" width="1448"></a></p><p class="medium-feed-snippet">A deep technical walkthrough of the Sentinel data lake, the MCP server, and the agents now hunting threats inside Microsoft Defender&#x2026;</p><p class="medium-feed-link"><a href="https://itnext.io/microsoft-security-copilot-agents-inside-the-agentic-soc-fcf7bd081afd?source=rss----5b301f10ddcd---4">Continue reading on ITNEXT »</a></p></div>]]></description>
            <link>https://itnext.io/microsoft-security-copilot-agents-inside-the-agentic-soc-fcf7bd081afd?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/fcf7bd081afd</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Dave R - Microsoft Azure & AI MVP☁️]]></dc:creator>
            <pubDate>Sat, 16 May 2026 19:02:49 GMT</pubDate>
            <atom:updated>2026-05-16T19:02:46.299Z</atom:updated>
        </item>
        <item>
            <title><![CDATA[Tokensparsamkeit for coding assistants]]></title>
            <link>https://itnext.io/https-blog-frankel-ch-tokensparsamkeit-coding-assistants-275d4ddd4faa?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/275d4ddd4faa</guid>
            <category><![CDATA[sparsamkeit]]></category>
            <category><![CDATA[ai-coding-assistant]]></category>
            <category><![CDATA[coding-assistant]]></category>
            <category><![CDATA[budget-management]]></category>
            <category><![CDATA[token]]></category>
            <dc:creator><![CDATA[Nicolas Fränkel]]></dc:creator>
            <pubDate>Sat, 16 May 2026 08:31:33 GMT</pubDate>
            <atom:updated>2026-05-16T08:31:31.050Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*N7rdsV_z0pCi0IKNgHbXYg.jpeg" /></figure><p>You make decisions with data. Most businesses assumed that the most data, the better the decision. Then, several factors put a halt to the hoarding of always more data. GDPR and its localized counterparts, and the cost of storage. However, before it happened, the Datensparsamkeit approach already existed.</p><blockquote><em>Datensparsamkeit is a German word that’s difficult to translate properly into English. It’s an attitude to how we capture and store data, saying that we should only handle data that we really need.</em></blockquote><blockquote><em>— </em><a href="https://martinfowler.com/bliki/Datensparsamkeit.html"><em>Datensparsamkeit</em></a></blockquote><p>I don’t agree with Martin Fowler’s claim that it’s difficult to translate. The translation of Sparsamkeit is <em>frugality</em>. In the context of coding assistants, token frugality is a good thing.</p><blockquote><em>Today, critical resources aren’t CPU, RAM, or storage, but tokens. Tokens are a finite and expensive resource. My opinion is that soon, developers will be measured on their token usage: the better one will be the one using the fewest tokens to achieve similar results.</em></blockquote><blockquote><em>— </em><a href="https://blog.frankel.ch/writing-agent-skill/"><em>Writing an agent skill</em></a></blockquote><p>Imagine two engineers finishing the same job with the same quality in the same timeframe. If the organization needs to let go of one, it will be the one that costs more. In the era of AI, it means the one who consumes more tokens.</p><p>In this post, I want to show a couple of methods to keep the usage of tokens small.</p><h3>Compression</h3><p>One of the first steps toward Tokensparsamkeit is to compress tokens sent to the underlying LLM <strong>while keeping the same data</strong>. But what are tokens? It’s a gross oversimplification, but you for the sake of explanation, let’s consider a word is a token. Read this <a href="https://letsdatascience.com/blog/tokenization-deep-dive-why-it-matters-more-than-you-think">deep dive</a> if you want more details.</p><p>If we consider tokens are words, we could remove articles and similar words from the payload to decrease the tokens number. “Find the distance between the Earth and the moon” becomes “Find distance between Earth and moon”. For all intents and purpose, the data received is the same, with less words.</p><p>The trick is to set a proxy between the client and the LLM backend. I’m using rtk myself:</p><blockquote><em>CLI proxy that reduces LLM token consumption by 60–90% on common dev commands. Single Rust binary, zero dependencies</em></blockquote><blockquote><em>— </em><a href="https://github.com/rtk-ai/rtk"><em>rtk project on GitHub</em></a></blockquote><p>The tool works across file commands, git, gh, test runners, build/lint commands, aws, docker, kubectl, etc. Note that it&#39;s not a magical recipe, as rtk itself mentions:</p><blockquote><em>This only applies to Bash tool calls. Claude Code built-in tools such as Read, Grep, and Glob bypass the hook, so use shell commands or explicit rtk commands when you want RTK filtering there.</em></blockquote><h3>Context optimization</h3><p>The second step toward Tokensparsamkeit is to avoid stuffing the context with irrelevant data.</p><p>Most people who start using coding assistants assume the context only consists of the system prompt and user prompts. There actually is a lot more. Anthropic’s <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective context engineering for AI agents</a> article mentions:</p><ul><li>System prompt</li><li>User prompt</li><li>Message history</li><li>Tool definitions</li><li>Tool results</li><li>MCP servers</li><li>RAG</li><li>Agent memory if applicable</li></ul><p>Claude Code introduced the option to compact (or clear?) the context before each interaction. It explicitly asked with each interaction whether to do it. I liked it, but they removed it a week or so later. Perhaps too many people didn’t understand what it entailed? In any case, make good use of the /compact command that most assistants provide: it will reduce the conversation history to reduce its token usage, while trying to keep the relevant bits.</p><p>Also notice that tools and MCP servers use tokens; the more you configure, the more tokens used. Some MCP servers are so easy to set up, it’s tempting to stuff your assistant with them. Don’t. Or enable them either on a case-by-case basis or at the project level. Why enable the Vaadin MCP on a Rust project?</p><p>The same goes for tools, although I don’t think many do use them a lot in comparison to MCP servers.</p><h3>Local models</h3><p>Tokens usage is only important for cloud-based billing. We don’t care about it if we use a local model. There are several ways to do it, including AI gateways. In the scope of this article, I’ll keep it simple.</p><p>I want to keep Claude Code as the client, because it’s really good. At the same time, I want to use my own hardware with a local model: the cost is upfront, but then I have zero recurring cost but the power.</p><p>If you want to just do it, <a href="https://unsloth.ai/docs/basics/claude-code">How to Run Local LLMs with Claude Code</a> is where I found the solution. Continue reading the section if you want to read about the issues I faced.</p><p>I tried initially to run Qwen3 32B via Ollama in Docker. Docker containers cannot access Apple’s Metal GPU framework, so the model ran entirely on CPU. It loaded successfully but crashed during inference with a 500 error; CPU-only inference on a 32B model is simply too slow to be usable.</p><p>I have been using Ollama a bit as the default, because others did. Then I stumbled upon <a href="https://sleepingrobots.com/dreams/stop-using-ollama">Friends Don’t Let Friends Use Ollama</a>. I switched from Ollama to llama-cpp, which enabled low-level configuration.</p><p>The biggest hurdle was the context window size. Claude Code sends <strong>lots</strong> of tokens to the backend. On the <a href="https://github.com/nfrankel/opentelemetry-tracing/">OpenTelemetry tracing demo</a>, it’s around 35k on each request.</p><p>I started with Qwen3 models. The default token size wasn’t big enough. When a model received more tokens than its maximum, llama server immediately rejects the request. I tried to increase the limit with the --ctx-size option, to no avail. Qwen3 modelsare trained with 32,768 tokens. It&#39;s a hard limit baked into the GGUF file metadata. Llama server abides by it.</p><p>Llama server is meant to serve multiple requests simultaneously. It turns out that the count of available tokens is shared equally across all possible requests. If the number of max tokens is T and the server can handle x requests in parallel, each request only has T/x tokens available. For this reason, I set the parallelism with --parallel 1.</p><p>Despite all of the above, it still didn’t work.</p><h3>Mixture of Experts vs. dense models</h3><p>I was using a <em>dense model</em>, which is what we use regularly. Dense models load all in memory at once. The alternative is to use a Mixture of Experts model.</p><blockquote><em>In the context of transformer models, a MoE consists of two main elements:</em></blockquote><ul><li><strong>Sparse MoE layers</strong> are used instead of dense feed-forward network (FFN) layers. MoE layers have a certain number of “experts” (e.g. 8), where each expert is a neural network. In practice, the experts are FFNs, but they can also be more complex networks or even a MoE itself, leading to hierarchical MoEs!</li><li>A <strong>gate network or router</strong>, that determines which tokens are sent to which expert. For example, in the image below, the token “More” is sent to the second expert, and the token “Parameters” is sent to the first network. As we’ll explore later, we can send a token to more than one expert. How to route a token to an expert is one of the big decisions when working with MoEs — the router is composed of learned parameters and is pretrained at the same time as the rest of the network.</li></ul><blockquote><em>— </em><a href="https://huggingface.co/blog/moe"><em>What is a Mixture of Experts</em></a></blockquote><p>In layman’s terms, a MoE segments its weights/parameters into separate specialized submodels called experts. A routing layer activates only the necessary experts depending on the request. Compared to regular dense models, instead of computing across the entire model of size T, only a small subset of experts is activated per request. The combined size of activated experts t is much smaller than T, even though the sum of all experts together is larger than T.</p><p>The Qwen3.5–35B-A3B model is a MoE that works perfectly on my machine.</p><h3>Putting it all together</h3><p>We still miss a couple of elements to reach the goal.</p><p>To better interact with Claude Code, the model should return structured content. That’s what the --jinja flag is for. For better performance, you should also use <a href="https://bentoml.com/llm/kernel-optimization/flashattention">Flash Attention</a>. It&#39;s an optimized algorithm for computing the attention mechanism in Transformer models. It&#39;s faster, more memory-efficient, and more scalable than standard attention. Activate it via --flash-attn on. The last configuration parameter is to offload as many layers as possible to the GPU with --n-gpu-layers 99.</p><p>The final server command line is:</p><pre>llama-server \<br>  --model ~/models/Qwen3.5-35B-A3B-Q4_K_M.gguf \<br>  --n-gpu-layers 99 \<br>  --ctx-size 65536 \<br>  --parallel 1 \<br>  --flash-attn on \<br>  --jinja \<br>  --port 8080</pre><p>On the Claude Code side, we need to set several environment variables:</p><p>Environment variable Meaning Example ANTHROPIC_BASE_URL URL to the llama-server instance http://127.0.0.1:8080 ANTHROPIC_API_KEY Anything dummy ANTHROPIC_AUTH_TOKEN Anything dummy CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC Self-explicit 1</p><pre>export ANTHROPIC_BASE_URL=http://127.0.0.1:8080<br>export ANTHROPIC_API_KEY=dummy<br>export ANTHROPIC_AUTH_TOKEN=dummy<br>export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1<br>claude</pre><p>At this point, you can use Claude Code, which will query your local model. Here’s a sample server output for a query, for information.</p><pre>srv  params_from_: Chat format: peg-native<br>slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.788 (&gt; 0.100 thold), f_keep = 0.789<br>slot launch_slot_: id  0 | task -1 | sampler chain: logits -&gt; ?penalties -&gt; ?dry -&gt; ?top-n-sigma -&gt; top-k -&gt; ?typical -&gt; top-p -&gt; min-p -&gt; ?xtc -&gt; temp-ext -&gt; dist <br>slot launch_slot_: id  0 | task 1464 | processing task, is_child = 0<br>slot update_slots: id  0 | task 1464 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 56401<br>slot update_slots: id  0 | task 1464 | n_past = 44456, slot.prompt.tokens.size() = 56378, seq_id = 0, pos_min = 56377, n_swa = 0<br>slot update_slots: id  0 | task 1464 | Checking checkpoint with [56141, 56141] against 44456...<br>slot update_slots: id  0 | task 1464 | Checking checkpoint with [55629, 55629] against 44456...<br>slot update_slots: id  0 | task 1464 | Checking checkpoint with [49151, 49151] against 44456...<br>slot update_slots: id  0 | task 1464 | Checking checkpoint with [40959, 40959] against 44456...<br>slot update_slots: id  0 | task 1464 | restored context checkpoint (pos_min = 40959, pos_max = 40959, n_tokens = 40960, n_past = 40960, size = 62.813 MiB)<br>slot update_slots: id  0 | task 1464 | erased invalidated context checkpoint (pos_min = 49151, pos_max = 49151, n_tokens = 49152, n_swa = 0, pos_next = 40960, size = 62.813 MiB)<br>slot update_slots: id  0 | task 1464 | erased invalidated context checkpoint (pos_min = 55629, pos_max = 55629, n_tokens = 55630, n_swa = 0, pos_next = 40960, size = 62.813 MiB)<br>slot update_slots: id  0 | task 1464 | erased invalidated context checkpoint (pos_min = 56141, pos_max = 56141, n_tokens = 56142, n_swa = 0, pos_next = 40960, size = 62.813 MiB)<br>slot update_slots: id  0 | task 1464 | n_tokens = 40960, memory_seq_rm [40960, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 43008, batch.n_tokens = 2048, progress = 0.762540<br>slot update_slots: id  0 | task 1464 | n_tokens = 43008, memory_seq_rm [43008, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 45056, batch.n_tokens = 2048, progress = 0.798851<br>slot update_slots: id  0 | task 1464 | n_tokens = 45056, memory_seq_rm [45056, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 47104, batch.n_tokens = 2048, progress = 0.835163<br>slot update_slots: id  0 | task 1464 | n_tokens = 47104, memory_seq_rm [47104, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 49152, batch.n_tokens = 2048, progress = 0.871474<br>slot update_slots: id  0 | task 1464 | n_tokens = 49152, memory_seq_rm [49152, end)<br>slot update_slots: id  0 | task 1464 | 8192 tokens since last checkpoint at 40960, creating new checkpoint during processing at position 51200<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 51200, batch.n_tokens = 2048, progress = 0.907785<br>slot update_slots: id  0 | task 1464 | created context checkpoint 6 of 32 (pos_min = 49151, pos_max = 49151, n_tokens = 49152, size = 62.813 MiB)<br>slot update_slots: id  0 | task 1464 | n_tokens = 51200, memory_seq_rm [51200, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 53248, batch.n_tokens = 2048, progress = 0.944097<br>slot update_slots: id  0 | task 1464 | n_tokens = 53248, memory_seq_rm [53248, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 55296, batch.n_tokens = 2048, progress = 0.980408<br>slot update_slots: id  0 | task 1464 | n_tokens = 55296, memory_seq_rm [55296, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 55885, batch.n_tokens = 589, progress = 0.990851<br>slot update_slots: id  0 | task 1464 | n_tokens = 55885, memory_seq_rm [55885, end)<br>slot update_slots: id  0 | task 1464 | prompt processing progress, n_tokens = 56397, batch.n_tokens = 512, progress = 0.999929<br>slot update_slots: id  0 | task 1464 | created context checkpoint 7 of 32 (pos_min = 55884, pos_max = 55884, n_tokens = 55885, size = 62.813 MiB)<br>slot update_slots: id  0 | task 1464 | n_tokens = 56397, memory_seq_rm [56397, end)<br>reasoning-budget: activated, budget=2147483647 tokens<br>slot init_sampler: id  0 | task 1464 | init sampler, took 4.37 ms, tokens: text = 56401, total = 56401<br>slot update_slots: id  0 | task 1464 | prompt processing done, n_tokens = 56401, batch.n_tokens = 4<br>slot update_slots: id  0 | task 1464 | created context checkpoint 8 of 32 (pos_min = 56396, pos_max = 56396, n_tokens = 56397, size = 62.813 MiB)<br>srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200<br>reasoning-budget: deactivated (natural end)<br>slot print_timing: id  0 | task 1464 | <br>prompt eval time =   65949.79 ms / 15441 tokens (    4.27 ms per token,   234.13 tokens per second)<br>       eval time =    3639.91 ms /    87 tokens (   41.84 ms per token,    23.90 tokens per second)<br>      total time =   69589.71 ms / 15528 tokens<br>slot      release: id  0 | task 1464 | stop processing: n_tokens = 56487, truncated = 0</pre><h3>Discussion</h3><p>While the underlying model is important, most people undervalue the client. I used both Claude Code and Copilot CLI with the same underlying model, Claude Sonnet 4.6. I found Claude Code superior by far across several sessions.</p><p>The move of most vendors toward subscriptions to benefit from recurring revenues make sense for them. For the customer, however, it’s another question: once you stop paying, you lose access to the service.</p><p>In the context of coding assistants, vendors justify it by cloud usage. Unfortunately, the per-token metering is quite opaque. If the vendor doesn’t size their service properly, users get charged more. I don’t think that’s fair.</p><p>Keeping Claude Code while hosting the model local is a great cost-savvy alternative. You only need to pay for the hardware once. Granted, it’s slower, but it’s a business model I prefer. If you have well-designed working autonomous agents, you can run them during the night anyway.</p><p><strong>To go further:</strong></p><ul><li><a href="https://martinfowler.com/bliki/Datensparsamkeit.html">Datensparsamkeit</a></li><li><a href="https://www.tokenoptimize.dev/guides/llm-token-optimization-strategies">LLM Token Optimization Strategies: The Complete Guide for 2026</a></li><li><a href="https://letsdatascience.com/blog/tokenization-deep-dive-why-it-matters-more-than-you-think">Tokenization Deep Dive: Why It Matters More Than You Think</a></li><li><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective context engineering for AI agents</a></li><li><a href="https://unsloth.ai/docs/basics/claude-code">How to Run Local LLMs with Claude Code</a></li><li><a href="https://huggingface.co/blog/moe">Mixture of Experts Explained</a></li><li><a href="https://openrouter.ai/qwen/qwen3-coder-next">Qwen3 Coder Next</a></li></ul><p><em>Originally published at </em><a href="https://blog.frankel.ch/tokensparsamkeit-coding-assistants/"><em>A Java Geek</em></a><em> on May 10th, 2026.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=275d4ddd4faa" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/https-blog-frankel-ch-tokensparsamkeit-coding-assistants-275d4ddd4faa">Tokensparsamkeit for coding assistants</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[YOLO Is a Terrible Strategy for Validating Production Changes]]></title>
            <link>https://itnext.io/yolo-is-a-terrible-strategy-for-validating-production-changes-a157369a0382?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/a157369a0382</guid>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[coding]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Benjamin Cane]]></dc:creator>
            <pubDate>Fri, 15 May 2026 22:19:16 GMT</pubDate>
            <atom:updated>2026-05-15T22:06:58.775Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*OESeq5QAldRa4xuf" /><figcaption>Photo by <a href="https://unsplash.com/@bijesh33?utm_source=medium&amp;utm_medium=referral">bijesh regmi</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>YOLO is a terrible strategy for validating production changes.</p><p>How many times have you seen it?</p><p>Your platform is running smoothly. No alerts, no issues. Then suddenly, something breaks.</p><p>After digging in, you discover the cause: another system you depend on made a change, and that change broke your platform.</p><p>They didn’t notice it broke. You did, much too late…</p><p>How many times have you been the cause of another platform breaking?</p><h3>🥶 Cold Reality</h3><p>I wish the above scenario were rare, but it happens constantly across the technology industry.</p><p>It happens between internal teams, third-party integrations, and shared infrastructure teams.</p><p>These scenarios make you wonder, “How was that change validated?”</p><p>Maybe they tested it, and their validation had gaps. Maybe they did little validation at all. If any.</p><p>Either way, the result is the same: <strong>they validated their change with 100% of production traffic.</strong> Bad plan.</p><h3>💡 Better Ways to Validate Changes</h3><p>There are many ways teams can reduce production risk when rolling out changes, and the best teams combine the following approaches.</p><h3>Canary Releases 🐤</h3><p>I talk about canary deployments often.</p><p>Instead of moving 100% of traffic at once, move small percentages gradually and observe behavior closely.</p><p><strong>That observed part matters.</strong> Look at error rates, latency changes (beyond normal platform warmup), resource spikes, and unexpected retries. All of these indicate customer impact.</p><p>Canary deployments are one of the best ways to reduce the blast radius of changes, identify problems quickly, and self-correct.</p><h3>Shadow Traffic 🪞</h3><p>Traffic mirroring sends production traffic to a new version before routing live traffic there.</p><p>Responses are ignored, but you observe behavior and monitor the same signals you would with a canary release without sacrificing a customer request.</p><h3>Synthetic Traffic 🤖</h3><p>Synthetic traffic simulates user behavior continuously. It’s great for monitoring customer experience, but also a great way to validate new deployments.</p><p>Route synthetic traffic to upgraded instances first and verify behavior before moving real traffic. If it fails with synthetic traffic, it likely won’t survive real traffic.</p><h3>Smoke Tests 😶‍🌫️</h3><p>The classic approach. After deployment, run a small set of fast tests to confirm the platform is fundamentally working.</p><p>Smoke tests don’t need to be fancy; they can be shell scripts, API calls, read-only requests, a test file, or full end-to-end validation.</p><p>Their purpose is simple: to quickly catch obvious breakage.</p><h3>🧠 Final Thoughts</h3><p>Don’t think of the above methods as mutually exclusive choices. Combine them.</p><p>Some platforms I work on combine canary releases, shadow traffic, and synthetic traffic. Others use smoke tests plus canary releases.</p><p>The more layers of validation you have, the more likely you are to catch issues before your customers do. Because having your customers validate changes for you is a poor strategy.</p><p><em>Originally published at </em><a href="https://bencane.com/posts/2026-05-07/"><em>https://bencane.com</em></a><em> on May 7, 2026.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a157369a0382" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/yolo-is-a-terrible-strategy-for-validating-production-changes-a157369a0382">YOLO Is a Terrible Strategy for Validating Production Changes</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Terminal Driven Project Scaffolding with Kikplate]]></title>
            <link>https://itnext.io/terminal-driven-project-scaffolding-with-kikplate-695735d0604b?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/695735d0604b</guid>
            <category><![CDATA[terminal-commands]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[java]]></category>
            <category><![CDATA[boilerplate]]></category>
            <dc:creator><![CDATA[Moeid Heidari]]></dc:creator>
            <pubDate>Fri, 15 May 2026 20:49:06 GMT</pubDate>
            <atom:updated>2026-05-15T20:49:05.160Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fIGieOUx61zlU8L8ChgYoA.png" /></figure><p>Kikplate CLI is built for developers who want to discover, install, and manage project templates directly from the terminal. Instead of browsing repositories manually or copying starter projects between machines, Kikplate gives you a workflow that stays entirely inside the command line.</p><p>The CLI connects to a Kikplate server, lets you search for plates, clone them into repositories, manage local templates, and authenticate with your account when needed.</p><p>The entire experience is designed around fast terminal workflows.</p><h3>Getting Started</h3><p>Once the CLI is installed, you can view all available commands.</p><pre>kikplate</pre><p>The CLI includes commands for searching, scaffolding, authentication, local management, submissions, and account operations.</p><pre>Usage:<br>  kikplate [command]<br><br>Available Commands:<br>  completion<br>  config<br>  describe<br>  login<br>  logout<br>  my<br>  plates<br>  scaffold<br>  search<br>  submit<br>  verify<br>  whoami</pre><p>Before using the CLI, you can initialize the default configuration.</p><pre>kikplate config init</pre><p>By default, the CLI stores its configuration here.</p><pre>~/.kikplate/config.yaml</pre><p>You can also provide a custom configuration path.</p><pre>kikplate --config ./config.yaml</pre><h3>Searching Plates</h3><p>One of the most useful parts of the CLI is plate discovery.</p><p>You can search by name.</p><pre>kikplate search --name node</pre><p>Example output:</p><pre>┌─────────────────────┬─────────────────────┬──────────┬────────┬──────────┐<br>│ SLUG                │ NAME                │ CATEGORY │ RATING │ VERIFIED |<br>├─────────────────────┼─────────────────────┼──────────┼────────┼──────────|<br>│ test-node-js-boiler │ Test Node JS Boiler │ other    │ 4.5    │ yes      │<br>└─────────────────────┴─────────────────────┴──────────┴────────┴──────────┘</pre><p>Search is not limited to names. You can filter using categories, tags, limits, and pagination.</p><pre>kikplate search --help<br>Search plates on the server<br><br>Usage:<br>  kikplate search [flags]<br><br>Flags:<br>      --category string   Filter by category<br>      --limit int         Results per page (default 20)<br>      --name string       Search by name<br>      --page int          Page number (default 1)<br>      --tag string        Filter by tag</pre><p>Searching by tags makes it easy to find templates built around specific architectures or technologies.</p><pre>kikplate search --tag DDD<br><br>┌──────────────────────────┬──────────────────────────┬──────────┬────────┬──────────┐<br>│ SLUG                     │ NAME                     │ CATEGORY │ RATING │ VERIFIED │<br>├──────────────────────────┼──────────────────────────┼──────────┼────────┼──────────┤<br>│ react-clean-architecture │ React Clean Architecture │ other    │ 5.0    │ yes      │<br>└──────────────────────────┴──────────────────────────┴──────────┴────────┴──────────┘<br><br>Total: 1</pre><p>The CLI output is intentionally compact and readable. You can quickly scan results without opening a browser.</p><h3>Managing Local Plates</h3><p>Kikplate also keeps track of plates available on your machine.</p><p>To view local plates:</p><pre>kikplate plates list</pre><p>Example output:</p><pre>┌───────────────────────────────┬───────────────────────────────┬────────────────────────────────────────────────────┬──────────────────────────┐<br>│ SLUG                          │ NAME                          │ DESCRIPTION                                        │ SERVER                   │<br>├───────────────────────────────┼───────────────────────────────┼────────────────────────────────────────────────────┼──────────────────────────┤<br>│ go-clean-architecture-starter │ Go Clean Architecture Starter │ HTTP API starter with Chi, PostgreSQL, layered ... │ https://kikplate.dev/api │<br>│ rust-axum-api-starter         │ Rust Axum API Starter         │ HTTP API starter with Axum, Tokio, SQLx, Postgr... │ https://kikplate.dev/api │<br>└───────────────────────────────┴───────────────────────────────┴────────────────────────────────────────────────────┴──────────────────────────┘</pre><p>This makes the CLI useful even when you already know which templates you use frequently.</p><h3>Scaffolding Projects</h3><p>The scaffolding command is where Kikplate becomes part of the actual development workflow.</p><p>You can scaffold a plate directly into a repository.</p><pre>kikplate scaffold rust-axum-api-starter https://github.com/MoeidHeidari/test-kikplate-clone.git/</pre><p>This command clones the target repository and applies the selected plate automatically.</p><p>Example repository created with Kikplate:</p><p>https://github.com/MoeidHeidari/test-kikplate-clone</p><p>You can also scaffold locally using the local flag.</p><pre>kikplate scaffold rust-axum-api-starter --local</pre><p>This workflow is especially useful when bootstrapping backend services, frontend applications, APIs, or internal tools.</p><h3>Authentication</h3><p>Kikplate supports SSO authentication providers directly from the terminal.</p><p>GitHub login:</p><pre>kikplate login sso github<br>kikplate login sso google<br>kikplate login sso gitlab</pre><p>To remove the active session:</p><pre>kikplate logout</pre><p>After authentication you can verify the current account.</p><pre>kikplate whoami</pre><p>Example output:</p><pre>Username:  moeidheidari<br>Name:      Moeid Heidari<br>Email:     moeidheidari73@gmail.com<br>Account:   405bed9f-ab1e-4bbf-a5d1-d5bd27462592</pre><p>The authentication flow feels native to the terminal and avoids unnecessary setup.</p><h3>Plate Details</h3><p>You can inspect a specific plate using the describe command.</p><pre>kikplate describe rust-axum-api-starter</pre><p>This is useful when checking metadata, tags, descriptions, repository information, and verification status before scaffolding.</p><h3>Submitting Plates</h3><p>Kikplate is not only for consuming templates. It also allows developers to publish and share their own plates.</p><p>To submit a repository as a plate:</p><pre>kikplate submit</pre><p>To verify a submitted plate:</p><pre>kikplate verify</pre><p>This creates a workflow where teams can maintain internal starter templates while also exposing public templates to the community.</p><h3>Personal Workspace Commands</h3><p>The CLI also includes commands focused on the authenticated user.</p><pre>kikplate my</pre><p>This command can be used to access personal plates, organizations, and bookmarks.</p><h3>Why the CLI Matters</h3><p>Most template systems stop at downloading repositories. Kikplate goes further by turning templates into a complete terminal workflow.</p><ul><li>You can search templates from the command line.</li><li>You can scaffold projects directly into repositories.</li><li>You can authenticate using SSO.</li><li>You can manage local plates.</li><li>You can publish and verify templates.</li></ul><p>All of this happens without leaving the terminal. That is what makes the CLI feel practical in daily development.</p><h3>CLI Documentation</h3><p>Full CLI documentation is available here: <a href="https://github.com/kikplate/kikplate/blob/main/docs/cli.md">https://github.com/kikplate/kikplate/blob/main/docs/cli.md</a></p><h3>Server Configuration</h3><p>By default the Kikplate CLI connects to the official Kikplate server at <a href="https://kikplate.dev/api">https://kikplate.dev/api</a></p><p>This means all search, authentication, and plate operations are executed against the hosted Kikplate platform without any additional setup.</p><p>If you are running your own Kikplate instance you can point the CLI to a different API server. This allows teams and organizations to host private plate registries while still using the same CLI workflow.</p><p>The server address can be changed through the CLI configuration or by editing the configuration file directly at ~/.kikplate/config.yaml</p><p>Once configured, all CLI commands such as search, scaffold, and plates will use your custom server instead of the default one.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=695735d0604b" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/terminal-driven-project-scaffolding-with-kikplate-695735d0604b">Terminal Driven Project Scaffolding with Kikplate</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Bypassing User Isolation on Android with a Screen Reader]]></title>
            <link>https://itnext.io/bypassing-user-isolation-on-android-with-a-screen-reader-8784558d7b66?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/8784558d7b66</guid>
            <category><![CDATA[talkback]]></category>
            <category><![CDATA[accessibility]]></category>
            <category><![CDATA[android]]></category>
            <category><![CDATA[cybersecurity]]></category>
            <dc:creator><![CDATA[Karol Wrótniak]]></dc:creator>
            <pubDate>Fri, 15 May 2026 11:08:31 GMT</pubDate>
            <atom:updated>2026-05-15T11:08:30.450Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*MhKxjuRD7Dz2dpY4.png" /></figure><p>A single missing check in Android lets one user’s screen reader leak another user’s private notifications. Here’s how it happened.</p><h3>Multi-user &amp; accessibility on Android</h3><p>Android’s <a href="https://source.android.com/docs/devices/admin/multi-user">multi-user support</a> lets several people share one device. Each user gets their own space, apps, and data. This feature is common on tablets. But not all smartphones have it. Even so, the code is there. The problem is that <a href="https://developer.android.com/reference/android/accessibilityservice/AccessibilityService">accessibility services</a> run with high privileges. They need to see everything to help users. Sometimes, this power breaks the walls between users.</p><h3>Screen readers &amp; TalkBack</h3><p><a href="https://en.wikipedia.org/wiki/Screen_reader">Screen readers</a> turn text into speech. They allow people with low vision to use apps. The screen may even be completely off, but the user can still interact with the device. <a href="https://play.google.com/store/apps/details?id=com.google.android.marvin.talkback"><strong>TalkBack</strong></a> is Google’s screen reader for Android. Normally, TalkBack only reads the currently focused UI elements. But there are ways to make it speak programmatically. One is <a href="https://developer.android.com/reference/android/view/View#announceForAccessibility(java.lang.CharSequence)">announceForAccessibility()</a> (now deprecated) - a method that forces the screen reader to read arbitrary text. Another is <a href="https://appt.org/en/docs/android/samples/accessibility-live-region">live regions</a> - parts of the UI that update without user interaction. When something changes, the system fires an <a href="https://developer.android.com/reference/android/view/accessibility/AccessibilityEvent">accessibility event</a> (a system-level broadcast) that carries the updated text. A screen reader picks it up and reads the new value aloud. Status bar notifications are one example of live regions.</p><h3>The bug: CVE-2022–20448</h3><p>The bug was simple: NotificationManagerService didn&#39;t check if a notification belonged to the current foreground user before dispatching the <a href="https://www.thedroidsonroids.com/blog/what-is-accessibility-in-mobile-apps">accessibility</a> event. This is what caused screen readers to read it out loud. Imagine a phone with two users: <strong>Alice</strong> (using the phone right now) and <strong>Bob</strong> (a background user).</p><ul><li>Bob receives a text message: <em>“Your verification code is 3291”</em>.</li><li>The system posts the notification and fires an <a href="https://www.thedroidsonroids.com/blog/provide-accessibility-in-mobile-app-guide">accessibility</a> event containing that text.</li><li>TalkBack on Alice’s active session picks up the event and reads it aloud.</li><li>Alice hears Bob’s private 2FA code.</li></ul><p>Screen readers weren’t the only apps that could intercept this data. <a href="https://www.thedroidsonroids.com/blog/mobile-app-accessibility-android-guide">Android dispatches accessibility events</a> to <strong>all</strong> registered accessibility services — not just TalkBack. Apps like <a href="https://play.google.com/store/apps/details?id=net.dinglisch.android.taskerm">Tasker</a>, which <a href="https://tasker.joaoapps.com/userguide/en/faqs/faq-problem.html">registers as an accessibility service</a> for UI automation, or notification-logging apps would also receive Bob’s notification content.</p><h3>The fix</h3><p>The entire <a href="https://android.googlesource.com/platform/frameworks/base/+/7b9ea7a75ed2de51e883f450b701c8d0d82e6e9c%5E%21/#F0">fix</a> was a single added condition — checking whether the notification actually belongs to the current user — plus a unit test to prevent regression:</p><pre>// frameworks/base/services/core/java/com/android/server/notification/NotificationManagerService.java<br><br>-                &amp;&amp; !suppressedByDnd) {<br>+                &amp;&amp; !suppressedByDnd<br>+                &amp;&amp; isNotificationForCurrentUser(record)) {</pre><p>isNotificationForCurrentUser() returns true only when the notification&#39;s owner matches the foreground user - so background users&#39; notifications are no longer broadcast as <a href="https://www.thedroidsonroids.com/blog/accessibility-standards-mobile-apps">accessibility</a> events.</p><p>I reported this issue on June 29, 2022. Google awarded a $5,000 bounty for the finding. They marked the bug as High severity in the <a href="https://source.android.com/docs/security/bulletin/2022-11-01">November 2022 Android Security Bulletin</a> and released patches for Android 10, 11, 12, 12L, and 13. The vulnerability is tracked as <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-20448">CVE-2022–20448</a>.</p><h3>Takeaway</h3><p>It really makes you wonder just how many security bugs are hiding behind assistive technologies.</p><p><em>Originally published at </em><a href="https://www.thedroidsonroids.com/blog/bypassing-user-isolation-on-android-with-a-screen-reader"><em>https://www.thedroidsonroids.com</em></a><em> on May 11, 2026.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8784558d7b66" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/bypassing-user-isolation-on-android-with-a-screen-reader-8784558d7b66">Bypassing User Isolation on Android with a Screen Reader</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Is Step Functions Still Necessary? The Case for Lambda Durable Functions in 2026]]></title>
            <description><![CDATA[<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://itnext.io/is-step-functions-still-necessary-the-case-for-lambda-durable-functions-in-2026-22f5a5f4a1a3?source=rss----5b301f10ddcd---4"><img src="https://cdn-images-1.medium.com/max/1376/1*M0tX8bVBD54obR7HRDCvuw.png" width="1376"></a></p><p class="medium-feed-snippet">AWS just erased the &#x201C;15-minute wall.&#x201D; And it may completely change how TypeScript developers build workflows.</p><p class="medium-feed-link"><a href="https://itnext.io/is-step-functions-still-necessary-the-case-for-lambda-durable-functions-in-2026-22f5a5f4a1a3?source=rss----5b301f10ddcd---4">Continue reading on ITNEXT »</a></p></div>]]></description>
            <link>https://itnext.io/is-step-functions-still-necessary-the-case-for-lambda-durable-functions-in-2026-22f5a5f4a1a3?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/22f5a5f4a1a3</guid>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[lambda-durable-functions]]></category>
            <category><![CDATA[workflow]]></category>
            <category><![CDATA[aws-lambda]]></category>
            <category><![CDATA[serverless]]></category>
            <dc:creator><![CDATA[Hoang Dinh]]></dc:creator>
            <pubDate>Fri, 15 May 2026 07:26:11 GMT</pubDate>
            <atom:updated>2026-05-15T07:26:10.103Z</atom:updated>
        </item>
        <item>
            <title><![CDATA[From Prototype to Production — Developer Abstractions that Accelerate (Part 7)]]></title>
            <link>https://itnext.io/from-prototype-to-production-developer-abstractions-that-accelerate-part-7-548fed473201?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/548fed473201</guid>
            <category><![CDATA[aiops]]></category>
            <category><![CDATA[control-plane]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[ai-control-plane]]></category>
            <dc:creator><![CDATA[Santosh Pai]]></dc:creator>
            <pubDate>Fri, 15 May 2026 07:24:39 GMT</pubDate>
            <atom:updated>2026-05-15T07:24:38.335Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DgLUHwsMEBVL735axJnpaQ.png" /></figure><p>By this stage, the system has all the necessary layers in place. Requests are <strong>validated</strong> before they leave, <strong>routed intelligently,</strong> executed within defined <strong>cost</strong> boundaries, adapted across <strong>environments</strong>, and fully <strong>observable</strong>. Each of these capabilities addresses a specific aspect of running AI systems in production, but together they introduce a level of complexity that can be difficult for teams to work with directly.</p><h3>Why do developers need abstractions?</h3><p>The challenge shifts from building the system to making it usable. Without clear abstractions, developers are required to understand and manage multiple concerns simultaneously — policies, routing logic, cost constraints, and environment configurations. This often leads to <strong>duplication of effort</strong>, <strong>inconsistencies</strong> across services, and an <strong>increasing reliance on implicit knowledge</strong> rather than shared structure. It’s a <strong>cognitive overload.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qpCXw7oiqhRrrwn1.png" /></figure><p>Abstractions address this by introducing <strong>a unified interface</strong> through which <strong>developers interact</strong> with the system. Instead of handling each layer independently, requests flow through a single entry point that applies guardrails, selects models, enforces limits, and captures observability by default. This <strong>simplifies integration while preserving control</strong>, allowing teams to focus on building features rather than orchestrating infrastructure.</p><p>Over time, these abstractions extend beyond APIs into structured configurations, reusable templates, and shared workflows. Policies become explicit and versioned, routing strategies can be defined once and applied consistently, and common patterns are reused across teams. This reduces fragmentation and makes system behaviour predictable, even as complexity increases.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*KggYcwB6tUYBn-LA.png" /></figure><p>What emerges is a set of tools, and a platform that balances flexibility with consistency. Developers can move quickly without bypassing safeguards, and teams can scale systems without introducing instability.</p><h3>The Need for an AI Control Plane</h3><p>The control plane enforces rules, enabling a way of building that remains reliable as systems evolve.</p><p>Looking across the layers — <strong>guardrails</strong>, <strong>routing</strong>, <strong>cost</strong> control, <strong>environment</strong> awareness, and <strong>observability</strong> — it becomes clear that the real value lies in how these capabilities are brought together and made accessible. This is what allows AI systems to transition from isolated prototypes to structured, production-ready systems that can be trusted to operate at scale.</p><h3>Shared Platform Capabilities</h3><p>Over time, the control plane becomes more than a gateway. It evolves into a shared platform capability used across teams and products.</p><p>New applications inherit observability automatically. Environment-aware policies are applied consistently. Routing strategies become reusable. Audit trails exist by default. Instead of each team solving operational concerns independently, the platform provides these capabilities as standardized building blocks.</p><p>This creates a significant shift in how AI systems are developed. Teams spend less time assembling infrastructure layers and more time focusing on product behaviour, user workflows, and business logic.</p><h3>Reducing Operational Fragmentation</h3><p>One of the less visible challenges in production AI systems is operational fragmentation. Different teams often introduce different SDKs, different routing strategies, different observability tools, and different governance models. Over time, this creates systems that behave inconsistently despite serving similar purposes.</p><p><strong>Developer abstractions reduce this fragmentation</strong> by creating a common operational language. Requests follow similar patterns regardless of the application. Policies are enforced consistently. Observability data becomes comparable across teams and environments.</p><p>Consistency at this layer is what allows organizations to scale AI adoption without losing operational control.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*f9GkSka5xtV-vKD6.png" /></figure><h3>The Shift from Integrations to Systems</h3><p>Early AI adoption focused on integrating models into applications. Production AI shifts the focus toward operating systems of behaviour — where governance, routing, observability, and cost management become shared infrastructure concerns rather than isolated implementation details.</p><h3>Series Summary</h3><p>This series explored the operational layers required to <strong>move AI systems from isolated prototypes to reliable production infrastructure</strong>.</p><p>We examined how <strong>guardrails</strong> define what is allowed, how <strong>routing</strong> determines where requests should go, how <strong>cost control</strong> keeps systems sustainable at scale, how <strong>environment-aware behaviour</strong> introduces operational context, and how <strong>observability and auditability</strong> make AI systems understandable over time.</p><p>The final layer focused on <strong>developer abstractions</strong> and the emergence of the <strong>AI Control Plane</strong> — a shared operational layer that brings governance, routing, cost management, environment policies, and observability together into a consistent system that teams can build on reliably.</p><p>Together, these layers represent a shift from simply integrating models into applications toward <strong>building structured, governable, and scalable AI systems</strong>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=548fed473201" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/from-prototype-to-production-developer-abstractions-that-accelerate-part-7-548fed473201">From Prototype to Production — Developer Abstractions that Accelerate (Part 7)</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Semantic Search in OutSystems Developer Cloud]]></title>
            <link>https://itnext.io/semantic-search-in-outsystems-developer-cloud-e621bc185e42?source=rss----5b301f10ddcd---4</link>
            <guid isPermaLink="false">https://medium.com/p/e621bc185e42</guid>
            <category><![CDATA[low-code]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[semantic-search]]></category>
            <category><![CDATA[outsystems]]></category>
            <dc:creator><![CDATA[Stefan Weber]]></dc:creator>
            <pubDate>Thu, 14 May 2026 14:37:32 GMT</pubDate>
            <atom:updated>2026-05-15T11:04:25.711Z</atom:updated>
            <cc:license>http://creativecommons.org/licenses/by/4.0/</cc:license>
            <content:encoded><![CDATA[<p>ODC now lets you add semantic search to your entities with a few clicks. But if you stop there, your retrieval quality will likely disappoint you. Here’s why — and what to do about it.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_7jADTi_ZpFF_7-rwr4PVQ.jpeg" /></figure><p>OutSystems Developer Cloud (ODC) has a built-in semantic search mechanism that works directly on top of your entities. You select which entities and text attributes should be searchable, and ODC takes care of the rest. No third-party search service is required.</p><p>That convenience comes with trade-offs. In this article, I’ll explain how semantic search works, walk you through what ODC supports and where it falls short, and then show how you can dramatically improve retrieval quality.</p><h3>Why Semantic Search Matters</h3><p>Instead of matching keywords, semantic search captures the meaning behind a query and returns results based on conceptual relevance. This is essential for chatbots, recommendation engines, and any scenario where users express their intent in natural language.</p><h3>How Semantic Search Works</h3><p>Traditional keyword search matches exact words. If you search for “restart device”, it finds documents containing those words. Semantic search works differently — it converts text into numerical vectors (embeddings) that represent meaning, and then finds other vectors that are close in that meaning space.</p><p>This is powerful. A search for “How do I restart the device?” will match a document titled “Reset procedure” even though neither word overlaps. The embedding model understands that “restart” and “reset” express a similar intent.</p><p>But this is also where things get tricky. Let me show you some examples.</p><h3>When Similarity Works Well</h3><p>These pairs are semantically different in wording but close in meaning — exactly what you want semantic search to find:</p><pre>Query<br>&quot;How do I restart the device?&quot;<br><br>Matches With<br>&quot;Reset procedure for your equipment&quot;<br><br>Why it Works<br>Different words, same intent</pre><pre>Query<br>&quot;The screen is black&quot;<br><br>Matches With<br>&quot;Display not showing any output&quot;<br><br>Why it Works<br>Symptom described differently</pre><pre>Query<br>&quot;Cancel my subscription&quot;<br><br>Matches With<br>&quot;How to end my membership&quot;<br><br>Why it Works<br>Synonyms and paraphrases</pre><h3>When Similarity Misleads</h3><p>Here’s where it gets dangerous. These pairs have high cosine similarity — they look close in vector space — but their meaning is fundamentally different:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/9a56b3490fb3349c33509f0414ea94fd/href">https://medium.com/media/9a56b3490fb3349c33509f0414ea94fd/href</a></iframe><p>These examples illustrate a fundamental limitation of dense vector embeddings. The models are trained on natural language patterns, and they’re excellent at capturing general meaning. But they compress fine-grained differences — negations, specific numbers, error codes, version identifiers — into nearly identical regions of the vector space.</p><blockquote><strong><em>Why this matters:</em></strong><em> This is one of the main reasons why production RAG systems use hybrid search (dense + sparse vectors). Sparse vectors excel at exact term matching and would easily distinguish “Error 503” from “Error 404”. ODC doesn’t support hybrid search today, which makes the optimization techniques later in this article even more important.</em></blockquote><h3>Cosine Similarity in a Nutshell</h3><p>When the system compares two embeddings, it uses cosine similarity — a score between -1 and 1, where 1 means identical direction in vector space. In practice, similarity scores for related text pairs typically fall between 0.3 and 0.95, depending on the embedding model and content type. A difference of 0.02 in similarity score can be the difference between a relevant and an irrelevant result, yet the misleading pairs above often score within that margin of the correct result.</p><p>This is why retrieval alone isn’t enough. You need additional mechanisms — query rewriting and reranking — to catch what vector similarity misses. We’ll get to those later.</p><h3>How ODC Implements Semantic Search</h3><p>ODC’s semantic search is tightly integrated with its entity model. Here are the key components:</p><ul><li><strong>Index Ingestion</strong> — You configure one or more text attributes of an entity for indexing. ODC chunks the content, generates embeddings via an embedding model, and stores them in a PostgreSQL PgVector extension.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/925/0*luzXeIexN4GnDenT.png" /></figure><ul><li><strong>Search Index</strong> — The vector database holds the chunked, embedded data.</li><li><strong>Retrieve</strong> — At query time, the user’s input is embedded and compared against the index using dense vector similarity.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/847/0*nMBYV1UUF6e9id_T.png" /></figure><ul><li><strong>Augment &amp; Generate</strong> — Retrieved chunks can be used in RAG pipelines (e.g., via Agent Workbench or custom applications) to ground LLM responses.</li></ul><p><strong>A note on embedding models:</strong> The quality of your embeddings depends heavily on the underlying model. ODC does not currently disclose which embedding model is used or allow you to bring your own. This limits your ability to evaluate its strengths and weaknesses for your specific domain — particularly for specialized terminology or non-English content.</p><p>At a high level, ODC’s semantic search covers three dimensions: understanding the <strong>intent</strong> behind a query, recognizing <strong>contextual</strong> relationships between words, and grasping <strong>meaning</strong> through synonyms, paraphrases, and linguistic associations. These are the core strengths of any embedding-based retrieval system — and also where its limitations begin, as we’ve seen above.</p><h3>Chunking Strategies in OutSystems</h3><p>Chunking is the process of splitting large documents into smaller, meaningful sections so they can be efficiently embedded, searched, and retrieved. Without chunking, you’d embed entire records as a single vector — burying the meaning of individual sections in noise.</p><p>Good chunking ensures that retrieval is accurate and focused, irrelevant text doesn’t overwhelm the model, hallucinations are reduced, and the system can handle large, mixed-topic content.</p><p>Instead of sending a 200-page product manual to the LLM, chunking retrieves only the section about “Resetting device settings” — so the model gives a precise answer, not noise from the entire document.</p><p>Let’s look at the four chunking strategies available in ODC.</p><h3>Fixed-Size Chunking</h3><p>This one splits text into equally sized pieces based on a maximum character count, with a configurable overlap between chunks.</p><p>It’s very simple and extremely fast, and it works even with unstructured or messy documents. The problem is that it splits sentences and concepts mid-word or mid-thought. A sentence like <em>“hold the On/Off button for 10 seconds until the Bosch logo appears”</em> can be cut right in the middle. This produces meaningless or noisy chunks that result in very poor semantic retrieval quality.</p><blockquote><strong><em>Best practice:</em></strong><em> Fixed-size chunking is the weakest strategy for semantic search. It’s acceptable only as a baseline or when content is completely unstructured and no better option is feasible. For anything with natural sentence or paragraph boundaries, avoid it.</em></blockquote><h3>Sentence-Based Chunking</h3><p>You define how many sentences a chunk may contain, along with a maximum character count and overlap. This respects natural sentence boundaries and produces more coherent chunks than fixed-size splitting.</p><p>There are some things to be aware of though. Sentence detection depends on language. ODC has dictionaries for 15 languages (English, German, French, etc.) that handle nuances like acronyms and abbreviations. For unsupported languages, only punctuation is used — leading to incorrect sentence splits. If the system doesn’t identify the language, it defaults to English.</p><p>More importantly, sentence-based chunking does not respect paragraph or section boundaries. A chunk may contain the last sentence of one topic and the first sentence of the next. Headings, lists, and tables are not treated differently from body text.</p><blockquote><strong><em>Best practice:</em></strong><em> Sentence-based chunking is a step up from fixed-size, but it only works well for homogeneous, well-punctuated prose in supported languages. Structured documents with headings, tables, or mixed content types will suffer.</em></blockquote><h3>Recursive Chunking</h3><p>This approach defines character limits and overlaps while prioritizing a hierarchy of specific characters as delimiters (e.g., headings → paragraphs → sentences). The splitter tries the highest-level separator first and only falls back to smaller ones when chunks exceed the size limit.</p><p>It aligns chunks with natural document structure, which makes it great for manuals, structured PDFs, and HTML content. The downside is that it fails on documents with broken or inconsistent structure. If headings are missing or formatting is irregular, the chunker degrades to something close to fixed-size behavior. It also doesn’t merge semantically related content across sections. If “Causes” and “Resolution” live in different sections, they end up in different chunks — even though they belong to the same concept.</p><blockquote><strong><em>Best practice:</em></strong><em> Recursive chunking is the best general-purpose strategy available in ODC. It works well for structured content but cannot handle cross-section reasoning or poorly formatted input.</em></blockquote><h3>Smart Chunking (Default)</h3><p>This is ODC’s default method. It combines recursive chunking with default separators, automatically adapting to the content found in searchable fields. No configuration required — it just works.</p><p>That said, it is still fundamentally recursive chunking with automated separator selection, so it inherits all the same limitations. It has no semantic understanding and cannot detect that “causes”, “symptoms”, and “reset procedure” belong to the same conceptual unit. Since the separators are chosen automatically, it can also be difficult to predict or debug how content is being split.</p><blockquote><strong><em>Best practice:</em></strong><em> Smart chunking is a sensible default, but don’t assume it’s optimal. For critical RAG applications, always evaluate whether recursive chunking with custom separators gives you better results.</em></blockquote><h3>Alternative Chunking Strategies</h3><p>In addition to the built-in chunking strategies, several other approaches have emerged in the RAG space that are deemed more advanced for production systems. Understanding these methods can highlight the limitations of ODC’s built-in options and guide you in implementing your own custom chunking solutions.</p><h3>Semantic Chunking</h3><p>Instead of splitting text by characters, sentences, or structural markers, semantic chunking groups text by meaning. It uses an embedding model to measure how similar consecutive sentences or paragraphs are to each other. When the similarity drops significantly, it creates a chunk boundary.</p><p>This means a troubleshooting article where “symptoms”, “causes”, and “resolution” flow naturally into each other would stay in one chunk — because the meaning is connected. Recursive chunking would split them into separate chunks based on their headings, losing that connection.</p><p>Semantic chunking is especially valuable for poorly structured documents where headings are missing or inconsistent. It doesn’t rely on formatting at all — only on what the text actually means.</p><p><em>Implementation complexity: Moderate — requires an embedding model call for each sentence pair during ingestion. Can be built in ODC by calling an external embedding API in your ingestion pipeline.</em></p><h3>Adaptive Chunking</h3><p>Adaptive chunking dynamically adjusts the chunk size based on content complexity. Dense, technical paragraphs get smaller chunks so that each embedding captures a focused idea. Simple, straightforward passages get larger chunks to avoid fragmenting content unnecessarily.</p><p>Think of a product manual where one section is a simple feature overview and the next is a detailed troubleshooting flow with multiple conditions. Fixed or recursive chunking would use the same granularity for both. Adaptive chunking would produce larger chunks for the overview and smaller, more focused chunks for the troubleshooting steps.</p><p>This prevents information loss in complex sections and reduces noise in simple ones.</p><p><em>Implementation complexity: Moderate to high — requires heuristics or a model to assess content density and adjust chunk sizes dynamically. Can be implemented with rule-based logic or a lightweight LLM call per section.</em></p><h3>Context-Enriched Chunking</h3><p>With all the strategies above, each chunk stands on its own. It has no awareness of what came before or after it. Context-enriched chunking solves this by adding a brief summary of neighboring chunks to each chunk.</p><p>For example, if chunk 3 contains a resolution step, context-enriched chunking would prepend something like: <em>“The previous section described error 503 occurring when the device loses network connectivity during a firmware update.”</em> This way, the embedding of chunk 3 captures not just the resolution itself but also the problem it relates to.</p><p>This is critical for multi-step reasoning, where the answer to a user’s question spans multiple sections. Without context enrichment, the retriever might find the resolution chunk but miss the connection to the specific error that caused it.</p><p><em>Implementation complexity: Moderate — requires a post-processing step after initial chunking that generates summaries of neighboring chunks (typically via an LLM call) and prepends them. Straightforward to implement but adds ingestion latency and cost.</em></p><h3>AI-Driven Chunking</h3><p>AI-driven chunking uses an LLM to read the entire document and decide where the most meaningful breakpoints are. Instead of following rules (split at headings, split every N sentences), the LLM identifies conceptual units the way a human reader would.</p><p>This is the most expensive strategy — it requires an LLM call during ingestion for every document — but it produces the most intuitive, human-like chunks. It’s particularly useful for mixed-source documents where structure, formatting, and content types vary wildly and no single rule-based strategy fits.</p><p><em>Implementation complexity: High — requires a full LLM processing call for every document during ingestion. Significantly increases ingestion time and cost. Best reserved for high-value content where retrieval quality is critical.</em></p><blockquote><strong><em>Key takeaway:</em></strong><em> The absence of these strategies means that out of the box, the quality of your ODC semantic search is limited by how well the four built-in chunking methods fit your content. For heterogeneous data — a mix of FAQs, troubleshooting guides, product specs, and legal text — no single built-in method will perform well across all content types.</em></blockquote><h3>Custom Chunks in OutSystems</h3><p>ODC does allow you to disable the built-in chunking on semantic search attributes and implement your own chunking logic.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/925/0*PMSjFZcJiSVHH8sn.png" /></figure><p>This means you can:</p><ul><li>Build a custom chunking pipeline in your application that preprocesses text before it is written to the entity.</li><li>Apply different chunking strategies to different content types — e.g., recursive chunking for structured manuals, sentence-based chunking for FAQs, and a custom semantic grouping for troubleshooting guides.</li><li>Implement any of the advanced strategies listed above by calling an LLM during ingestion to determine optimal chunk boundaries.</li><li>Store the pre-chunked text in your entity attributes so that ODC only handles the embedding and indexing — not the splitting.</li></ul><p>This shifts the chunking responsibility from ODC’s built-in mechanism to your application, giving you full control over chunk quality at the cost of additional development effort and ingestion complexity.</p><blockquote><strong><em>Best practice:</em></strong><em> For production RAG applications with diverse or complex content, seriously consider disabling the default chunkers and implementing a custom ingestion pipeline. The built-in methods are convenient for prototyping and simple use cases, but a tailored chunking strategy will consistently deliver better retrieval quality.</em></blockquote><p>In advanced RAG systems, the highest quality comes from a hybrid ingestion pipeline that applies different chunking methods to different content types. With custom chunking in ODC, this is achievable — it just requires you to build and maintain that pipeline yourself.</p><h3>Dense-Only Vectors: A Significant Limitation</h3><p>Chunking determines what goes into your vectors. But the type of vector itself also has a major impact on retrieval quality.</p><p>ODC semantic search uses only dense vector embeddings — compact numerical vectors that capture semantic meaning.</p><p>Dense vectors are great at understanding paraphrases and synonyms. “How do I restart the device?” matches “Reset procedure” — that kind of thing. They work well for natural language queries.</p><p>Where they struggle is exact term matching. Searching for error code “503” or product name “Kiox 300” may return semantically similar but factually wrong results. Domain-specific terms like “HANA”, “SAP”, or “OML” may not be well represented in the embedding model’s training data. Dense embeddings compress numbers and codes into a meaning space that doesn’t distinguish “Error 503” from “Error 510”.</p><p>In production RAG systems, the best practice is Hybrid Search — combining dense vector search (semantic similarity) with sparse vector search (keyword and exact-match relevance). A typical starting point uses a weighted formula:</p><pre>Final Score = 0.7 × Dense Score + 0.3 × Sparse Score</pre><p>Note that this weighting is use-case-dependent and should be tuned for your specific data and query patterns. The 0.7/0.3 split is a commonly cited baseline, not a universal rule.</p><p>This ensures that both semantic relevance and exact matching influence the ranking. A query like <em>“How do I fix error 503 on the Kiox 300?”</em> benefits from dense search understanding the intent (“fixing an issue”, “troubleshooting”) and sparse search matching the exact terms “503” and “Kiox 300”.</p><p>ODC does not support sparse vectors or hybrid search. This means that queries containing specific identifiers, codes, or product names may return less precise results than a hybrid system would.</p><h3>What’s Available and What’s Not</h3><p>Now that you’ve seen how chunking works and where dense-only vectors fall short, here’s a summary of what ODC semantic search supports today:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ce0585a32257186e4b0b2c2ab9736592/href">https://medium.com/media/ce0585a32257186e4b0b2c2ab9736592/href</a></iframe><h3>Optimizing Retrieval: Query Rewriting and Reranking</h3><p>Given the information above, the question becomes: how can you improve retrieval quality within ODC? The answer lies in two techniques that you can implement in your application logic.</p><h3>Query Rewriting</h3><p>Query rewriting is the process of transforming a user’s original query into a better, clearer, or more complete version before sending it to the retrieval system.</p><p>In RAG, the answer quality heavily depends on what you retrieve. If the query is unclear or incomplete, the retriever may miss relevant documents, retrieve irrelevant content, or show overly broad results. This is especially critical with dense-only search, where the embedding of a vague query lands in a broad, non-specific region of the vector space.</p><p>Consider a chatbot conversation:</p><p><em>User: “How do I reset it?”</em></p><p>The system cannot know what “it” refers to. Sending this raw query to semantic search will produce poor results because the embedding of “How do I reset it?” is far too generic.</p><p>An LLM rewrites the query using the conversation history to produce a self-contained, specific query:</p><p><em>Rewritten Query: “How do I reset the KIOX 300 display when it is stuck in loading state?”</em></p><p>This rewritten query produces a far more precise embedding that lands much closer to the relevant chunks in the vector space.</p><p>Query rewriting can fix ambiguity, expand missing context, add synonyms, convert conversational questions into standalone ones, and turn fragments into full queries.</p><h3>How to Implement in ODC</h3><p>Since ODC doesn’t provide built-in query rewriting, you implement it as a pre-processing step in your application:</p><ol><li>Capture the conversation history.</li><li>Before calling the semantic search action, send the user’s latest message along with the conversation history to an LLM with a prompt like: <em>“Rewrite the following user question as a standalone, self-contained search query. Use the conversation history to resolve any ambiguous references. Return only the rewritten query.”</em></li><li>Use the rewritten query as the input to ODC’s semantic search.</li></ol><p>This is a lightweight LLM call (a few tokens in, a few tokens out) that can dramatically improve retrieval relevance at minimal cost.</p><blockquote><strong><em>Best practice:</em></strong><em> Query rewriting is the single highest-impact, lowest-cost optimization you can make for your RAG pipeline. Implement it always, even for simple use cases.</em></blockquote><h3>Reranking</h3><p>Reranking is a post-retrieval optimization step where an additional model — often a cross-encoder or LLM-powered relevance scorer — evaluates and reorders the initially retrieved documents to ensure the most relevant items appear at the top.</p><p>Semantic search retrieves candidates based on embedding similarity, which is a fast but rough approximation. The initial ranking often contains results that are topically related but don’t answer the question, results that are semantically similar but factually irrelevant, and truly relevant results buried below mediocre ones.</p><p>This problem is amplified in ODC because there’s no hybrid search, the chunking strategies are limited, and dense-only retrieval can surface conceptually similar but wrong content.</p><h3>How It Works (Two-Stage Retrieval)</h3><p><strong>Stage 1 — Retrieval (fast, broad):</strong> ODC semantic search retrieves a broad set of candidate chunks (e.g., top 10–20 results).</p><p><strong>Stage 2 — Rerank (precise, slower):</strong> A more powerful model (cross-encoder or LLM) evaluates each candidate together with the user query and assigns a refined relevance score.</p><p>Here’s an example. User query: <em>“How do I fix error 503 on the Kiox 300?”</em></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/479c72d14dab08bf012990f8a27ccd01/href">https://medium.com/media/479c72d14dab08bf012990f8a27ccd01/href</a></iframe><p>The reranker pushes the most relevant result to the top and filters out noise — something the initial dense vector similarity alone could not accomplish.</p><p>Reranking greatly improves precision, lowers token costs (you pass fewer, better chunks to the LLM for generation), reduces hallucination, and is especially valuable for technical and repetitive domains where many chunks are topically similar but only a few are actually useful.</p><h3>How to Implement in ODC</h3><ol><li><strong>Over-retrieve:</strong> Configure your semantic search to return more results than you ultimately need (e.g., retrieve 15, use 3–5).</li><li><strong>Call a reranking model:</strong> After retrieval, send the query and the retrieved chunks to a reranking API. Options include:</li></ol><ul><li><strong>Cohere Rerank API</strong> — Purpose-built reranking model. Fast, cost-effective, and consistent. Recommended as a first choice.</li><li><strong>Cross-encoder models</strong> (e.g., via Azure AI or a custom endpoint) — High accuracy, good for domain-specific tuning.</li><li><strong>LLM-as-reranker</strong> — Use a prompt that asks the LLM to score each chunk’s relevance to the query on a scale of 1–10. This works but is slower, more expensive per call, and less deterministic than dedicated reranking models. Use it only when a purpose-built reranker isn’t available.</li></ul><ol><li><strong>Sort and filter:</strong> Reorder the results by the new relevance score and take only the top N.</li><li><strong>Pass to generation:</strong> Use the reranked chunks as context for the LLM response.</li></ol><p>A simple LLM-based reranking prompt:</p><p><em>“Given the following user query and a list of text passages, rate each passage’s relevance to the query on a scale of 0 to 10. Return only the passage IDs and their scores.”</em></p><h3>Putting It All Together</h3><p>Here’s the recommended architecture for high-quality semantic search in ODC:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/0*fxwtUETqJqJx_9H5.png" /></figure><p>This pipeline compensates for most of the limitations:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b3838cf9049c555663b9f5d034171479/href">https://medium.com/media/b3838cf9049c555663b9f5d034171479/href</a></iframe><blockquote><strong><em>Important caveat:</em></strong><em> Query rewriting and reranking serve as mitigations, not as comprehensive solutions to these limitations. For highly accurate retrieval, a more advanced approach incorporating both sparse and dense vectors, along with hybrid search, is necessary.</em></blockquote><h3>Summary</h3><p>ODC’s built-in semantic search is a significant step forward — it can remove the need for third-party vector databases and simplifies the developer experience. For high-accuracy RAG applications though, be aware of its constraints.</p><p><strong>Know your content.</strong> Understand the structure and diversity of the data you’re indexing. Choose the chunking strategy that fits — don’t blindly accept the default.</p><p><strong>Use recursive or smart chunking for structured content.</strong> If your entity data contains well-structured text with natural section boundaries, recursive chunking will outperform fixed-size and sentence-based approaches.</p><p><strong>Avoid fixed-size chunking</strong> unless you’re dealing with completely unstructured, messy text and have no better option.</p><p><strong>Consider custom chunking.</strong> While built-in chunking strategies are convenient, custom chunking can significantly enhance accuracy and quality — especially for heterogeneous content.</p><p><strong>Implement query rewriting.</strong> A simple LLM call before retrieval can transform a vague conversational query into a precise search input. This is your highest-impact optimization.</p><p><strong>Implement reranking.</strong> Over-retrieve and then rerank. This compensates for the lack of hybrid search and the limitations of dense-only retrieval. It’s especially critical for technical domains with specific terminology. Prefer dedicated reranking models over LLM-as-reranker for cost, speed, and consistency.</p><p><strong>Acknowledge the limitations you can’t change.</strong> Currently, ODC lacks support for sparse vectors, hybrid search, and custom embedding models. Navigate these constraints by employing the pre- and post-retrieval optimizations mentioned earlier. For high-accuracy requirements, an external tech stack remains necessary.</p><p><strong>Monitor and iterate.</strong> Collect user feedback on search quality. The gap between “good enough” and “production-grade” is almost always closed through iterative refinement of chunking parameters, query rewriting prompts, and reranking thresholds.</p><p>If you’ve implemented any of these optimizations in your ODC projects, I’d love to hear about your results and experiences. Let’s connect on <a href="https://www.linkedin.com/stefanweber1">LinkedIn</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e621bc185e42" width="1" height="1" alt=""><hr><p><a href="https://itnext.io/semantic-search-in-outsystems-developer-cloud-e621bc185e42">Semantic Search in OutSystems Developer Cloud</a> was originally published in <a href="https://itnext.io">ITNEXT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>