Python + AI Weekly Office Hours: Recordings & Resources #280
Replies: 103 comments
-
|
2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to? Yes, this is already happening quite a bit. Common use cases include:
A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL. The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you set up Entra OBO (On-Behalf-Of) flow for Python MCP servers? 📹 5:48 The demo showed how to use the Graph API with the OBO flow to find out the groups of a signed-in user and use that to decide whether to allow access to a particular tool. The flow works as follows:
For the authentication dance, FastMCP handles the DCR (Dynamic Client Registration) flow since Entra itself doesn't support DCR natively. To test from scratch:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: Which MCP inspector should I use for testing servers with Entra authentication? 📹 20:24 The standard MCP Inspector doesn't work well with Entra authentication because it doesn't do the DCR (Dynamic Client Registration) dance properly. MCP Jam is recommended instead because it properly handles the OAuth flow with DCR. To set it up:
MCP Jam also has nice features like:
One note: enum values in tools don't yet show as dropdowns in MCP Jam (issue to be filed). Links shared: What's the difference between MCP Jam and LM Studio? 📹 34:19 LM Studio is primarily for playing around with LLMs locally. MCP Jam has some overlap since it includes a chat interface with access to models, but its main purpose is to help you develop MCP servers and apps. It's focused on the development workflow rather than just chatting with models. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you track LLM usage tokens and costs? 📹 28:04 For basic tracking, Azure portal shows metrics for token usage in your OpenAI accounts. You can see input tokens and output tokens in the metrics section. You can also:
If you use multiple providers, you need a way to consolidate the tracking. OpenTelemetry metrics could work but you'd need a way to hook into each system. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you keep yourself updated with all the new changes related to AI? 📹 30:32 Several sources recommended:
Particularly recommended:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you build a Microsoft Copilot agent in Python with custom API calls? 📹 36:30 For building agents that work with Microsoft 365 Copilot (which appears in Windows Copilot and other Microsoft surfaces):
The agent framework team is responsive if there are issues. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: As a backend developer with a non-CS background, how do I learn about AI from scratch? 📹 46:39 Recommended approach:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: What's new with the RAG demo (azure-search-openai-demo) after the SharePoint data source was added? 📹 49:50 The main work is around improving ACL (Access Control List) support. The cloud ingestion feature was added recently, but it doesn't yet support ACLs. The team is working on making ACLs compatible with all features including:
A future feature idea: adding an MCP server to the RAG repo for internal documentation use cases, leveraging the Entra OBO flow for access control. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to? 📹 53:53 Yes, this is already happening quite a bit. Common use cases include:
A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL. The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: What advantages do other formats have over .txt for prompts? How do you improve prompts with DSPy and evals? 📹 4:55 Prompty is a template format that mixes Jinja and YAML together. The YAML goes at the top for metadata, and the rest is Jinja templating. Jinja is the most common templating system for Python (used by Flask, etc.). The nice thing about Jinja is you can pass in template variables—useful for customization, passing in citations, etc. Prompty turns the file into a Python list of chat messages with roles and contents. However, we're moving from Prompty to plain Jinja files because:
Recommendation: Keep prompts separate from code when possible, especially long system prompts. Use plain .txt or .md if you don't need variables, or Jinja if you want to render variables. With agents and tools, some LLM-facing text (like tool descriptions in docstrings) will inevitably live in your code—that's fine. For iterating on prompts: Run evaluations, change the prompt, and see whether it improves things. There are tools like DSPy and Agent Framework's Lightning that do automated prompt optimization/fine-tuning. Lightning says it "fine-tunes agents" but may actually be doing prompt changes. Most of the time, prompt changes don't make a huge difference, but sometimes they might. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: What is the future of AI and which specialization should I pursue? 📹 11:54 If you enjoy software engineering and full-stack engineering, it's more about understanding the models so you understand why they do what they do, but it's really about how you're building on top of those models. There's lots of interesting stuff to learn, and it really depends on you and what you're most interested in doing. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: Which livestream series should I follow to build a project using several tools and agents, and should I use a framework? 📹 13:33 Everyone should understand tool calling before moving on to agents. From the original 9-part Python + AI series, start with tool calling, then watch the high-level agents overview. The upcoming six-part series in February will dive deeper into each topic, especially how to use Agent Framework. At the bare minimum, you should understand LLMs, tool calling, and agents. Then you can decide whether to do everything with just tool calling (you can do it yourself with an LLM that has tool calling) or use an agent framework like LangChain or Agent Framework if you think it has enough benefits for you. It's important to understand that agents are based on tool calling—it's the foundation of agents. The success and failure of agents has to do with the ability of LLMs to use tool calling. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: How does Azure manage the context window? How do I maintain a long conversation with a small context window? 📹 15:21 There are three general approaches:
With today's large context windows (128K, 256K), it's often easier to just wait for an error and tell the user to start a new chat, or do summarization when the error occurs. This approach is most likely to work across models since every model should throw an error when you're over the context window. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: How do we deal with context rot and how do we summarize context using progressive disclosure techniques? 📹 19:17 Read through Kelly Hong's (Chroma researcher) blog post on context rot. The key point is that even with a 1 million token context window, you don't have uniform performance across that context window. She does various tests to see when performance starts getting worse, including tests on ambiguity, distractors, and implications. A general tip for coding agents with long-running tasks: use a main agent that breaks the task into subtasks and spawns sub-agents for each one, where each sub-agent has its own focused context. This is the approach used by the LangChain Deep Agents repo. You can also look at how different projects implement summarization. LangChain's summarization middleware is open source—you can see their summary prompt and approach. They do approximate token counting and trigger summarization when 80% of the context is reached. Links shared:
How do I deal with context issues when using the Foundry SDK with a single agent? 📹 25:03 If you're using the Foundry SDK with a single agent (hosted agent), you can implement something like middleware through hooks or events. Another approach is the LangChain Deep Agents pattern: implement sub-agents as tools where each tool has a limited context and reports back a summary of its results to the main agent. For the summarization approach with Foundry agents, you'd need to figure out what events, hooks, or middleware systems they have available. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: Have you seen or implemented anything related to AG-UI or A2UI? 📹 29:02 AG-UI (Agent User Interaction Protocol) is an open standard introduced by the CopilotKit team that standardizes how front-end applications communicate with AI agents. Both Pydantic AI and Microsoft Agent Framework have support for AG-UI—they provide adapters to convert messages to the AG-UI format. The advantage of standardization is that if people agree on a protocol between backend and frontend, it means you can build reusable front-end components that understand how to use that backend. Agent Framework also supports different UI event stream protocols, including Vercel AI (though Vercel is a competitor, so support may be limited). These are adapters—you can always adapt output into another format if needed, but it's nice when it's built in. A2UI is created by Google with Consortium CopilotKit and relates to A2A (Agent-to-Agent). A2UI appears to be newer with less support currently in Agent Framework, though A2A is supported. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: What is the GA version of agent framework? Can we install it now? 📹 2:55 At Microsoft, "GA" (Generally Available) means a product is stable enough for production use — similar to what used to be called "beta" vs. "release." Some companies only adopt GA products. Agent framework is currently in public preview — you can definitely install and use it. There are many samples and an entire series built on it. However, the interface has been changing as the team incorporates feedback. The framework has been progressing through release candidates (RC1 through RC6), with RC6 expected to be the last before the official 1.0.0 GA release. Once that ships, it won't be marked as a pre-release anymore. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: What are some good local models for agents? 📹 5:08 Pamela shared her experience trying to find good local models for agents on her Mac M1 with 16 GB RAM. Using CanIRun.ai to check hardware compatibility, her options were limited to Phi 3.18B and Qwen 3.5:9B. She used Qwen 3.5:9B extensively and found it technically works for agents but with significant quality issues: it would sometimes output Chinese (likely due to training data), and would forget the original user question after making tool calls. The 27B version is reportedly much better but requires more memory. She also shared a SQL benchmark for LLMs that compares both frontier and local models on text-to-SQL tasks. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: What is GEPA's "Optimize Anything"? 📹 7:30 Pamela attended a meetup where she met the creator of "Optimize Anything," which uses GEPA — a prompt optimizer based on genetic algorithms. Instead of trying every possible prompt variation, it intelligently decides what to try next, giving you efficient prompt optimization. It's not just for prompts — people use it for structured outputs, and GEPA can even optimize skills (non-LLM tasks). Developers are reporting really good results from using GEPA to improve their LLM workflows. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Has anyone at Microsoft tested TurboQuant for local models? 📹 8:34 Pamela hadn't heard of TurboQuant before — it's a KV cache compression technique from Google published just 11 hours before the office hours. She said she'd ask colleagues about it. The discussion continued later in the session, with community members sharing that TurboQuant provides ~6x KV cache memory reduction and up to 8x faster attention computation on H100, with minimal accuracy loss at 3-4 bits. The practical impact is primarily on inference serving rather than application development. It could potentially help when deploying models on serverless GPUs (like Azure Container Apps with serverless GPU using vLLM). Someone already created a vLLM plugin for TurboQuant. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Have you used the Work IQ MCP server? 📹 11:39 Yes — Pamela has it configured in her Copilot CLI. She demonstrated it by asking "Do I have any meetings today?" and showed it querying Work IQ to check her calendar. Work IQ is a standard input/output MCP server (not HTTP), and you configure it in your Work IQ provides read-only access — it can read calendar events, search through Teams chats, and search SharePoint. It's similar to using Copilot in Teams but convenient because you can use it from any MCP client. You need to authenticate during setup, and then it works from there. |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Does Microsoft have anything to check agent skills for security and effectiveness? 📹 16:48 For effectiveness, Pamela recommended two tools from her colleague Shane:
For security, there's nothing specific she's aware of yet. She emphasized that security for agents should be enforced at the environment level — lock down what commands an agent can run and what credentials it can access, rather than relying on prompt instructions. Running skills in isolated Docker containers is safer than running directly on a machine with production credentials. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Have you seen the "Mirage" problem with multimodal LLMs? 📹 23:22 Pamela shared a paper called "MIRAGE: The Illusion of Visual Understanding" which found that multimodal LLMs sometimes hallucinate image understanding — they respond as if they've analyzed an image even when no image was provided. For example, when asked to identify tissue in a histology slide with no image attached, the LLM would confidently describe "cardiac muscle tissue." The researchers also found that some benchmark successes weren't actually due to image understanding. They cleaned benchmarks by removing compromised questions and re-evaluated vision-language models, finding performance dropped across the board. Relatedly, Pamela discussed Azure AI Vision (Florence model) being deprecated, with image analysis service retiring in 2028. Microsoft is moving away from dedicated vision models toward generic multimodal LLMs and embedding models like Cohere Embed. For multimodal RAG, Pamela noted a quirk: when providing both an image and its text description to an LLM, the model strongly prefers referencing the text description over actually looking at the image. You may need to explicitly prompt it to examine the image itself. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Why are people saying MCP is dead? Is it because of skills? 📹 29:33 Pamela referred back to the previous week's extensive discussion and reiterated: MCP is not dead (though there's a tongue-in-cheek funeral for it on April 1st in NYC). Skills are useful for coding agents, but MCP is particularly valuable when authentication is involved, when you're in deployed environments across organizations, and for features like elicitations and destructive tool decorators. The MCP protocol has a working group exploring how to make skills a first-class concept discoverable from MCP servers, alongside tools, resources, and prompts. All major agent frameworks now support both skills and MCP. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: What are some recommended talks from the Py AI conference? 📹 33:00 The talks from the Py AI conference are being uploaded to YouTube. Pamela highlighted several:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Ollama now powered by MLX on Apple Silicon? 📹 36:44 Ollama announced preview support for MLX on Apple Silicon, promising faster local model inference on Macs. However, the current requirement is a Mac with 32+ GB of unified memory, which means it doesn't help Pamela (16 GB M1). If you have a newer Mac with sufficient memory and GPU neural accelerators (especially M5), it's worth trying. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: When can we see agent framework durable agents in an Azure Functions series? 📹 40:04 Pamela hasn't had a chance to play with durable execution yet. The durable task extension for Microsoft Agent Framework provides serverless hosting, session management, deterministic multi-agent orchestrations with automatic checkpointing, and human-in-the-loop support. She noted it's not yet compatible with Foundry hosted agents — you'd deploy on Azure Functions to get durability. She offered to invite Nick (from the durable team) to a future office hours to demo it. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Is private networking supported for Foundry hosted agents? 📹 43:18 Currently, you cannot create hosted agents within network-isolated Foundry projects. Pamela said she'd check with colleagues and hoped this limitation would be addressed soon, given how important private networking is for Microsoft's enterprise customers. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Any tips to maximize GitHub Copilot Auto output quality with the Codex model? 📹 48:16 Pamela doesn't use Auto mode herself (she uses Opus). She recommended reaching out to Burke Holland and Pierce Boggan on Twitter/X — they're VS Code advocates who are more familiar with Auto mode. Update posted after office hours: Pierce (from the VS Code team) responded: "It should do better in Insiders, where our task detection model is now at 100% as of today. Based on those results, we'll roll to stable. Today, Auto is based purely off available capacity and uptime, so the model mix can be quite static in how it's applied." Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: How to start learning AI? What about the "AI Engineering" book? 📹 49:49 Pamela recommended starting with the Python AI series before reading Chip Huyen's "AI Engineering" book. She shared her blog post on how she learns about generative AI, which covers the AI Engineering book, AI news sources, hands-on practice, and Microsoft's video series. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/03/31: Announcements 📹 0:39 Agent Framework RC6 released: Release candidate 6 for the Python agent-framework is out and expected to be the last RC before GA 1.0.0. GitHub Copilot privacy policy change: GitHub updated its privacy statement so that Copilot Free, Pro, and Pro Plus users' interaction data (inputs, outputs, code snippets) will be used to train and improve AI models unless users opt out. Enterprise and organization users with private repos are not affected by this change. Links shared: Host Your Agents on Foundry series: A new three-part live stream series starting end of April covering hosting agent framework, LangChain/LangGraph on Foundry, and quality/safety evaluations. Includes office hours after each session. Links shared: Upcoming events: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Each week, we hold weekly office hours about all things Python + AI in the Foundry Discord.
Join the Discord here: http://aka.ms/aipython/oh
This thread will list the recordings of each office hours, and any other resources that come out of the OH sessions. The questions and answers are automatically posted (based on the transcript) as comments in this thread.
March 31, 2026
Topics covered:
March 24, 2026
Topics covered:
March 17, 2026
Topics covered:
February 17, 2026
Topics covered:
February 10, 2026
Topics covered:
February 3, 2026
Topics covered:
January 27, 2026
Topics covered:
January 20, 2026
Topics covered:
January 13, 2026
Topics covered:
January 6, 2025
Topics covered:
Beta Was this translation helpful? Give feedback.
All reactions