<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Beam College</title><link>https://beamcollege.dev/</link><description>Recent content on Beam College</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 15 May 2026 22:53:40 +0000</lastBuildDate><atom:link href="https://beamcollege.dev/index.xml" rel="self" type="application/rss+xml"/><item><title>Apache Beam: Powering the Future of Event-Driven Agents at Scale</title><link>https://beamcollege.dev/sessions/2026/powering-the-future-of-event-driven/</link><pubDate>Mon, 09 Mar 2026 14:02:09 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/powering-the-future-of-event-driven/</guid><description>&lt;p>Where did Apache Beam start, and where is it taking us? Join us as we dive into the origins of the Beam model and examine the critical shift toward a streaming-first data landscape. We’ll break down why—in today’s expansive landscape that demands real-time context—robust streaming has become the essential nervous system for event-driven agents.&lt;/p>
&lt;p>Discover how the unified Beam model enables autonomous agents to react, reason, and act in real-time, turning raw, unbounded data streams into intelligent, decentralized action.&lt;/p></description></item><item><title>Getting Started with Apache Beam (2023)</title><link>https://beamcollege.dev/sessions/2023/1-getting-started/</link><pubDate>Thu, 13 Feb 2025 21:22:35 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/1-getting-started/</guid><description/></item><item><title>Beam in the Data Analytics Lifecycle (2022)</title><link>https://beamcollege.dev/sessions/2022/1-beam-data-lifecycle/</link><pubDate>Thu, 13 Feb 2025 21:08:48 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/1-beam-data-lifecycle/</guid><description/></item><item><title>How Apache Beam sets you up for a generative AI world</title><link>https://beamcollege.dev/sessions/2024/beaming-future/</link><pubDate>Wed, 14 Feb 2024 09:50:04 -0600</pubDate><guid>https://beamcollege.dev/sessions/2024/beaming-future/</guid><description/></item><item><title>Overview of Beam Quest</title><link>https://beamcollege.dev/sessions/2024/beam-quest/</link><pubDate>Wed, 14 Feb 2024 09:48:13 -0600</pubDate><guid>https://beamcollege.dev/sessions/2024/beam-quest/</guid><description/></item><item><title>Project Shield: How we use Beam to defend democracy</title><link>https://beamcollege.dev/sessions/2024/project-shield/</link><pubDate>Wed, 14 Feb 2024 09:48:13 -0600</pubDate><guid>https://beamcollege.dev/sessions/2024/project-shield/</guid><description/></item><item><title>AI-Powered Data Processing</title><link>https://beamcollege.dev/challenges/ai-powered-data-processing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/challenges/ai-powered-data-processing/</guid><description>Build a data pipeline that leverages AI techniques to process and derive insights from data.</description></item><item><title>Apache Beam: Powering the Future of Event-Driven Agents at Scale</title><link>https://beamcollege.dev/sessions/2026/powering-future/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2026/powering-future/</guid><description>&lt;p>Where did Apache Beam start, and where is it taking us?&lt;/p>
&lt;p>Join us as we dive into the origins of the Beam model and examine the critical shift toward a streaming-first data landscape. We’ll break down why—in today’s expansive landscape that demands real-time context—robust streaming has become the essential nervous system for event-driven agents.&lt;/p>
&lt;p>Discover how the unified Beam model enables autonomous agents to react, reason, and act in real-time, turning raw, unbounded data streams into intelligent, decentralized action.&lt;/p></description></item><item><title>Beam College 2021 Opening Remarks</title><link>https://beamcollege.dev/sessions/2021/1-opening-remarks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2021/1-opening-remarks/</guid><description>&lt;p>Welcome to the opening remarks for Beam College 2021 where Evren provided a brief overview of the importance of Apache Beam and why it is a great moment to learn.&lt;/p></description></item><item><title>Learning Resources for Apache Beam</title><link>https://beamcollege.dev/sessions/2023/2-learning-resources/</link><pubDate>Thu, 13 Feb 2025 21:23:57 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/2-learning-resources/</guid><description/></item><item><title>Beam Overview (2022)</title><link>https://beamcollege.dev/sessions/2022/2-beam-overview/</link><pubDate>Thu, 13 Feb 2025 21:10:42 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/2-beam-overview/</guid><description/></item><item><title>Background About Data Processing Systems</title><link>https://beamcollege.dev/sessions/2023/0-background/</link><pubDate>Mon, 13 Feb 2023 21:23:57 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/0-background/</guid><description/></item><item><title>Apache Beam in the Data Analytics Lifecycle</title><link>https://beamcollege.dev/sessions/2021/2-beam-in-data-lifecycle/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2021/2-beam-in-data-lifecycle/</guid><description/></item><item><title>Authoring your first pipeline</title><link>https://beamcollege.dev/sessions/2026/authoring-first-pipeline/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2026/authoring-first-pipeline/</guid><description>&lt;p>This hands-on session guides beginners through creating their first Apache Beam pipeline from scratch. We&amp;rsquo;ll start with core Beam concepts—PCollections, PTransforms, and the Pipeline object—then walk through a practical example building a data processing pipeline step by step. You&amp;rsquo;ll learn how to read data from sources, apply transformations like Map, FlatMap, and GroupByKey, and write results to sinks. The session covers common patterns, debugging techniques, and best practices for structuring your pipeline code. We&amp;rsquo;ll also explore how these foundational concepts translate to real-world MLOps scenarios like feature engineering pipelines and batch inference workflows. Whether you&amp;rsquo;re new to Beam or looking to integrate it into your ML platform, you&amp;rsquo;ll leave with the confidence to start building production-ready pipelines on runners like Dataflow.&lt;/p></description></item><item><title>Build a data pipeline without coding</title><link>https://beamcollege.dev/challenges/dataflow-builder/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/challenges/dataflow-builder/</guid><description>Prototype common data processing tasks using Apache Beam YAML within a Google Colab environment.</description></item><item><title>Making the Jump from Batch to Streaming</title><link>https://beamcollege.dev/sessions/2026/making-the-jump-from-batch-to-streaming/</link><pubDate>Mon, 09 Feb 2026 16:47:05 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/making-the-jump-from-batch-to-streaming/</guid><description>&lt;p>This session dives into Apache Beam’s streaming primitives, focusing on reading from unbounded sources using Splittable DoFn and Unbounded Source, windowing strategies, and triggers with accumulation modes.&lt;/p></description></item><item><title>From Click to Chart: Building a Real-Time Analytics Engine</title><link>https://beamcollege.dev/sessions/2026/from-click-to-chart/</link><pubDate>Mon, 09 Feb 2026 16:43:32 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/from-click-to-chart/</guid><description>&lt;p>This session, &amp;ldquo;From Click to Chart,&amp;rdquo; demystifies the engineering behind real-time analytics. We will trace the lifecycle of a data point as it travels through a modern Google Cloud Platform (GCP) architecture, moving from ingestion to visualization in seconds. Using Apache Beam, we will explore how to build a unified pipeline that handles massive streams of data without the need to manage servers.&lt;/p>
&lt;p>Key Points Addressed:&lt;/p>
&lt;ul>
&lt;li>The &amp;ldquo;Why&amp;rdquo; of Streaming: Understanding the shift from traditional Batch processing to Real-Time Streaming.&lt;/li>
&lt;li>The Architecture: A deep dive into the &amp;ldquo;Golden Path&amp;rdquo; stack: Cloud Pub/Sub (Ingest) $\rightarrow$ Dataflow (Process) $\rightarrow$ BigQuery (Store).&lt;/li>
&lt;li>Apache Beam Fundamentals: Introduction to the unified programming model, including Windowing (how to group infinite data) and Watermarks (handling late data).&lt;/li>
&lt;/ul></description></item><item><title>Real-Time Semantic Enrichment &amp; Clustering</title><link>https://beamcollege.dev/sessions/2022/3-semantic-enrichment/</link><pubDate>Thu, 13 Feb 2025 21:11:58 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/3-semantic-enrichment/</guid><description/></item><item><title>How to be Involved with Beam Community</title><link>https://beamcollege.dev/sessions/2023/3-community/</link><pubDate>Mon, 13 Feb 2023 21:34:45 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/3-community/</guid><description/></item><item><title>Industry Data Management 4.0</title><link>https://beamcollege.dev/sessions/2021/3-industry-data-management/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2021/3-industry-data-management/</guid><description/></item><item><title>Real-time Anomaly Detection</title><link>https://beamcollege.dev/challenges/anomaly-detection/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/challenges/anomaly-detection/</guid><description>Build a data pipeline that enables real-time anomaly detection.</description></item><item><title>YAML: a new SDK to author your pipelines</title><link>https://beamcollege.dev/sessions/2026/yaml/</link><pubDate>Mon, 09 Feb 2026 16:30:58 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/yaml/</guid><description>&lt;p>In this talk, we explore a new way of authoring and running your Beam pipelines; via the YAML SDK! Learn how you can split your pipeline infrastructure from your complex processing logic.&lt;/p></description></item><item><title>Tutorial: My First Pipeline with Beam</title><link>https://beamcollege.dev/sessions/2022/4-my-first-pipeline/</link><pubDate>Thu, 13 Feb 2025 21:13:46 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/4-my-first-pipeline/</guid><description/></item><item><title>Give Me Back My Data</title><link>https://beamcollege.dev/sessions/2021/4-give-back-data/</link><pubDate>Thu, 13 Feb 2025 20:25:43 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/4-give-back-data/</guid><description/></item><item><title>Leverage Managed I/O</title><link>https://beamcollege.dev/challenges/iceberg/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/challenges/iceberg/</guid><description>Build a pipeline that integrates with Iceberg | Kafka | BigQuery via Managed I/O.</description></item><item><title>The Dataflow Job Builder</title><link>https://beamcollege.dev/sessions/2026/the-dataflow-job-builder/</link><pubDate>Mon, 09 Feb 2026 16:31:55 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/the-dataflow-job-builder/</guid><description>&lt;p>Learn how you can create low-code and no-code Beam YAML jobs in the Cloud Dataflow UI.&lt;/p></description></item><item><title>Exception Management Inside Beam</title><link>https://beamcollege.dev/sessions/2022/5-exception-management/</link><pubDate>Thu, 13 Feb 2025 21:15:55 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/5-exception-management/</guid><description/></item><item><title>Apache Beam Overview</title><link>https://beamcollege.dev/sessions/2021/5-apache-beam-overview/</link><pubDate>Thu, 13 Feb 2025 20:28:30 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/5-apache-beam-overview/</guid><description>&lt;p>Slides available at &lt;a href="https://github.com/griscz/beam-college/blob/main/day1/A2-Apache_Beam_Overview.pdf" target="_blank" rel="noopener">https://github.com/griscz/beam-college/blob/main/day1/A2-Apache_Beam_Overview.pdf&lt;/a>&lt;/p></description></item><item><title>CI/CD with Dataflow Templates</title><link>https://beamcollege.dev/sessions/2024/ci-cd-dataflow-templates/</link><pubDate>Wed, 14 Feb 2024 09:39:28 -0600</pubDate><guid>https://beamcollege.dev/sessions/2024/ci-cd-dataflow-templates/</guid><description/></item><item><title>Introducing Managed IO, the New Era of Beam Connectors</title><link>https://beamcollege.dev/sessions/2026/introducing-managed-io/</link><pubDate>Mon, 09 Feb 2026 16:45:01 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/introducing-managed-io/</guid><description>&lt;p>Discover what makes Managed IO a major leap towards a more unified, flexible, and upgrade-friendly connector ecosystem. This talk dives into the motivation and design behind Managed IO, highlighting a few key goals:&lt;/p>
&lt;p>• Provide a consistent API across all connectors&lt;br>
• Enable runners to seamlessly upgrade IOs, pulling in bug fixes and new features — all with zero user effort&lt;br>
• Allow runners to fine-tune connector behavior for their environment&lt;/p></description></item><item><title>Part 3: Intro to RunInference</title><link>https://beamcollege.dev/sessions/2023/6-beam-ml-3/</link><pubDate>Thu, 13 Feb 2025 21:27:01 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/6-beam-ml-3/</guid><description/></item><item><title>Multi-language Pipelines with Beam (2022)</title><link>https://beamcollege.dev/sessions/2022/6-multi-language/</link><pubDate>Thu, 13 Feb 2025 21:18:08 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/6-multi-language/</guid><description/></item><item><title>Describing a Pipeline Declaratively</title><link>https://beamcollege.dev/sessions/2021/6-describing-pipeline-declaratively/</link><pubDate>Thu, 13 Feb 2025 20:30:12 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/6-describing-pipeline-declaratively/</guid><description>&lt;p>Slides available at &lt;a href="https://github.com/griscz/beam-college/blob/main/day1/A3-Describe_pipeline_declaratively.pdf" target="_blank" rel="noopener">https://github.com/griscz/beam-college/blob/main/day1/A3-Describe_pipeline_declaratively.pdf&lt;/a>&lt;/p></description></item><item><title>Open Challenge: Innovate with Beam</title><link>https://beamcollege.dev/challenges/open/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/challenges/open/</guid><description>Build any solution you want using Apache Beam.</description></item><item><title>Part 4: From Speech to Classifier</title><link>https://beamcollege.dev/sessions/2023/7-beam-ml-4/</link><pubDate>Thu, 13 Feb 2025 21:27:01 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/7-beam-ml-4/</guid><description/></item><item><title>Dataflow Templates</title><link>https://beamcollege.dev/sessions/2022/7-dataflow-templates/</link><pubDate>Thu, 13 Feb 2025 21:19:40 -0600</pubDate><guid>https://beamcollege.dev/sessions/2022/7-dataflow-templates/</guid><description/></item><item><title>Runner Architecture, Management &amp; Autotuning</title><link>https://beamcollege.dev/sessions/2021/7-apache-beam-runner-architecture/</link><pubDate>Thu, 13 Feb 2025 20:31:54 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/7-apache-beam-runner-architecture/</guid><description/></item><item><title>Beam YAML Bootcamp</title><link>https://beamcollege.dev/sessions/2024/beam-yaml-bootcamp/</link><pubDate>Wed, 14 Feb 2024 09:46:58 -0600</pubDate><guid>https://beamcollege.dev/sessions/2024/beam-yaml-bootcamp/</guid><description/></item><item><title>Scaling Iceberg Ingestion with Apache Beam</title><link>https://beamcollege.dev/sessions/2026/scaling-iceberg-ingestion/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2026/scaling-iceberg-ingestion/</guid><description>&lt;p>This session explores the technical evolution of Apache Iceberg integration within the Apache Beam ecosystem. We dive into a suite of recent performance enhancements designed to streamline data lake ingestion at scale. Key topics include the adoption of table-defined compression for improved processing and storage efficiency, and the implementation of metadata caching to minimize lookups and prevent metadata service quota exhaustion. We also examine direct write capabilities that bypass expensive processing for large bundles, and autosharding mechanisms that optimize file sizes and ensure horizontal scalability.&lt;/p></description></item><item><title>Part 5: LLM to Speech Output</title><link>https://beamcollege.dev/sessions/2023/8-beam-ml-5/</link><pubDate>Thu, 13 Feb 2025 21:27:01 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/8-beam-ml-5/</guid><description/></item><item><title>Apache Beam Demo &amp; Dataflow SQL</title><link>https://beamcollege.dev/sessions/2021/8-demo-dataflow-sql/</link><pubDate>Thu, 13 Feb 2025 20:31:54 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/8-demo-dataflow-sql/</guid><description/></item><item><title>Getting Started with Remote ML Inference in Beam Java</title><link>https://beamcollege.dev/sessions/2026/getting-started-remote-ml-inference/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/sessions/2026/getting-started-remote-ml-inference/</guid><description>&lt;p>This session introduces the new Remote ML Inference transform in the Apache Beam Java SDK and shows how Java pipelines can run inference using external model services such as OpenAI. We’ll walk through how the transform works, how to use the OpenAI model handler with practical examples, and how to implement custom model handlers for other remote ML providers. The talk covers common usage patterns, framework extensibility, and includes a live demo so developers can quickly add remote ML inference capabilities to their Beam Java pipelines.&lt;/p></description></item><item><title>Real-Time Anomaly Detection with Apache Beam</title><link>https://beamcollege.dev/sessions/2026/real-time-anomaly-detection-with-apache-beam/</link><pubDate>Mon, 09 Feb 2026 16:45:55 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/real-time-anomaly-detection-with-apache-beam/</guid><description>&lt;p>Real-time anomaly detection is essential for identifying unexpected patterns and critical events in streaming data. This talk addresses the unique algorithmic challenges of anomaly detection in streaming environments and introduces a new feature within Apache Beam designed for this purpose. We will demonstrate how to seamlessly integrate both online and pre-trained offline anomaly detection models into Beam pipelines, empowering users to build robust, scalable, and real-time anomaly detection systems.&lt;/p></description></item><item><title>Part 6: Recap and How to Extend the Example</title><link>https://beamcollege.dev/sessions/2023/9-beam-ml-6/</link><pubDate>Thu, 13 Feb 2025 21:27:01 -0600</pubDate><guid>https://beamcollege.dev/sessions/2023/9-beam-ml-6/</guid><description/></item><item><title>Advanced Patterns for Windows &amp; Triggers</title><link>https://beamcollege.dev/sessions/2021/9-advanced-patterns-windows-triggers/</link><pubDate>Thu, 13 Feb 2025 20:36:13 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/9-advanced-patterns-windows-triggers/</guid><description/></item><item><title>Building Scalable Semantic Search and RAG Pipelines</title><link>https://beamcollege.dev/sessions/2026/building-scalable-semantic-search-and-rag-pipelines/</link><pubDate>Mon, 09 Feb 2026 16:43:32 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/building-scalable-semantic-search-and-rag-pipelines/</guid><description>&lt;p>This presentation introduces vector-based semantic search and Retrieval Augmented Generation (RAG), demonstrating how to build scalable pipelines for using Apache Beam. We’ll start by explaining fundamental concepts like chunking, embeddings and vector similarity. Then we’ll explore semantic search applications before extending to full RAG systems.&lt;/p>
&lt;p>The presentation walks through implementing both semantic search and RAG pipelines using Apache Beam’s ML components, covering data ingestion, chunking, embedding generation, vector database integration, and similarity search. By the end, students will understand the theoretical foundations of both systems and have practical knowledge of how to implement them at scale using Apache Beam’s distributed processing capabilities.&lt;/p></description></item><item><title>DoFn Lifecycle and User Code Requirements</title><link>https://beamcollege.dev/sessions/2021/10-dofn-lifecycle/</link><pubDate>Thu, 13 Feb 2025 20:38:18 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/10-dofn-lifecycle/</guid><description/></item><item><title>Video Data Processing with Apache Beam</title><link>https://beamcollege.dev/sessions/2026/video-data-processing/</link><pubDate>Mon, 09 Feb 2026 16:43:32 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/video-data-processing/</guid><description>&lt;p>This architecture leverages a Splittable DoFn for parallelized video ingestion, distributing frame extraction across workers before applying Sliding Window logic to generate temporal 3D tensors. It utilizes Beam’s RunInference API with a KeyedModelHandler for GPU-accelerated inference, ensuring robust state management. Finally, CoGroupByKey synchronizes asynchronous feature vectors with metadata, serializing the aligned dataset into TFRecord SequenceExamples for downstream training.&lt;/p></description></item><item><title>Branching &amp; Merging PCollections</title><link>https://beamcollege.dev/sessions/2021/11-branching-merging/</link><pubDate>Thu, 13 Feb 2025 20:55:06 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/11-branching-merging/</guid><description/></item><item><title>Real-Time Stateful Processing of Video Data</title><link>https://beamcollege.dev/sessions/2026/real-time-stateful-processing/</link><pubDate>Mon, 09 Feb 2026 16:45:55 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/real-time-stateful-processing/</guid><description>&lt;p>You will learn how to build a pipeline that processes video data in real time to enable efficient analysis and event or anomaly detection.&lt;/p>
&lt;ul>
&lt;li>Reads video streams or recorded footage from the source&lt;/li>
&lt;li>Applies sliding-window analysis to examine activity over short intervals&lt;/li>
&lt;li>Uses stateful processing to track patterns and detect anomalies over time&lt;/li>
&lt;/ul>
&lt;p>Finally, it raises alerts or stores annotated events in a structured format that can be used for monitoring dashboards or downstream analysis.&lt;/p></description></item><item><title>Advanced Group &amp; Aggregation</title><link>https://beamcollege.dev/sessions/2021/12-advanced-group-aggregation/</link><pubDate>Thu, 13 Feb 2025 20:56:51 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/12-advanced-group-aggregation/</guid><description/></item><item><title>Assembling the Puzzle: High-Performance Entity Building streaming Beam pipeline using a Two-Tiered State Architecture</title><link>https://beamcollege.dev/sessions/2026/assembling-the-puzzle/</link><pubDate>Mon, 09 Feb 2026 16:43:32 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/assembling-the-puzzle/</guid><description>&lt;p>When source systems emit only partial updates to conserve network bandwidth, Data Engineers face the complex task of reconstructing complete entities in real-time. In this session, we will deep dive into a high-performance, SCD-like streaming pipeline that dynamically reconstructs full entities from partial data before sinking them to the data warehouse.&lt;/p>
&lt;p>The core of our solution is a custom two-tiered state backend architecture. By intelligently combining Apache Beam’s native, low-latency state API (Tier 1) with an external third-party data store (Tier 2), we overcome standard memory and throughput limitations. Join us to explore how this two-tiered design, alongside Beam timers, drastically reduces external database lookups, minimizes network latency, and unlocks unparalleled performance for stateful streaming pipelines.&lt;/p></description></item><item><title>State &amp; Timers Patterns</title><link>https://beamcollege.dev/sessions/2021/13-state-timers/</link><pubDate>Thu, 13 Feb 2025 20:58:07 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/13-state-timers/</guid><description/></item><item><title>Beyond Vectors: Building Scalable GraphRAG with Apache Beam and Cloud Spanner</title><link>https://beamcollege.dev/sessions/2026/beyond-vectors/</link><pubDate>Mon, 09 Feb 2026 16:43:32 -0600</pubDate><guid>https://beamcollege.dev/sessions/2026/beyond-vectors/</guid><description>&lt;p>Retrieval-Augmented Generation (RAG) has revolutionized how we interact with LLMs, but standard vector search often fails to capture the complex, multi-hop relationships hidden in data. &amp;ldquo;GraphRAG&amp;rdquo; solves this by grounding answers in a structured Knowledge Graph, but building these graphs from raw unstructured data at scale presents a massive data engineering challenge.&lt;/p>
&lt;p>In this session, we will demonstrate how to build a production-grade GraphRAG ingestion and retrieval pipeline using Apache Beam and Google Cloud Spanner Graph.&lt;/p></description></item><item><title>Custom Containers</title><link>https://beamcollege.dev/sessions/2021/14-custom-containers/</link><pubDate>Thu, 13 Feb 2025 20:59:26 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/14-custom-containers/</guid><description/></item><item><title>Multi-language Pipelines</title><link>https://beamcollege.dev/sessions/2021/15-multilanguage-pipelines/</link><pubDate>Thu, 13 Feb 2025 21:00:34 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/15-multilanguage-pipelines/</guid><description/></item><item><title>Build a Source with Splittable DoFns</title><link>https://beamcollege.dev/sessions/2021/16-splittable-dofns/</link><pubDate>Thu, 13 Feb 2025 21:02:34 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/16-splittable-dofns/</guid><description/></item><item><title>Dataflow Templates</title><link>https://beamcollege.dev/sessions/2021/17-dataflow-templates/</link><pubDate>Thu, 13 Feb 2025 21:04:04 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/17-dataflow-templates/</guid><description/></item><item><title>GCP Dataflow Architecture</title><link>https://beamcollege.dev/sessions/2021/18-dataflow-architecture/</link><pubDate>Thu, 13 Feb 2025 21:05:31 -0600</pubDate><guid>https://beamcollege.dev/sessions/2021/18-dataflow-architecture/</guid><description/></item><item><title>Hackathon</title><link>https://beamcollege.dev/hackathon/</link><pubDate>Mon, 28 Apr 2025 02:13:32 -0600</pubDate><guid>https://beamcollege.dev/hackathon/</guid><description>&lt;h3 id="what-is-the-schedule">What is the Schedule?&lt;/h3>
&lt;p>The hackathon will be May 17 &amp;amp; 18, 2025. For more details, check out the &lt;a href="https://beamcollege.dev/schedule">schedule page&lt;/a>&lt;/p>
&lt;h3 id="who-can-participate">Who can participate?&lt;/h3>
&lt;p>In order to participate in Beam College 2025 you need to fulfill all of these requirements:&lt;br>
be 18 years or older not be a Google employee not a resident of Cuba, Iran, North Korea, Syria, Crimea, Donetsk People’s Republic, Luhansk People’s Republic, Belarus, or Russia.&lt;/p>
&lt;h3 id="what-are-the-hackathon-prizes">What are the hackathon prizes?&lt;/h3>
&lt;ul>
&lt;li>1st place: $1,500 USD&lt;/li>
&lt;li>2nd place: $1,000 USD&lt;/li>
&lt;li>3rd place: $500 USD&lt;/li>
&lt;li>Prize will be split evenly among team members.&lt;/li>
&lt;/ul>
&lt;h3 id="how-many-people-can-be-in-a-team">How many people can be in a team?&lt;/h3>
&lt;p>Teams may have 2-6 participants.&lt;/p></description></item><item><title>Stateful processing In Apache Beam</title><link>https://beamcollege.dev/sessions/2025/stateful/</link><pubDate>Tue, 01 Apr 2025 13:10:03 -0500</pubDate><guid>https://beamcollege.dev/sessions/2025/stateful/</guid><description>The stateful processing interface in Apache Beam serves as a versatile tool for data processing, empowering users with advanced capabilities to handle complex workflows. This session will delve into the diverse functionalities provided by stateful processing, illustrating their practical applications through clear and concise code examples.</description></item><item><title>Real-Time Streaming with Kafka</title><link>https://beamcollege.dev/sessions/2025/kafka/</link><pubDate>Tue, 01 Apr 2025 13:08:32 -0500</pubDate><guid>https://beamcollege.dev/sessions/2025/kafka/</guid><description>Explore real-time streaming pipelines with Kafka I/O. This session will share best practices for optimizing Kafka I/O performance and cost-efficiency, including strategies like redistribute transforms and offset-based deduplication. We will also cover integrating Dataflow with Google Managed Kafka for scalable data processing.</description></item><item><title>Real-Time Anomaly Detection with Apache Beam</title><link>https://beamcollege.dev/sessions/2025/anomaly-detection/</link><pubDate>Tue, 01 Apr 2025 12:10:28 -0500</pubDate><guid>https://beamcollege.dev/sessions/2025/anomaly-detection/</guid><description>Real-time anomaly detection is essential for identifying unexpected patterns and critical events in streaming data. This talk addresses the unique algorithmic challenges of anomaly detection in streaming environments and introduces a new feature within Apache Beam designed for this purpose. We will demonstrate how to seamlessly integrate both online and pre-trained offline anomaly detection models into Beam pipelines, empowering users to build robust, scalable, and real-time anomaly detection systems.</description></item><item><title>Building Scalable Semantic Search and RAG Pipelines</title><link>https://beamcollege.dev/sessions/2025/rag/</link><pubDate>Tue, 01 Apr 2025 12:09:13 -0500</pubDate><guid>https://beamcollege.dev/sessions/2025/rag/</guid><description>This presentation introduces vector-based semantic search and Retrieval Augmented Generation (RAG), demonstrating how to build scalable pipelines for using Apache Beam. We&amp;rsquo;ll start by explaining fundamental concepts like chunking, embeddings and vector similarity. Then we&amp;rsquo;ll explore semantic search applications before extending to full RAG systems.</description></item><item><title>Introducing Managed IO, the New Era of Beam Connectors</title><link>https://beamcollege.dev/sessions/2025/introducing-managed-io/</link><pubDate>Tue, 01 Apr 2025 12:07:40 -0500</pubDate><guid>https://beamcollege.dev/sessions/2025/introducing-managed-io/</guid><description>Discover what makes Managed IO a major leap towards a more unified, flexible, and upgrade-friendly connector ecosystem. This talk dives into the motivation and design behind Managed IO, highlighting a few key goals:</description></item><item><title>YAML: a new SDK to author your pipelines.</title><link>https://beamcollege.dev/sessions/2025/yaml/</link><pubDate>Tue, 01 Apr 2025 12:07:38 -0500</pubDate><guid>https://beamcollege.dev/sessions/2025/yaml/</guid><description>In this talk, we explore a new way of authoring and running your Beam pipelines; via the YAML SDK! Learn how you can split your pipeline infrastructure from your complex processing logic.</description></item><item><title>The Dataflow Job Builder</title><link>https://beamcollege.dev/sessions/2025/dataflow-job-builder/</link><pubDate>Fri, 14 Mar 2025 09:46:58 -0600</pubDate><guid>https://beamcollege.dev/sessions/2025/dataflow-job-builder/</guid><description>Learn how you can create low-code and no-code Beam YAML jobs in the Cloud Dataflow UI.</description></item><item><title>Getting Started: Intro to Creating a Beam Pipeline</title><link>https://beamcollege.dev/sessions/2025/getting-started/</link><pubDate>Thu, 13 Feb 2025 21:22:35 -0600</pubDate><guid>https://beamcollege.dev/sessions/2025/getting-started/</guid><description>In this introductory session, Sascha provides a technical overview of Apache Beam and how it enables developers to build data pipelines that run on various frameworks like Spark and Flink. It supports both batch and stream processing, providing flexibility for diverse data needs. Beam&amp;rsquo;s SDKs are available in Go, Python, and Java, allowing developers to choose their preferred language.</description></item><item><title>How Apache Beam sets you up for a generative AI world</title><link>https://beamcollege.dev/sessions/2025/beaming-future/</link><pubDate>Wed, 14 Feb 2024 09:50:04 -0600</pubDate><guid>https://beamcollege.dev/sessions/2025/beaming-future/</guid><description>This session provides an overview of Apache Beam and its place in the generative AI space. What problem does Beam solve? What are its advantages? Why and how are companies using it?</description></item><item><title>Implementing a Complex ML Pipeline in Beam</title><link>https://beamcollege.dev/sessions/2025/implementing-complex-ml-pipeline/</link><pubDate>Wed, 14 Feb 2024 09:45:14 -0600</pubDate><guid>https://beamcollege.dev/sessions/2025/implementing-complex-ml-pipeline/</guid><description>In this session we explain how to implement a complex ML pipeline with Apache Beam. The pipeline we will build takes audio data, convert it to text, classify it to identify the topic or subject, feed it to a LLM and then take the output of the model and turn it back to voice.</description></item><item><title>Implementing a ML Pipeline with Google AI Studio</title><link>https://beamcollege.dev/sessions/2025/implementing-ml-pipeline-ai-studio/</link><pubDate>Wed, 14 Feb 2024 09:45:14 -0600</pubDate><guid>https://beamcollege.dev/sessions/2025/implementing-ml-pipeline-ai-studio/</guid><description>This tutorial demonstrates how to perform streaming inference with Apache Beam and Google AI Studio&amp;rsquo;s Gemini model, focusing on a geography-based example to get country capitals.</description></item><item><title>Making the Jump from Batch to Streaming</title><link>https://beamcollege.dev/sessions/2025/jump-batch-streaming1/</link><pubDate>Wed, 14 Feb 2024 09:42:24 -0600</pubDate><guid>https://beamcollege.dev/sessions/2025/jump-batch-streaming1/</guid><description>This session dives into Apache Beam&amp;rsquo;s streaming primitives, focusing on reading from unbounded sources using Splittable DoFn and Unbounded Source, windowing strategies, and triggers with accumulation modes.</description></item><item><title>About Beam College</title><link>https://beamcollege.dev/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/about/</guid><description>&lt;p>Beam College is an online, free educational program that provides hands-on training to solve data processing use cases using &lt;a href="https://beam.apache.org" target="_blank">Apache Beam®&lt;/a>.&lt;/p>
&lt;h3 id="topics">Topics&lt;/h3>
&lt;p>Beam College 2026 sessions are organized in the following 3 tracks:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Beginner&lt;/strong>: Overview of what is Apache Beam and how it fits on your data processing stack, building blocks and fundamentals for building your first pipeline.&lt;/li>
&lt;li>&lt;strong>Tips &amp;amp; Tricks&lt;/strong>: Sessions focused on solving specific use cases with Apache Beam including processing video data, processing time series data, migrating from batch to streaming, etc.&lt;/li>
&lt;li>&lt;strong>New Features&lt;/strong>: Sessions focused on reviewing some of the latest features in Apache Beam covering topics like remote inference for AI, RAG pipelines, managed I/O with Iceberg, etc.&lt;/li>
&lt;/ul>
&lt;p>Sessions are provided as a live online conference. We encourage participants to join live in order to maximize their learning and engagement opportunities. However, all sessions are recorded and published for future reference.&lt;/p></description></item><item><title>Additional Learning Resources</title><link>https://beamcollege.dev/learning-resources/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/learning-resources/</guid><description>&lt;p>Besides the archive of previous sessions that you can find in this website, Apache Beam has these other very useful learning resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://tour.beam.apache.org/" target="_blank" rel="noopener">Tour of Beam&lt;/a> is an interactive way of learning to write Beam code with a sandbox, where you can write and run pipelines while walking through various concepts.&lt;/li>
&lt;li>&lt;a href="https://play.beam.apache.org/" target="_blank" rel="noopener">Beam Playground&lt;/a> is an interactive environment to try out Beam transforms and examples without having to install Apache Beam in your environment.&lt;/li>
&lt;li>&lt;a href="https://www.cloudskillsboost.google/course_templates/724?qlcampaign=3l-event-90" target="_blank" rel="noopener">Beam Quest&lt;/a> is a series of labs that teach you how to write and test Apache Beam pipelines. Each lab takes about 1.5 hours to complete. When you complete the quest, you’re granted a badge that you can use to show your Beam expertise.&lt;/li>
&lt;/ul>
&lt;p>For an extensive list of learning resources for Apache Beam we recommend the &lt;a href="https://beam.apache.org/get-started/resources/learning-resources/" target="_blank" rel="noopener">learning resources&lt;/a> page in the Apache Beam website.&lt;/p></description></item><item><title>Code of Conduct</title><link>https://beamcollege.dev/coc/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/coc/</guid><description>&lt;h1 id="contributor-covenant-code-of-conduct">Contributor Covenant Code of Conduct&lt;/h1>
&lt;h2 id="our-pledge">Our Pledge&lt;/h2>
&lt;p>We as members, contributors, and leaders pledge to make participation in our&lt;br>
community a harassment-free experience for everyone, regardless of age, body&lt;br>
size, visible or invisible disability, ethnicity, sex characteristics, gender&lt;br>
identity and expression, level of experience, education, socio-economic status,&lt;br>
nationality, personal appearance, race, caste, color, religion, or sexual&lt;br>
identity and orientation.&lt;/p>
&lt;p>We pledge to act and interact in ways that contribute to an open, welcoming,&lt;br>
diverse, inclusive, and healthy community.&lt;/p></description></item><item><title>Contact</title><link>https://beamcollege.dev/contact/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/contact/</guid><description>&lt;h1 id="contact">Contact&lt;/h1>
&lt;p>Hello. :)&lt;/p></description></item><item><title>Frequently Asked Questions</title><link>https://beamcollege.dev/faq/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/faq/</guid><description>&lt;p>Frequently asked questions about Beam College.&lt;/p></description></item><item><title>Previous Instructors</title><link>https://beamcollege.dev/instructors/archive/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/instructors/archive/</guid><description>&lt;h3 id="meet-our-archive-of-instructors-from-all-editions-of-beam-college">Meet our archive of instructors from all editions of Beam College.&lt;/h3></description></item><item><title>Privacy Policy</title><link>https://beamcollege.dev/privacy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/privacy/</guid><description>&lt;p>Beam College collects some personal data from its users. We take great care in collecting only the minimum data that we need to provide the service and to manage this data adequately. This document provides more details about the data we gather and how it is used.&lt;/p>
&lt;h4 id="data-controller">Data controller&lt;/h4>
&lt;p>The legal entity responsible for Beam College is:&lt;/p>
&lt;p>Nearshore Link, Inc.&lt;br>
2940 Thousand Oaks Dr.&lt;br>
Austin, TX 78746&lt;/p>
&lt;p>Contact: Pedro Galvan &lt;a href="mailto:pedro@sg.com.mx">pedro@sg.com.mx&lt;/a>&lt;/p></description></item><item><title>Register</title><link>https://beamcollege.dev/register/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/register/</guid><description>&lt;iframe height="804" width="100%" src="https://us.airmeet.com/widgets/event/32d2ede0-230c-11f1-b849-9d0f1f2635e6/embedded-registration?v=2&amp;backgroundColor=9580ff&amp;textColor=ffffff&amp;buttonColor=9580ff&amp;isLightAmbience=false&amp;bgType=gradient&amp;communityId=764624fc-d051-4c74-b573-a1a50759e3c1&amp;title=Registration+form&amp;successMsg=A+confirmation+email+with+the+event+access+link+has+been+sent+to+the+registered+email+ID.+You+will+need+this+link+to+enter+the+event." frameborder="0">&lt;/iframe></description></item><item><title>Schedule</title><link>https://beamcollege.dev/schedule/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/schedule/</guid><description>&lt;h3 id="beam-college-2026-will-be-on-april-21-23-2026">Beam College 2026 will be on April 21-23, 2026.&lt;/h3>
&lt;p>All times in UTC. Click on the calendar link to add to your calendar in your local time.&lt;/p></description></item><item><title>Terms and Conditions</title><link>https://beamcollege.dev/terms-conditions/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://beamcollege.dev/terms-conditions/</guid><description>&lt;p>&lt;em>Version 0.9. March 12th, 2025.&lt;/em>&lt;/p>
&lt;p>By registering for this Hackathon event or participating in any way, you fully and unconditionally agree to comply with all of the terms and conditions below. If you do not agree with any of these terms and conditions, do not register for and participate in this event and do not submit an entry.&lt;/p>
&lt;p>This event is aimed at promoting the development of solutions that involve the use of Apache Beam by giving awards to the winner/s as consideration of his/her/their works.&lt;/p></description></item></channel></rss>