<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[StreamNative - Medium]]></title>
        <description><![CDATA[Cloud-Native Event Streaming powered by Apache Pulsar for enterprises - Medium]]></description>
        <link>https://medium.com/streamnative?source=rss----ab76d1bbc527---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>StreamNative - Medium</title>
            <link>https://medium.com/streamnative?source=rss----ab76d1bbc527---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Fri, 10 Apr 2026 13:23:39 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/streamnative" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[StreamNative: Enabling Real-time Messaging and Streaming for the Cloud]]></title>
            <link>https://medium.com/streamnative/streamnative-enabling-real-time-messaging-and-streaming-for-the-cloud-7e7ef46fe7c2?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/7e7ef46fe7c2</guid>
            <category><![CDATA[cloud]]></category>
            <category><![CDATA[pulsar]]></category>
            <category><![CDATA[streamnative]]></category>
            <dc:creator><![CDATA[Xiaofeng Liu]]></dc:creator>
            <pubDate>Fri, 24 Sep 2021 07:43:20 GMT</pubDate>
            <atom:updated>2021-09-24T07:43:19.925Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*W--ObvGE_SLfxgCI.png" /></figure><p>As the CEO of StreamNative, it’s been an exciting ride so far, and today we are proud to share that we’ve <a href="https://www.prnewswire.com/news-releases/original-creators-of-apache-pulsar-raise-23m-series-a-for-streamnative-round-led-by-prosperity7-ventures-301375962.html">raised a $23M series A round</a>. For us, this funding underscores the increased adoption of StreamNative and Apache Pulsar that we are seeing in the market and a bright future ahead.</p><p>In celebrating this milestone, we’d like to look back at how our Pulsar journey began. More than 10 years ago Matteo and I were at Yahoo! working to develop a consolidated messaging platform that connected all the popular Yahoo! Applications, including Yahoo! Finance, Yahoo! Mail, Yahoo! Sports, Flickr and more, to data. At the time we looked at the existing messaging and streaming technologies, but they were not able to provide the scalability, reliability, and features needed to meet today’s modern architecture and application requirements.</p><p>The team at Yahoo! set out to build a cloud-native messaging service that would work for the global enterprise. We built Pulsar from the ground up to handle millions of topics and partitions with full support for geo-replication and multi-tenancy. Pulsar was open sourced by Yahoo! in 2016 and became a top-level Apache Software Foundation in 2018.</p><p>Over the past several years there has been a huge market shift from applications and traditional services using monolithic messaging services — either running on-premise, or simply ported to the cloud — to truly cloud-native applications designed to leverage the cloud and Kubernetes. This shift to the cloud and containers has amplified the spotlight on Apache Pulsar.</p><p>Apache Pulsar is unique in that it provides an all-in-one platform with unified messaging and streaming capabilities built for the cloud. Think about it as the combination of Kafka (streaming only) and RabbitMQ (messaging only), designed for multi-tenancy and containers.</p><p>At StreamNative, we work to help organizations around the globe successfully adopt Pulsar. StreamNative builds upon the powerful Apache Pulsar platform with two product offerings, StreamNative Cloud and StreamNative Platform, details below:</p><ol><li>StreamNative Cloud provides Apache Pulsar-as-a-service and delivers a resilient and scalable messaging and event streaming managed service deployable in minutes (alleviating the need to spend time or resources to deploy, upgrade, or maintain clusters).</li><li>StreamNative Platform is a self-managed cloud-native offering that completes Apache Pulsar, providing a distribution of Pulsar with advanced capabilities to help accelerate real-time application development and to simplify enterprise operations at scale.</li></ol><p>We’re excited to see the growth in the Apache Pulsar and StreamNative communities. When we started this journey Kafka was the dominant player in the space, but Pulsar’s rapid adoption since it became a top level Apache Software Foundation project in 2018 has been remarkable.</p><p>In fact, the number of monthly active Apache Pulsar contributors surpassed Apache Kafka recently (see graph below)! Many have adopted Apache Pulsar because it offers the potential of faster throughput and lower latency than Apache Kafka, along with a compatible API that allows developers to switch from Kafka to Pulsar with relative ease.</p><p>We are proud to continue to support the Apache Pulsar community through events, training, project updates, and project contributions. In fact, members of the StreamNative team often represent more than half the monthly Pulsar contributors. We also play a key role sponsoring and hosting the global Pulsar Summits (next being <a href="https://pulsar-summit.org/">Pulsar Summit Europe 2021 in October</a>).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*lZKTRjX718flUmgS.png" /></figure><p>What’s next? StreamNative will continue to focus on advancing the state-of-the-art in streaming, stream storage, and messaging technologies. From real-time microservices that use Pulsar’s pub/sub features and streaming storage for real-time analytics to infinite storage for deep analysis, we’re all about innovating on Pulsar’s flexible architecture and industry leading feature-set to deliver new capabilities.</p><p>And, we’re hiring! We’re growing our global staff across all departments to accelerate product development, ecosystem expansion, and customer acquisition. If you’re interested in joining the StreamNative team and building a platform based on Apache Pulsar to enable companies to manage the entire lifecycle of data, <a href="https://streamnative.io/en/contact/">contact us</a>.</p><h3>About the CEO</h3><p>Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California. You can follow him on <a href="https://twitter.com/sijieg">twitter</a>.</p><h3>About the CTO</h3><p>Matteo is the CTO at StreamNative, where he brings rich experience in distributed pub-sub messaging platforms. Matteo was one of the co-creators of Apache Pulsar during his time at Yahoo!. Matteo and Sijie worked to create a global, distributed messaging system for Yahoo!, which would later become Apache Pulsar. Matteo then co-founded Streamlio with Guo, and later served as the Senior Principal Software Engineer at Splunk post-acquisition. Matteo is the PMC Chair of Apache Pulsar, where he helps to guide the community and ensure the success of the Pulsar project. He is also a PMC member for Apache BookKeeper. Matteo lives in Menlo Park, California. You can follow him on <a href="https://twitter.com/merlimat">twitter</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7e7ef46fe7c2" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/streamnative-enabling-real-time-messaging-and-streaming-for-the-cloud-7e7ef46fe7c2">StreamNative: Enabling Real-time Messaging and Streaming for the Cloud</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Speakers Announced for Pulsar Virtual Summit Europe 2021]]></title>
            <link>https://medium.com/streamnative/speakers-announced-for-pulsar-virtual-summit-europe-2021-3e24b52bac7d?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/3e24b52bac7d</guid>
            <category><![CDATA[essaging]]></category>
            <category><![CDATA[open-source]]></category>
            <category><![CDATA[pulsar-summit]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[events]]></category>
            <dc:creator><![CDATA[Xiaofeng Liu]]></dc:creator>
            <pubDate>Wed, 15 Sep 2021 04:57:07 GMT</pubDate>
            <atom:updated>2021-09-15T04:57:07.837Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*VcavLrqkdQT6ZH0R.png" /></figure><p>The first-ever <a href="https://pulsar-summit.org/en/event/europe-2021">Pulsar Virtual Summit Europe</a> is just one month away! Co-hosted by StreamNative and Clever Cloud, this event will be held online on October 6th at 12:00 PM CEST.</p><p>The Pulsar Summit offers a unique opportunity for engineers, architects, data scientists, and technical leaders interested in Pulsar and the messaging and streaming ecosystem to learn and network. Since 2020, the Pulsar Summits have drawn more than 100 speakers, thousands of attendees, and hundreds of companies globally.</p><p>The speaker committee for the Pulsar Summit Europe 2021includes Apache Pulsar PMC members Matteo Merli from StreamNative, Jerry Peng from Splunk, and Rajan Dhabalia from Verizon Media. Additionally, Till Rohrmann from Ververica, Karthik Ramasamy from Splunk, Addison Higham from StreamNative, and Ricardo Ferreira from Elastic will be participating.</p><p>Featured speakers include engineers, developer advocates, and technical leaders from the Apache Pulsar PMC, Clever Cloud, Databricks, StreamNative, Elastic, DataStax, Flipkart, Zilliz, Tencent, JAMPP, and Softtech.</p><p><a href="https://hopin.com/events/pulsar-summit-europe-2021">Register today</a> and learn about the latest Pulsar project updates, technology deep dives, use cases, and ecosystem developments!</p><h3>Featured Sessions</h3><p>The Pulsar Virtual Summit Europe 2021 will feature 3 keynotes and 12 breakout sessions. Below is a sneak peak into some of the featured breakout sessions.</p><h4>1. Tracking Apache Pulsar Messages with Apache SkyWalking</h4><p>Presented by Penghui Li, Apache Pulsar PMC Member and Software Engineer at StreamNative</p><p>Apache SkyWalking is a popular application performance monitoring tool for distributed systems, specially designed for microservices, cloud-native, and container-based (Docker, K8s) architectures. In this talk, the speakers will walk you through the features of Apache SkyWalking and Pulsar, and demo how to track Pulsar messages with SkyWalking to troubleshoot issues related to message publishing and receiving.</p><h4>2. Log System as Backbone–How We Built the World’s Most Advanced Vector Database on Pulsar</h4><p>Presented by Xiaofan Luan, Partner and Director of Engineering at Zilliz</p><p>Milvus is an open-source vector database for building and managing vector similarity search applications. It has been adopted in production by thousands of companies, including Lucidworks, Shutterstock, and Cloudinary. In this talk, Xiaofan Luan will share with you how the community built Milvus 2.0, a cloud-native, highly scalable and extendable vector similarity solution, on Pulsar.</p><h4>3. Writing Custom Sink Connectors for Pulsar I/O</h4><p>Presented by Ricardo Ferreira, Principal Developer Advocate at Elastic</p><p>In this talk, Ricardo Ferreira will show you how to write and deploy custom sink connectors for Pulsar I/O that work just like the built-in ones. He will also discuss some of the design decisions that your custom connectors may need to address.</p><h4>4. Pulsar Watermarking</h4><p>Presented by Eron Wright, Cloud Engineering Lead at StreamNative</p><p>The goal of the Pulsar Watermarking project is to simplify and improve the correctness of stream processing applications. In this session, Eron Wright will do a technical deep-dive into the Apache Pulsar community’s plan to support event-time watermarking in a Pulsar topic.</p><h4>5. Application of Apache Pulsar in Tencent Billing and Tencent Advertising</h4><p>Presented by Mingyu Bao, Senior Engineer at Tencent</p><p>Mingyu Bao will provide a behind-the-scenes on Tencent’s adoption of Pulsar for their billing and advertising use cases and share some of their challenges. He will also discuss the adaptations and improvements Tencent made with Pulsar in order to meet their performance and operations requirements.</p><h3>Register Now</h3><p>Don’t miss this opportunity to learn from top Pulsar thought leaders. <a href="https://hopin.com/events/pulsar-summit-europe-2021">Register now</a> to participate and connect with the Pulsar community at the summit. Check out <a href="https://pulsar-summit.org/en/event/europe-2021/schedule/first-day">the full schedule</a> for more details.</p><h3>About the Author</h3><p>Alice Bi is a content strategist at StreamNative. She has experience with digital marketing, and UX design, and communication research. Alice is based in Los Angeles, California.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3e24b52bac7d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/speakers-announced-for-pulsar-virtual-summit-europe-2021-3e24b52bac7d">Speakers Announced for Pulsar Virtual Summit Europe 2021</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exactly-Once Semantics with Transactions in Pulsar]]></title>
            <link>https://medium.com/streamnative/exactly-once-semantics-with-transactions-in-pulsar-40eb851db93c?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/40eb851db93c</guid>
            <category><![CDATA[transactions]]></category>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[exactlyonce]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Thu, 29 Jul 2021 16:02:38 GMT</pubDate>
            <atom:updated>2021-07-29T16:02:37.498Z</atom:updated>
            <content:encoded><![CDATA[<p>We have hit an exciting milestone for the Apache Pulsar community: exactly once semantics. As part of the 2.8 Pulsar release, we have evolved the exactly-once semantic from <a href="https://github.com/apache/pulsar/wiki/PIP-6:-Guaranteed-Message-Deduplication">guaranteed message deduplication</a> on a single topic to atomic produce and acknowledgement over multiple topics via Transaction API. In this post, I’ll explain what this means, how we made this evolution, and how the transaction features in Pulsar simplify exactly-once semantics for building messaging and streaming applications.</p><p>Before diving into the transaction features, let’s get started with an overview of messaging semantics.</p><h3>What is exactly-once semantics?</h3><p>In any distributed system, the machines that form the system can always fail independently of one another. In Apache Pulsar, an individual broker or bookie can crash, or a network failure can happen while the producer is producing a message to a topic. Depending on how the producer handles such a failure, the application can get one of three different semantics.</p><h4>At-least-once Semantics</h4><p>If the producer receives an acknowledgement (ACK) from the Pulsar broker, it means that the message has been written to the Pulsar topic. However, if a producer times out on receiving an acknowledgement or receives an error from the Pulsar broker, it might retry sending the message to the Pulsar topic. If the broker had failed right before it sent the ACK but after the message was successfully written to the Pulsar topic, this reattempt leads to the message being written twice and delivered more than once to the consumers.</p><h4>At-most-once Semantics</h4><p>If the producer does not attempt to produce the message when it times out on receiving an acknowledgement or receives an error, then the message might end up not being written to the Pulsar topic, and not delivered to the consumers. In some cases in order to avoid the possibility of duplication, we accept that messages will not be written.</p><h4>Exactly-once Semantics</h4><p>Exactly-once semantics guarantees that even if a producer retries sending a message multiple times, the message will only be written exactly-once to the Pulsar topic. Exactly-once semantics is the most desirable guarantee, but also one that is not well understood. Exactly-once semantics requires coordination between the messaging system itself and the application producing and consuming the messages. For example, if after consuming and acknowledging a message successfully, your application rewinds the subscription to a previous message ID, your application will receive all the messages from that message ID to the latest one, all over again.</p><h3>Challenges in supporting exactly-once semantics</h3><p>Supporting exactly-once delivery semantics in messaging systems presents some challenges. To describe them, I’ll start with a simple example.</p><p>Suppose there is a producer that sends a message “Hello StreamNative” to a Pulsar topic called “Greetings”. Further suppose a consumer on the other end receives messages from the topic and prints them. In a happy path where there are no failures, this works well, and the message “Hello StreamNative” is written to the “Greetings” topic only once. The consumer receives the message, processes it, and acknowledges it to indicate that it has completed its processing. The consumer will not receive the message again, even if the consumer application crashes and restarts.</p><p>However, at scale, failure scenarios can happen all the time.</p><h4>A bookie can fail</h4><p>Pulsar stores messages in BookKeeper. BookKeeper is a highly available, durable log storage service where data written to a ledger (a segment of a Pulsar topic) is persisted and replicated multiple times (number n). As a result, BookKeeper can tolerate n-1 bookie failures, meaning that a ledger is available as long as there is at least one bookie available. Inherited from Zab/Paxos, BookKeeper’s replication protocol guarantees that once the data has been successfully written to a quorum of bookies, the data is permanently stored and will be replicated to all bookies within the same ensemble.</p><h4>A broker can fail or the producer-to-broker connection can fail</h4><p>Durability in Pulsar depends on the producer receiving an ACK from the Pulsar broker. Failure to receive that ACK does not necessarily mean that the produce request itself failed. The broker can crash after writing a message but before it sends an ACK back to the producer. It can also crash before even writing the message to the topic. Since there is no way for the producer to know the nature of the failure, it is forced to assume that the message was not written successfully and to retry it. In some cases, the same message is duplicated in the Pulsar topic, causing the consumers to receive it more than once.</p><h4>The Pulsar client can fail</h4><p>Exactly-once delivery must account for client failures as well. But it is also hard to tell if a client has actually failed and is not just temporarily partitioned from the Pulsar brokers or undergoing an application pause. Having the ability to distinguish between a permanent failure and a soft one is important. The Pulsar broker should discard messages sent by a zombie producer, likewise for the consumer. Once a new client has been restarted, it must be able to recover from whatever state the previous failed client left behind and begin processing from a safe point.</p><p>The Pulsar community completes the support for exactly-once semantics in steps. We first introduced Idempotent Producer to support exactly-once semantics on a single topic in the Pulsar 1.20.0-incubating release, and then completed the vision by introducing Transaction API to provide atomicity across multiple topics in the recent 2.8.0 release.</p><h3>Idempotent producer: exactly-once semantics on a single topic</h3><p>We started the journey of supporting exactly-once semantics in Pulsar by introducing Idempotent Producer in its 1.20.0-incubating release.</p><p>What does Idempotent Producer mean? An idempotent operation can be performed once or many times without causing a different result. If Guaranteed Message Deduplication is enabled at the cluster level or the namespace level and a producer is configured to be a Idempotent Producer, the produce requests are idempotent. In the event of an error that causes a producer to retry, the same message sent by the producer multiple times, is guaranteed to write to the Pulsar topic only once on the broker.</p><p>To turn on this feature and get exactly-once semantics per partition — meaning no duplicates, no data loss, and in-order semantics — configure the following:</p><ul><li>Enable message deduplication for all namespaces/topics at the cluster level, or for a specific namespace at the namespace policy level, or for a specific topic at the topic policy level</li><li>Specify a name for the producer and set the message timeout to 0</li></ul><p>How did that feature work? Under the hood, it works in a way very similar to TCP: each message produced to Pulsar will contain a sequence ID that the Pulsar broker will use to dedupe any duplicated message. However, unlike TCP which provides guarantees only within a transient connection, this sequence ID along with the message is persisted to the Pulsar topic and Pulsar broker keeps track of the last received sequence ID. So even if the Pulsar broker fails, any broker that takes over the topic ownership will also know if a message is duplicated or not. The overhead of this mechanism is very low, adding negligible performance overhead over the non-idempotent producer.</p><p>You can try out this feature in any Pulsar version newer than 1.20.0-incubating by following the tutorial <a href="http://pulsar.apache.org/docs/en/cookbooks-deduplication/">here</a>.</p><p>While powerful, Idempotent producer only solves a narrow scope of challenges for exactly-once semantics. There are still many other challenges it doesn’t resolve. For example, there is no atomicity when a producer attempts to produce messages to multiple topics. A publish error can occur when the broker serving one of the topics crashes. If the producer doesn’t retry publishing the message again, it results in some messages being persisted once and others being lost. If the producer retries, it results in some messages being persisted multiple times.</p><p>On the consumer side, the message acknowledgement was a best-effort operation. The message ACKs can potentially be lost because the consumer has no idea if the broker has received them and will not retry sending ACKs again. This will then result in consumers receiving duplicate messages.</p><h3>Transactions: atomic writes and acknowledgments across multiple topics</h3><p>To address the remaining challenges described above, we’ve strengthened Pulsar’s delivery semantics by introducing a Pulsar Transaction API to support atomic writes and acknowledgments across multiple topics. This allows a producer to send a batch of messages to multiple topics such that either all messages in the batch are eventually visible to any consumer or none are ever visible to consumers. This feature also allows you to acknowledge your messages across multiple topics in the same transaction along with the messages you have processed, thereby allowing end-to-end exactly-once semantics.</p><p>Here is an example code snippet to demonstrate the use of Transaction API:</p><pre>PulsarClient pulsarClient = PulsarClient.builder()<br>        .serviceUrl(&quot;pulsar://localhost:6650&quot;)<br>        .enableTransaction(true)<br>        .build();</pre><pre>Transaction txn = pulsarClient<br>        .newTransaction()<br>        .withTransactionTimeout(1, TimeUnit.MINUTES)<br>        .build()<br>        .get();</pre><pre>producer.newMessage(txn).value(&quot;Hello Pulsar Transaction&quot;.getBytes()).send();</pre><pre>Message&lt;byte[]&gt; message = consumer.receive();<br>consumer.acknowledge(message.getMessageId(), txn);</pre><pre>txn.commit().get();</pre><p>The code example above describes how you can use the new producer API with Transaction API to send messages atomically to a set of topics and use the new consumer API with Transactions to acknowledge the processed messages in the same transaction.</p><p>It is worth noting that:</p><ul><li>A Pulsar topic might have some messages that are part of a transaction while others are not.</li><li>A Pulsar client can have multiple concurrent transactions outstanding. This design is fundamentally different from the transactions implementation in other older messaging systems, and results in much higher throughput.</li><li>The current Pulsar Transaction API only supports READ_COMMITTED isolation level. The consumer can only read the messages that are not part of a transaction and the messages that are part of a committed transaction. Messages produced in an aborted transaction are not delivered to any consumers.</li></ul><p>To use the Transaction API, you don’t need any additional settings in the Pulsar client.</p><h3>End-to-end exactly-once stream processing made simple: a Pulsar+Flink Example</h3><p>Exactly-once stream processing is now possible through the Pulsar Transaction API.</p><p>One of the most critical questions for a stream processing system is, “Does my stream processing application get the right answer, even if one of the instances crashes in the middle of processing?” The key, when recovering a failed instance, is to resume processing in exactly the same state as before the crash.</p><p>Stream processing on Apache Pulsar is a read-process-write operation on Pulsar topics. A source operator that runs a Pulsar consumer reads messages from one or multiple Pulsar topics, some processing operators transform the messages or modify the state maintained by them, and a sink operator that runs a Pulsar producer writes the resulting messages to another Pulsar topic. Exactly-once stream processing is simply the ability to execute a read-process-write operation exactly once. In such a context, “getting the right answer” means not missing any input messages from the source operator or producing any duplicates to the sink operator. This is the behavior users expect from an exactly-once stream processor.</p><p>Let’s take the Pulsar and Flink integration as an example.</p><p>Prior to Pulsar 2.8.0, the Pulsar and Flink integration only supported exactly-once source connector and at-least-once sink connector. That means if you want to use Flink to build stream applications with Apache Pulsar, the highest processing guarantee you can get end-to-end is at-least-once — the resulting messages from these streaming applications may potentially produce multiple times to the resulting topic in Pulsar.</p><p>With the introduction of Pulsar Transaction in 2.8.0, the Pulsar-Flink sink connector can be easily enhanced to support exactly-once semantics. Because Flink uses a two-phase commit protocol to ensure end-to-end exactly-once semantics, we can implement the designated TwoPhaseCommitSinkFunction and hook up the Flink sink message lifecycle with Pulsar Transaction API. When the Pulsar-Flink sink connector calls beginTransaction, it starts a Pulsar Transaction and obtains the transaction id. All the subsequent messages written to the sink connector will be associated with this transaction ID. They will be flushed to Pulsar when the connector calls preCommit. The Pulsar transaction will then be committed or aborted when the connector calls recoverAndCommit and recoverAndAbort accordingly. The integration is very straightforward and the connector just has to persist the transaction ID together with Flink checkpoints so the transaction ID can be retrieved back for commit or abort.</p><p>Based on idempotency and atomicity provided by Pulsar Transactions and the globally consistent checkpoint algorithm offered by Apache Flink, the streaming applications built on Pulsar and Flink can easily achieve end-to-end exactly-once semantics.</p><h3>Where to go from here</h3><p>Exactly-once semantics via Transaction API is now supported in <a href="https://auth.streamnative.cloud/login?state=hKFo2SAtTWYyejRLMi1CZkFwWE16LUc1X0RFUzZuY3F6ejBWUqFupWxvZ2luo3RpZNkgbkItS09ERTlGWW1ybHZoYWJKUVdOaS1LUHhGWXBjdkyjY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&amp;client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&amp;protocol=oauth2&amp;audience=https%3A%2F%2Fapi.streamnative.cloud&amp;redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&amp;defaultMethod=signup&amp;scope=openid%20profile%20email%20offline_access&amp;response_type=code&amp;response_mode=query&amp;nonce=c1JvaTJVaU1PT2xmOEVvM2hnWFIwckJ6OUhyX2JOQ1FjN1ljSHE0eC1GSg%3D%3D&amp;code_challenge=vTFvdA2fbYkHvT7j-8Hgg2nIWpnbOSSQWVzeavNh-XE&amp;code_challenge_method=S256&amp;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTUuMCJ9">StreamNative Cloud</a> as well as in <a href="https://streamnative.io/en/platform">StreamNative Platform</a> v1.0 and later. If you’d like to understand the exactly-once guarantees in more detail, I’d recommend checking out <a href="https://github.com/apache/pulsar/wiki/PIP-31%3A-Transaction-Support">PIP-31</a> for the transaction feature. If you’d like to dive deeper into the detailed design, this <a href="https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx">design document</a> is worth reading.</p><p>This post primarily focuses on describing the nature of the user-facing guarantees as supported by the Transaction API introduced in Apache Pulsar 2.8.0, and how you can use this feature. In our next post, we will go into more details about the API and design.</p><p>If you want to put the new Transaction API to practical use, check out <a href="https://auth.streamnative.cloud/login?state=hKFo2SAtTWYyejRLMi1CZkFwWE16LUc1X0RFUzZuY3F6ejBWUqFupWxvZ2luo3RpZNkgbkItS09ERTlGWW1ybHZoYWJKUVdOaS1LUHhGWXBjdkyjY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&amp;client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&amp;protocol=oauth2&amp;audience=https%3A%2F%2Fapi.streamnative.cloud&amp;redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&amp;defaultMethod=signup&amp;scope=openid%20profile%20email%20offline_access&amp;response_type=code&amp;response_mode=query&amp;nonce=c1JvaTJVaU1PT2xmOEVvM2hnWFIwckJ6OUhyX2JOQ1FjN1ljSHE0eC1GSg%3D%3D&amp;code_challenge=vTFvdA2fbYkHvT7j-8Hgg2nIWpnbOSSQWVzeavNh-XE&amp;code_challenge_method=S256&amp;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTUuMCJ9">StreamNative Cloud</a> or download <a href="https://streamnative.io/en/platform">StreamNative Platform</a> 1.0 to create your own applications with Pulsar Java clients.</p><p>My fellow colleagues Sijie Guo and Addison Higham are going to give a presentation “<a href="https://www.na2021.pulsar-summit.org/exactly-once-made-easy-transactional-messaging-in-apache-pulsar">Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar</a>” at the upcoming <a href="https://www.na2021.pulsar-summit.org/">Pulsar Virtual Summit North America 2021</a> on June 16–17th. If you are interested in this topic, <a href="https://hopin.com/events/pulsar-summit-north-america-2021">reserve your spot</a> today and listen to them diving into every detail of Pulsar Transaction.</p><h3>Credits</h3><p>An amazing team of Pulsar committers and contributors worked for over a year to bring this awesome exactly-once work to Pulsar. Thanks to everyone that has been involved in this feature development: Penghui Li, Ran Gao, Bo Cong, Addison Higham, Jia Zhai, Yong Zhang, Xiaolong Ran, Matteo Merli, and Sijie Guo.</p><h3>About the Author</h3><p><strong>Penghui Li</strong> is a PMC member of Apache Pulsar and a tech lead in StreamNative. Previously, he worked at Zhaopin.com where he served as the leading promoter to adopt Pulsar. His career has always involved messaging service from the messaging system, through the microservice, and into the current world with Pulsar. You can follow him on <a href="https://twitter.com/lipenghui6">twitter</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=40eb851db93c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/exactly-once-semantics-with-transactions-in-pulsar-40eb851db93c">Exactly-Once Semantics with Transactions in Pulsar</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Apache Pulsar Launches 2.8: Unified Messaging and Streaming With Transactions]]></title>
            <link>https://medium.com/streamnative/apache-pulsar-launches-2-8-unified-messaging-and-streaming-with-transactions-37dad479cba1?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/37dad479cba1</guid>
            <category><![CDATA[transactions]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[release-notes]]></category>
            <category><![CDATA[tech-blog]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Fri, 16 Jul 2021 16:02:37 GMT</pubDate>
            <atom:updated>2021-07-16T16:02:36.151Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*90me7QXXbP8TxPkZ.png" /></figure><h3>An Overview of the 2.8.0 Release</h3><p>Today, the Apache Pulsar Project Management Committee announced the release of Apache Pulsar 2.8.0, which includes a number of exciting upgrades and enhancements. This blog provides a deep dive into the updates from the 2.8.0 release as well as a detailed look at the major Pulsar developments that have helped it evolve into the unified messaging and streaming platform it is today.</p><p>Note: The Pulsar community typically releases a major release every 3 months, but it has been 6 months since the release of 2.7.0. We spent more time on 2.8.0 in order to make the transaction API generally available to the Pulsar community.</p><h3>Release 2.8 Overview</h3><p>The key features and updates in this release are:</p><ul><li>Exclusive Producer</li><li>Package Management API</li><li>Simplified Client Memory Limit Settings</li><li>Broker Entry Metadata</li><li>New Protobuf Code Generator</li><li>Transactions</li></ul><h4>Exclusive Producer</h4><p>By default, the Pulsar producer API provides a “multi-writer” semantic to append messages to a topic. However, there are several use cases that require exclusive access for a single writer, such as ensuring a linear non-interleaved history of messages or providing a mechanism for leader election.</p><p>This new feature allows applications to require exclusive producer access in order to achieve a “single-writer” situation. It guarantees that there should be 1 single writer in any combination of errors. If the producer loses its exclusive access, no more messages from it can be published on the topic.</p><p>One use case for this feature is the metadata controller in Pulsar Functions. In order to write a single linear history of all the functions metadata updates, the metadata controller requires to elect one leader and that all the “decisions” made by this leader be written on the metadata topic. By leveraging the exclusive producer feature, Pulsar guarantees that the metadata topic contains different segments of updates, one per each successive leader, and there is no interleaving across different leaders. See “<a href="https://github.com/apache/pulsar/wiki/PIP-68%3A-Exclusive-Producer">PIP-68: Exclusive Producer</a>” for more details.</p><h4>Package Management API</h4><p>Since its introduction in version 2.0, the Functions API has become hugely popular among Pulsar users. While it offers many benefits, there are a number of ways to improve the user experience. For example, today, if a function is deployed multiple times, the function package ends up being uploaded multiple times. Also, there is no version management in Pulsar for Functions and IO connectors. The newly introduced package management API provides an easier way to manage the packages for Functions and IO connectors and significantly simplifies the upgrade and rollback processes. Read “<a href="http://pulsar.apache.org/docs/en/admin-api-packages/">Package Management API</a>” for more details.</p><h4>Simplified Client Memory Limit Settings</h4><p>Prior to 2.8, there are multiple settings in producers and consumers that allow controlling the sizes of the internal message queues. These settings ultimately control the amount of memory the Pulsar client uses. However, there are few issues with this approach that make it complicated to select an overall configuration that controls the total usage of memory.</p><p>For example, the settings are based on the “number of messages”, so the expected message size must be adjusted per producer or consumer. If an application has a large (or unknown) number of producers or consumers, it’s very difficult to select an appropriate value for queue sizes. The same is true for topics that have many partitions.</p><p>In 2.8, we introduced a new API to set the memory limit. This single memoryLimit setting specifies a maximum amount of memory on a given Pulsar client. The producers and consumers compete for the memory assigned. It ensures the memory used by the Pulsar client will not go beyond the set limit. Read “<a href="https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits">PIP-74: Pulsar client memory limits</a>” for more details.</p><h4>Broker Entry Metadata</h4><p>Pulsar messages define a very comprehensive set of metadata properties. However, to add a new property, the MessageMetadata definition in Pulsar protocol must change to inform both broker and client of the newly introduced property.</p><p>But in certain cases, this metadata property might need to be added from the broker side, or need to be retrieved by the broker at a very low cost. To prevent deserializing these properties from the message metadata, we introduced “Broker Entry Metadata” in 2.8.0 to provide a lightweight approach to add additional metadata properties without serializing and deserializing the protobuf-encoded MessageMetadata.</p><p>This feature unblocks a new set of capabilities for Pulsar. For example, we can leverage broker entry metadata to generate broker publish time for the messages appended to the Pulsar topic. The other example is to generate a monotonically increasing sequence-id for messages produced to a Pulsar topic. We use this feature in Kafka-on-Pulsar to implement Kafka offset.</p><h4>New Protobuf Code Generator</h4><p>Pulsar uses Google Protobuf in order to perform serialization and deserialization of the commands that are exchanged between clients and brokers. Because of the overhead involved with the regular Protobuf implementation, we have been using a modified version of Protobuf 2.4.1. The modifications were done to ensure a more efficient serialization code that used thread local cache for the objects used in the process.</p><p>This approach introduced a few issues. For example, the patch to the Protobuf code generator is only based on Protobuf version 2.4.1 and cannot be upgraded to the newer Protobuf versions. In 2.8, we switched the patched Protobuf 2.4.1 to Splunk LightProto as the code generator. The new code generator generates the fastest possible Java code for Protobuf SerDe, is 100% compatible with proto2 definition and wire protocol, and provides zero-copy deserialization using Netty ByteBuf.</p><h4>Transactions</h4><p>Prior to Pulsar 2.8, Pulsar only supported exactly-once semantics on single topic through Idempotent Producer. While powerful, Idempotent producer only solves a narrow scope of challenges for exactly-once semantics. For example, there is no atomicity when a producer attempts to produce messages to multiple topics. A publish error can occur when the broker serving one of the topics crashes. If the producer doesn’t retry publishing the message again, it results in some messages being persisted once and others being lost. If the producer retries, it results in some messages being persisted multiple times.</p><p>In order to address the remaining challenges described above, we’ve strengthened the Pulsar’s delivery semantics by introducing a Pulsar Transaction API to support atomic writes and acknowledgements across multiple topics. The addition of the Transaction API to Apache Pulsar completes our vision of making Pulsar a complete unified messaging and streaming platform.</p><p>Pulsar PMC member and StreamNative Engineering Lead Penghui Li, goes over this functionality in great detail in his recent blog, Exactly-once Semantics with Transactions in Pulsar. You can read it to learn more about the <a href="https://streamnative.io/en/blog/release/2021-06-14-exactly-once-semantics-with-transactions-in-pulsar">exactly-once semantics support in Pulsar</a>.</p><h3>Building a Unified Messaging and Streaming Platform with Apache Pulsar</h3><h4>The Evolution of Apache Pulsar</h4><p>Apache Pulsar is widely adopted by hundreds of companies across the globe, including Splunk, Tencent, Verizon, and Yahoo! JAPAN, just to name a few. Born as a cloud-native distributed messaging system, Apache Pulsar has evolved into a complete messaging and streaming platform for publishing and subscribing, storing, and processing streams of data at scale and in real-time.</p><p>Back in 2012 the Yahoo! team was looking for a global, geo-replicated infrastructure that could manage all of Yahoo!’s messaging data. After vetting the messaging and streaming landscape it became clear that existing technologies were not able to serve the need for an event-driven organization. As a result, the team at Yahoo! set out to build its own.</p><p>At the time, there were generally two types of systems to handle in-motion data: message queues that handled mission-critical business events in real-time, and streaming systems that handled scalable data pipelines at scale. Companies had to limit their capabilities to one or the other, or they had to adopt multiple different technologies. If they chose multiple technologies, they would end up with a complex infrastructure that often resulted in data segregation and data silos, with one silo for message queues used to build application services and the other silo for streaming systems used to build data services. The figure below illustrates what this can look like.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*k3MjAgSK51q-SbVV.png" /></figure><p>However, with the diversity of data that companies need to process beyond operational data (like log data, click events, etc), coupled with the increase in the number of downstream systems that need access to combined business data and operational data, the system would need to support message queueing and streaming.</p><p>Beyond that, companies need an infrastructure platform that would allow them to build all of their applications on top of it, and then have those applications handle in-motion data (messaging and streaming data) by default. This way real-time data infrastructure could be significantly simplified, as illustrated in the diagram below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*xmrtQII5uOEAEFTG.png" /></figure><p>With that vision, the Yahoo! team started working on building a unified messaging and streaming platform for in-motion data. Below is an overview of the key milestones on the Pulsar journey, from inception to today.</p><h4>Step 1: A scalable storage for streams of data</h4><p>The journey of Pulsar began with Apache BookKeeper. Apache BookKeeper implements a log-like abstraction for continuous streams and provides the ability to run it at internet-scale with simple write-read log APIs. A log provides a great abstraction for building distributed systems, such as distributed databases and pub-sub messaging. The write APIs are in the form of appends to the log. And the read APIs are in the form of continuous read from a starting offset defined by the readers. The implementation of BookKeeper created the foundation — a scalable log-backed messaging and streaming system.</p><h4>Step 2: A multi-layered architecture that separates compute from storage</h4><p>On top of the scalable log storage, a stateless serving layer was introduced which runs stateless brokers for publishing and consuming messages. This multi-layered architecture separates serving/compute from storage, allowing Pulsar to manage serving and storage in separate layers.</p><p>This architecture also ensures instant scalability and higher availability. Both of these factors are extremely important and make Pulsar well-suited for building mission-critical services, such as billing platforms for financial use cases, transaction processing systems for e-commerce and retailers, and real-time risk control systems for financial institutions.</p><h4>Step 3: Unified messaging model and API</h4><p>In a modern data architecture, the real-time use cases can typically be categorized into two categories: queueing and streaming. Queueing is typically used for building core business application services while streaming is typically used for building real-time data services such as data pipelines.</p><p>To provide one platform able to serve both application and data services required a unified messaging model that integrates queuing and streaming semantics. The Pulsar topics become the source of truth for consumption. Messages can be stored only once on topics, but can be consumed in different ways via different subscriptions. Such unification significantly reduces the complexity of managing and developing messaging and streaming applications.</p><h4>Step 4: Schema API</h4><p>Next, a new Pulsar schema registry and a new type-safe producer &amp; consumer API were added. The built-in schema registry enables message producers and consumers on Pulsar topics to coordinate on the structure of the topic’s data through the Pulsar broker itself, without needing an external coordination mechanism. With data schemas, every single piece of data traveling through Pulsar is completely discoverable, enabling you to build systems that can easily adapt as the data changes.</p><p>Furthermore, the schema registry keeps track of data compatibility between versions of the schema. As the new schemas are uploaded the registry ensures that new schema versions are able to be read by old consumers. This ensures that Producers cannot break Consumers.</p><h4>Step 5: Functions and IO API</h4><p>The next step was to build APIs that made it easy to get data in and out of Pulsar and process it. The goal was to make it easy to build event-driven applications and real-time data pipelines with Apache Pulsar, so you can then process those events when they arrive, no matter where they originated from.</p><p>The Pulsar IO API allows you to build real-time streaming data pipelines by plugging various source connectors to get data from external systems into Pulsar and sink connectors to get data from Pulsar into external systems. Today, Pulsar provides several built-in connectors that you can use.</p><p>Additionally, StreamNative hosts StreamNative Hub (a registry of Pulsar connectors) that provides dozens of connectors integrated with popular data systems. If the IO API is for building streaming data pipelines, the Functions API is for building event-driven applications and real-time stream processors.</p><p>The serverless function concepts were adopted into stream processing and then built the Functions API as a lightweight serverless library that you can write any event processing logic using any language you like. The underlying motivation was to enable your engineering team to write stream processing logic without the operational complexity of running and maintaining yet another cluster.</p><h4>Step 6: Infinite storage for Pulsar via Tiered Storage</h4><p>As adoption of Apache Pulsar continued and the amount of data stored in Pulsar increased, users eventually hit a “retention cliff”, at which point it became significantly more expensive to store, manage, and retrieve data in Apache BookKeeper. To work around this, operators and application developers typically use an external store like AWS S3 as a sink for long-term storage. This means you lose most of the benefits of Pulsar’s immutable stream and ordering semantics, and instead end up having to manage two different systems with different access patterns.</p><p>The introduction of Tiered Storage allows Pulsar to offload the majority of the data to a remote cloud-native storage. This cheaper form of storage readily scales with the volume of data. More importantly, with the addition of Tiered Storage, Pulsar provides the batch storage capabilities needed to support batch processing when integrating with a unified batch and stream processor like Flink. The unified batch and stream processing capabilities integrated with Pulsar enable companies to query real-time streams with historical context quickly and easily, unlocking a unique competitive advantage.</p><h4>Step 7: Protocol Handler</h4><p>After introducing tiered storage, Pulsar evolved from a Pub/Sub messaging system into a scalable stream data system that can ingest, store, and process streams of data. However, existing applications written using other messaging protocols such as Kafka, AMQP, MQTT, etc had to be rewritten to adopt Pulsar’s messaging protocol.</p><p>The Protocol Handler API further reduces the overhead of adopting Pulsar for building messaging and streaming applications, and allows developers to extend Pulsar capabilities to other messaging domains by leveraging all the benefits provided by Pulsar architecture. This resulted in major collaborations between StreamNative and other industry leaders to develop popular protocol handlers including:</p><ul><li><a href="https://hub.streamnative.io/protocol-handlers/kop/0.2.0">Kafka-on-Pulsar (KoP)</a>, which was <a href="https://streamnative.io/en/blog/tech/2020-03-24-bring-native-kafka-protocol-support-to-apache-pulsar">launched in March 2020</a> by OVHCloud and StreamNative.</li><li><a href="https://hub.streamnative.io/protocol-handlers/aop/0.1.0">AMQP-on-Pulsar (AoP)</a>, which was <a href="https://streamnative.io/en/blog/tech/2020-06-15-announcing-aop-on-pulsar">announced in June 2020</a> by China Mobile and StreamNative.</li><li><a href="https://hub.streamnative.io/protocol-handlers/mop/0.2.0">MQTT-on-Pulsar (MoP)</a>, which was <a href="https://streamnative.io/en/blog/tech/2020-09-28-announcing-mqtt-on-pulsar">announced in August 2020</a> by StreamNative.</li><li><a href="https://github.com/streamnative/rop">RocketMQ-on-Pulsar (RoP)</a>, which was launched in May 2021 by Tencent Cloud and StreamNative.</li></ul><h4>Step 8: Transaction API for exactly-once stream processing</h4><p>More recently, transactions were added to Apache Pulsar to enable exactly-once semantics for stream processing. This is a fundamental feature that provides a strong guarantee for streaming data transformations, making it easy to build scalable, fault-tolerant, stateful messaging and streaming applications that process streams of data.</p><p>Furthermore, the transaction API capabilities are not limited to a given language client. Pulsar’s support for transactional messaging and streaming is primarily a protocol-level capability that can be presented in any language. Such protocol-level capability can be leveraged in all kinds of applications.</p><h3>Building an ecosystem for unified messaging and streaming</h3><p>In addition to contributing to the Pulsar technology, the community is also working to build a robust ecosystem to support it. Pulsar’s ability to support a rich ecosystem of pub-sub libraries, connectors, functions, protocol handlers, and integrations with popular query engines will enable Pulsar adopters to streamline workflows and achieve new use cases.</p><h3>What is Next?</h3><p>If you are interested in learning more about Pulsar 2.8.0, you can <a href="https://pulsar.apache.org/en/versions/">download 2.8.0</a> and try it out today!</p><p>If you want to learn more about how companies have adopted Pulsar, you can <a href="https://hopin.com/events/pulsar-summit-north-america-2021">sign up</a> for Pulsar Summit NA 2021!</p><p>For more information about the Apache Pulsar project and the progress, please visit the official website at <a href="https://pulsar.apache.org/">https://pulsar.apache.org</a> and follow the project on Twitter <a href="https://twitter.com/apache_pulsar">@apache_pulsar</a> or <a href="https://twitter.com/streamnativeio">@streamnativeio</a>.</p><h3>About the Author</h3><p><strong>Matteo Merli</strong> is the PMC Chair of Apache Pulsar and CTO at StreamNative. He is co-creator of Pulsar while at Yahoo!, co-founder of Streamlio, Committer and PMC of Apache Pulsar and Apache BookKeeper. He works for The Apache Software Foundation for 4+ years, and has rich experience in distributed pub-sub messaging platform. You can follow him on <a href="https://twitter.com/merlimat">twitter</a>. You can follow him on <a href="https://www.linkedin.com/in/matteomerli/">linkedin</a>.</p><p><strong>Sijie Guo</strong> is the co-founder and CEO of StreamNative, which provides a cloud-native event streaming platform powered by Apache Pulsar. Sijie has worked on messaging and streaming data technologies for more than a decade. Prior to StreamNative, Sijie cofounded Streamlio, a company focused on real-time solutions. At Twitter, Sijie was the tech lead for the messaging infrastructure group, where he co-created DistributedLog and Twitter EventBus. Prior to that, he worked on the push notification infrastructure at Yahoo!, where he was one of the original developers of BookKeeper and Pulsar. He is also the VP of Apache BookKeeper and PMC member of Apache Pulsar. You can follow him on <a href="https://twitter.com/sijieg">twitter</a>.</p><p>This post was originally published on <a href="http://pulsar.apache.org/blog/">Apache Pulsar blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=37dad479cba1" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/apache-pulsar-launches-2-8-unified-messaging-and-streaming-with-transactions-37dad479cba1">Apache Pulsar Launches 2.8: Unified Messaging and Streaming With Transactions</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Pulsar Isolation for Dummies: Separate Pulsar Clusters]]></title>
            <link>https://medium.com/streamnative/pulsar-isolation-for-dummies-separate-pulsar-clusters-ffa790d36ea5?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/ffa790d36ea5</guid>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[dummies]]></category>
            <category><![CDATA[separate]]></category>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[isolation]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Thu, 10 Jun 2021 16:02:24 GMT</pubDate>
            <atom:updated>2021-06-10T16:02:23.491Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/855/0*MOkKmrptRcmcRjqK.jpg" /></figure><p>This blog is for Pulsar users of all levels. If you follow the instructions in this blog, you will successfully configure isolation in Pulsar.</p><p>This is the second blog in the series on configuring isolation in <a href="https://pulsar.apache.org/docs/en/next/standalone/">Apache Pulsar</a>. The first blog, <a href="https://streamnative.io/en/blog/tech/2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar">Taking an In-Depth Look at How to Achieve Isolation in Pulsar</a>, explains how to use the following approaches to achieve isolation in Pulsar:</p><ul><li>Separate Pulsar clusters</li><li>Shared BookKeeper cluster</li><li>Single Pulsar cluster</li></ul><p>This blog details how to create multiple, separate pulsar clusters for isolation of resources. Because this approach segregates resources and does not share storage or local ZooKeeper with other clusters, it provides the highest level of isolation. You should use this approach if you want to isolate critical workloads (such as billing and ads). You can create multiple, separate clusters dedicated to each workload.</p><p>To help you get started quickly, this blog walks you through every step for the following parts:</p><ol><li>Deploy two separate Pulsar clusters</li><li>Verify data isolation of clusters</li><li>Synchronize and migrate data between clusters (optional)</li><li>Scale up and down nodes (optional)</li></ol><h3>Deploy environment</h3><p>The examples in this blog are developed on a macOS (version 11.2.3, memory 8G).</p><p><strong>Software requirement</strong></p><ul><li>Java 8</li></ul><p>You will deploy two clusters and each of them supports the following services:</p><ul><li>1 ZooKeeper</li><li>1 bookie</li><li>1 broker</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*EhBjTdbzVgQ_Z5wi.png" /><figcaption>Figure 1 — Two Separate Pulsar Clusters</figcaption></figure><p>The following are the details of the two Pulsar clusters you will deploy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/786/1*E7CRcjBSnaPj_t4_eiXZNg.png" /></figure><h4>Prepare deployment</h4><ol><li><a href="https://pulsar.apache.org/docs/en/next/standalone/#install-pulsar-using-binary-release">Download Pulsar</a> and untar the tarball.</li></ol><p>In this example, Pulsar 2.7.0 is installed.</p><p>2. Create empty directories using the following structure and then change the names accordingly.</p><p>You can create the directories anywhere in your local environment.</p><p><strong>Input</strong></p><pre>|-separate-clusters<br>    |-configuration-store<br>        |-zk1<br>    |-cluster1<br>        |-zk1<br>        |-bk1<br>        |-broker1<br>    |-cluster2<br>        |-zk1<br>        |-bk1<br>        |-broker1</pre><p>3. Copy the files to each directory you created in step 2.</p><p>4. Start <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-the-configuration-store">configuration store</a>.</p><p>Configuration store operates at the instance level and provides configuration management and task coordination across clusters. In this example, cluster1 and cluster2 share one configuration store.</p><p><strong>Input</strong></p><pre>cd configuration-store/zk1<br><br>bin/pulsar-daemon start configuration-store</pre><h4>Deploy Pulsar cluster1</h4><ol><li>Start a <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-local-zookeeper">local ZooKeeper</a>.</li></ol><p>For each Pulsar cluster, you need to deploy 1 local ZooKeeper to manage configurations and coordinate tasks.</p><p><strong>Input</strong></p><pre>cd cluster1/zk1<br><br>bin/pulsar-daemon start zookeeper</pre><p>2. <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#cluster-metadata-initialization">Initialize metadata</a>.</p><p>Write metadata to ZooKeeper.</p><p><strong>Input</strong></p><pre>cd cluster1/zk1<br><br>bin/pulsar initialize-cluster-metadata \<br>  --cluster cluster1 \<br>  --zookeeper localhost:2181 \<br>  --configuration-store localhost:2184 \<br>  --web-service-url http://localhost:8080/ \<br>  --web-service-url-tls https://localhost:8443/ \<br>  --broker-service-url pulsar://localhost:6650/ \<br>  --broker-service-url-tls pulsar+ssl://localhost:6651/</pre><p>3. <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-bookkeeper">Deploy BookKeeper</a>.</p><p>BookKeeper provides <a href="https://pulsar.apache.org/docs/en/next/concepts-architecture-overview#persistent-storage">persistent storage</a> for messages on Pulsar. Each Pulsar broker owns its bookie. BookKeeper clusters and Pulsar clusters share the local ZooKeeper.</p><p>(1) <a href="https://pulsar.apache.org/docs/en/next/concepts-architecture-overview#persistent-storage">Configure bookies</a>.</p><p>Change the value of the following configurations in the cluster1/bk1/conf/bookkeeper.conf file.</p><pre>allowLoopback=true<br>prometheusStatsHttpPort=8002<br>httpServerPort=8002</pre><p>(2) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#start-bookies">Start bookies</a>.</p><p><strong>Input</strong></p><pre>cd cluster1/bk1<br><br>bin/pulsar-daemon start bookie</pre><p>Check whether the bookie is started successfully.</p><p><strong>Input</strong></p><pre>bin/bookkeeper shell bookiesanity</pre><p><strong>Output</strong></p><pre>Bookie sanity test succeeded</pre><p>4. <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-brokers">Deploy brokers</a>.</p><p>(1) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#broker-configuration">Configure brokers</a>.</p><p>Change the value of the following configurations in the cluster1/broker1/conf/broker.conf file.</p><pre>zookeeperServers=127.0.0.1:2181<br>configurationStoreServers=127.0.0.1:2184<br>clusterName=cluster1<br>managedLedgerDefaultEnsembleSize=1<br>managedLedgerDefaultWriteQuorum=1<br>managedLedgerDefaultAckQuorum=1</pre><p>(2) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#start-the-broker-service">Start brokers</a>.</p><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-daemon start broker</pre><h4>Deploy Pulsar cluster2</h4><ol><li>Deploy a <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-local-zookeeper">local ZooKeeper</a>.</li></ol><p>(1) Configure a local ZooKeeper.</p><ul><li>Change the value of the following configurations in the cluster2/zk1/conf/zookeeper.conf file.</li></ul><pre>clientPort=2186<br>admin.serverPort=9992</pre><ul><li>Add the following configurations to the cluster2/zk1/conf/pulsar_env.sh file.</li></ul><pre>OPTS=&quot;-Dstats_server_port=8011&quot;</pre><p>(2) Start a <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-local-zookeeper">local ZooKeeper</a>.</p><p><strong>Input</strong></p><pre>cd cluster2/zk1<br><br>bin/pulsar-daemon start zookeeper</pre><p>2. <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#cluster-metadata-initialization">Initialize metadata</a>.</p><p><strong>Input</strong></p><pre>bin/pulsar initialize-cluster-metadata \<br>  --cluster cluster2 \<br>  --zookeeper localhost:2186 \<br>  --configuration-store localhost:2184 \<br>  --web-service-url http://localhost:8081/ \<br>  --web-service-url-tls https://localhost:8444/ \<br>  --broker-service-url pulsar://localhost:6660/ \<br>  --broker-service-url-tls pulsar+ssl://localhost:6661/</pre><p>3. <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-bookkeeper">Deploy BookKeeper</a>.</p><p>(1) <a href="https://pulsar.apache.org/docs/en/next/concepts-architecture-overview#persistent-storage">Configure bookies</a>.</p><p>Change the value of the following configurations in the cluster2/bk1/conf/bookkeeper.conf file.</p><pre>bookiePort=3182<br>zkServers=localhost:2186<br>allowLoopback=true<br>prometheusStatsHttpPort=8003<br>httpServerPort=8003</pre><p>(2) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#start-bookies">Start bookies</a>.</p><p><strong>Input</strong></p><pre>cd cluster2/bk1<br><br>bin/pulsar-daemon start bookie</pre><p>Check whether the bookie is started successfully.</p><p><strong>Input</strong></p><pre>bin/bookkeeper shell bookiesanity</pre><p><strong>Output</strong></p><pre>Bookie sanity test succeeded</pre><p>4. <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-brokers">Deploy brokers</a>.</p><p>(1) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#broker-configuration">Configure brokers</a>.</p><ul><li>Change the value of the following configurations in the cluster2/broker1/conf/broker.conf file.</li></ul><pre>clusterName=cluster2<br>zookeeperServers=127.0.0.1:2186<br>configurationStoreServers=127.0.0.1:2184<br>brokerServicePort=6660<br>webServicePort=8081<br>managedLedgerDefaultEnsembleSize=1<br>managedLedgerDefaultWriteQuorum=1<br>managedLedgerDefaultAckQuorum=1</pre><ul><li>Change the value of the following configurations in the cluster2/broker1/conf/client.conf file.</li></ul><pre>webServiceUrl=http:<em>//localhost:8081/</em><br>brokerServiceUrl=pulsar:<em>//localhost:6660/</em></pre><p>(2) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#start-the-broker-service">Start brokers</a>.</p><p><strong>Input</strong></p><pre>cd cluster2/broker1<br><br>bin/pulsar-daemon start broker</pre><h3>Verify data isolation of clusters</h3><p>This section verifies whether the data in the two Pulsar clusters is isolated.</p><ol><li>Create namespace1 and assign it to cluster1.</li></ol><blockquote><strong><em>Tip</em></strong><em><br>The format of a namespace name is </em><em>&lt;tenant-name&gt;/&lt;namespace-name&gt;. For more information, see Namespaces.</em></blockquote><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-admin namespaces create -c cluster1 public/namespace1</pre><p>Check the result.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin namespaces list public</pre><p><strong>Output</strong></p><pre>&quot;public/default&quot;<br>&quot;public/namespace1&quot;</pre><p>2. Set the retention policy for namespace1.</p><blockquote><strong><em>Note</em></strong><em><br>If the retention policy is not set and the topic is not subscribed, the data stored on the topic is deleted automatically after a while.</em></blockquote><p><strong>Input</strong></p><pre>bin/pulsar-admin namespaces set-retention -s 100M -t 3d public/namespace1</pre><p>3. Create topic1 in namespace1 and write 1000 messages to this topic.</p><blockquote>Tip<br>The pulsar-client is a command line tool to send and consume data. For more information, see <a href="https://pulsar.apache.org/docs/en/next/reference-cli-tools/">Pulsar command line tools</a>.</blockquote><p><strong>Input</strong></p><pre>bin/pulsar-client produce -m &#39;hello c1 to c2&#39; -n 1000 public/namespace1/topic1<br><br>09:56:34.504 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 1000 messages successfully produced</pre><p>Check the result.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin --admin-url http:<em>//localhost:8080 topics stats-internal public/namespace1/topic1</em></pre><p><strong>Output</strong></p><p>The entriesAddedCounter parameter shows that 1000 messages are added.</p><pre>{<br>  &quot;entriesAddedCounter&quot; : 1000,<br>  &quot;numberOfEntries&quot; : 1000,<br>  &quot;totalSize&quot; : 65616,<br>  &quot;currentLedgerEntries&quot; : 1000,<br>  &quot;currentLedgerSize&quot; : 65616,<br>  &quot;lastLedgerCreatedTimestamp&quot; : &quot;2021-04-22T10:24:00.582+08:00&quot;,<br>  &quot;waitingCursorsCount&quot; : 0,<br>  &quot;pendingAddEntriesCount&quot; : 0,<br>  &quot;lastConfirmedEntry&quot; : &quot;4:999&quot;,<br>  &quot;state&quot; : &quot;LedgerOpened&quot;,<br>  &quot;ledgers&quot; : [ {<br>    &quot;ledgerId&quot; : 4,<br>    &quot;entries&quot; : 0,<br>    &quot;size&quot; : 0,<br>    &quot;offloaded&quot; : false<br>  } ],<br>  &quot;cursors&quot; : { },<br>  &quot;compactedLedger&quot; : {<br>    &quot;ledgerId&quot; : -1,<br>    &quot;entries&quot; : -1,<br>    &quot;size&quot; : -1,<br>    &quot;offloaded&quot; : false<br>  }<br>}</pre><p>4. Check the data stored on public/namespace1/topic1 by cluster2 (localhost:8081).</p><p><strong>Input</strong></p><pre>bin/pulsar-admin --admin-url http:<em>//localhost:8081 topics stats-internal public/namespace1/topic1</em></pre><p><strong>Output</strong></p><p>The attempt failed. The error message shows that the data stored on public/namespace1 is assigned only to cluster1. This proves that the data is isolated.</p><pre>Namespace missing local cluster name in clusters list: local_cluster=cluster2 ns=public/namespace1 clusters=[cluster1]<br><br>Reason: Namespace missing local cluster name in clusters list: local_cluster=cluster2 ns=public/namespace1 clusters=[cluster1]</pre><p>5. Write data to public/namespace1/topic1 in cluster2.</p><p><strong>Input</strong></p><pre>cd cluster2/broker1<br><br>bin/pulsar-client produce -m &#39;hello c1 to c2&#39; -n 1000 public/namespace1/topic1</pre><p><strong>Output</strong></p><p>The error message shows that 0 message is written. The attempt failed because namespace1 is assigned only to cluster1. This proves that the data is isolated.</p><pre>12:09:50.005 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 0 messages successfully produced</pre><h3>Synchronize and migrate data between clusters</h3><p>After verifying that the data is isolated, you can synchronize (using geo-replication) and migrate data between clusters.</p><ol><li>Assign namespace1 to cluster2, that is, adding cluster2 to the cluster list of namespace1.</li></ol><p>This enables geo-replication to synchronize the data between cluster1 and cluster2.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin namespaces set-clusters --clusters cluster1,cluster2 public/namespace1</pre><p>Check the result.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin namespaces get-clusters public/namespace1</pre><p><strong>Output</strong></p><pre>&quot;cluster1&quot;<br>&quot;cluster2&quot;</pre><p>2. Check whether topic1 is in cluster2.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin --admin-url http:<em>//localhost:8081 topics stats-internal public/namespace1/topic1</em></pre><p><strong>Output</strong></p><p>The output shows that there are 1000 messages on cluster2/topic1. This proves that the data stored on cluster1/topic1 is replicated to cluster2 successfully.</p><pre>{<br>  &quot;entriesAddedCounter&quot; : 1000,<br>  &quot;numberOfEntries&quot; : 1000,<br>  &quot;totalSize&quot; : 75616,<br>  &quot;currentLedgerEntries&quot; : 1000,<br>  &quot;currentLedgerSize&quot; : 75616,<br>  &quot;lastLedgerCreatedTimestamp&quot; : &quot;2021-04-23T12:02:52.929+08:00&quot;,<br>  &quot;waitingCursorsCount&quot; : 1,<br>  &quot;pendingAddEntriesCount&quot; : 0,<br>  &quot;lastConfirmedEntry&quot; : &quot;1:999&quot;,<br>  &quot;state&quot; : &quot;LedgerOpened&quot;,<br>  &quot;ledgers&quot; : [ {<br>    &quot;ledgerId&quot; : 1,<br>    &quot;entries&quot; : 0,<br>    &quot;size&quot; : 0,<br>    &quot;offloaded&quot; : false<br>  } ],<br>  &quot;cursors&quot; : {<br>    &quot;pulsar.repl.cluster1&quot; : {<br>      &quot;markDeletePosition&quot; : &quot;1:999&quot;,<br>      &quot;readPosition&quot; : &quot;1:1000&quot;,<br>      &quot;waitingReadOp&quot; : true,<br>      &quot;pendingReadOps&quot; : 0,<br>      &quot;messagesConsumedCounter&quot; : 1000,<br>      &quot;cursorLedger&quot; : 2,<br>      &quot;cursorLedgerLastEntry&quot; : 2,<br>      &quot;individuallyDeletedMessages&quot; : &quot;[]&quot;,<br>      &quot;lastLedgerSwitchTimestamp&quot; : &quot;2021-04-23T12:02:53.248+08:00&quot;,<br>      &quot;state&quot; : &quot;Open&quot;,<br>      &quot;numberOfEntriesSinceFirstNotAckedMessage&quot; : 1,<br>      &quot;totalNonContiguousDeletedMessagesRange&quot; : 0,<br>      &quot;properties&quot; : { }<br>    }<br>  },<br>  &quot;compactedLedger&quot; : {<br>    &quot;ledgerId&quot; : -1,<br>    &quot;entries&quot; : -1,<br>    &quot;size&quot; : -1,<br>    &quot;offloaded&quot; : false<br>  }<br>}</pre><p>3. Migrate the producer and consumer from cluster1 to cluster2.</p><pre>PulsarClient pulsarClient1 = PulsarClient.builder().serviceUrl(&quot;pulsar://localhost:6650&quot;).build();<br><em>// migrate the client to cluster2 pulsar://localhost:6660</em><br>PulsarClient pulsarClient2 = PulsarClient.builder().serviceUrl(&quot;pulsar://localhost:6660&quot;).build();</pre><p>4. Remove cluster1 from the cluster list of namespace1.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin namespaces set-clusters --clusters cluster2 public/namespace1</pre><p>Check if the data is stored on cluster1/topic1.</p><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-admin --admin-url http://localhost:8080 topics stats-internal public/namespace1/topic1</pre><p><strong>Output</strong></p><p>The data is removed from cluster1/topic1 successfully since the value of numberOfEntries parameter is 0.</p><pre>{<br>  &quot;entriesAddedCounter&quot; : 0,<br>  &quot;numberOfEntries&quot; : 0,<br>  &quot;totalSize&quot; : 0,<br>  &quot;currentLedgerEntries&quot; : 0,<br>  &quot;currentLedgerSize&quot; : 0,<br>  &quot;lastLedgerCreatedTimestamp&quot; : &quot;2021-04-23T15:20:08.1+08:00&quot;,<br>  &quot;waitingCursorsCount&quot; : 1,<br>  &quot;pendingAddEntriesCount&quot; : 0,<br>  &quot;lastConfirmedEntry&quot; : &quot;3:-1&quot;,<br>  &quot;state&quot; : &quot;LedgerOpened&quot;,<br>  &quot;ledgers&quot; : [ {<br>    &quot;ledgerId&quot; : 3,<br>    &quot;entries&quot; : 0,<br>    &quot;size&quot; : 0,<br>    &quot;offloaded&quot; : false<br>  } ],<br>  &quot;cursors&quot; : {<br>    &quot;pulsar.repl.cluster2&quot; : {<br>      &quot;markDeletePosition&quot; : &quot;3:-1&quot;,<br>      &quot;readPosition&quot; : &quot;3:0&quot;,<br>      &quot;waitingReadOp&quot; : true,<br>      &quot;pendingReadOps&quot; : 0,<br>      &quot;messagesConsumedCounter&quot; : 0,<br>      &quot;cursorLedger&quot; : 4,<br>      &quot;cursorLedgerLastEntry&quot; : 0,<br>      &quot;individuallyDeletedMessages&quot; : &quot;[]&quot;,<br>      &quot;lastLedgerSwitchTimestamp&quot; : &quot;2021-04-23T15:20:08.122+08:00&quot;,<br>      &quot;state&quot; : &quot;Open&quot;,<br>      &quot;numberOfEntriesSinceFirstNotAckedMessage&quot; : 1,<br>      &quot;totalNonContiguousDeletedMessagesRange&quot; : 0,<br>      &quot;properties&quot; : { }<br>    }<br>  },<br>  &quot;compactedLedger&quot; : {<br>    &quot;ledgerId&quot; : -1,<br>    &quot;entries&quot; : -1,<br>    &quot;size&quot; : -1,<br>    &quot;offloaded&quot; : false<br>  }<br>}</pre><p>At this point, you replicated data from cluster1/topic1 to cluster2 and then removed the data from cluster1/topic1.</p><h3>Scale up and down nodes</h3><p>If you need to handle increasing or decreasing workloads, you can scale up or down nodes. This section demonstrates how to scale up and scale down nodes (brokers and bookies).</p><h4>Broker</h4><p><strong>Scale up brokers</strong></p><p>In this procedure, you’ll create 2 partitioned topics on cluster1/broker1 and add 2 brokers. Then, you’ll offload the data stored on partitioned topics and check the data distribution among 3 brokers.</p><ol><li>Check the information about brokers in cluster1.</li></ol><p><strong>Input</strong></p><pre>cd/cluster1/broker1<br><br>bin/pulsar-admin brokers list cluster1</pre><p><strong>Output</strong></p><p>The output shows that broker1 is the only broker in cluster1.</p><pre>&quot;192.168.0.105:8080&quot;</pre><p>2. Create 2 partitioned topics on cluster1/broker1.</p><p>Create 6 partitions for partitioned-topic1 and 7 partitions for partitioned-topic2.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics create-partitioned-topic -p 6 public/namespace1/partitioned-topic1<br><br>bin/pulsar-admin topics create-partitioned-topic -p 7 public/namespace1/partitioned-topic2</pre><p>Check the result.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics partitioned-lookup public/namespace1/partitioned-topic1</pre><p><strong>Output</strong></p><p>All data of partitioned-topic1 is from broker1.</p><pre>&quot;persistent://public/namespace1/partitioned-topic1-partition-0    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-1    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-2    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-3    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-4    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-5    pulsar://192.168.0.105:6650&quot;</pre><p><strong>Input</strong></p><pre>bin/pulsar-admin topics partitioned-lookup public/namespace1/partitioned-topic2</pre><p><strong>Output</strong></p><p>All data of partitioned-topic2 is from broker1.</p><pre>&quot;persistent://public/namespace1/partitioned-topic2-partition-0    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-1    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-2    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-3    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-4    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-5    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-6    pulsar://192.168.0.105:6650&quot;</pre><p>3. Add broker2 and broker3.</p><p>(1) Prepare for deployment.</p><p>Create two empty repositories (broker2 and broker3) under cluster1 repository. Copy the untarred files in the Pulsar repository to these two repositories.</p><pre>|-separate-clusters<br>    |-configuration-store<br>        |-zk1<br>    |-cluster1<br>        |-zk1<br>        |-bk1<br>        |-broker1<br>        |-broker2<br>        |-broker3<br>    |-cluster2<br>        |-zk1<br>        |-bk1<br>        |-broker1</pre><p>(2) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#deploy-brokers">Deploy brokers</a>.</p><p>(2.a) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#broker-configuration">Configure brokers</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/756/1*alL8_87UeeLblbW6qMLn9g.png" /></figure><p>(2.b) <a href="https://pulsar.apache.org/docs/en/next/deploy-bare-metal-multi-cluster/#start-the-broker-service">Start brokers</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/475/1*HzJF2-pMjt59xeR7evhVHQ.png" /></figure><p>(2.c) Check the information about the running brokers in cluster1.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin brokers list cluster1</pre><p><strong>Output</strong></p><pre>&quot;192.168.0.105:8080&quot; <em>// broker1</em><br>&quot;192.168.0.105:8082&quot; <em>// broker2</em><br>&quot;192.168.0.105:8083&quot; <em>// broker3</em></pre><p>4. Offload the data stored on namespace1/partitioned-topic1.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin namespaces unload public/namespace1</pre><p>Check the result.</p><p>(1) Check the distribution of data stored on partitioned-topic1.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics partitioned-lookup public/namespace1/partitioned-topic1</pre><p><strong>Output</strong></p><p>The output shows that the data stored on partitioned-topic1 is distributed evenly on broker1, broker2, and broker3.</p><pre>&quot;persistent://public/namespace1/partitioned-topic1-partition-0    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-1    pulsar://192.168.0.105:6653&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-2    pulsar://192.168.0.105:6652&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-3    pulsar://192.168.0.105:6653&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-4    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-5    pulsar://192.168.0.105:6653&quot;</pre><p>(2) Check the distribution of data stored on partitioned-topic2.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics partitioned-lookup public/namespace1/partitioned-topic2</pre><p>The output shows that the data stored on partitioned-topic2 is distributed evenly on broker1, broker2, and broker3.</p><p><strong>Output</strong></p><pre>&quot;persistent://public/namespace1/partitioned-topic2-partition-0    pulsar://192.168.0.105:6653&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-1    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-2    pulsar://192.168.0.105:6653&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-3    pulsar://192.168.0.105:6652&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-4    pulsar://192.168.0.105:6653&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-5    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-6    pulsar://192.168.0.105:6653&quot;</pre><p><strong>Scale down brokers</strong></p><blockquote><strong><em>Tip</em></strong><em><br>The following steps continue from the previous section “Scale up brokers”.</em></blockquote><p>In this procedure, you’ll stop 1 broker in cluster1 and check how the data stored on the partitioned topics is distributed among other brokers.</p><ol><li>Stop broker3.</li></ol><p><strong>Input</strong></p><pre>cd/cluster1/broker3<br><br>bin/pulsar-daemon stop broker</pre><p>Check the result.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin brokers list cluster1</pre><p><strong>Output</strong></p><p>The output shows that only broker1 and broker2 are running in cluster1.</p><pre>&quot;192.168.0.105:8080&quot; <em>// broker1</em><br>&quot;192.168.0.105:8082&quot; <em>// broker2</em></pre><p>2. Check the distribution of data stored on partitioned-topic1.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics partitioned-lookup public/namespace1/partitioned-topic1</pre><p><strong>Output</strong></p><p>The output shows that the data stored on partitioned-topic1 is distributed evenly between broker1 and broker2, which means that the data from broker3 is redistributed.</p><pre>&quot;persistent://public/namespace1/partitioned-topic1-partition-0    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-1    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-2    pulsar://192.168.0.105:6652&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-3    pulsar://192.168.0.105:6652&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-4    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic1-partition-5    pulsar://192.168.0.105:6650&quot;</pre><p>Similarly, the data stored on partitioned-topic2 is distributed evenly between broker1 and broker2.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics partitioned-lookup public/namespace1/partitioned-topic2</pre><p><strong>Output</strong></p><pre>&quot;persistent://public/namespace1/partitioned-topic2-partition-0    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-1    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-2    pulsar://192.168.0.105:6652&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-3    pulsar://192.168.0.105:6652&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-4    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-5    pulsar://192.168.0.105:6650&quot;<br>&quot;persistent://public/namespace1/partitioned-topic2-partition-6    pulsar://192.168.0.105:6652&quot;</pre><h4>Bookie</h4><p><strong>Scale up bookies</strong></p><p>In this procedure, you’ll add 2 bookies to cluster1/bookkeeper1. Then, you’ll write data to topic1 and check whether the replicas are saved.</p><ol><li>Check the information about bookies in cluster1.</li></ol><p><strong>Input</strong></p><pre>cd cluster1/bk1</pre><pre>bin/bookkeeper shell listbookies -rw -h</pre><p><strong>Output</strong></p><p>The output shows that broker1 is the only broker in cluster1.</p><pre>12:31:34.933 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :<br>12:31:34.946 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105</pre><p>2. Allow 3 bookies to serve.</p><p>Change the values of the following configurations in the cluster1/broker1/conf/broker.conf file.</p><pre>managedLedgerDefaultEnsembleSize=3 // specify the number of bookies to use when creating a ledger<br>managedLedgerDefaultWriteQuorum=3 // specify the number of copies to store for each message<br>managedLedgerDefaultAckQuorum=2  // specify the number of guaranteed copies (acks to wait before writing is completed)</pre><p>3. Restart broker1 to enable the configurations.</p><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-daemon stop broker<br><br>bin/pulsar-daemon start broker</pre><p>4. Set the retention policy for the messages in public/default.</p><blockquote><strong><em>Note</em></strong><em><br>If the retention policy is not set and the topic is not subscribed, the data of the topic is deleted automatically after a while.</em></blockquote><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-admin namespaces set-retention -s 100M -t 3d public/default</pre><p>5. Create topic1 in public/default and write 100 messages to this topic.</p><p><strong>Input</strong></p><pre>bin/pulsar-client produce -m &#39;hello&#39; -n 100 topic1</pre><p><strong>Output</strong></p><p>The data is not written successfully because of the insufficient number of bookies.</p><pre>···<br><br>12:40:38.886 [pulsar-client-io-1-1] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0x56f92aff, L:/192.168.0.105:53069 - R:/192.168.0.105:6650] Received error from server: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available<br><br>...<br><br>12:40:38.886 [main] ERROR org.apache.pulsar.client.cli.PulsarClientTool - Error while producing messages<br><br>...<br><br>12:40:38.890 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 0 messages successfully produced</pre><p>6. Add bookie2 and bookie3.</p><p>(1) Prepare for deployment.</p><p>Create two empty repositories (bk2 and bk3) under cluster1 repository. Copy the untarred files in Pulsar repository to these two repositories.</p><pre>|-separate-clusters<br>    |-configuration-store<br>        |-zk1<br>    |-cluster1<br>        |-zk1<br>        |-bk1<br>        |-bk2<br>        |-bk3<br>        |-broker1<br>    |-cluster2<br>        |-zk1<br>        |-bk1<br>        |-broker1</pre><p>(2) Deploy bookies.</p><p>(2.a) Configure bookies.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/757/1*TPRh7JebfdEWkTuDgTxDyg.png" /></figure><p>(2.b) Start bookies.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/516/1*0OWi6Z6n1ScpvOQRmwEFKA.png" /></figure><p>(2.c) Check the running bookies in cluster1.</p><p><strong>Input</strong></p><pre>bin/bookkeeper shell listbookies -rw -h</pre><p><strong>Output</strong></p><p>All three bookies are running in cluster1: bookie1：192.168.0.105:3181 bookie2：192.168.0.105:3183 bookie3：192.168.0.105:3184</p><pre>12:12:47.574 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3183, IP:192.168.0.105, Port:3183, Hostname:192.168.0.105 <br>12:12:47.575 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3184, IP:192.168.0.105, Port:3184, Hostname:192.168.0.105<br>12:12:47.576 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105</pre><p>7. Set the retention policy for messages in public/default.</p><blockquote><strong><em>Note</em></strong><em><br>If the retention policy is not set and the topic is not subscribed, the data stored on the topic is deleted automatically after a while.</em></blockquote><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-admin namespaces set-retention -s 100M -t 3d public/default</pre><p>8. Create topic1 in public/default and write 100 messages to this topic.</p><p><strong>Input</strong></p><pre>bin/pulsar-client produce -m &#39;hello&#39; -n 100 topic1</pre><p><strong>Output</strong></p><p>The messages are written successfully.</p><pre>...<br>12:17:40.222 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 100 messages successfully produced</pre><p>9. Check the information about topic1.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics stats-internal topic1</pre><p><strong>Output</strong></p><p>The output shows that the data stored on topic1 is saved in the ledger with ledgerId 5.</p><pre>{<br>  &quot;entriesAddedCounter&quot; : 100,<br>  &quot;numberOfEntries&quot; : 100,<br>  &quot;totalSize&quot; : 5500,<br>  &quot;currentLedgerEntries&quot; : 100,<br>  &quot;currentLedgerSize&quot; : 5500,<br>  &quot;lastLedgerCreatedTimestamp&quot; : &quot;2021-05-11T12:17:38.881+08:00&quot;,<br>  &quot;waitingCursorsCount&quot; : 0,<br>  &quot;pendingAddEntriesCount&quot; : 0,<br>  &quot;lastConfirmedEntry&quot; : &quot;5:99&quot;,<br>  &quot;state&quot; : &quot;LedgerOpened&quot;,<br>  &quot;ledgers&quot; : [ {<br>    &quot;ledgerId&quot; : 5,<br>    &quot;entries&quot; : 0,<br>    &quot;size&quot; : 0,<br>    &quot;offloaded&quot; : false<br>  } ],<br>  &quot;cursors&quot; : { },<br>  &quot;compactedLedger&quot; : {<br>    &quot;ledgerId&quot; : -1,<br>    &quot;entries&quot; : -1,<br>    &quot;size&quot; : -1,<br>    &quot;offloaded&quot; : false<br>  }<br>}</pre><p>10. Check in which bookies the ledger with ledgerId 5 is saved.</p><p><strong>Input</strong></p><pre>bin/bookkeeper shell ledgermetadata -ledgerid 5</pre><p><strong>Output</strong></p><p>As configured previously, the ledger with ledgerId 5 is saved on bookie1 (3181), bookie2 (3181), and bookie3 (3184).</p><pre>...<br>12:23:17.705 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - ledgerID: 5<br>12:23:17.714 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=3, ackQuorumSize=2, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[192.168.0.105:3184, 192.168.0.105:3181, 192.168.0.105:3183]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90b3BpYzE=, application=base64:cHVsc2Fy}}<br>...</pre><p><strong>Scale down bookies</strong></p><blockquote><strong><em>Tip</em></strong><em><br>The following steps continue from the previous section “Scale up bookies”.</em></blockquote><p>In this procedure, you’ll remove 2 bookies. Then, you’ll write data to topic2 and check where the data is saved.</p><ol><li>Allow 1 bookie to serve.</li></ol><p>Change the values of the following configurations in the cluster1/broker1/conf/broker.conf file.</p><pre>managedLedgerDefaultEnsembleSize=1 // specify the number of bookies to use when creating a ledger<br>managedLedgerDefaultWriteQuorum=1 // specify the number of copies to store for each message<br>managedLedgerDefaultAckQuorum=1  // specify the number of guaranteed copies (acks to wait before writing is completed)</pre><p>2. Restart broker1 to enable the configurations.</p><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-daemon stop broker<br><br>bin/pulsar-daemon start broker</pre><p>3. Check the information about bookies in cluster1.</p><p><strong>Input</strong></p><pre>cd cluster1/bk1<br><br>bin/bookkeeper shell listbookies -rw -h</pre><p><strong>Output</strong></p><p>All three bookies are running in cluster1, including bookie1 (3181), bookie2 (3183), and bookie3 (3184).</p><pre>...<br>15:47:41.370 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :<br>15:47:41.382 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3183, IP:192.168.0.105, Port:3183, Hostname:192.168.0.105<br>15:47:41.383 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3184, IP:192.168.0.105, Port:3184, Hostname:192.168.0.105<br>15:47:41.384 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105<br>...</pre><p>4. Stop bookie2 and bookie3.</p><blockquote><strong><em>Tip</em></strong><em><br>For more information about how to stop bookies, see </em><a href="https://bookkeeper.apache.org/docs/4.13.0/admin/decomission/"><em>Decommission Bookies</em></a><em>.</em></blockquote><p><strong>Input</strong></p><pre>cd cluster1/bk2<br><br>bin/bookkeeper shell listunderreplicated<br><br>bin/pulsar-daemon stop bookie<br><br>bin/bookkeeper shell decommissionbookie</pre><p><strong>Input</strong></p><pre>cd cluster1/bk3<br><br>bin/bookkeeper shell listunderreplicated<br><br>bin/pulsar-daemon stop bookie<br><br>bin/bookkeeper shell decommissionbookie</pre><p>5. Check the information about bookies in cluster1.</p><p><strong>Input</strong></p><pre>cd cluster1/bk1<br><br>bin/bookkeeper shell listbookies -rw -h</pre><p><strong>Output</strong></p><p>The output shows that bookie1 (3181) is the only running bookie in cluster1.</p><pre>...<br>16:05:28.690 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :<br>16:05:28.700 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:192.168.0.105:3181, IP:192.168.0.105, Port:3181, Hostname:192.168.0.105<br>...</pre><p>6. Set the retention policy for the messages in public/default.</p><blockquote><strong><em>Note</em></strong><em><br>If the retention policy is not set and the topic is not subscribed, the data stored on the topic is deleted automatically after a while.</em></blockquote><p><strong>Input</strong></p><pre>cd cluster1/broker1<br><br>bin/pulsar-admin namespaces set-retention -s 100M -t 3d public/default</pre><p>7. Create topic2 in public/default and write 100 messages to this topic.</p><p><strong>Input</strong></p><pre>bin/pulsar-client produce -m &#39;hello&#39; -n 100 topic2</pre><p><strong>Output</strong></p><p>The data is written successfully.</p><pre>...<br>16:06:59.448 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 100 messages successfully produced</pre><p>8. Check the information about topic2.</p><p><strong>Input</strong></p><pre>bin/pulsar-admin topics stats-internal topic2</pre><p><strong>Output</strong></p><p>The data stored on topic2 is saved in the ledger with ledgerId 7.</p><pre>{<br>  &quot;entriesAddedCounter&quot; : 100,<br>  &quot;numberOfEntries&quot; : 100,<br>  &quot;totalSize&quot; : 5400,<br>  &quot;currentLedgerEntries&quot; : 100,<br>  &quot;currentLedgerSize&quot; : 5400,<br>  &quot;lastLedgerCreatedTimestamp&quot; : &quot;2021-05-11T16:06:59.058+08:00&quot;,<br>  &quot;waitingCursorsCount&quot; : 0,<br>  &quot;pendingAddEntriesCount&quot; : 0,<br>  &quot;lastConfirmedEntry&quot; : &quot;7:99&quot;,<br>  &quot;state&quot; : &quot;LedgerOpened&quot;,<br>  &quot;ledgers&quot; : [ {<br>    &quot;ledgerId&quot; : 7,<br>    &quot;entries&quot; : 0,<br>    &quot;size&quot; : 0,<br>    &quot;offloaded&quot; : false<br>  } ],<br>  &quot;cursors&quot; : { },<br>  &quot;compactedLedger&quot; : {<br>    &quot;ledgerId&quot; : -1,<br>    &quot;entries&quot; : -1,<br>    &quot;size&quot; : -1,<br>    &quot;offloaded&quot; : false<br>  }<br>}</pre><p>9. Check where the ledger with ledgerId 7 is saved.</p><p><strong>Input</strong></p><pre>bin/bookkeeper shell ledgermetadata -ledgerid 7</pre><p><strong>Output</strong></p><p>The ledger with ledgerId 7 is saved on bookie1 (3181).</p><pre>...<br>16:11:28.843 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - ledgerID: 7<br>16:11:28.846 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=1, writeQuorumSize=1, ackQuorumSize=1, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[192.168.0.105:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90b3BpYzM=, application=base64:cHVsc2Fy}}<br>...</pre><h3>Conclusion</h3><p>This is the second blog in the series on configuring isolation in Apache Pulsar. Now you should now know how to:</p><ol><li>Deploy two separate Pulsar clusters</li><li>Verify data isolation of clusters</li><li>Synchronize and migrate data between clusters</li><li>Scale up and down nodes (brokers and bookies)</li></ol><p>The next blog will discuss how to configure Pulsar isolation in a shared BookKeeper cluster. Coming soon!</p><h3>Further reading</h3><p>If you’re interested in Pulsar isolation policy, feel free to check the following resources out!</p><ul><li>For beginners: <a href="https://pulsar.apache.org/docs/en/next/administration-isolation/">Pulsar Isolation Policy — User Guide</a></li><li>For advanced users: <a href="https://streamnative.io/en/blog/tech/2021-03-02-taking-an-in-depth-look-at-how-to-achieve-isolation-in-pulsar">Taking an In-Depth Look at How to Achieve Isolation in Pulsar</a></li></ul><h3>About the Author</h3><p><strong>Ran Gao</strong> is a software engineer at StreamNative. Before that, he was responsible for the development of search service at Zhaopin.com. Prior to that, he worked on the development of the logistics system at JD Logistics. Being interested in open source and messaging systems, Ran is an Apache Pulsar committer. You can follow him on <a href="https://twitter.com/leon_ran_10">twitter</a>.</p><p><strong>Yu Liu</strong> is an Apache Pulsar committer and a content strategist from StreamNative. You can follow her on <a href="https://twitter.com/Anonymitaet1">twitter</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ffa790d36ea5" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/pulsar-isolation-for-dummies-separate-pulsar-clusters-ffa790d36ea5">Pulsar Isolation for Dummies: Separate Pulsar Clusters</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[What’s New in Apache Pulsar 2.6.4]]></title>
            <link>https://medium.com/streamnative/whats-new-in-apache-pulsar-2-6-4-af3ce95119ba?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/af3ce95119ba</guid>
            <category><![CDATA[release-notes]]></category>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[bug-fixes]]></category>
            <category><![CDATA[enhancement]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Thu, 10 Jun 2021 16:02:24 GMT</pubDate>
            <atom:updated>2021-06-10T16:02:23.089Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/855/0*a9JpmUjwdRZJvlt-.jpg" /></figure><h3>What’s New in Apache Pulsar 2.6.4</h3><p>We are excited to see the Apache Pulsar community has successfully released the 2.6.4 version! 10 contributors provided improvements and bug fixes that contributed to 16 PRs.</p><p>Highlights:</p><ul><li>Broker no longer delivers old messages after a topic is closed (<a href="https://github.com/apache/pulsar/pull/8634">#8634</a>)</li><li>AWS credentials are refreshed after expiry (<a href="https://github.com/apache/pulsar/pull/9387">#9387</a>)</li><li>Pulsar identifies when individual message deletes cause an unsynced cursor (<a href="https://github.com/apache/pulsar/pull/9732">#9732</a>)</li></ul><p>This blog walks through the most noteworthy changes. For the complete list including all enhancements and bug fixes, check out the <a href="http://pulsar.apache.org/release-notes/#264-mdash-2021-06-02-a-id264a">Pulsar 2.6.4 Release Notes</a>.</p><h3>Notable enhancement</h3><h4>C++ client</h4><p><strong>C++ client supports multiple topic subscriptions across multiple namespaces (</strong><a href="https://github.com/apache/pulsar/pull/9520"><strong>#9520</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, you could not subscribe to different topics on different namespaces.</p><p><strong>Resolution</strong></p><ul><li>Move the check for namespace in MultiTopicsConsumerImpl to PatternMultiTopicsConsumerImpl that uses a regex subscription.</li><li>Fix the existing tests for subscriptions on topics across different namespaces.</li></ul><h3>Notable bug fix</h3><h4>Broker</h4><p><strong>Pulsar guarantees security for clients using JWT (</strong><a href="https://github.com/apache/pulsar/pull/9172"><strong>#9172</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, it was possible for attackers to connect to Pulsar instances because the signature of the JWT was not validated when the token was set to none.</p><p><strong>Resolution</strong></p><p>Modified JWT to use parseClaimsJws instead of parse to get the token objects. Now, parseClaimsJws guarantees the correct security model for parsing signed JWTs.</p><p><strong>Pulsar identifies when individual message deletes cause an unsynced cursor (</strong><a href="https://github.com/apache/pulsar/pull/9732"><strong>#9732</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, cursors were not being flushed when acknowledgements caused a dirty cursor. Instead of deleting the acknowledged messages, messages were redelivered.</p><p><strong>Resolution</strong></p><p>Fixed code to mark the individual acknowledgements and automatically trigger the flush of dirty cursors.</p><p><strong>Pulsar can expire a range of messages (#9083)</strong></p><p><strong>Issue</strong></p><p>Previously, only a single message expired after an expiry check. As a result, many expired messages remained in a subscription and were delivered to consumers after the expiry time.</p><p><strong>Resolution</strong></p><p>Modified OpFindNewest to jump to a valid position, which allows PersistentMessageExpiryMonitor to find the best range of messages to expire.</p><p><strong>Pulsar allows manual (forced) topic deletion after removing non-durable subscriptions (#7356)</strong></p><p><strong>Issue</strong></p><p>Previously, during the removal of non-durable subscriptions, there was a race condition that left a topic in a state where you could not delete it until it was unloaded or reloaded.</p><p><strong>Resolution</strong></p><p>Fixed the race condition by setting the topic fence before performing any delete operations and reverting the topic state after the delete operations.</p><p><strong>Broker no longer delivers old messages after a topic is closed (</strong><a href="https://github.com/apache/pulsar/pull/8634"><strong>#8634</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, it was possible to re-deliver very old messages if a topic was not gracefully closed. The cursor rolled back to the last persisted position and triggered the re-delivery of those messages.</p><p><strong>Resolution</strong></p><p>Fixed the redelivery of messages by setting a time-bound period after which all cursor updates are flushed on the disk.</p><p><strong>Batch index acknowledgement data is no longer persisted (</strong><a href="https://github.com/apache/pulsar/pull/9504"><strong>#9504</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, the batch index acknowledgement data persisted because batchDeletedIndexInfoBuilder generated the batch index acknowledgement data but did not clear the current set before adding the delete set.</p><p><strong>Resolution</strong></p><p>Fixed by clearing the delete set before adding a new delete set.</p><p><strong>Closed ledger deletes after expiration (</strong><a href="https://github.com/apache/pulsar/pull/9136"><strong>#9136</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, a closed ledger (with no incoming traffic) could fail to delete after expiring because the read position of the cursor still points to the last entry of the closed ledger.</p><p><strong>Resolution</strong></p><p>Updated behavior when closing the current ledger. Now, when the cursor’s mark-delete position points to the last entry of the current ledger, the read position is moved to the newly created ledger.</p><h4>Tiered storage</h4><p><strong>AWS credentials are refreshed after expiry (</strong><a href="https://github.com/apache/pulsar/pull/9387"><strong>#9387</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, expired AWS credentials were reused. With the refactor of Azure support, a regression occurred where the AWS credentials are fetched once and then used through the entire process.</p><p><strong>Resolution</strong></p><p>The AWS credential provider chain takes care of the credential refresh. When integrating with JClouds, you still need to return a new set of credentials each time.</p><h4>Java client</h4><p><strong>Compression applied during schema preparation (</strong><a href="https://github.com/apache/pulsar/pull/9396"><strong>#9396</strong></a><strong>)</strong></p><p><strong>Issue</strong></p><p>Previously, compression was not applied during deferred schema preparation and the consumer could receive an uncompressed message and then fail.</p><p><strong>Resolution</strong></p><p>Fixed by enforcing compression during the schema preparation.</p><h3>Get involved</h3><p>To get started, you can <a href="https://pulsar.apache.org/en/download/">download Pulsar directly</a> or you can spin up for a Pulsar cluster on StreamNative Cloud with a free 30-day trial of <a href="https://auth.streamnative.cloud/login?state=hKFo2SBVeG81YTFiSWUtdDhhQkgtd19LdWhWYm9jUng4NGpua6FupWxvZ2luo3RpZNkgVHh1bFN0bHozeEFpeDR5QlNGMnlWM19oUHpwcTlvSk2jY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&amp;client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&amp;protocol=oauth2&amp;audience=https%3A%2F%2Fapi.streamnative.cloud&amp;redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&amp;defaultMethod=singup&amp;scope=openid%20profile%20email%20offline_access&amp;response_type=code&amp;response_mode=query&amp;nonce=VDRWNG5rYVhpcWZJYTdOWlF4Q1BDeENxcFZKQlFneU9VYlllRzdTdXF4UQ%3D%3D&amp;code_challenge=W__xPbFyDLkHTgO8p7DmrT84cHkZC3RvLsr3iE438sQ&amp;code_challenge_method=S256&amp;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9">StreamNative Cloud</a> in which Pulsar 2.6.4 changes are shipped! Moreover, we offer technical consulting and expert training to help get your organization started. As always, we are highly responsive to your feedback. Feel free to <a href="https://streamnative.io/en/contact">contact us</a> if you have any questions at any time. Look forward to hearing from you and stay tuned for the next Pulsar release!</p><h3>About the Author</h3><p><strong>Yong Zhang</strong> is an Apache Pulsar committer. He works as a software engineer at StreamNative.</p><p><strong>Yu Liu</strong> is an Apache Pulsar committer and a content strategist from StreamNative. You can follow her on <a href="https://twitter.com/Anonymitaet1">twitter</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=af3ce95119ba" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/whats-new-in-apache-pulsar-2-6-4-af3ce95119ba">What’s New in Apache Pulsar 2.6.4</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Function Mesh — Simplify Complex Streaming Jobs in Cloud]]></title>
            <link>https://medium.com/streamnative/function-mesh-simplify-complex-streaming-jobs-in-cloud-a9d95c19c371?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/a9d95c19c371</guid>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[open-source]]></category>
            <category><![CDATA[pulsar-functions]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Thu, 10 Jun 2021 16:02:24 GMT</pubDate>
            <atom:updated>2021-06-10T16:02:22.898Z</atom:updated>
            <content:encoded><![CDATA[<h3>Function Mesh — Simplify Complex Streaming Jobs in Cloud</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*IW5YWRNh1vQca_-W.jpg" /></figure><p>Today, we are excited to introduce Function Mesh, a serverless framework purpose-built for event streaming applications. It brings powerful event-streaming capabilities to your applications by orchestrating multiple <a href="http://pulsar.apache.org/docs/en/next/functions-overview/">Pulsar Functions</a> and <a href="http://pulsar.apache.org/docs/en/next/io-overview/">Pulsar IO connector</a> for complex event streaming jobs on Kubernetes.</p><h3>What is Function Mesh</h3><p>Function Mesh is a Kubernetes operator that enables users to run Pulsar Functions and connectors natively on Kubernetes, unlocking the full power of Kubernetes’ application deployment, scaling, and management. For example, Function Mesh relies on Kubernetes’ scheduling functionality, which ensures that functions are resilient to failures and can be scheduled properly at any time.</p><p>Function Mesh is also a serverless framework to orchestrate multiple Pulsar Functions and I/O connectors for complex streaming jobs in a simple way. If you’re seeking cloud-native serverless streaming solutions, Function Mesh is an ideal tool for you. The key benefits of Function Mesh include:</p><ul><li>Eases the management of Pulsar Functions and connectors when you run multiple instances of Functions and connectors together.</li><li>Utilizes the full power of Kubernetes Scheduler, including deployment, scaling and management, to manage and scale Pulsar Functions and connectors.</li><li>Makes Pulsar Functions and connectors run natively in the cloud environment, which leads to greater possibilities when more resources become available in the cloud.</li><li>Enables Pulsar Functions to work with different messaging systems and to integrate with existing tools in the cloud environment (Function Mesh runs Pulsar Functions and connectors independently from Pulsar).</li></ul><p>Function Mesh is well-suited for common, lightweight streaming use cases, such as ETL jobs, and is not intended to be used as a full-power streaming engine.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*tvyw-mmjB39vbF__.png" /></figure><h3>Why Function Mesh</h3><p>Pulsar introduces Pulsar Functions and Pulsar I/O since its 2.0 release.</p><p><a href="http://pulsar.apache.org/docs/en/next/functions-overview/">Pulsar Functions</a> is a turnkey serverless event streaming framework built natively for Apache Pulsar. Pulsar Functions enables users to create event processing logic on a per message basis and bring simplicity and serverless concepts to event streaming, thus eliminating the need to deploy a separate system. Popular use cases of Pulsar Functions include ETL jobs, real-time aggregation, microservices, reactive services, event routing, and more.</p><p><a href="http://pulsar.apache.org/docs/en/next/io-overview/">Pulsar IO connector</a> is a framework that allows you to ingress or egress data from and to Pulsar using the existing Pulsar Functions framework. Pulsar IO consist of source and sink connectors. A source is an event processor that ingests data from an external system into Pulsar, and a sink is an event processor that egresses data from Pulsar to an external system.</p><p>Both Pulsar Functions and Pulsar I/O have made building event streaming applications become simpler. Pulsar Functions supports running functions and connectors on Kubernetes. However the existing implementation has a few drawbacks:</p><ol><li>The function metadata is stored in Pulsar and the function running state is managed by Kubernetes. This results in inconsistency between metadata and running state, which makes the management become complicated and problematic. For example, the StatefulSet running Pulsar Functions can be deleted from Kuberentes while Pulsar isn’t aware of it.</li><li>The existing implementation uses Pulsar topics for storing function metadata. It can cause broker crash loops if the function metadata topics are temperaily not available.</li><li>Functions are tied to a specific Pulsar cluster, making it difficult to use functions across multiple Pulsar clusters.</li><li>The existing implementation makes it hard for users deploying Pulsar Functions on Kuberentes to implement certain features, such as auto-scaling.</li></ol><p>Additionally, with the increased adoption of Pulsar Functions and Pulsar I/O connectors for building serverless event streaming applications, people are looking for orchestrating multiple functions into a single streaming job to achieve complex event streaming capabilities. Without Function Mesh, there is a lot of manual work to organize and manage multiple functions to process events.</p><p>To solve the pain points and make Pulsar Functions Kubernetes-native, we developed Function Mesh — a serverless framework purpose-built for running Pulsar Functions and connectors natively on Kubernetes, and for simplifying building complex event streaming jobs.</p><h3>Core Concepts</h3><p>Function Mesh enables you to build event streaming applications leveraging your familiarity with Apache Pulsar and modern stream processing technologies. Three concepts are foundational to build an event streaming applications: streams, functions, and connectors.</p><h4>Stream</h4><p>A stream is a partitioned, immutable, append-only sequence for events that represents a series of historical facts. For example, the events of a stream could model a sequence of financial transactions, like “Jack sent $100 to Alice”, followed by “Alice sent $50 to Bob”. A stream is used for connecting functions and connectors. The streams in Function Mesh are implemented by using topics in Apache Pulsar.</p><h4>Function</h4><p>A function is a lightweight event processor that consumes messages from one or more input streams, applies a user-supplied processing logic to one or multiple messages, and produces the results of the processing logic to another stream. The functions in Function Mesh are implemented based on Pulsar Functions.</p><h4>Connector</h4><p>A connector is an event processor that ingresses or egresses events from and to streams. There are two types of connectors in Functions Mesh:</p><ul><li>Source Connector (aka Source): an event processor that ingests events from an external data system into a stream.</li><li>Sink Connector (aka Sink): an event processor that egresses events from streams to an external data system.</li></ul><p>The connectors in Function Mesh are implemented based on Pulsar IO connectors. The available Pulsar IO connectors can be found at <a href="https://hub.streamnative.io/">StreamNative Hub</a>.</p><h4>FunctionMesh</h4><p>A FunctionMesh (aka Mesh) is a collection (can be either a Directed Acyclic Graph (DAG) or a cyclic graph) of functions and connectors connected by streams that are orchestrated together for achieving powerful stream processing logics. All the functions and connectors in a Mesh share the same lifecycle. They are started when a Mesh is created and terminated when the mesh is destroyed. All the functions and connectors are long running processes. They are auto-scaled based on the workload by Function Mesh.</p><h3>How Function Mesh works</h3><p>Function Mesh APIs build on existing Kubernetes APIs, so that Function Mesh resources are compatible with other Kubernetes-native resources, and can be managed by cluster administrators using existing Kubernetes tools. The foundational concepts are delivered as Kubernetes Custom Resource Definitions (CRDs), which can be configured by a cluster administrator for developing event streaming applications.</p><p>Instead of using the Pulsar admin CLI tool to send function admin requests to Pulsar clusters, you now can use kubectl to submit a Function Mesh CRD manifest directly to Kubernetes clusters. The Function Mesh controller watches the CRD and creates Kubernetes resources to run the defined Function/Source/Sink, or Mesh. The benefit of this approach is both the function metadata and function running state are directly stored and managed by Kubernetes to avoid the inconsistency problem that was seen in Pulsar’s existing approach.</p><p>The following diagram illustrates a typical user flow of Function Mesh.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*Jn5VY6tVOmLytv5D.png" /></figure><h4>Function Mesh Internals</h4><p>Function Mesh mainly consists of two components. One is a Kubernetes operator that watches Function Mesh CRDs and creates Kuberentes resources (i.e. StatefulSet) to run functions, connectors, and meshes on Kubernetes; while the other one is a Function Runner that invokes the functions and connectors logic when receiving events from input streams and produces the results to output streams. The Runner is currently implemented using Pulsar Functions runner.</p><p>The below diagram illustrates the overall architecture of Function Mesh. When a user creates a Function Mesh CRD, the controller receives the submitted CRD from Kubernetes API server. The controller processes the CRD and generates the corresponding Kubernetes resources. For example, when the controller processes the Function CRD, it creates a StatefulSet to run the function. Each pod of this function StatefulSet launches a Runner to invoke the function logic.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*blKmmVlpYgo-PnSR.png" /></figure><h3>How to use Function Mesh</h3><p>To use Function Mesh, you need to install Function Mesh operator and CRD into the Kubernetes cluster first. For more details about installation, refer to <a href="https://functionmesh.io/docs/install-function-mesh/">installation guide</a>.</p><p>After installing the Function Mesh operator and deploying a Pulsar cluster, you need to package your functions/connectors, define CRDs for functions, connectors and Function Mesh, and then submit the CRDs to the Kubernetes cluster with the following command.</p><pre>$ kubectl apply -f /path/to/custom-crd.yaml</pre><p>Once your Kubernetes cluster receives the CRD, the Function Mesh operator will schedule individual parts and run the functions as a stateful set with other necessary resource objects.</p><p>Below we illustrate how to run Functions, Connectors and Meshes respectively with some examples.</p><h4>How to run functions using Function Mesh</h4><p>Function Mesh does not change how you develop Pulsar Functions to run in the cloud. The submission process just switches from a pulsar-admin client tool to a yaml file. Behind the scenes, we developed the CRD resources for Pulsar Function and the controller to handle it properly.</p><p>After developing and testing your function, you need to package it and then submit it to a Pulsar cluster or build it as a Docker image and upload it to the image registry. For details, refer to <a href="https://functionmesh.io/docs/functions/run-function">run Pulsar Functions using Function Mesh</a>.</p><p>This following example for Function CRD launches an ExclamationFunction inside Kubernetes and enables auto-scaling, and it uses a Java runtime to talk to the Pulsar messaging system.</p><pre>apiVersion: compute.functionmesh.io/v1alpha1<br>kind: Function<br>metadata:<br>  name: function-sample<br>  namespace: default<br>spec:<br>  className: org.apache.pulsar.functions.api.examples.ExclamationFunction<br>  replicas: 1<br>  maxReplicas: 5<br>  image: streamnative/function-mesh-example:latest<br>  logTopic: persistent:<em>//public/default/logging-function-logs</em><br>  input:<br>    topics:<br>    - persistent:<em>//public/default/source-topic</em><br>    typeClassName: java.lang.String<br>  output:<br>    topic: persistent:<em>//public/default/sink-topic</em><br>    typeClassName: java.lang.String<br>  resources:<br>    requests:<br>      cpu: &quot;0.1&quot;<br>      memory: 1G<br>    limits:<br>      cpu: &quot;0.2&quot;<br>      memory: 1.1G<br>  pulsar:<br>    pulsarConfig: &quot;test-pulsar&quot;<br>  java:<br>    jar:  &quot;/pulsar/examples/api-examples.jar&quot;</pre><h4>How to run connectors using Function Mesh</h4><p>Source and sink are specialized functions. If you use Pulsar built-in or StreamNative-managed connectors, you can create them by specifying the Docker image in the source or sink CRDs. These Docker images are public at the Docker Hub, with the image name in a format of streamnative/pulsar-io-CONNECTOR-NAME:TAG, such as streamnative/pulsar-io-hbase:2.7.1. You can check all supported connectors in the <a href="https://hub.streamnative.io/">StreamNative Hub</a>.</p><p>If you use self-built connectors, you can package them to an external package or to a docker image, upload the package and then submit the connectors through CDRs. For details, refer to <a href="https://functionmesh.io/docs/connectors/run-connector">run Pulsar connectors using Function Mesh</a>.</p><p>In the following CRD YAML files for source and sink, the connectors receive the input from DebeziumMongoDB and send the output to ElasticSearch.</p><p>Define the CRD yaml file for source:</p><pre>apiVersion: compute.functionmesh.io/v1alpha1<br>kind: Source<br>metadata:<br>  name: source-sample<br>spec:<br>  image: streamnative/pulsar-io-debezium-mongodb:2.7.1<br>  className: org.apache.pulsar.io.debezium.mongodb.DebeziumMongoDbSource<br>  replicas: 1<br>  output:<br>    topic: persistent:<em>//public/default/destination</em><br>    typeClassName: org.apache.pulsar.common.schema.KeyValue<br>  sourceConfig:<br>    mongodb.hosts: rs0/mongo-dbz-0.mongo.default.svc.cluster.local:27017,rs0/mongo-dbz-1.mongo.default.svc.cluster.local:27017,rs0/mongo-dbz-2.mongo.default.svc.cluster.local:27017<br>    mongodb.name: dbserver1<br>    mongodb.user: debezium<br>    mongodb.password: dbz<br>    mongodb.task.id: &quot;1&quot;<br>    database.whitelist: inventory<br>    pulsar.service.url: pulsar:<em>//test-pulsar-broker.default.svc.cluster.local:6650</em><br>  pulsar:<br>    pulsarConfig: &quot;test-source&quot;<br>  java:<br>    jar: connectors/pulsar-io-debezium-mongodb-2.7.1.nar<br>    jarLocation: &quot;&quot; # use pulsar provided connectors</pre><p>Define the CRD yaml file for sink:</p><pre>apiVersion: compute.functionmesh.io/v1alpha1<br>kind: Sink<br>metadata:<br>  name: sink-sample<br>spec:<br>  image: streamnative/pulsar-io-elastic-search:2.7.1<br>  className: org.apache.pulsar.io.elasticsearch.ElasticSearchSink<br>  replicas: 1<br>  input:<br>    topics:<br>    - persistent:<em>//public/default/input</em><br>    typeClassName: &quot;[B&quot;<br>  sinkConfig:<br>    elasticSearchUrl: &quot;http://quickstart-es-http.default.svc.cluster.local:9200&quot;<br>    indexName: &quot;my_index&quot;<br>    typeName: &quot;doc&quot;<br>    username: &quot;elastic&quot;<br>    password: &quot;X2Mq33FMWMnqlhvw598Z8562&quot;<br>  pulsar:<br>    pulsarConfig: &quot;test-sink&quot;<br>  java:<br>    jar: connectors/pulsar-io-elastic-search-2.7.1.nar<br>    jarLocation: &quot;&quot; # use pulsar provided connectors</pre><h4>How to Run Function Mesh on Kubernetes</h4><p>A FunctionMesh orchestrates functions, sources and sinks together and manages them as a whole. The FunctionMesh CRD has a list of fields for functions, sources and sinks and you can connect them together through the topics field. Once the YAML file is submitted, the FunctionMesh controller will reconcile it into multiple function/source/sink resources and delegate each of them to corresponding controllers. The function/source/sink controllers reconcile each task and launch corresponding sub-components. The FunctionMesh controller collects the status of each component from the system and aggregates them in its own status field.</p><p>The following FunctionMesh job example launches two functions and streams the input through the two functions to append exclamation marks.</p><pre>apiVersion: compute.functionmesh.io/v1alpha1<br>kind: FunctionMesh<br>metadata:<br>  name: mesh-sample<br>spec:<br>  functions:<br>    - name: ex1<br>      className: org.apache.pulsar.functions.api.examples.ExclamationFunction<br>      replicas: 1<br>      maxReplicas: 5<br>      input:<br>        topics:<br>          - persistent:<em>//public/default/source-topic</em><br>        typeClassName: java.lang.String<br>      output:<br>        topic: persistent:<em>//public/default/mid-topic</em><br>        typeClassName: java.lang.String<br>      pulsar:<br>        pulsarConfig: &quot;mesh-test-pulsar&quot;<br>      java:<br>        jar: pulsar-functions-api-examples.jar<br>        jarLocation: public/default/test<br>   - name: ex2<br>      className: org.apache.pulsar.functions.api.examples.ExclamationFunction<br>      replicas: 1<br>      maxReplicas: 3<br>      input:<br>        topics:<br>          - persistent:<em>//public/default/mid-topic</em><br>        typeClassName: java.lang.String<br>      output:<br>        topic: persistent:<em>//public/default/sink-topic</em><br>        typeClassName: java.lang.String<br>      pulsar:<br>        pulsarConfig: &quot;mesh-test-pulsar&quot;<br>      java:<br>        jar: pulsar-functions-api-examples.jar<br>        jarLocation: public/default/test</pre><p>The output topic and input topic of the two functions are the same, so that one can publish the result into this topic and the other can fetch the data from that topic.</p><h4>Work with pulsar-admin CLI tool</h4><p>If you want to use Function Mesh and do not want to change the way you create and submit functions, you can use Function Mesh worker service. It is similar to Pulsar Functions worker service but uses Function Mesh to schedule and run functions. Function Mesh worker service enables you to use the <a href="https://pulsar.apache.org/docs/en/pulsar-admin/">pulsar-admin</a> CLI tool to manage Pulsar Functions and connectors in Function Mesh. The following figure illustrates how Function Mesh worker service works with Pulsar proxy, converts and forwards requests to the Kubernetes cluster.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/0*zePXs58hAvRws8wL.png" /></figure><p>For details about the usage, you can refer to <a href="https://functionmesh.io/docs/install-function-mesh/#work-with-pulsar-admin-cli-tool">work with </a><a href="https://functionmesh.io/docs/install-function-mesh/#work-with-pulsar-admin-cli-tool">pulsar-admin CLI tool</a>.</p><h4>Migrate Pulsar Functions to Function Mesh</h4><p>If you run Pulsar Functions using the existing Kubernetes runtime and want to migrate them to Function Mesh, Function Mesh provides you a tool to generate a list of CRDs of your existing functions. You can then apply these CRDs to ask Function Mesh to take over the ownership of managing the running Pulsar Functions on Kubernetes. For details, refer to <a href="https://functionmesh.io/docs/migration/migrate-function">migration Pulsar Functions guide</a>.</p><h4>Supported Features</h4><p>Currently, Function Mesh supports the following features:</p><ul><li>Running Pulsar Functions and connectors natively in Kubernetes.</li><li>Orchestrating multiple Pulsar Functions and connectors as a streaming job.</li><li>Compatibility with original Pulsar Admin API for submitting Functions and connectors.</li><li>Auto-scaling instances for functions and connectors using Horizontal Pod Autoscaler.</li><li>Authentication and authorization.</li><li>Multiple runtimes with Java, Python, and Golang support.</li><li>Schema and SerDe.</li><li>Resource limitation.</li></ul><h3>Future Plans</h3><p>We plan to enable the following features in the upcoming releases, if you have any ideas or would like to contribute to it, feel free to contact us.</p><ul><li>Improve the capability level of the Function Mesh operator.</li><li>Feature parity with Pulsar Functions, such as stateful function.</li><li>Support additional runtime based on self-contained function runtime, such as web-assembly.</li><li>Develop better tools/frontend to manage and inspect Function Meshes.</li><li>Group individual functions together to improve latency and reduce cost.</li><li>Support advanced auto-scaling based on Pulsar metrics.</li><li>Integrate function registry with <a href="http://pulsar.apache.org/docs/en/next/admin-api-packages/">Apache Pulsar Packages</a>.</li></ul><h3>Try Function Mesh Now</h3><p>Function Mesh is now <a href="https://github.com/streamnative/function-mesh">open source</a>, try it on your Kubernetes clusters today!</p><p>To learn more about Function Mesh, <a href="https://functionmesh.io/docs/">read the docs</a> and <a href="https://youtu.be/u_9YDM44fMw">watch a live demo</a>.</p><p>If you have any feedback or suggestions for this project, feel free to <a href="mailto:function-mesh@streamnative.io">contact us</a> or open issues in the <a href="https://github.com/streamnative/function-mesh">GitHub repo</a>. Any feedback is highly appreciated.</p><h3>About the Author</h3><p><strong>Neng Lu</strong> is a staff software engineer at StreamNative where he drives the development of Apache Pulsar and the integrations with big data ecosystem. Before that, he was a senior software engineer at Twitter. He was the core committer to the Heron project and the leading engineer for Heron development at Twitter. He also worked on Twitter’s monitoring and key-value storage systems. Before joining Twitter, he got his master’s degree from UCLA and a bachelor’s degree from Zhejiang University. You can follow him on <a href="https://twitter.com/nlu90">twitter</a>. You can follow him on <a href="https://www.linkedin.com/in/neng-lu-002b4a27/">linkedin</a>.</p><p><strong>Rui Fu</strong> is a software engineer at StreamNative. Before joining StreamNative, he was a platform engineer at the Energy Internet Research Institute of Tsinghua University. He was leading and focused on stream data processing and IoT platform development at Energy Internet Research Institute. Rui received his postgraduate degree from HKUST and an undergraduate degree from The University of Sheffield. You can follow him on <a href="https://www.linkedin.com/in/ruifu1/">linkedin</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a9d95c19c371" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/function-mesh-simplify-complex-streaming-jobs-in-cloud-a9d95c19c371">Function Mesh — Simplify Complex Streaming Jobs in Cloud</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building Connectors On Pulsar Made Simple]]></title>
            <link>https://medium.com/streamnative/building-connectors-on-pulsar-made-simple-af534685fd54?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/af534685fd54</guid>
            <category><![CDATA[simple]]></category>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[steamnative-hub]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[connector]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Fri, 04 Jun 2021 16:01:10 GMT</pubDate>
            <atom:updated>2021-06-04T16:01:08.788Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/855/0*GLszlVjWown_8Nuu.jpg" /></figure><p>New updates in StreamNative Hub make developing and using Pulsar connectors even easier! You can also expect more connectors to be deployed and adopted on more cloud providers with GUI tools!</p><h3>Why Pulsar Connectors?</h3><p><a href="https://pulsar.apache.org/docs/en/next/io-overview/">Pulsar connectors</a> enable Pulsar to quickly and easily integrate with various external systems. In fact, according to the 2021 Pulsar User Survey Report (which will be published later this month), connectors are one of the most-used Pulsar features with 30% of Pulsar users using connectors.</p><p>To facilitate connector development and improve their ease of use, we launched <a href="https://streamnative.io/en/blog/tech/2020-05-26-intro-to-hub">StreamNative Hub</a> in 2020 to provide a single place to find, download, use, store, and share Pulsar related extensions, and offer a broad spectrum of Pulsar integrations. Since its launch last year, dozens of connectors have been created and added to the Hub. Some popular Pulsar plugins on StreamNative Hub include <a href="https://streamnative.io/en/blog/tech/2021-03-17-announcing-aws-sqs-connector-for-apache-pulsar">AWS SQS connector</a>, <a href="https://github.com/streamnative/pulsar-io-aws-lambda">AWS Lambda connector</a>, <a href="https://streamnative.io/en/blog/tech/2021-04-26-announcing-amqp10-connector-for-apache-pulsar">AMQP1_0 connector</a>, <a href="https://github.com/streamnative/pulsar-io-iotdb">IoTDB connector</a>, and more.</p><p>In this blog, we introduce recent updates that make developing and using Pulsar connectors even easier.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*UV9D8N2-44aeuL6z.png" /></figure><h3>About StreamNative Hub</h3><p><a href="https://streamnative.io/en/blog/tech/2020-05-26-intro-to-hub">StreamNative Hub</a> is an app store for developing event streaming applications and provides dozens of plugins and integrations. Its key components include:</p><ul><li>Connectors: Allow you to move streaming data in and out of Pulsar, which simplifies integration for enterprises bringing Pulsar into their existing infrastructure. All Pulsar built-in connectors are shipped in the StreamNative Hub.</li><li>Offloader: Allow you to offload the majority of the data from BookKeeper to external remote storage, which provides a cheaper form of storage that readily scales with the volume of data.</li><li>Protocol handler: Allow you to support other messaging protocols natively and dynamically in Pulsar brokers on runtime, which streamlines operations with Pulsar’s enterprise-grade features without modifying code. Kafka, AMQP, and MQTT are supported.</li></ul><p>As more and more members have contributed and used connectors, we’ve identified some opportunities to improve the Hub’s ease of use, read on to learn more.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*GdsbTtFx9i1UxLLa.png" /></figure><h3>New Pulsar Connector Development Guide</h3><p>To simplify the integration between Pulsar and external systems, we created a new development guide, <a href="https://github.com/streamnative/pulsar-io-template/blob/master/README.md">Pulsar Connector Development Guide</a>, that developers can reference to improve productivity and boost efficiency when developing a connector. This guide helps with the following:</p><ul><li>Developing a New Connector</li><li>If you need to pipe data in or out of Pulsar and other systems that do not have a connector yet, you can read the <a href="https://github.com/streamnative/pulsar-io-template/blob/master/README.md">Pulsar Connector Development Guide</a>. It contains step-by-step guidelines for how to develop and contribute a connector to StreamNative Hub, including detailed instructions and various templates for both code and documentation.</li><li>Promoting Awareness and Usage of an Existing Connector</li><li>If you already developed a connector and want to make it available to the community, we recommend you host it in a public repository and show it on StreamNative Hub. You can host the connector repo at your desired location and then sync the documentation to StreamNative Hub using a simple script with just one line of code by following the instructions in the <a href="https://github.com/streamnative/pulsar-io-template/blob/master/README.md">Pulsar Connector Development Guide</a>.</li></ul><h3>Future StreamNative Hub Upgrades</h3><p>We are continuously looking for new ways to improve StreamNative Hub and we are working on additional upgrades, such as adding more comprehensive tests to improve the usability, reliability, and performance of connectors. You can also expect more connectors to be deployed and adopted on more cloud providers with GUI tools. Stay tuned!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Sn9SuQpnSf6LlwsZ.png" /></figure><h3>Contribute Your Connector</h3><p>If you develop connectors, we encourage you to add your connector to StreamNative Hub! In StreamNative Hub, your connector will get exposure to the widest possible audience and enjoy faster innovation cycles of development. You will also be contributing to a robust Pulsar ecosystem.</p><h3>Get Involved in the Pulsar Community</h3><p>In addition to adding a connector, there are more ways you can contribute, including:</p><ul><li>Improve documentation!</li><li>The documentation hosted at StreamNative Hub is open source. Feel free to submit or request changes (fix typos, add clarifications, and more).</li><li>Report bugs.</li><li>Review pull requests.</li><li>Provide feedback on proposed features, enhancements, or designs.</li><li>Suggest new features.</li><li>Answer questions in issues or channels.</li></ul><h3>Ready to Get Started?</h3><p>Start your journey with connectors now with the <a href="https://github.com/streamnative/pulsar-io-template/blob/master/README.md">Quick Start Guide!</a></p><p>Happy Connectoring!</p><h3>About the Author</h3><p><strong>Guangning E</strong> is an Apache Pulsar committer and the main contributor to Apache Pulsar IO and Apache Pulsar Manager. He works as a senior software engineer at StreamNative, where he specializes in cloud platform, cloud computing, and big data related fields.</p><p><strong>Yu Liu</strong> is an Apache Pulsar committer and a content strategist from StreamNative. You can follow her on <a href="https://twitter.com/Anonymitaet1">twitter</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=af534685fd54" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/building-connectors-on-pulsar-made-simple-af534685fd54">Building Connectors On Pulsar Made Simple</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[What’s New in Apache Pulsar 2.7.2]]></title>
            <link>https://medium.com/streamnative/whats-new-in-apache-pulsar-2-7-2-6cbf5fcb6a16?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/6cbf5fcb6a16</guid>
            <category><![CDATA[improvement]]></category>
            <category><![CDATA[bug-fixes]]></category>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[release-notes]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Thu, 27 May 2021 16:03:51 GMT</pubDate>
            <atom:updated>2021-05-27T16:03:49.513Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/855/0*QsCgZDj1ds8sVofy.jpg" /></figure><p>We are excited to see the Apache Pulsar community has successfully released the 2.7.2 version! More than 38 contributors provided improvements and bug fixes that contributed to 85 commits.</p><p>Highlights of this release are as below:</p><ul><li>Consumers are no longer blocked after receiving multiple retry messages in Docker.</li><li>Consumers can consume messages published in the topic stats when using the Key_Shared subscription type.</li></ul><p>This blog walks through the most noteworthy changes grouped by the key functionality. For the complete list, including all enhancements and bug fixes, check out the <a href="https://pulsar.apache.org/en/release-notes/">Pulsar 2.7.2 Release Notes</a>.</p><h3>Notable bug fix and enhancement</h3><p>Pulsar 2.7.2 has included the following changes for broker, bookie, proxy, Pulsar admin, Pulsar SQL, and clients.</p><h4>Broker</h4><ul><li>Fix NPEs and thread safety issues in PersistentReplicator. <a href="https://github.com/apache/pulsar/pull/9763">PR-9763</a><br>Previously, in a non-persistent topic with a key-shared subscription, messages were marked as published in the topic stats, but consumers did not consume them. This caused NullPointerExeptions (NPEs).<br>· Make cursor field volatile since the field is updated asynchronously in another thread.<br>· Remove the unnecessary synchronization on the openCursorAsync method since it is not needed.<br>· Add null checks before accessing the cursor field since statistics might be updated before the cursor is available.</li><li>Fix the issue of a message not dispatched for the Key_Shared subscription type in a non-persistent topic. <a href="https://github.com/apache/pulsar/pull/9826">PR-9826</a><br>Previously, In a non-persistent topic with a key-shared subscription, messages were marked as published in the topic stats, but consumers did not consume them. This PR fixes this issue.</li><li>Fix the issue of a consumer being blocked after receiving retry messages. <a href="https://github.com/apache/pulsar/pull/10078">PR-10078</a><br>Previously, in the Docker environment, if a consumer enabled the retry feature and set the retry topic in DeadLetterPolicy, the consumer was blocked after receiving multiple retry messages because the hasMessageAvailable check was set to false. This PR fixes this issue.</li><li>Fix the issue of schema not added when subscribing to an empty topic without schema. <a href="https://github.com/apache/pulsar/pull/9853">PR-9853</a><br>Previously, when a consumer with a schema subscribed to an empty topic without schema, the previous check used isActive, which only checked whether the topic could be deleted. However, it should check if there was any connected producer or consumer of this topic. For the previous implementation, even if a topic had no active producers or consumers, the topic&#39;s subscription list was not empty and isActive returned true. Then the consumer&#39;s schema was not attached to the topic and it threw an IncompatibleSchemaException. This PR changes to check if the topic has active producers or consumers instead of checking whether it can be deleted.</li><li>Fix the issue of schema type check when using the ALWAYS_COMPATIBLE strategy. <a href="https://github.com/apache/pulsar/pull/10367">PR-10367</a><br>This PR provides the following enhancements when using the ALWAYS_COMPATIBLE strategy for schema type check:<br>· For non-transitive strategy, it checks only schema type for the last schema.<br>· For transitive strategy, it checks all schema types.<br>· For getting schema by schema data, it considers different schema types.</li><li>Fix the issue of CPU 100% usage when deleting namespace. <a href="https://github.com/apache/pulsar/pull/10337">PR-10337</a><br>Previously, When deleting a namespace, the namespace Policies setting was marked as deleted, triggering the topic&#39;s onPoliciesUpdate and a read of the data of ZooKeeper’s Policies node as checkReplicationAndRetryOnFailure. Because the namespace was deleted, the ZooKeeper node no longer existed and the failure to read data triggered infinite retries. This PR fixes this issue by adding a method to check for non-deleted policies.</li></ul><h4>Bookie</h4><ul><li>Fallback to PULSAR_GC if BOOKIE_GC is not defined. <a href="https://github.com/apache/pulsar/pull/9621">PR-9621</a><br>This PR changes fallback from PULSAR_MEM to PULSAR_GC if BOOKIE_GC is not defined.</li><li>Fallback to PULSAR_EXTRA_OPTS if BOOKIE_EXTRA_OPTS is not defined. <a href="https://github.com/apache/pulsar/pull/10397">PR-10397</a><br>This PR defines that -Dio.netty.* does not pass the system properties if PULSAR_EXTRA_OPTS or BOOKIE_EXTRA_OPTS is set. This change ensures consistency with PULSAR_EXTRA_OPTS behavior and prevents duplicate properties. This PR also adds -Dio.netty.leakDetectionLevel=disabled (unless BOOKIE_EXTRA_OPTS is set) since PULSAR_EXTRA_OPTS does not include that setting by default.</li></ul><h4>Proxy</h4><ul><li>Fix authorization error while using proxy and Prefix subscription authentication mode. <a href="https://github.com/apache/pulsar/pull/10226">PR-10226</a><br>Previously, when using Pulsar proxy and Prefix subscription authentication mode, org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider#canConsumeAsync threw an exception, which caused the consumer error. This PR updates the org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider#allowTopicOperationAsync logic, checks isSuperUser first, and then returns isAuthorizedFuture.</li></ul><h4>Pulsar admin</h4><ul><li>Add get version command for Pulsar REST API, pulsar-admin, and pulsar-client. <a href="https://github.com/apache/pulsar/pull/9975">PR-9975</a></li></ul><h4>Pulsar SQL</h4><ul><li>Fix the issue of BKNoSuchLedgerExistsException. <a href="https://github.com/apache/pulsar/pull/9910">PR-9910</a><br>Previously, when using Pulsar SQL to query messages, BKNoSuchLedgerExistsException was thrown if the ZooKeeper ledger root directory was changed. This PR fixes this issue.</li></ul><h4>Client</h4><p>Pulsar 2.7.2 includes the following changes for Java, Python, C++, and WebSocket clients.</p><p><strong>Java</strong></p><ul><li>Fix the issue that ClientConfigurationData’s objects are not equal. <a href="https://github.com/apache/pulsar/pull/10091">PR-10091</a><br>This PR fixes this issue and reuses AuthenticationDisabled.INSTANCE as default instead of creating a new one.</li><li>Fix the issue of AutoConsumeSchema KeyValue encoding. <a href="https://github.com/apache/pulsar/pull/10089">PR-10089</a><br>This PR keeps the KeyValueEncodingType when auto-consuming a KeyValue schema.</li><li>Fix the error of OutOfMemoryError while using KeyValue&lt;GenericRecord, GenericRecord&gt;. <a href="https://github.com/apache/pulsar/pull/9981">PR-9981</a><br>Previously, a topic with schema KeyValue&lt;GenericRecord, GenericRecord&gt; could not be consumed due to a problem inHttpLookupService. The HttpLookupService downloaded the schema in JSON format but the KeyValue schema was expected to be encoded in binary form. This PR uses the existing utility functions to convert the JSON representation of the KeyValue schema to the desired format.</li><li>Fix the concurrency issue in the client’s producer epoch handling. <a href="https://github.com/apache/pulsar/pull/10436">PR-10436</a><br>This PR uses a volatile field for epoch and AtomicLongFieldUpdater for incrementing the value.</li><li>Handle NPE while receiving ack for a closed producer. <a href="https://github.com/apache/pulsar/pull/8979">PR-8979</a></li><li>Fix the issue of batch size not set when deserializing from a byte array. <a href="https://github.com/apache/pulsar/pull/9855">PR-9855</a><br>Previously, batch index message acknowledgment was added to the seek method to support more precise seek using ACK sets. However, when the seek was performed by a message that was serialized and deserialized, the batchSize was set to zero, which led to a discrepancy between messageId forms and seek results. This PR fixes this issue.</li><li>Fix the issue of a single-topic consumer being unable to close. <a href="https://github.com/apache/pulsar/pull/9849">PR-9849</a></li></ul><p><strong>Python</strong></p><ul><li>Support setting the default value when using Python Avro Schema. <a href="https://github.com/apache/pulsar/pull/10265">PR-10265</a><br>Previously, the default value for the Python Avro schema could not be set, causing the Python schema to not be updated. This PR fixes this issue and adds the following changes:<br>· Add the required field to control the type of schema that can set null.<br>· Add the required_default field to control the schema whether it has a default attribute or not.<br>· Add the default field to control the default value of the schema.</li><li>Fix the issue of nested Map or Array in schema does not work. <a href="https://github.com/apache/pulsar/pull/9548">PR-9548</a><br>Previously, the Python client did not handle nested Map or Array well, and the generated schema string was invalid. When the Map/Array&#39;s schema() method set the values field of the schema string, it ignored the Record type but not Map and Array. This PR fixes the issue and adds 4 tests for Map&lt;Map&gt;, Map&lt;Array&gt;, Array&lt;Array&gt;, and Array&lt;Map&gt; to cover all nested cases that involve Map or Array.</li><li>Add TLS SNI support for Python and C++ clients. <a href="https://github.com/apache/pulsar/pull/8957">PR-8957</a><br>This PR adds TLS SNI support for CPP and Python clients, so you can connect to brokers through the proxy.</li></ul><p><strong>C++</strong></p><ul><li>Fix the issue that the C++ client cannot be built on Windows. <a href="https://github.com/apache/pulsar/pull/10363">PR-10363</a><br>This PR puts PULSAR_PUBLIC before the variable type and keeps the LIB_NAME as the shared library&#39;s name (for example, removing the dll suffix).</li><li>Fix the issue of the paused zero queue consumer pre-fetches messages. <a href="https://github.com/apache/pulsar/pull/10036">PR-10036</a><br>Previously, zero queue consumers (the consumer’s receiver queue size is 0) pre-fetched messages after pauseMessageListener was called. This was because ConsumerImpl::increaseAvailablePermits did not check the boolean variable messageListenerRunning_, which became false after pauseMessageListener was called. Therefore, after the zero queue consumer was paused, it still sent the FLOW command to pre-fetch a message to its internal unbounded queue incomingMessages_. This PR fixes this issue and make the following changes:<br>· Add the check for messageListenerRunning_ in increaseAvailablePermits method and make the implementation consistent with Java client&#39;s ConsumerImpl#increaseAvailablePermits. Change the type of availablePermits_ to std::atomic_int.<br>· Add the increaseAvailablePermits invocation in resumeMessageListener to send FLOW command after consumer resumes since pauseMessageListener does not prefetch messages anymore.</li><li>Fix the issue of segmentation fault when getting a topic name from the received message ID. <a href="https://github.com/apache/pulsar/pull/10006">PR-10006</a><br>Previously, the C++ client supported getting a topic name from both the received message and its message ID. However, for a consumer that subscribed to a non-partitioned topic, getting a topic name from the received message ID caused a segmentation fault. This PR uses setTopicName for every single message when a consumer receives a batch and adds related tests for all types of consumers (including ConsumerImpl, MultiTopicsConsumerImpl, and PartitionedConsumerImpl).</li><li>Fix the issue of the SinglePartitionMessageRouter always picking the same partition. <a href="https://github.com/apache/pulsar/pull/9702">PR-9702</a><br>Previously, the SinglePartitionMessageRouter was supposed to pick a random partition for a given producer and stick with that. The problem was that the C rand() call always used the seed 0 and that ended up having multiple processes to always deterministically pick the same partition. This PR fixes this issue.</li><li>Reduce log level for an ack-grouping tracker. <a href="https://github.com/apache/pulsar/pull/10094">PR-10094</a><br>Previously, the warning log occurred when the ACK grouping tracker tried to send ACKs while the connection was closed. This PR changes the log level to debug when the connection is not ready for AckGroupingTrackerEnabled::flush.</li></ul><p><strong>WebSocket</strong></p><ul><li>Optimize URL token param value. <a href="https://github.com/apache/pulsar/pull/10187">PR-10187</a><br>This PR removes the Bearer prefix requirement for the token param value of the WebSocket URL.</li><li>Make the browser client support the token authentication. <a href="https://github.com/apache/pulsar/pull/9886">PR-9886</a><br>Previously, the WebSocket client used the HTTP request header to transport the authentication params, but the browser JavaScript WebSocket client could not add new headers. This PR uses the query param token to transport the authentication token for the browser JavaScript WebSocket client.</li></ul><h4>Function and connector</h4><ul><li>Allow customizable function logging. <a href="https://github.com/apache/pulsar/pull/10389">PR-10389</a><br>Previously, the function log configuration was in the jar package and could not be dynamically customized. This PR changes the function log configuration file to the configuration directory, which can be customized.</li><li>Pass through record properties from Pulsar sources. <a href="https://github.com/apache/pulsar/pull/9943">PR-9943</a></li><li>Fix the issue of the time unit in Pulsar Go functions. <a href="https://github.com/apache/pulsar/pull/10160">PR-10160</a><br>This PR changes the time unit of avg process latency from ns to ms.</li><li>Fix the issue that the Kinesis sink did not try to resend messages. <a href="https://github.com/apache/pulsar/pull/10420">PR-10420</a><br>Previously, when the Kinesis sink connector failed to send a message, it did not retry. In this case, if retainOrdering was enabled, it would lead to subsequent messages not being sent. This PR adds retry logic for the Kinesis sink connector. A message is retried to send if it fails to send.</li><li>Fix the issue of null error messages in the onFailure exception in the Kinesis sink. <a href="https://github.com/apache/pulsar/pull/10416">PR-10416</a><br>Previously, if the Kinesis producer failed to send a message, the error message in the onFailure exception was null. This PR extracts the UserRecordFailedException to show the real error messages.</li></ul><h4>Tiered storage</h4><ul><li>Prevent class loader leak and restore offloader directory override. <a href="https://github.com/apache/pulsar/pull/9878">PR-9878</a><br>Previously, there was a class loader leak. This PR updates the PulsarService and the PulsarConnectorCache classes to use a map from directory strings to offloaders.</li><li>Add logs for cleanup of offloaded data operation. <a href="https://github.com/apache/pulsar/pull/9852">PR-9852</a><br>Previously, the cleanup offloaded data operation lacked logs making it hard for users to analyze the reason for the tiered storage data loss. This PR adds some logs for the cleanup of offloaded data operation.</li></ul><h3>Get involved</h3><p>To get started, you can <a href="https://pulsar.apache.org/en/download/">download Pulsar</a> directly or you can spin up a Pulsar cluster on StreamNative Cloud with a free 30-day trial of <a href="https://auth.streamnative.cloud/login?state=hKFo2SBVeG81YTFiSWUtdDhhQkgtd19LdWhWYm9jUng4NGpua6FupWxvZ2luo3RpZNkgVHh1bFN0bHozeEFpeDR5QlNGMnlWM19oUHpwcTlvSk2jY2lk2SA2ZXI3M3FLcTQycUIwd2JzcjFTT01hWWJhdTdLaGxldw&amp;client=6er73qKq42qB0wbsr1SOMaYbau7Khlew&amp;protocol=oauth2&amp;audience=https%3A%2F%2Fapi.streamnative.cloud&amp;redirect_uri=https%3A%2F%2Fconsole.streamnative.cloud%2Fcallback&amp;defaultMethod=singup&amp;scope=openid%20profile%20email%20offline_access&amp;response_type=code&amp;response_mode=query&amp;nonce=VDRWNG5rYVhpcWZJYTdOWlF4Q1BDeENxcFZKQlFneU9VYlllRzdTdXF4UQ%3D%3D&amp;code_challenge=W__xPbFyDLkHTgO8p7DmrT84cHkZC3RvLsr3iE438sQ&amp;code_challenge_method=S256&amp;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9">StreamNative Cloud</a> in which Pulsar 2.7.2 changes are shipped! Moreover, we offer technical consulting and expert training to help get your organization started. As always, we are highly responsive to your feedback. Feel free to <a href="https://streamnative.io/en/contact">contact us</a> if you have any questions at any time. Look forward to hearing from you and stay tuned for the next Pulsar release!</p><h3>About the Author</h3><p><strong>Yong Zhang</strong> is an Apache Pulsar committer. He works as a software engineer at StreamNative.</p><p><strong>Yu Liu</strong> is an Apache Pulsar committer and a technical writer from StreamNative. You can follow her on <a href="https://twitter.com/Anonymitaet1">twitter</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6cbf5fcb6a16" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/whats-new-in-apache-pulsar-2-7-2-6cbf5fcb6a16">What’s New in Apache Pulsar 2.7.2</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Flink SQL on StreamNative Cloud]]></title>
            <link>https://medium.com/streamnative/flink-sql-on-streamnative-cloud-4c69b562a165?source=rss----ab76d1bbc527---4</link>
            <guid isPermaLink="false">https://medium.com/p/4c69b562a165</guid>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[tech-blog]]></category>
            <category><![CDATA[apache-pulsar]]></category>
            <category><![CDATA[flink]]></category>
            <category><![CDATA[streaming]]></category>
            <dc:creator><![CDATA[Sijia-w]]></dc:creator>
            <pubDate>Thu, 22 Apr 2021 16:03:37 GMT</pubDate>
            <atom:updated>2021-04-22T16:03:33.821Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*cnCgmkllft7_TUpd.png" /></figure><p>We are excited to announce the launch of Flink SQL on StreamNative Cloud. Flink SQL on StreamNative Cloud (aka “Flink SQL”) provides an intuitive and interactive SQL interface that reduces the complexity of building real-time data queries on Apache Pulsar. StreamNative is Cloud Partners with Ververica, the original developers of and the company behind Apache Flink. This partnership has enabled a close collaboration and integration and has helped us to create a powerful, turnkey platform for real time data insights.</p><h3>Why Apache Flink and Flink SQL?</h3><p>Apache Flink is a distributed, stream data processing engine that provides high throughput, low latency data processing, powerful abstractions and operational flexibility. With Apache Flink, users can easily develop and deploy event-driven applications, data analytics jobs, and data pipelines to handle real-time and historical data in complex distributed systems. Because of its powerful functionality and mature community, Apache Flink is widely adopted globally by some of the largest and most successful data-driven enterprises, including Alibaba, Netflix, and Uber.</p><p>Flink SQL provides relational abstractions of events stored in Apache Pulsar. It supports SQL standards for unified stream and batch processing. With Flink SQL, users can write SQL queries and access key insights from their real-time data, without having to write a line of Java or Python.</p><p>With a powerful execution engine and simple abstraction layer, Apache Flink and Flink SQL provide a distributed, real-time data processing solution with low development and maintenance costs. With Pulsar and Flink, StreamNative offers both stream storage and stream compute for a complete streaming solution.</p><h3>Flink + Pulsar: A Cloud-Native Streaming Platform for Infinite Data Streams</h3><p>The need for real-time data insights has never been more critical. But data insights aren’t limited to real-time data. Companies also need to integrate and understand large amounts of historical data in order to gain a complete picture of their business. This requires the ability to capture, store and compute both real-time and historical data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*IhMRs_qov9KmaYO4.png" /><figcaption><em>Figure 1. Real-time data insights requires the ability to capture, store and compute both real-time and historical data</em></figcaption></figure><p>Pulsar’s <a href="http://pulsar.apache.org/docs/en/concepts-tiered-storage/">tiered storage model</a> provides the storage capabilities required for both batch and stream processing, enabling StreamNative Cloud to offer unified storage. Integrating Apache Flink and Flink SQL enables us to offer unified batch and stream processing, and Flink SQL simplifies the execution.</p><p>In a streaming-first world, the core abstraction of data is the infinite stream. The tables are derived from the stream and updated continuously as new data arrives in the stream. Apache Pulsar is the storage for infinite streams and Apache Flink is the engine that creates the materialized views in the form of streaming tables. You can then run streaming queries to perform continuous transformations, or run batch queries against streaming tables to get the latest value for every key in the stream in real time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*HlYVNeikx9LBHD5a.png" /><figcaption><em>Figure 2. Stream &amp; Table</em></figcaption></figure><p>Integrating Apache Flink with Apache Pulsar enables companies to represent and process streaming data in new ways. The Pulsar infinite stream is the core storage abstraction for streaming data and everything else is a materialized view over the infinite stream, including databases, search indexes, or other data serving systems in the company. All the data enrichment and ETL needed to create these derived views can now be created in a streaming fashion using Apache Flink. Monitoring, security, anomaly and threat detection, analytics, and response to failures can be done in real-time by combining historical context with real-time data analytics.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*S9ZQvFUNClnW3Hq0.png" /><figcaption><em>Figure 3. StreamNative Cloud as a complete streaming solution</em></figcaption></figure><h3>When to Use Flink SQL</h3><p>With Flink SQL on StreamNative Cloud, Pulsar clusters are treated as Flink catalogs. Users can query infinite streams of events in Apache Pulsar using Flink SQL. Below are some top use cases for utilizing the streaming SQL queries over Pulsar streams:</p><h4>1. Real-time monitoring</h4><p>We often think of monitoring as tracking low-level performance statistics using counters and gauges. While these metrics can tell you that your CPU usage is high, they can’t tell you if your application is doing what it’s supposed to. Flink SQL allows you to define custom metrics from streams of messages that applications generate, whether they are logging events, captured change data, or any other kind. For example, a cloud service might need to check that every time a new user signs up, a welcome email is sent, a new user record is created, and their credit card is billed. These functions might be spread over multiple different services or applications, and you want to monitor that each thing happened for each new customer within a certain SLA.</p><p>Below is a streaming SQL query to monitor error counts over a stream of error codes.</p><pre>INSERT INTO error_counts<br>SELECT error_code, count(*) FROM monitoring_stream<br>GROUP BY TUMBLE(ts, INTERVAL &#39;1&#39; MINUTE), error_code<br>HAVING type = ‘ERROR’;</pre><h4>2. Real-time anomaly detection</h4><p>Security use cases often look a lot like monitoring and analytics. Rather than monitoring application behavior or business behavior, application developers are looking for patterns of fraud, abuse, spam, intrusion, or other bad behavior. Flink SQL provides a simple and real-time way of defining these patterns and querying real-time Pulsar streams.</p><p>Below is a streaming SQL query to detect frauds over a stream of transactions.</p><pre>INSERT INTO possible_fraud<br>SELECT card_number, count(*)<br>FROM transactions<br>GROUP BY TUMBLE(ts, INTERVAL &#39;1&#39; MINUTE), card_number<br>HAVING count(*) &gt; 3;</pre><h4>3. Real-time data pipelines</h4><p>Companies build real-time data pipelines for data enrichment. These data pipelines capture data changes coming out of several databases, transform them, join them together, and store them in a key-value database, search index, cache, or other data serving systems.</p><p>For a long time, ETL pipelines were built as periodic batch jobs. For example, they ingest the raw data in realtime, and then transform it every few hours to enable efficient queries. For many real-time use cases, such as transaction or payment processing, this delay is unacceptable. Flink SQL together with Pulsar I/O connectors enables real-time data integration between different systems.</p><p>Now you can enrich streams of events with metadata stored in a different table using joins, or perform simple filtering of Personally Identifiable Information (PII) data before loading the stream into another system.</p><p>The streaming SQL query below shows an example enriching a click stream using a users table.</p><pre>INSERT INTO vip_users<br>SELECT user_id, page, action<br>FROM clickstream c<br>LEFT JOIN users u ON c.user_id = u.user_id<br>WHERE u.level = ‘Platinum’;</pre><h3>Pulsar Abstractions in Flink SQL</h3><p>The integration of Flink SQL and Apache Pulsar utilizes Flink’s <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/catalogs.html">catalog API</a> to reference existing Pulsar metadata and automatically map them to Flink’s corresponding metadata. There are a few core abstractions in this integration that map to the core abstractions in Pulsar and allow you to manipulate Pulsar topics using SQL.</p><ul><li>Catalog: A catalog is a collection of databases. It is mapped to an existing Pulsar cluster.</li><li>Database: A database is a collection of tables. It is mapped to a namespace in Apache Pulsar. All the namespaces within a Pulsar cluster will automatically be converted to Flink databases in a Pulsar catalog. Databases can also be created or deleted via Data Definition Language (DDL) queries, where the underlying Pulsar namespaces will be created or deleted.</li></ul><pre>CREATE DATABASE userdb;</pre><ul><li>Table: A Pulsar topic can be presented as a STREAMING table or an UPSERT table.</li><li>Schema: The schema of a Pulsar topic will be automatically mapped as Flink table schema if the topic already exists with a schema. If a Pulsar topic doesn’t exist, creating a table via DDL queries will convert the Flink table schema to a Pulsar schema for creating a Pulsar topic.</li><li>Metadata Columns: The message metadata and properties of a Pulsar message will be mapped into the metadata columns of a Flink table. These metadata columns are: — messageId: the message ID of a Pulsar message. (read-only) - sequenceId: the sequence ID of a Pulsar message. (read-only) - publishTime: the publish timestamp of a Pulsar message. (read-only) - eventTime: the event timestamp of a Pulsar message. (readable/writable) - properties: the message properties of a Pulsar message. (readable/writable)</li></ul><p>A Pulsar topic can be presented as a STREAMING table or an UPSERT table in Flink.</p><h4>STREAMING table</h4><p>A streaming table represents an unbounded sequence of structured data (“facts”). For example, we could have a stream of financial transactions such as “Jack sent $100 to Kate, then Alice sent $200 to Kate”. Facts in a table are immutable, which means new events can be inserted into a table, but existing events can never be updated or deleted. All the topics within a Pulsar namespace will automatically be mapped to streaming tables in a catalog configured to use a pulsar connector. Streaming tables can also be created or deleted via DDL queries, where the underlying Pulsar topics will be created or deleted.</p><pre>CREATE TABLE pageviews (<br>  user_id BIGINT,<br>  page_id BIGINT,<br>  viewtime TIMESTAMP,<br>  user_region STRING,<br>  WATERMARK FOR viewtime AS viewtime - INTERVAL &#39;2&#39; SECOND<br>);</pre><h4>UPSERT table</h4><p>An upsert table represents a collection of evolving facts. For example, we could have a table that contains the latest financial information such as “Kate’s current account balance is $300”. It is the equivalent of a traditional database table but enriched by streaming semantics such as windowing. Facts in a UPSERT table are mutable, which means new facts can be inserted into the table, and existing facts can be updated or deleted. Upsert tables can be created by specifying connector to be upsert-pulsar.</p><pre>CREATE TABLE pageviews_per_region (<br>  user_region STRING,<br>  pv BIGINT,<br>  uv BIGINT,<br>  PRIMARY KEY (user_region) NOT ENFORCED<br>) with (<br>  “connector” = “upsert-pulsar”<br>};</pre><p>By integrating the concepts of streaming tables and upsert tables, FlinkSQL allows joining upsert tables that represent the current state of the world with streaming tables that represent events that are happening right now. A topic in Pulsar can be represented as either a streaming table or an upsert table in Flink SQL, depending on the intended semantics of the processing on the topic.</p><p>For instance, if you want to read the data in a topic as a series of independent values, you would treat a Pulsar topic as a streaming table. An example of such a streaming table is a topic that captures page view events where each page view event is unrelated and independent of another. On the other hand, if you want to read the data in a topic as an evolving collection of updatable values, you would treat the topic as an upsert topic. An example of a topic that should be read as an UPSERT table in Flink is one that captures user metadata where each event represents the latest metadata for a particular user id including its user name, address or preferences.</p><h3>A Dive into Flink SQL on StreamNative Cloud</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ZHqVBV-EFJfURdXL.png" /><figcaption><em>Figure 4. StreamNative Cloud Architecture</em></figcaption></figure><p>StreamNative Cloud operates out of a control plane and cloud pools.</p><p>The control plane includes the backend services that StreamNative manages in its own cloud account. The backend services mainly include a Cloud API service and a Cloud console. Users can interact with StreamNative Cloud via the Cloud console, and applications can interact with it via the Cloud API service.</p><p>The cloud pools can be managed by StreamNative in its own cloud account or in the customers’ cloud accounts. Pulsar clusters are run inside the cloud pools. The SQL queries are also run on the cloud pools.</p><p>The diagram below demonstrates how the authentication/authorization is implemented in our system. Here it assumes that data has already been ingested into the Pulsar clusters on StreamNative Cloud, but you can ingest data from external data sources, such as events data, streaming data, IoT data, and more, using Pulsar’s pub/sub messaging API.</p><p>Users or applications can interact with the StreamNative control plane to create a Pulsar cluster. Once the Pulsar cluster is ready, users can either create a Flink session cluster and use the SQL editor in StreamNative’s Cloud console to initiate interactive queries, or create long-running deployments to continuously process data streams in the Pulsar cluster.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*kgOafv23liYvKCYf.png" /><figcaption><em>Figure 5. How Flink SQL interacts with Pulsar clusters</em></figcaption></figure><p>For each Flink session cluster, there is a SQL Gateway process which parses SQL queries and executes queries locally or submits queries to the Flink cluster. Each SQL session in the SQL Gateway will initiate Pulsar catalogs, with each catalog representing one existing Pulsar cluster. The catalog contains all the necessary information needed to securely access the Pulsar cluster. For DDL queries, they are directly executed in the SQL gateway, while all the DML queries will be submitted to the Flink session cluster to execute. All the SQL queries are impersonated as the actual user who submits them for security purposes.</p><h3>What’s Next for Flink + Pulsar Integration on StreamNative Cloud?</h3><p>We are releasing Flink SQL on StreamNative Cloud as a developer preview feature to gather feedback. We plan to add several more capabilities such as running Flink SQL as continuous deployments, providing the ability to run arbitrary Flink jobs, and more, as we work with both the Pulsar and Flink communities to build a robust, unified batch and streaming solution.</p><h3>How Do I Access Flink SQL on StreamNative Cloud?</h3><p>You can get started by watching the <a href="https://www.youtube.com/watch?v=0BxXjEqoJlU">quick start tutorial</a> for Flink SQL on StreamNative Cloud. We’d love to hear about any ideas you have for improvement and to work closely with early adopters. Note, the Flink SQL offering is only available on paid clusters for now. We will give free cloud credits to our early adopters. If you are interested in trying out, please email us at <a href="mailto:info@streamnative.io">info@streamnative.io</a>.</p><p>To learn more about Flink SQL, you can:</p><ul><li>Watch the <a href="https://youtu.be/9ojajM7Zt0M?t=2105">intro video</a>.</li><li>Read about Flink SQL <a href="https://docs.streamnative.io/cloud/stable/compute/flink-sql">here</a>.</li><li>Get started with Flink SQL in <a href="http://console.streamnative.cloud/">StreamNative Cloud</a>.</li></ul><p>Finally, if you’re interested in messaging and event streaming, and want to help build Pulsar and Flink, <a href="https://streamnative.io/en/careers">we are hiring</a>.</p><h3>About the Author</h3><p><strong>Sijie Guo</strong> is the co-founder and CEO of StreamNative, which provides a cloud-native event streaming platform powered by Apache Pulsar. Sijie has worked on messaging and streaming data technologies for more than a decade. Prior to StreamNative, Sijie cofounded Streamlio, a company focused on real-time solutions. At Twitter, Sijie was the tech lead for the messaging infrastructure group, where he co-created DistributedLog and Twitter EventBus. Prior to that, he worked on the push notification infrastructure at Yahoo!, where he was one of the original developers of BookKeeper and Pulsar. He is also the VP of Apache BookKeeper and PMC member of Apache Pulsar. You can follow him on <a href="https://twitter.com/sijieg">twitter</a>.</p><p>This post was originally published on <a href="https://streamnative.io/blog">StreamNative blog</a>.</p><p><em>Like this post? Please recommend and/or share.</em></p><p><em>Want to learn more? See </em><a href="https://streamnative.io/blog/"><em>https://streamnative.io/blog</em></a><em>.</em> <em>Follow us </em><a href="https://medium.com/streamnative"><em>here</em></a><em> on Medium and check out our </em><a href="https://github.com/streamnative"><em>GitHub</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4c69b562a165" width="1" height="1" alt=""><hr><p><a href="https://medium.com/streamnative/flink-sql-on-streamnative-cloud-4c69b562a165">Flink SQL on StreamNative Cloud</a> was originally published in <a href="https://medium.com/streamnative">StreamNative</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>