[doc] improve quickstart: merge Iceberg/Paimon sections & adjust DML behavior#1924

xx789633 · 2025-11-03T02:39:30Z

Purpose

Linked issue: close #1921

Brief change log

Tests

API and Format

Documentation

…e.dml-sync

luoyuxia

@xx789633 Thanks for the pr. Left minor comment.
But one thing is that I found the side navigator won't work now with tab:

Find same navigator item

Click the navigator item, it won't jump to the section

luoyuxia · 2025-11-03T06:47:28Z

+To enable lakehouse functionality as a tiered storage solution for a table, you must create the table with the configuration option `table.datalake.enabled = true`. 
+Return to the `SQL client` and execute the following SQL statement to create a table with data lake integration enabled:
+```sql  title="Flink SQL"
+CREATE TABLE datalake_enriched_orders (


It's pity that CREATE TABLE datalake_enriched_orders and INSERT INTO datalake_enriched_orders can't shared by paimon and iceberg. Union read in batch mode for pk table is not supported. In iceberg, you can only use appendonly table.

Hi @luoyuxia , table of contents don't work with tabs, see: facebook/docusaurus#5343. The only solution maybe we have is to disable the table of contents in tabs: facebook/docusaurus#5343 (comment)

Hi @luoyuxia , I've updated the doc with the changes we discussed offline. please take a look.

Copilot

Pull Request Overview

This PR refactors the Flink quickstart documentation by consolidating duplicate content between the Paimon and Iceberg guides into shared files, and implementing a tabbed interface to switch between the two data lake formats.

Extracted common sections (prerequisites, Flink SQL operations, cleanup) into reusable shared files
Reorganized the main flink.md to use a tabbed interface with Paimon and Iceberg as separate tabs
Created new MDX files (_flink-paimon.mdx and _flink-iceberg.mdx) that compose the shared content
Removed the standalone flink-iceberg.md file

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`website/docs/quickstart/flink.md`	Converted to a tabbed interface that displays Paimon and Iceberg guides
`website/docs/quickstart/flink-iceberg.md`	Deleted - content moved to `_flink-iceberg.mdx`
`website/docs/quickstart/_flink-paimon.mdx`	New file containing Paimon-specific quickstart content with shared imports
`website/docs/quickstart/_flink-iceberg.mdx`	New file containing Iceberg-specific quickstart content with shared imports
`website/docs/quickstart/_shared-prerequisites.md`	Extracted common prerequisites section
`website/docs/quickstart/_shared-flink-sql.md`	Extracted common Flink SQL operations, table creation, queries, and update/delete operations
`website/docs/quickstart/_shared-streaming-into-fluss.md`	Extracted common datalake-enabled table creation and streaming data writing
`website/docs/quickstart/_shared-cleanup.md`	Extracted common cleanup and learn more sections

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-03T06:59:03Z

+    `nation_name` STRING,
+    PRIMARY KEY (`order_key`) NOT ENFORCED


The PRIMARY KEY definition is missing from the datalake_enriched_orders table in the shared file. Looking at the original flink.md (Paimon version), the table had PRIMARY KEY (order_key) NOT ENFORCED on line 20, but the original flink-iceberg.md did not have this constraint. The shared file now includes the PRIMARY KEY, which changes the behavior for Iceberg users who previously created tables without primary keys. This could break existing workflows or cause unexpected behavior.

Suggested change

`nation_name` STRING,

PRIMARY KEY (`order_key`) NOT ENFORCED

`nation_name` STRING

Copilot · 2025-11-03T06:59:03Z

+includes the [fluss-flink](engine-flink/getting-started.md), [iceberg-flink](https://iceberg.apache.org/docs/latest/flink/) and
+[flink-connector-faker](https://flink-packages.org/packages/flink-faker) to simplify this guide.
+
+3. To start all containers, run:


The step numbering is incorrect. Step 2 creates the lib directory and downloads the Hadoop jar (lines 29-34), but then step 3 appears twice: once for creating the docker-compose.yml file (line 49) and again for starting containers (line 138). The docker-compose.yml creation should be labeled as step 3, and starting containers should be step 4.

Suggested change

3. To start all containers, run:

4. To start all containers, run:

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

luoyuxia

@xx789633 Thanks for the pr. It should be ready to be merge in next interation. After you modify your doc, please go through the quickstart doc to make sure it works

luoyuxia · 2025-11-07T13:13:18Z

+1. Create a working directory for this guide.
+
+```shell
+mkdir fluss-quickstart-flink


Suggested change

mkdir fluss-quickstart-flink

mkdir fluss-quickstart-flink-paimon

luoyuxia · 2025-11-07T13:18:47Z

@@ -437,8 +447,103 @@ LEFT JOIN fluss_nation FOR SYSTEM_TIME AS OF o.ptime AS n
    ON c.nation_key = n.nation_key;
 ```

+  </TabItem>
+</Tabs>
+
 ### Real-Time Analytics on Fluss datalake-enabled Tables


There is no much difference in this step, can these two tabs be merged into one except view the files?

luoyuxia · 2025-11-07T13:19:47Z

-## Learn more
-Now that you're up and running with Fluss and Flink with Iceberg, check out the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink or [this guide](/maintenance/observability/quickstart.md) to learn how to set up an observability stack for Fluss and Flink.
+  </TabItem>
+</Tabs>


we still need clean up

luoyuxia · 2025-11-07T13:23:06Z

-## Streaming into Fluss
-
-First, run the following SQL to sync data from source tables to Fluss tables:
+Next, perform streaming data writing into the **datalake-enabled** table, `datalake_enriched_orders`:


From

Next, perform streaming data writing into the **datalake-enabled**

to

INSERT INTO datalake_enriched_orders

we can reuse same content, but remember use

INSERT INTO datalake_enriched_orders SELECT o.order_key, o.cust_key, o.total_price, o.order_date, o.order_priority, o.clerk, c.name, c.phone, c.acctbal, c.mktsegment, n.name FROM ( SELECT *, PROCTIME() as ptime FROM `default_catalog`.`default_database`.source_order ) o LEFT JOIN fluss_customer FOR SYSTEM_TIME AS OF o.ptime AS c ON o.cust_key = c.cust_key LEFT JOIN fluss_nation FOR SYSTEM_TIME AS OF o.ptime AS n ON c.nation_key = n.nation_key;

This content is short. I think we can keep as it is?

luoyuxia

@xx789633 Left two minor comments again

luoyuxia · 2025-11-10T02:32:06Z

-the `fluss_orders` table with information from the `fluss_customer` and `fluss_nation` primary-key tables.
+```sql  title="Flink SQL"
+-- execute DML job asynchronously
+SET 'table.dml-sync' = 'false';


don't need this since it's false by default.

luoyuxia · 2025-11-10T02:34:22Z

+-- switch to batch mode
+SET 'execution.runtime-mode' = 'batch';
+```
+


SET 'sql-client.execution.result-mode' = 'tableau';

Also change this to make the screen looks well.

luoyuxia · 2025-11-10T07:55:37Z

-SET 'execution.runtime-mode' = 'batch';
+-- use tableau result mode
+SET 'sql-client.execution.result-mode' = 'tableau';
 ```


I think set to batch mode is still required? Right?

wuchong

Thanks @xx789633 for the great work!

Currently, we have three shared Markdown snippets:

_shared-cleanup.md
_shared-create-table.md
_shared-lake-analytics.md

While content sharing can reduce duplication, it also harms readability and flow—especially in documentation meant for users. In this case:

_shared-lake-analytics.md is only used by flink-lake.md, so there’s no real benefit to extracting it.
_shared-cleanup.md contains just a few lines, keeping it inline improves clarity.
_shared-create-table.md does reuse some content, but it disrupts the logical order of the CREATE TABLE statements, which feels unnatural and confusing from a user’s perspective.

The original intent of this issue was to maximize reuse of the “Real-Time Analytics with Flink” section across the Paimon and Iceberg quickstarts. But now that we’ve split them into two distinct guides (“Real-Time Analytics with Flink” and “Building a Streaming Lakehouse”) we’ve already addressed the core concern.

Therefore, I suggest we avoid content reuse altogether here. Our goal isn’t to eliminate all duplication at the cost of usability and readability.

Let me know what you think!

wuchong · 2025-11-10T11:34:51Z

+import CreateTable from './_shared-create-table.md';

-For more information on working with Flink, refer to the [Apache Flink Engine](engine-flink/getting-started.md) section.
+This guide will help you set up a basic streaming Lakehouse using Fluss with Paimon or Iceberg.


Suggested change

This guide will help you set up a basic streaming Lakehouse using Fluss with Paimon or Iceberg.

This guide will help you set up a basic Streaming Lakehouse using Fluss with Paimon or Iceberg, and help you better understand the powerful feature of Union Read.

wuchong · 2025-11-10T11:37:05Z

+
+```shell
+mkdir fluss-quickstart-flink-paimon
+cd fluss-quickstart-flink-paimon


Can we just name this directory fluss-quickstart-paimon? In the future, we want to introduce Trino and other query engines into this quickstart. So binding to a specific engine is not feasible. The same to the iceberg directory.

wuchong · 2025-11-10T11:44:23Z

@@ -285,104 +384,15 @@ SELECT o.order_key,
       c.mktsegment,
       n.name
 FROM fluss_order o


fluss_order is not defined on this page.

Besides, there is no records in fluss_customer, fluss_order and fluss_nation as there is no insert into job to these tables.

Besides, this query is different with the one in iceberg tab. I think we need keep consistent between them?

wuchong · 2025-11-10T11:49:46Z

+        ├── LATEST
+        └── snapshot-1
+```
+The files adhere to Paimon's standard format, enabling seamless querying with other engines such as [StarRocks](https://docs.starrocks.io/docs/data_source/catalog/paimon_catalog/).


Keep consistent with Iceberg tab? use "Trino" and "Spark" here?

wuchong · 2025-11-10T11:53:10Z

@@ -1,15 +1,16 @@
 ---
-title: Real-Time Analytics with Flink (Iceberg)
+title: Build Streaming Lakehouse


Rename the title from Build Streaming Lakehouse to Building a Streaming Lakehouse, this matches standard quickstart naming conventions.

Consider renaming the file from flink-lake.md to lakehouse.md for better clarity and consistency, since the guide now focuses on the broader lakehouse architecture, not just Flink integration.

wuchong

I appended a commit to do some minor improvements.

…Real-Time Analytics with Flink" (#1924) (cherry picked from commit 8faff0e)

wuchong · 2025-11-10T16:10:06Z

I cherry-picked this PR to release-0.8 branch: c9dcf80

@xx789633 could you help to verify whether all the v0.8 quickstart (flink, iceberg, paimon) still work as expected?

https://fluss.apache.org/docs/quickstart/flink/
https://fluss.apache.org/docs/quickstart/lakehouse/

xx789633 · 2025-11-10T16:12:24Z

I'll do this tomorrow.

xx789633 · 2025-11-11T02:10:15Z

This SQL in Iceberg tab

-- insert tuples into datalake_enriched_orders
INSERT INTO datalake_enriched_orders
SELECT o.order_key,
       o.cust_key,
       o.total_price,
       o.order_date,
       o.order_priority,
       o.clerk,
       c.name,
       c.phone,
       c.acctbal,
       c.mktsegment,
       n.name
FROM fluss_order o
LEFT JOIN fluss_customer FOR SYSTEM_TIME AS OF `o`.`ptime` AS `c`
    ON o.cust_key = c.cust_key
LEFT JOIN fluss_nation FOR SYSTEM_TIME AS OF `o`.`ptime` AS `n`
    ON c.nation_key = n.nation_key;

needs to be changed to:

-- insert tuples into datalake_enriched_orders
INSERT INTO datalake_enriched_orders
SELECT o.order_key,
       o.cust_key,
       o.total_price,
       o.order_date,
       o.order_priority,
       o.clerk,
       c.name,
       c.phone,
       c.acctbal,
       c.mktsegment,
       n.name
FROM (
    SELECT *, PROCTIME() as ptime
    FROM `default_catalog`.`default_database`.source_order
) o
LEFT JOIN fluss_customer FOR SYSTEM_TIME AS OF o.ptime AS c
    ON o.cust_key = c.cust_key
LEFT JOIN fluss_nation FOR SYSTEM_TIME AS OF o.ptime AS n
    ON c.nation_key = n.nation_key;

I have created a pull request to fix that :#1964

Other than that, everything works as expected.

…Real-Time Analytics with Flink" (apache#1924)

share the same content between iceberg/paimon quickstart; adjust tabl…

7575354

…e.dml-sync

xx789633 force-pushed the quick-improve branch from 79edc11 to 7575354 Compare November 3, 2025 02:41

luoyuxia reviewed Nov 3, 2025

View reviewed changes

luoyuxia requested a review from Copilot November 3, 2025 06:57

Copilot AI reviewed Nov 3, 2025

View reviewed changes

xx789633 force-pushed the quick-improve branch from 4585132 to 7575354 Compare November 3, 2025 11:23

xx789633 added 5 commits November 4, 2025 22:18

wip

1a7d677

wip

50a1b06

wip

5c60561

wip

7dad2c2

nit

8f5a9ef

luoyuxia requested a review from Copilot November 5, 2025 06:55

Copilot AI reviewed Nov 5, 2025

View reviewed changes

Comment thread website/docs/quickstart/flink.md Outdated

Update website/docs/quickstart/flink.md

2dc1dc7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

luoyuxia reviewed Nov 7, 2025

View reviewed changes

xx789633 added 4 commits November 8, 2025 09:29

nit

fb91025

nit

53cc839

nit

a2d10fc

nit

de41522

luoyuxia reviewed Nov 10, 2025

View reviewed changes

nit

1eb4026

luoyuxia reviewed Nov 10, 2025

View reviewed changes

xx789633 added 2 commits November 10, 2025 16:40

nit

1129ffd

nit

2e10185

wuchong reviewed Nov 10, 2025

View reviewed changes

xx789633 and others added 3 commits November 10, 2025 21:09

nit

11b1503

nit

02d74d4

Jark's improvement

7d3b5e3

wuchong approved these changes Nov 10, 2025

View reviewed changes

wuchong merged commit 8faff0e into apache:main Nov 10, 2025
2 checks passed

wuchong pushed a commit that referenced this pull request Nov 10, 2025

[doc] Separate Quickstart into "Building a Streaming Lakehouse" and "…

c9dcf80

…Real-Time Analytics with Flink" (#1924) (cherry picked from commit 8faff0e)

Ugbot pushed a commit to Ugbot/fluss that referenced this pull request Apr 26, 2026

[doc] Separate Quickstart into "Building a Streaming Lakehouse" and "…

a5f7725

…Real-Time Analytics with Flink" (apache#1924)

	`nation_name` STRING,
	PRIMARY KEY (`order_key`) NOT ENFORCED
	`nation_name` STRING

	3. To start all containers, run:
	4. To start all containers, run:

	mkdir fluss-quickstart-flink
	mkdir fluss-quickstart-flink-paimon

	This guide will help you set up a basic streaming Lakehouse using Fluss with Paimon or Iceberg.
	This guide will help you set up a basic Streaming Lakehouse using Fluss with Paimon or Iceberg, and help you better understand the powerful feature of Union Read.

Conversation

xx789633 commented Nov 3, 2025

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xx789633 Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xx789633 Nov 3, 2025 •

edited

Loading