Skip to content

Dynamic partition create redundant partition assignment in Zookeeper #1185

@zcoo

Description

@zcoo

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

main (development)

Please describe the bug 🐞

Reproduce steps:

  1. Create a partitioned table
  2. Insert into table with enough data and trigger dynamic partition using Flink.
  3. In my case, these data are partitioned into 24 partitions.
  4. In ZK path : /metadata/databases/[databaseName]/tables/[tableName]/partitions 24 partitions created.
    However, in ZK path /tabletservers/partitions/[partitionId] 32 partitions created.
  5. Drop table in Flink. 24 partitions are successfully deleted but 8 partitions are residual in ZK.

It seems a thread unsafe issue.

Solution

Although we found it in dynamic partition scene, it is not only concern with dynamic partition. It happens when more than one client try to create partition with the same table id and the same partition name at the same time (which usually occurs in dynamic partition).

com.alibaba.fluss.server.coordinator.MetadataManager#createPartition
This method is not thread safe, even though it checks if partition exists before registers to zk.
Thread 1:
t1: check zk for partition name 「x」 -> t2: register to zk
Thread 2:
t3: check zk for partition name 「x」-> t4: register to zk
t1 and t3 can happen in the same time.
t2 and t4 can happen in the same time.

try {
	long partitionId = zookeeperClient.getPartitionIdAndIncrement();
	// register partition assignments to zk first
	zookeeperClient.registerPartitionAssignment(
	        partitionId, partitionAssignment);
	// then register the partition metadata to zk
	zookeeperClient.registerPartition(
	        tablePath, tableId, partitionName, partitionId);
	LOG.info(
	        "Register partition {} to zookeeper for table [{}].",
	        partitionName,
	        tablePath);
}

Thus "registerPartitionAssignment" call twice because partition id is different
but "zookeeperClient.registerPartition" call once because they share the zk node.
That is why
In zk path : /metadata/databases/[databaseName]/tables/[tableName]/partitions 24 partitions created.
However, in zk path /tabletservers/partitions/[partitionId] 32 partitions created.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions