Search before asking
Fluss version
main (development)
Please describe the bug 🐞
Reproduce steps:
- Create a partitioned table
- Insert into table with enough data and trigger dynamic partition using Flink.
- In my case, these data are partitioned into 24 partitions.
- In ZK path : /metadata/databases/[databaseName]/tables/[tableName]/partitions 24 partitions created.
However, in ZK path /tabletservers/partitions/[partitionId] 32 partitions created.
- Drop table in Flink. 24 partitions are successfully deleted but 8 partitions are residual in ZK.
It seems a thread unsafe issue.
Solution
Although we found it in dynamic partition scene, it is not only concern with dynamic partition. It happens when more than one client try to create partition with the same table id and the same partition name at the same time (which usually occurs in dynamic partition).
com.alibaba.fluss.server.coordinator.MetadataManager#createPartition
This method is not thread safe, even though it checks if partition exists before registers to zk.
Thread 1:
t1: check zk for partition name 「x」 -> t2: register to zk
Thread 2:
t3: check zk for partition name 「x」-> t4: register to zk
t1 and t3 can happen in the same time.
t2 and t4 can happen in the same time.
try {
long partitionId = zookeeperClient.getPartitionIdAndIncrement();
// register partition assignments to zk first
zookeeperClient.registerPartitionAssignment(
partitionId, partitionAssignment);
// then register the partition metadata to zk
zookeeperClient.registerPartition(
tablePath, tableId, partitionName, partitionId);
LOG.info(
"Register partition {} to zookeeper for table [{}].",
partitionName,
tablePath);
}
Thus "registerPartitionAssignment" call twice because partition id is different
but "zookeeperClient.registerPartition" call once because they share the zk node.
That is why
In zk path : /metadata/databases/[databaseName]/tables/[tableName]/partitions 24 partitions created.
However, in zk path /tabletservers/partitions/[partitionId] 32 partitions created.
Are you willing to submit a PR?
Search before asking
Fluss version
main (development)
Please describe the bug 🐞
Reproduce steps:
However, in ZK path /tabletservers/partitions/[partitionId] 32 partitions created.
It seems a thread unsafe issue.
Solution
Although we found it in dynamic partition scene, it is not only concern with dynamic partition. It happens when more than one client try to create partition with the same table id and the same partition name at the same time (which usually occurs in dynamic partition).
com.alibaba.fluss.server.coordinator.MetadataManager#createPartitionThis method is not thread safe, even though it checks if partition exists before registers to zk.
Thread 1:
t1: check zk for partition name 「x」 -> t2: register to zk
Thread 2:
t3: check zk for partition name 「x」-> t4: register to zk
t1 and t3 can happen in the same time.
t2 and t4 can happen in the same time.
Thus "registerPartitionAssignment" call twice because partition id is different
but "zookeeperClient.registerPartition" call once because they share the zk node.
That is why
In zk path : /metadata/databases/[databaseName]/tables/[tableName]/partitions 24 partitions created.
However, in zk path /tabletservers/partitions/[partitionId] 32 partitions created.
Are you willing to submit a PR?