Search before asking
Description
In #1749 we introduce log recovery for unclean shut down. However, we have observed that for production clusters, this recovery time is excessively long, which can cause Fluss running in Kubernetes to be killed by health checks due to timeout, leading to repeated restarts of the TabletServer. Therefore, it is necessary to optimize this recovery process to reduce the recovery time.
Willingness to contribute
Search before asking
Description
In #1749 we introduce log recovery for unclean shut down. However, we have observed that for production clusters, this recovery time is excessively long, which can cause Fluss running in Kubernetes to be killed by health checks due to timeout, leading to repeated restarts of the TabletServer. Therefore, it is necessary to optimize this recovery process to reduce the recovery time.
Willingness to contribute