wallet: Ensure best block matches wallet scan state#30221
wallet: Ensure best block matches wallet scan state#30221ryanofsky merged 7 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30221. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
|
|
I'm not sure, but it seems like a potentially good thing to me that The last block processed is the block that CWallet in-memory state has been synced to (particularly CWalletTx::m_state which includes mempool / abandoned information not serialized to disk). The BESTBLOCK record is the last block whose data has been flushed to disk, that the wallet should begin syncing from next time it is reloaded. It seems like the first commit ae3b8280a7dbb6cc87814ed97dbea5a122cf8215 could lead to a performance regression if it is writing the flushing the BESTBLOCK record on every single blockConnected event from the node. For example during reindexing it could write and sync the database each time a block is connected, even if the block is before the wallets birthday, or doesn't have any relevant transactions? I think it probably makes sense to write BESTBLOCK in a smarter way, and probably write it more frequently, but it seems unnecessary to have to write it every block, and it might be bad for performance now or make optimizations harder in the future. I'm also curious about the idea of saving height with the best block record. I could imagine this being useful, since IIRC parts of wallet loading are complicated by not knowing heights of transactions before the chain is attached, but it doesn't really seem like the height is used for much yet. Was this intended to be used by something else later? |
But none of that state is relevant to the last block processed. Any state related to blocks (confirmed or conflicted) is written to disk.
The point is that this discrepancy can result in skipping blocks. It doesn't make sense to me that we wouldn't store the block the wallet's state has been synced to, and it doesn't make sense that we should rely on a field that means something else to determine what point to start rescanning from on the next load. If BESTBLOCK doesn't match the chainstate, that's fine because it's a locator and we will just find the fork and sync from there. I don't think it's necessary to record which block we think the chainstate is synced to. |
You may be right this is the better approach. But I think the previous approach does make some sense, too. You might not want to write to the wallet database each time a block is connected if the block doesn't contain any relevant transactions, especially during reindexing. |
I think you would since not writing it would result in possibly rescanning that block at the next loading which takes a little bit of time, regardless of whether any relevant transactions are in the block.
Perhaps an easy solution to this is to just write the best block every 1000 blocks (or some other interval) when we are in IBD?
This was mainly done to avoid looking up the height every time we load since that was being problematic and causing a tests to fail. But it definitely could be more useful in the future. |
|
The idea of saving heights is interesting to me because wallet code assumes it know block heights many places, but it doesn't actually store heights anywhere. So for example, if the wallet stored a mapping of block hashes to heights (or other block information) it might be useful for allowing the wallet to work in an offline mode or letting the wallet CLI tool have more capabilities. This is all pretty abstract though, so I'm not suggesting something specific.
Yes but I think that just adds back the complexity you were trying to get rid of, in a form that seems worse than the status quo. If you implement that, the wallet will be back to tracking the last block processed separately from the best block written to the database. And now, instead of the node determining when data should be flushed to disk and having all wallets flush that data simultaneously, each wallet will have internal logic deciding when to flush. This would be more complicated, and could be worse for power consumption and performance if there are multiple wallets and flushes are happening more frequently at different times. I think overall this PR is doing some good things, but the goals seem either not clear, or not clearly good, so I wonder if maybe you could take these changes and make more targeted PRs for the things you want to achieve? Or, if this PR is definitely the direction you want to go, I'm happy to review it. I don't like some things it is doing, but overall I think it is a reasonable change. |
e5c9876 to
56c71b8
Compare
fjahr
left a comment
There was a problem hiding this comment.
Concept ACK
I can confirm that best block locator and last_processed_block being different is confusing, see also #30455 (comment)
Currently, this needs a rebase and I'm curious if @achow101 plans to make further changes based on @ryanofsky 's comments.
test/functional/wallet_assumeutxo.py
Outdated
There was a problem hiding this comment.
I'm thinking that it might make sense to keep this backup where it is and add second backup created at 199. Then this could test that both cases work as expected, i.e. 199 errors below and 299 doesn't. 299 is an interesting edge case in general (snapshot height == backup height).
I wonder if something else could be done to resolve this confusion other than writing to every wallet file every time a new block is connected, even if the block doesn't contain any relevant transactions. To me, "last block processed" and "last block flushed" seem like different concepts, and we could force them to be the same but only if we:
|
|
re: #30221 (comment)
This seems unrelated to the other CI failures, but it looks like this failure is caused by a bug in the fuzz test: Where the CWallet object is destroyed after the TestingSetup object so the interfaces::Chain object is already destroyed and this error happens. Simplest fix would probably be to make the CWallet object a static variable in the function like the TestingSetup variable is, instead of a global variable. |
7eed9f1 to
dde9704
Compare
It can't be static because the But, with moving |
dde9704 to
e2e9c30
Compare
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK e2e9c30a4d04bd8173c3386c5b97dbac227e810e just moving WriteBestBlock call from destructor back to RemoveWallet() since last review.
I left a bunch of suggestions below and repeated some of Martin's but they are all nitpicking at this point so feel free to ignore
| } | ||
|
|
||
| // Update the best block | ||
| SetLastBlockProcessed(block.height - 1, *Assert(block.prev_hash)); |
There was a problem hiding this comment.
In commit "wallet: Update best block record after block dis/connect" (f1f254f6c9d3cead617300367442c1bbb449af7c)
re: #30221 (comment)
To deal with unclean shutdowns, during disconnect the reverse order compared to blockConnected would be better: If the locator is updated first, an unclean shutdown will lead to a rescan of the disconnected block (which might do nothing but correct the locator). But if txns are set to Inactive first and have an unclean shutdown after that, the transactions will be missing from the wallet / balance. As mentioned above, doing it all in one batch would resolve this in a more elegant way.
I might not totally understand this comment, but I think it is suggesting calling WriteBestBlock() at the top and SetLastBlockProcessed() at the bottom of blockDisconnected() , which is the opposite of what currently happens in blockConnected() where the last block is updated at the top and best block is updated at the bottom.
This could be reasonable. I also think it could be reasonable to do the opposite and just update the last block and best block at the bottom of both methods.
I think if the concern here is dealing with "unclean shutdowns" there is probably nothing special about the block disconnected method or reason to be particularly concerned about shutdowns here (unless I'm missing something), and it should be if the ok best block is a little out of date in that case.
src/wallet/wallet.cpp
Outdated
There was a problem hiding this comment.
In commit "wallet: Write best block record on unload" (8ecc7653556a1977eae5a902d51a2ececde94913)
Just to be sure, current state is that if WriteBestBlock is called here, rather than being called in the destructor or being called in FlushAndDeleteWallet, the best block locator could be a little out of date and this is maybe not ideal but fine?
re: #30221 (comment)
It can't be static because the
FUZZ_TARGETbelow needs to access it. However, the wallet could be created in there instead of in the setup, as the other wallet fuzz tests do.
Yes, suggestion would be to follow the fuzz test setup pattern and create a static variable and global pointer. But no need to change this if debugging the CI failure is hard and only consequence is the best block could be a little behind.
e2e9c30 to
4e3acb8
Compare
rkrux
left a comment
There was a problem hiding this comment.
This partial review is for the previous diff at e2e9c30a4d04bd8173c3386c5b97dbac227e810e. I started looking into it today and need to spend some more time in internalising the consequences of this diff but at the moment I'm inclining towards a concept ack.
Left few questions below.
src/wallet/wallet.cpp
Outdated
There was a problem hiding this comment.
This comment is specific to the concept of scanning for wallet transactions and then updating the last block processed both in memory and disk:
Additionally, after rescanning on wallet loading, instead of writing
the locator for the current chain tip, write the locator for the last
block that the rescan had scanned.
There are couple other usages of ScanForWalletTransactions in rescanblockchain RPC and conditionally in importdescriptors. I guess we don't want to update the block locator in disk in these cases because we also sync the block record to disk every 144 blocks now & doing it in these 2 RPCs doesn't seem necessary?
There was a problem hiding this comment.
Those usages of ScanForWalletTransactions specifically set the option to not write the last block scanned to disk. This is because they are expected to be used during normal operation when the wallet will still be handling incoming blocks and transactions, so having that record possibly be out sync is actually quite bad.
There was a problem hiding this comment.
Also just mentioning that rescanblockchain can be used with ranges (start_height / stop_height). If stop_height is set to some past block, setting the best block back to that old block would be incorrect and lead to wrong balances etc.
There was a problem hiding this comment.
Thanks, this makes sense to me.
Limiting the writing of best block to disk mainly to cases1 that are actively involved in processing the new blocks seems logical. Since these cases will be updating the best block periodically & conditionally, it's prudent to not interfere by doing the same via other user driven RPCs that work on different criterion as highlighted in the above comments.
This is because they are expected to be used during normal operation when the wallet will still be handling incoming blocks and transactions
I find this corroborated here in the scan function:
Lines 1893 to 1896 in c779ee3
Footnotes
-
blockConnected, blockDisconnected notifications ↩
When a block is connected, if the new block had anything relevant to the wallet, update the best block record on disk. If not, also sync the best block record to disk every 144 blocks. Also reuse the new WriteBestBlock method in BackupWallet.
The only reason to call chainStateFlushed during wallet loading is to ensure that the best block is written. Do these writes explicitly to prepare for removing chainStateFlushed, while also ensuring that the wallet's in memory state tracking is written to disk. Additionally, after rescanning on wallet loading, instead of writing the locator for the current chain tip, write the locator for the last block that the rescan had scanned. This ensures that the stored best block record matches the wallet's current state. Any blocks dis/connected during the rescan are processed after the rescan and the last block processed will be updated accordingly.
Migrating a wallet requires having a bestblock record. This is always the case in normal operation as such a record is always written on wallet loading if it didn't already exist. However, within the unit tests and benchmarks, this is not guaranteed. Since migration requires the record, WalletMigration needs to also add this record before the benchmark.
chainStateFlushed is no longer needed since the best block is updated after a block is scanned. Since the chainstate being flushed does not necessarily coincide with the wallet having processed said block, it does not entirely make sense for the wallet to be recording that block as its best block, and this can cause race conditions where some blocks are not processed. Thus, remove this notification.
StopWallets, which was being called prior to UnloadWallets, performs an unnecessary database closing step. This causes issues in UnloadWallets which does additional database cleanups. Since the database closing step is unnecessary, StopWallets is removed, and UnloadWallets is now called from WalletLoaderImpl::stop.
Since CWallet::chainStateFlushed is now no-op, this test no longer tests the concurrent writes scenario. There are no other cases where multiple DatabaseBatches are open at the same time.
rkrux
left a comment
There was a problem hiding this comment.
ACK 30a94b1
I like how this PR brings the best/last block processed updates both in memory & disk in unison, making it easier to reason about the wallet code ultimately. The detailed commit messages aids review.
I have verified that the location of such changes makes it possible for the following wallet RPCs to correctly handle best block related memory & disk updates:
createwallet
loadwallet
restorewallet
unloadwallet
backupwallet
I feel that updating the locator when wallet transactions are found in the latest block & periodically every 144 blocks in blockConnected notifications, and also going back to the previous best block hash in the corresponding blockConnected notification makes it more intuitive & easier to reason about. There's a possibility that the best block locator will be updated frequently if the wallet has transactions in contiguous blocks but I don't think that's a big deal as it's relatively less likely to occur.
Even though I had not read the chainStateFlushed code prior to this PR, I found consistent usage of that terminology within wallet instance slightly distracting & thus support its removal.
I prefer the SetLastBlockProcessedInMem naming as it clearly mentions the storage that's being updated. Also in favour of SetLastBlockProcessed function updating in memory & disk in one go.
I realised that there's an opportunity to put m_last_block_processed & m_last_block_processed_height together in a struct as they are more often than not used together, I see a related refactor has been alluded to in this comment: #30221 (comment)
| @@ -3220,13 +3224,21 @@ bool CWallet::AttachChain(const std::shared_ptr<CWallet>& walletInstance, interf | |||
|
|
|||
There was a problem hiding this comment.
Within AttachChain itself, there is another opportunity to use setLastBlockProcessedInMem here:
Lines 3137 to 3143 in c779ee3
The cs_wallet is held at the start of this function.
There was a problem hiding this comment.
Will do if I need to retouch.
src/wallet/wallet.cpp
Outdated
There was a problem hiding this comment.
Thanks, this makes sense to me.
Limiting the writing of best block to disk mainly to cases1 that are actively involved in processing the new blocks seems logical. Since these cases will be updating the best block periodically & conditionally, it's prudent to not interfere by doing the same via other user driven RPCs that work on different criterion as highlighted in the above comments.
This is because they are expected to be used during normal operation when the wallet will still be handling incoming blocks and transactions
I find this corroborated here in the scan function:
Lines 1893 to 1896 in c779ee3
Footnotes
-
blockConnected, blockDisconnected notifications ↩
| // Also save the best block locator because rescanning only updates it intermittently. | ||
| walletInstance->SetLastBlockProcessed(*scan_res.last_scanned_height, scan_res.last_scanned_block); |
There was a problem hiding this comment.
I verified that ScanForWalletTransactions does update the best block locator on disk intermittently:
Lines 1875 to 1883 in c779ee3
But I don't see it updating m_last_block_processed, instead it reads it. Even if m_last_block_processed comes out to be same as the best block locator at the end in the AttachChain case, I find updating m_last_block_processed and best block locator here after rescan better both in terms of data consistency and readability - gives more confidence.
| walletInstance->SetLastBlockProcessed(*scan_res.last_scanned_height, scan_res.last_scanned_block); | ||
| } | ||
| } | ||
| walletInstance->m_attaching_chain = false; |
There was a problem hiding this comment.
Though these m_attaching_chain updates were done in AttachChain function, it looks quite cleaner now to get rid of this low level boolean from wallet instance, makes it easier to wrap head around AttachChain.
| } | ||
| walletInstance->m_attaching_chain = false; | ||
| // Set and update the best block record | ||
| // Set last block scanned as the last block processed as it may be different in case the case of a reorg. |
There was a problem hiding this comment.
as it may be different in case the case of a reorg.
Redundancy if retouched.
| # no longer the case (after the unclean shutdown, the node's chain returned to the pre-invalidation tip). | ||
| # This issue blocks any future spending and results in an incorrect balance display. | ||
| # After disconnecting the block, the wallet should record the new best block. | ||
| # Upon reload after the crash, since the chainstate was not flushed, the tip contains the previously abandoned |
There was a problem hiding this comment.
since the chainstate was not flushed
I now realise that this refers to the node specific chainstate.
Having seeing the removal of chain state flushed function in the wallet, I initially thought it referred to that (which doesn't exist anymore).
| # Due to an existing bug, the wallet incorrectly keeps the transaction in an abandoned state, even though that's | ||
| # no longer the case (after the unclean shutdown, the node's chain returned to the pre-invalidation tip). | ||
| # This issue blocks any future spending and results in an incorrect balance display. | ||
| # After disconnecting the block, the wallet should record the new best block. |
There was a problem hiding this comment.
the wallet should record the new best block.
I assume there is no way for us to verify this before the crash and after tip invalidation. The wallet info returns value based on the one that's in memory: m_last_block_processed
wallet.getwalletinfo()['lastprocessedblock']['hash']
There was a problem hiding this comment.
Not sure what would be verified? After invalidateblock and before the crash, we do verify that the coinbase transaction is marked abandoned. This check is to verify that starting after the crash, the wallet has the correct before-invalidate state.
There was a problem hiding this comment.
Not sure what would be verified? After invalidateblock and before the crash, we do verify that the coinbase transaction is marked abandoned.
I wanted to check that the best block locator is indeed persisted to disk after invalidation but the getwalletinfo RPC reads lastprocessedblock from memory, which can't be used for verifying this PR's change. The coinbase transaction verification was done before this PR as well, so that also not verifies.
Upon reload after the crash, since the chainstate was not flushed, the tip contains the previously abandoned coinbase. This should be rescanned and now un-abandoned.
While this comment is helpful, but I'd prefer to assert the "rescanning" portion in the test. Previously, I had in mind some functionality that reads from disk while returning the response in RPC but that might be going a bit too far just for tests. The other easier way I found out is to verify using the debug log from AttachChain that confirms the rescan indeed does happen - which tells me that the best block locator was persisted to disk after block invalidation and before node crash.
Line 3192 in 3f83c74
Functional test diff that worked in my system:
@@ -104,6 +104,7 @@ class ReorgsRestoreTest(BitcoinTestFramework):
# Disconnect tip and sync wallet state
tip = wallet.getbestblockhash()
+ tip_height = wallet.getblockstats(tip)["height"]
wallet.invalidateblock(tip)
wallet.syncwithvalidationinterfacequeue()
@@ -116,7 +117,8 @@ class ReorgsRestoreTest(BitcoinTestFramework):
node.kill_process()
# Restart the node and confirm that it has not persisted the last chain state changes to disk
- self.start_node(0)
+ with self.nodes[0].assert_debug_log(expected_msgs=[f"Rescanning last 1 blocks (from block {tip_height - 1})...\n"]):
+ self.start_node(0)
assert_equal(node.getbestblockhash(), tip)
# After disconnecting the block, the wallet should record the new best block.There was a problem hiding this comment.
Ah, I see. I think that's unnecessary and the current test is sufficient to verify the behavior.
| # Upon reload after the crash, since the chainstate was not flushed, the tip contains the previously abandoned | ||
| # coinbase. This should be rescanned and now un-abandoned. | ||
| wallet = node.get_wallet_rpc("reorg_crash") | ||
| assert_equal(wallet.getwalletinfo()['immature_balance'], 0) # FIXME: #31824. |
There was a problem hiding this comment.
- assert_equal(wallet.getwalletinfo()['immature_balance'], 0) # FIXME: #31824.
+ assert_greater_than(wallet.getwalletinfo()['immature_balance'], 0)Nit: The correct balance is there, can be asserted.
| # coinbase. This should be rescanned and now un-abandoned. | ||
| wallet = node.get_wallet_rpc("reorg_crash") | ||
| assert_equal(wallet.getwalletinfo()['immature_balance'], 0) # FIXME: #31824. | ||
| assert_equal(wallet.gettransaction(coinbase_tx_id)['details'][0]['abandoned'], False) |
There was a problem hiding this comment.
self.log.info("Test that wallet doesn't crash due to a duplicate block disconnection event after an unclean shutdown")
IMO the test log should be updated to mention this improvement also.
That the wallet transactions are not missed anymore in case of temporarily invalidated blocks with unclean node shutdown.
There was a problem hiding this comment.
Will do if I need to retouch.
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK 30a94b1. Only changes since last review are using WriteBestBlock method more places and updating comments.
It looks like there are some small review comments above that could be followed up, but with 3 acks I think it would be good to merge this shortly and let smaller improvements can be made later. I plan to merge this soon after testing locally.
|
|
||
| if (!m_last_block_processed.IsNull()) { | ||
| CBlockLocator loc; | ||
| chain().findBlock(m_last_block_processed, FoundBlock().locator(loc)); |
There was a problem hiding this comment.
In commit "wallet: Update best block record after block dis/connect" (7bacabb)
IMO would be good to assert !loc.IsNull() as a sanity check to ensure that doesn't happen. (It shouldn't, but node or wallet code could change in the future and invalidate current assumptions.)
I did the follow-ups in #32580. |
Implements the idea discussed in #29652 (comment)
Currently,
m_last_block_processedandm_last_block_processed_heightare not guaranteed to match the block locator stored in the wallet, nor do either of those fields actually represent the last block that the wallet is synced up to. This is confusing and unintuitive.This PR changes those last block fields to be updated whenever the wallet makes a change to the db for new transaction state found in new blocks. Whenever a block is received that contains a transaction relevant to the wallet, the last block locator will now be written to disk. Furthermore, every block disconnection will now write an updated locator.
To ensure that the locator is relatively recent and loading rescans are fairly quick in the event of unplanned shutdown, it is also now written every 144 blocks (~1 day). Additionally it is now written when the wallet is unloaded so that it is accurate when the wallet is loaded again.
Lastly, the
chainstateFlushednotification in the wallet is changed to be a no-op. The best block locator record is no longer written whenchainstateFlushedis received from the node since it should already be mostly up to date.