-
Notifications
You must be signed in to change notification settings - Fork 33
Retrieval of peers public keys from chainbase db. #1228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oesn't build in our ci/cd which used libstdc++ v11
plugins/net_plugin/peer_keys_db.cpp
Outdated
| auto lower = idx.lower_bound(boost::make_tuple(t_id->id)); | ||
| auto upper = idx.lower_bound(boost::make_tuple(next_tid)); | ||
|
|
||
| for (auto itr = lower; itr != upper; ++itr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table could be rather large. Seems like we should loop through the producer schedule, find each one, and only copy those out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 679 registered producers in EOS. Granted there are only 30 finalizers, but that can grow unbounded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few hundred name/public keys pairs is tiny, a few KB. I feel that adding extra code is not desirable it since it will not make any measurable difference, but it may not be the prevalent viewpoint at ENF :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if the table is maliciously made large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address this with the version as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid a host function if at all possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heifner one question I have, should we consider both proposers and finalizers?
Also should we use the current producers and the ones on the next schedule.
Should we consider the current finalizers + all the ones pending?
Upon reflection, I think it is better to just efficiently reflect the table from chainbase, using the version scheme I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with that. The net_plugin can track the current proposers and finalizers easily. It already tracks the active and pending proposer schedule. The account has to pay for the on-chain ram, duplicating it in the node seems like would be safe. We could limit the copy time to say 5ms so it can't be weaponized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, why bother keeping a copy. Seems like chainbase access should only be slightly slower than C++ memory lookup. You can use hash lookup instead of chainbase lookup, but seems like not worth making a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, why bother keeping a copy.
So it can be accessed without locks from any net_plugin thread, even when the main thread is executing transactions and updating the chainbase db.
|
Apologies if this was covered in some design doc I missed: Was using a ROtrx considered instead of fishing in the tables manually? |
I didn't consider it. Wouldn't really know how to do it either. |
Interesting idea. Currently read-only trxs are not enabled on all nodes, but that could be changed if we wanted to use them "internally" for something like this. Also I wonder how much overhead that would be compared to direct table access. |
|
Note:start |
libraries/chain/peer_keys_db.cpp
Outdated
| fc::raw::unpack(ds, row_name); | ||
| fc::raw::unpack(ds, row_version); | ||
| fc::raw::unpack(ds, row_key); | ||
| assert(row_version > _version && row_version <= *cb_version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this putting too much trust in to the contract that the row contents will always be well formed? Previously one malformed table row would "just" cause the entire peer list to fail to update so I didn't bring it up, but now a malformed table row can abort the whole process if asserts are enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could a row not be well formed though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too short (throws exception), public key is malformed (exception), or row_version==0 (assert), etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You means the serialized data we unpack? How it it possible that the system contract would not populate a correct row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a malformed table row can abort the whole process if asserts are enabled.
I feel that that's the whole point of enabling asserts. You want the process to abort if something that shouldn't be possible does happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how I remember us describing the use case for asserts. I remember asserts being present to describe invariants that cannot be true. In this case this isn't an invariant but rather input validation on what the contract is communicating to nodeos. asserts should be impossible from the viewpoint of nodeos but in this case a misbehaving contract can force an assert to be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invariants that cannot be true
According to how we understand the code to work IMO! And that should include contract code.
Really an assert says: If my code and the compiler are correct, this has to be true.
Different from user input validation that we use EOS_ASSERT for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't consider the system contract and nodeos to be one entity, so communication between them would qualify as requiring input validation. Even the system contract isn't worthy enough to blindly trust to the point of allowing it to fail asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have EOS_ASSERTs now so we should be fine.
libraries/chain/peer_keys_db.cpp
Outdated
| size_t num_updated = 0; | ||
| try { | ||
| const auto& db = chain.db(); | ||
| auto cb_version = _get_version(db); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really seeing any benefit to using chainbase's revision over just simply the applied block num? That's really what the contract is tracking -- the last block num the row was modified at. (version seems a misnomer in that aspect) Relying on chainbase revision == head block num might be an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tracks the version of the peerkeys table, and this version is incremented every time the peerkeys table is modified.
What I meant with cb_version is the version of peerkeys in chainbase, vs the version of peerkeys we currently have in the unordered_flat_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But afaict you're just using chainbase's revision as a proxy for current block num. I don't believe that's a pattern used elsewhere -- normally you'd look at forkdb or the block passed in accepted_block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just using chainbase's revision as a proxy for current block num
Where do I do that? peer_keys_db_t::_get_version() returns the version stored in the peerkeysver table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops you are right I misread the earlier code
libraries/chain/peer_keys_db.cpp
Outdated
| auto cb_version = _get_version(db); | ||
| if (!cb_version || *cb_version == _version) | ||
| return num_updated; // nothing to do | ||
| assert(*cb_version > _version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this potentially assert on fork switches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, good point, let me think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be addressed by 2bcfc00 and
VaultaFoundation/system-contracts@90773cb
which now update the reflected maps only up to lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter what signal you attach to, the state of the node is as of HEAD unless you are running in IRREVERSIBLE mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood your comment. You are only pulling rows with block numbers up to lib from the table.
libraries/chain/peer_keys_db.cpp
Outdated
| auto lower = secidx.lower_bound(std::make_tuple(t_id->id._id, _version + 1)); | ||
| auto upper = secidx.upper_bound(std::make_tuple(t_id->id._id, *cb_version)); | ||
|
|
||
| auto new_map = boost::make_shared<peer_key_map_t>(*get_peer_key_map()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have confidence that given the size of this map can be maliciously controlled that performance will not be a problem at 100k, 500k, etc entries when doing the deep copy? It may be possible someone can make bloated 512KB PUB_WA in these entries as well, making this a very heavy copy (this may be something that the contract can defend against though by simply disallowing PUB_WA here). I'd like to see more data on the overhead and/or defense against abuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point for the PUB_WA. I'll disallow it in the contract.
regpeerkey registers a key for an existing account.
To register 500k peerkeys, you'd have to create 500k accounts. At a cost of 1GB of ram.
The cost for the table would only be 50MB. And because those updates would not occur in a single block it would not take an excessive time. It doesn't seem that that would be the most effective attack against spring.
I do believe that if someone created 500K accounts, there are likely many other issues in Spring that would degrade more that the user registering 500K peer keys and using 50MB of RAM.
There is a possible attack by creating an account with a name, registering a key, then deleting the account which I think would return the ram cost, and then doing the same thing over with a different name. I can add some protection against that if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disallowing webauthn keys: VaultaFoundation/system-contracts@fd86ed8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't too worried about changes; worst case (assuming action executes 100µs) I guess you would end up with 40000 changed keys to iterate through. I was more worried about the unbounded size of the map being copied. Nobody who owns 1GB, 10GB, or whatever should be able to cause large delays inside of nodeos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map with 10,000 keys is copied in 0.059ms. I don't see the large delays inside of nodeos happening.
But if you think this is worth worrying about, I can add some code to ensure that no large delay occurs even if the map was to grow to 1M entries or more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative would be to keep two updated unordered_flat_map in memory, and switch the atomic shared_ptr back and forth between them.
I think I like this solution better because it makes an attack impractical, and in the case where there is no attack the cost of keeping duplicated maps in memory is trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented the flip-flop solution f7384d5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the separate thread solution would work. Couldn't you just put a time limit of say 5ms per accept block? If it is delayed in getting the data no real harm especially if the data is an attack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just put a time limit of say 5ms per accept block?
I implemented the flip-flop solution, so a time limit is not needed anymore. I think the current version is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the separate thread solution would work
I was thinking the copy and merge would occur on a separate thread.
spoonincode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires further discussion:
#1228 (comment)
| symbol.cpp | ||
| whitelisted_intrinsics.cpp | ||
| thread_utils.cpp | ||
| peer_keys_db.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally this was in net_plugin. I think that is a better place for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it in controller because I wanted to be able to test it in unit tests which I think is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net_plugin has unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test is in eos-system-contracts/tests and accesses the controller via the tester. I don't think the cmake system is plumbed for accessing net_plugin, but I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize you were using it in eos-system-contracts/tests. Seems like those tests should be moved to Spring as they test Spring functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize you were using it in
eos-system-contracts/tests. Seems like those tests should be moved to Spring as they test Spring functionality.
Well, I need the contracts from eos-system-contracts to modify chainbase memory when registering keys using the regpeerkey action, so the test has to be in eos-system-contracts/tests. But in the same test I verify the reading from chainbase memory part which is in libraries/chain, so I can check that both the registering of keys, and the reflection into an unordered_flat_map work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would rather test the eos-system-contracts in Spring than test Spring in eos-system-contracts. Not sure @spoonincode thoughts on this. I can approve as is, but seems cleaner to move all the tests to Spring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would rather test the eos-system-contracts in Spring than test Spring in eos-system-contracts
The short answer is I agree with that.
I had a longer answer how maybe the ideal way to test this feature is in spring but not via eos-system-contracts, because my understanding from prior discussions is that reference-contracts/eos-system-contracts being part of spring's tests has been considered somewhat1 an anti-pattern, especially outside of integration level tests. But since this will ultimately involve net plugin tests too perhaps it will eventually be part of an integration test.
Footnotes
-
Or maybe not even 'somewhat' depending who you ask ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to make sure that the code writing into chainbase in eos-system-contracts is in sync with the code reading the data in spring, if we are not using both spring code and eos-system-contracts in the tests.
For example when I changed the code in eos-system-contracts to store block_num as uint32_t instead of uint64_t, the test failed because the code was not in sync with spring. I wouldn't want to lose that connection by duplicating the contract code in spring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking on it more, it's clearer that the right place to test is in spring for two reasons --
- I take it eventually there will be an integration test for the feature as a whole (i.e. with net stuff tested) too and that can only be done in spring anyways. But even if not making such a test,
- contract tests should not depend on random internals of libchain because it prevents us from creating a clean libtester api. In other words, contract unit tests directly interacting beyond the tester apis is not something we want to encourage -- we want to reduce the leakiness not add more. As an extreme example of why it's a problem, consider if we want to rewrite the system contracts tests in Vert1. It's literally impossible to have this test with Vert.
I see @heifner has approved so I won't block the PR on this issue either. I'll rereview shortly.
Footnotes
-
This might not even be an 'extreme' example since it has been floated more than once! ↩
|
Silly me, I realize there is an issue with my flip-flop scheme... moving the PR back to draft. |
|
Code is ready for review again. I ended up using a |
libraries/chain/CMakeLists.txt
Outdated
|
|
||
| if(CMAKE_CXX_COMPILER_ID MATCHES "Clang" AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 14.0) | ||
| target_compile_options(eosio_chain PUBLIC -Wthread-safety) | ||
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we should consider making this common somewhere and not copy paste it all over.
Also, since this is PUBLIC it means everything that links to libchain also gets it enabled. This is true today for everything that links to net_plugin -- for example nodeos' main.cpp currently have -Wthread-safety enabled. Is that a problem? If it's not a problem, all the more reason to consider just enabling it globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like me to add it to the top CMakeLists.txt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't really sure what the side effect was for enabling -Wthread-safety on code that isn't prepared for it. We're already doing that a little because of net_plugin and now we're going to be doing it a lot because of libchain. If that's fine maybe we blanket enable it at top anyways, if that's bad then maybe we should change these to PRIVATE instead. Doesn't need to be done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just trying it out and it builds fine. Checking it in to see if there is any test issue. I can undo that commit if desired.
eb52ee9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't really sure what the side effect was for enabling
-Wthread-safetyon code that isn't prepared for it.
I believe that it should be fine (a no-op)
| private: | ||
| std::optional<uint64_t> _get_version(const chainbase::database& db); | ||
|
|
||
| bool _active; // if not active (the default), no update occurs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this needs a = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks c60b6e9
libraries/chain/peer_keys_db.cpp
Outdated
| "deserialized invalid version from `peerkeys`"); | ||
|
|
||
| fc::raw::unpack(ds, row_key); | ||
| EOS_ASSERT(row_key.valid(), misc_exception, "deserialized invalid public key from `peerkeys`"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suspicious this opens up the possibility for a denial of service. It looks like you should just ignore entries with a bad key instead of failing to update all entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is what we are doing with the try/catch block within the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right I got cross eyed with the double-uped try blocks
spoonincode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libraries/chain/peer_keys_db.cpp
Outdated
| EOS_ASSERT(row_block_num > static_cast<uint64_t>(_block_num), misc_exception, | ||
| "deserialized invalid version from `peerkeys`"); | ||
|
|
||
| fc::raw::unpack(ds, row_version); // no need to check, all versions must include the `key` optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to https://github.com/eosnetworkfoundation/eos-system-contracts/pull/185/files#r2001804923 -- comment does not match my understanding of prior discussion. It seems like if version != 0 this should skip the row.
Resolves #1183.
It is needed for #1169.
Performance tests for reflection from peers keys in chainbase db to the
unordered_flat_map(release build, AMD 7950x). I used 10,000 keys which is many more that can be expected.unordered_flat_mapwith 10,000{name/public_key}pairs from chainbase:1.71573msunordered_flat_mapcontaining any number of{name/public_key}pairs with 32 new keys:~0.01ms.