Retrieval of peers public keys from chainbase db. #1228

greg7mdp · 2025-03-04T19:13:46Z

Resolves #1183.

It is needed for #1169.

Performance tests for reflection from peers keys in chainbase db to the unordered_flat_map (release build, AMD 7950x). I used 10,000 keys which is many more that can be expected.

update empty unordered_flat_map with 10,000 {name/public_key} pairs from chainbase: 1.71573ms
update unordered_flat_map containing any number of {name/public_key} pairs with 32 new keys: ~0.01ms.
each pair occupies 88 bytes, so 10,000 pairs occupy about 1MB per map, or 2MB total.

Note: when a peerkey row is deleted in chainbase via delpeerkey, the key is not deleted from the unordered_flat_map. I don't think it should be a problem.

…oesn't build in our ci/cd which used libstdc++ v11

plugins/net_plugin/peer_keys_db.cpp

heifner · 2025-03-04T20:21:12Z

plugins/net_plugin/peer_keys_db.cpp

+            auto               lower = idx.lower_bound(boost::make_tuple(t_id->id));
+            auto               upper = idx.lower_bound(boost::make_tuple(next_tid));
+
+            for (auto itr = lower; itr != upper; ++itr) {


The table could be rather large. Seems like we should loop through the producer schedule, find each one, and only copy those out.

There are 679 registered producers in EOS. Granted there are only 30 finalizers, but that can grow unbounded.

A few hundred name/public keys pairs is tiny, a few KB. I feel that adding extra code is not desirable it since it will not make any measurable difference, but it may not be the prevalent viewpoint at ENF :-)

But what if the table is maliciously made large?

I'll address this with the version as well.

I think we should avoid a host function if at all possible.

@heifner one question I have, should we consider both proposers and finalizers?
Also should we use the current producers and the ones on the next schedule.
Should we consider the current finalizers + all the ones pending?

Upon reflection, I think it is better to just efficiently reflect the table from chainbase, using the version scheme I described above.

I think I agree with that. The net_plugin can track the current proposers and finalizers easily. It already tracks the active and pending proposer schedule. The account has to pay for the on-chain ram, duplicating it in the node seems like would be safe. We could limit the copy time to say 5ms so it can't be weaponized.

However, why bother keeping a copy. Seems like chainbase access should only be slightly slower than C++ memory lookup. You can use hash lookup instead of chainbase lookup, but seems like not worth making a copy.

However, why bother keeping a copy.

So it can be accessed without locks from any net_plugin thread, even when the main thread is executing transactions and updating the chainbase db.

spoonincode · 2025-03-04T21:11:41Z

Apologies if this was covered in some design doc I missed: Was using a ROtrx considered instead of fishing in the tables manually?

greg7mdp · 2025-03-04T21:37:09Z

Apologies if this was covered in some design doc I missed: Was using a ROtrx considered instead of fishing in the tables manually?

I didn't consider it. Wouldn't really know how to do it either.

heifner · 2025-03-04T21:51:05Z

Apologies if this was covered in some design doc I missed: Was using a ROtrx considered instead of fishing in the tables manually?

Interesting idea. Currently read-only trxs are not enabled on all nodes, but that could be changed if we wanted to use them "internally" for something like this. Also I wonder how much overhead that would be compared to direct table access.

…ith tests)

ericpassmore · 2025-03-07T21:57:10Z

Note:start
category: Other
component: P2P
summary: Retrieval of peers public keys from chainbase db.
Note:end

spoonincode · 2025-03-10T15:16:35Z

libraries/chain/peer_keys_db.cpp

+         fc::raw::unpack(ds, row_name);
+         fc::raw::unpack(ds, row_version);
+         fc::raw::unpack(ds, row_key);
+         assert(row_version > _version && row_version <= *cb_version);


Is this putting too much trust in to the contract that the row contents will always be well formed? Previously one malformed table row would "just" cause the entire peer list to fail to update so I didn't bring it up, but now a malformed table row can abort the whole process if asserts are enabled.

How could a row not be well formed though?

It's too short (throws exception), public key is malformed (exception), or row_version==0 (assert), etc

You means the serialized data we unpack? How it it possible that the system contract would not populate a correct row?

a malformed table row can abort the whole process if asserts are enabled.

I feel that that's the whole point of enabling asserts. You want the process to abort if something that shouldn't be possible does happen.

That's not how I remember us describing the use case for asserts. I remember asserts being present to describe invariants that cannot be true. In this case this isn't an invariant but rather input validation on what the contract is communicating to nodeos. asserts should be impossible from the viewpoint of nodeos but in this case a misbehaving contract can force an assert to be true.

invariants that cannot be true

According to how we understand the code to work IMO! And that should include contract code.
Really an assert says: If my code and the compiler are correct, this has to be true.

Different from user input validation that we use EOS_ASSERT for.

I don't consider the system contract and nodeos to be one entity, so communication between them would qualify as requiring input validation. Even the system contract isn't worthy enough to blindly trust to the point of allowing it to fail asserts.

I have EOS_ASSERTs now so we should be fine.

spoonincode · 2025-03-10T15:32:32Z

libraries/chain/peer_keys_db.cpp

+   size_t num_updated = 0;
+   try {
+      const auto& db = chain.db();
+      auto cb_version = _get_version(db);


I am not really seeing any benefit to using chainbase's revision over just simply the applied block num? That's really what the contract is tracking -- the last block num the row was modified at. (version seems a misnomer in that aspect) Relying on chainbase revision == head block num might be an implementation detail.

This tracks the version of the peerkeys table, and this version is incremented every time the peerkeys table is modified.
What I meant with cb_version is the version of peerkeys in chainbase, vs the version of peerkeys we currently have in the unordered_flat_map.

But afaict you're just using chainbase's revision as a proxy for current block num. I don't believe that's a pattern used elsewhere -- normally you'd look at forkdb or the block passed in accepted_block.

just using chainbase's revision as a proxy for current block num

Where do I do that? peer_keys_db_t::_get_version() returns the version stored in the peerkeysver table.

whoops you are right I misread the earlier code

spoonincode · 2025-03-10T15:36:09Z

libraries/chain/peer_keys_db.cpp

+      auto cb_version = _get_version(db);
+      if (!cb_version || *cb_version == _version)
+         return num_updated;                    // nothing to do
+      assert(*cb_version > _version);


Won't this potentially assert on fork switches?

Hum, good point, let me think about it.

Should be addressed by 2bcfc00 and
VaultaFoundation/system-contracts@90773cb

which now update the reflected maps only up to lib.

Doesn't matter what signal you attach to, the state of the node is as of HEAD unless you are running in IRREVERSIBLE mode.

Sorry, I misunderstood your comment. You are only pulling rows with block numbers up to lib from the table.

spoonincode · 2025-03-10T15:58:48Z

libraries/chain/peer_keys_db.cpp

+      auto lower = secidx.lower_bound(std::make_tuple(t_id->id._id, _version + 1));
+      auto upper = secidx.upper_bound(std::make_tuple(t_id->id._id, *cb_version));
+
+      auto new_map = boost::make_shared<peer_key_map_t>(*get_peer_key_map());


Do we have confidence that given the size of this map can be maliciously controlled that performance will not be a problem at 100k, 500k, etc entries when doing the deep copy? It may be possible someone can make bloated 512KB PUB_WA in these entries as well, making this a very heavy copy (this may be something that the contract can defend against though by simply disallowing PUB_WA here). I'd like to see more data on the overhead and/or defense against abuse.

Good point for the PUB_WA. I'll disallow it in the contract.

regpeerkey registers a key for an existing account.

To register 500k peerkeys, you'd have to create 500k accounts. At a cost of 1GB of ram.

The cost for the table would only be 50MB. And because those updates would not occur in a single block it would not take an excessive time. It doesn't seem that that would be the most effective attack against spring.

I do believe that if someone created 500K accounts, there are likely many other issues in Spring that would degrade more that the user registering 500K peer keys and using 50MB of RAM.

There is a possible attack by creating an account with a name, registering a key, then deleting the account which I think would return the ram cost, and then doing the same thing over with a different name. I can add some protection against that if necessary.

disallowing webauthn keys: VaultaFoundation/system-contracts@fd86ed8

I wasn't too worried about changes; worst case (assuming action executes 100µs) I guess you would end up with 40000 changed keys to iterate through. I was more worried about the unbounded size of the map being copied. Nobody who owns 1GB, 10GB, or whatever should be able to cause large delays inside of nodeos.

The map with 10,000 keys is copied in 0.059ms. I don't see the large delays inside of nodeos happening.

But if you think this is worth worrying about, I can add some code to ensure that no large delay occurs even if the map was to grow to 1M entries or more.

Another alternative would be to keep two updated unordered_flat_map in memory, and switch the atomic shared_ptr back and forth between them.
I think I like this solution better because it makes an attack impractical, and in the case where there is no attack the cost of keeping duplicated maps in memory is trivial.

Implemented the flip-flop solution f7384d5

I'm not sure how the separate thread solution would work. Couldn't you just put a time limit of say 5ms per accept block? If it is delayed in getting the data no real harm especially if the data is an attack.

Couldn't you just put a time limit of say 5ms per accept block?

I implemented the flip-flop solution, so a time limit is not needed anymore. I think the current version is safe.

I'm not sure how the separate thread solution would work

I was thinking the copy and merge would occur on a separate thread.

spoonincode

requires further discussion:
#1228 (comment)

…ws large.

heifner · 2025-03-11T19:14:01Z

libraries/chain/CMakeLists.txt

             symbol.cpp
             whitelisted_intrinsics.cpp
             thread_utils.cpp
+             peer_keys_db.cpp


Originally this was in net_plugin. I think that is a better place for it.

I moved it in controller because I wanted to be able to test it in unit tests which I think is useful.

net_plugin has unit tests.

The unit test is in eos-system-contracts/tests and accesses the controller via the tester. I don't think the cmake system is plumbed for accessing net_plugin, but I could be wrong.

I didn't realize you were using it in eos-system-contracts/tests. Seems like those tests should be moved to Spring as they test Spring functionality.

I didn't realize you were using it in eos-system-contracts/tests. Seems like those tests should be moved to Spring as they test Spring functionality.

Well, I need the contracts from eos-system-contracts to modify chainbase memory when registering keys using the regpeerkey action, so the test has to be in eos-system-contracts/tests. But in the same test I verify the reading from chainbase memory part which is in libraries/chain, so I can check that both the registering of keys, and the reflection into an unordered_flat_map work.

I think I would rather test the eos-system-contracts in Spring than test Spring in eos-system-contracts. Not sure @spoonincode thoughts on this. I can approve as is, but seems cleaner to move all the tests to Spring.

I think I would rather test the eos-system-contracts in Spring than test Spring in eos-system-contracts

The short answer is I agree with that.

I had a longer answer how maybe the ideal way to test this feature is in spring but not via eos-system-contracts, because my understanding from prior discussions is that reference-contracts/eos-system-contracts being part of spring's tests has been considered somewhat¹ an anti-pattern, especially outside of integration level tests. But since this will ultimately involve net plugin tests too perhaps it will eventually be part of an integration test.

Footnotes

Or maybe not even 'somewhat' depending who you ask ↩

It is hard to make sure that the code writing into chainbase in eos-system-contracts is in sync with the code reading the data in spring, if we are not using both spring code and eos-system-contracts in the tests.

For example when I changed the code in eos-system-contracts to store block_num as uint32_t instead of uint64_t, the test failed because the code was not in sync with spring. I wouldn't want to lose that connection by duplicating the contract code in spring.

Thinking on it more, it's clearer that the right place to test is in spring for two reasons --

I take it eventually there will be an integration test for the feature as a whole (i.e. with net stuff tested) too and that can only be done in spring anyways. But even if not making such a test,

contract tests should not depend on random internals of libchain because it prevents us from creating a clean libtester api. In other words, contract unit tests directly interacting beyond the tester apis is not something we want to encourage -- we want to reduce the leakiness not add more. As an extreme example of why it's a problem, consider if we want to rewrite the system contracts tests in Vert¹. It's literally impossible to have this test with Vert.

I see @heifner has approved so I won't block the PR on this issue either. I'll rereview shortly.

Footnotes

This might not even be an 'extreme' example since it has been floated more than once! ↩

greg7mdp · 2025-03-11T21:36:46Z

Silly me, I realize there is an issue with my flip-flop scheme... moving the PR back to draft.

greg7mdp · 2025-03-12T01:02:32Z

Code is ready for review again. I ended up using a mutex, but considering the updates to the hash_map are quite fast and infrequent (per block containing new peer keys), I feel it shouldn't be an issue.
I could use a shared_mutex to only read lock for operator[](), but I doubt it would be an improvement.

libraries/chain/include/eosio/chain/peer_keys_db.hpp

libraries/chain/peer_keys_db.cpp

libraries/chain/include/eosio/chain/controller.hpp

spoonincode · 2025-03-14T23:39:01Z

libraries/chain/CMakeLists.txt


+if(CMAKE_CXX_COMPILER_ID MATCHES "Clang" AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 14.0)
+  target_compile_options(eosio_chain PUBLIC -Wthread-safety)
+endif()


In the future we should consider making this common somewhere and not copy paste it all over.

Also, since this is PUBLIC it means everything that links to libchain also gets it enabled. This is true today for everything that links to net_plugin -- for example nodeos' main.cpp currently have -Wthread-safety enabled. Is that a problem? If it's not a problem, all the more reason to consider just enabling it globally.

Would you like me to add it to the top CMakeLists.txt?

I wasn't really sure what the side effect was for enabling -Wthread-safety on code that isn't prepared for it. We're already doing that a little because of net_plugin and now we're going to be doing it a lot because of libchain. If that's fine maybe we blanket enable it at top anyways, if that's bad then maybe we should change these to PRIVATE instead. Doesn't need to be done now.

I was just trying it out and it builds fine. Checking it in to see if there is any test issue. I can undo that commit if desired.
eb52ee9

I wasn't really sure what the side effect was for enabling -Wthread-safety on code that isn't prepared for it.

I believe that it should be fine (a no-op)

spoonincode · 2025-03-14T23:40:48Z

libraries/chain/include/eosio/chain/peer_keys_db.hpp

+private:
+   std::optional<uint64_t> _get_version(const chainbase::database& db);
+
+   bool               _active;               // if not active (the default), no update occurs


Looks like this needs a = false

Thanks c60b6e9

spoonincode · 2025-03-15T00:11:56Z

libraries/chain/peer_keys_db.cpp

+                       "deserialized invalid version from `peerkeys`");
+
+            fc::raw::unpack(ds, row_key);
+            EOS_ASSERT(row_key.valid(), misc_exception, "deserialized invalid public key from `peerkeys`");


I am suspicious this opens up the possibility for a denial of service. It looks like you should just ignore entries with a bad key instead of failing to update all entries.

I believe this is what we are doing with the try/catch block within the loop.

you are right I got cross eyed with the double-uped try blocks

spoonincode

lgtm other then https://github.com/AntelopeIO/spring/pull/1228/files#r1996432630 and https://github.com/AntelopeIO/spring/pull/1228/files#r1996452813

spoonincode · 2025-03-18T19:06:45Z

libraries/chain/peer_keys_db.cpp

+            EOS_ASSERT(row_block_num > static_cast<uint64_t>(_block_num), misc_exception,
+                       "deserialized invalid version from `peerkeys`");
+
+            fc::raw::unpack(ds, row_version); // no need to check, all versions must include the `key` optional


Similar to https://github.com/eosnetworkfoundation/eos-system-contracts/pull/185/files#r2001804923 -- comment does not match my understanding of prior discussion. It seems like if version != 0 this should skip the row.

greg7mdp added 3 commits March 4, 2025 14:11

Retrieval of peers public keys from chainbase db.

613941f

Merge branch 'main' of github.com:AntelopeIO/spring into gh_1183

98848e1

Use boost::atomic_shared_ptr as std::atomic<std::shared_ptr<X>> d…

6f31dd7

…oesn't build in our ci/cd which used libstdc++ v11

heifner suggested changes Mar 4, 2025

View reviewed changes

Use boost::make_shared

bdab3bb

heifner mentioned this pull request Mar 7, 2025

Add P2P gossip_bp_peer message #1245

Closed

Better implementation of peer public keys reflection into hash map (w…

87e9823

…ith tests)

greg7mdp added 2 commits March 10, 2025 09:50

Fix lower iterator start

71ba5e0

Merge branch 'main' of github.com:AntelopeIO/spring into gh_1183

fd8bda8

spoonincode reviewed Mar 10, 2025

View reviewed changes

spoonincode suggested changes Mar 10, 2025

View reviewed changes

greg7mdp added 2 commits March 11, 2025 13:04

Implement flip/flop between two maps to avoid update delay if map gro…

f7384d5

…ws large.

Use block number to keep track of which keys need to be updated.

2bcfc00

heifner reviewed Mar 11, 2025

View reviewed changes

Merge branch 'main' of github.com:AntelopeIO/spring into gh_1183

4e92063

greg7mdp marked this pull request as draft March 11, 2025 21:37

Update code to use a mutex instead of an atomic_shared_ptr

a1bb03a

greg7mdp marked this pull request as ready for review March 12, 2025 00:58

heifner suggested changes Mar 12, 2025

View reviewed changes

libraries/chain/include/eosio/chain/peer_keys_db.hpp Show resolved Hide resolved

libraries/chain/peer_keys_db.cpp Outdated Show resolved Hide resolved

Reduce the scope of the mutex lock and other minor changes.

fa31622

heifner reviewed Mar 12, 2025

View reviewed changes

libraries/chain/peer_keys_db.cpp Outdated Show resolved Hide resolved

greg7mdp added 5 commits March 12, 2025 10:42

Make the peer_keys_db disabled by default.

a6aad62

Use fc::lock_guard

985ec9b

Use misc_exception instead of plugin_exception

fa53683

Enable thread safety checks for libraries/chain

2269ce3

Provide access functions from controller

c55c9f6

heifner reviewed Mar 12, 2025

View reviewed changes

libraries/chain/include/eosio/chain/controller.hpp Outdated Show resolved Hide resolved

greg7mdp added 2 commits March 12, 2025 13:31

Remove incorrect comment about thread safety.

d197bbd

Change block_num member type to uint32_t

0666da7

heifner approved these changes Mar 14, 2025

View reviewed changes

spoonincode reviewed Mar 14, 2025

View reviewed changes

spoonincode reviewed Mar 15, 2025

View reviewed changes

spoonincode suggested changes Mar 15, 2025

View reviewed changes

Add initializer for peer_keys_db::active

c60b6e9

spoonincode approved these changes Mar 15, 2025

View reviewed changes

greg7mdp added 2 commits March 14, 2025 20:36

Define the -Wthread-safety for all of spring.

eb52ee9

Update peer_key with version and std::optional

a8b696e

spoonincode reviewed Mar 18, 2025

View reviewed changes

greg7mdp added 2 commits March 19, 2025 15:15

Use a std::variant to encapsulate the public_key optional

586c080

Store optional in struct which could be extended using binary_extensions

454a7c5

greg7mdp merged commit 329f7d9 into main Mar 24, 2025
36 checks passed

greg7mdp deleted the gh_1183 branch March 24, 2025 19:17

Retrieval of peers public keys from chainbase db. #1228

Retrieval of peers public keys from chainbase db. #1228

Uh oh!

Conversation

greg7mdp commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greg7mdp Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spoonincode commented Mar 4, 2025

Uh oh!

greg7mdp commented Mar 4, 2025

Uh oh!

heifner commented Mar 4, 2025

Uh oh!

ericpassmore commented Mar 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greg7mdp Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greg7mdp Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greg7mdp commented Mar 4, 2025 •

edited

Loading

greg7mdp Mar 5, 2025 •

edited

Loading

greg7mdp Mar 10, 2025 •

edited

Loading

greg7mdp Mar 10, 2025 •

edited

Loading

greg7mdp Mar 10, 2025 •

edited

Loading

greg7mdp Mar 10, 2025 •

edited

Loading