libnet: Controller: cache networks & endpoints in-memory#49736
libnet: Controller: cache networks & endpoints in-memory#49736vvoland merged 3 commits intomoby:masterfrom
Conversation
| @@ -0,0 +1,11 @@ | |||
| package maputil | |||
There was a problem hiding this comment.
CI is failing because of that:
| package maputil | |
| //go:build go1.22 | |
| package maputil |
99867d9 to
40f7c61
Compare
robmry
left a comment
There was a problem hiding this comment.
LGTM!
Just some suggestions/nits ... and should it have a changelog comment, saying this might solve a long-standing issue with a "has active endpoints" on network deletion?
libnetwork/sandbox_store.go
Outdated
| } | ||
| } | ||
| } | ||
| c.cacheEndpoint(ep) |
There was a problem hiding this comment.
Looks like getEndpointFromStore will already have cached it, so this should be moved to the if err != nil block above?
There was a problem hiding this comment.
Yikes! That's where it was originally 🙈
libnetwork/endpoint_store.go
Outdated
| // maintained by the Controller. | ||
| // | ||
| // This method is thread-safe. | ||
| func (c *Controller) deleteEndpoint(ep *Endpoint) error { |
There was a problem hiding this comment.
The name might be a bit misleading - deleteEndpoint doesn't actually delete the endpoint, it forgets it ... the caller should already have deleted it from the Sandbox etc? Perhaps deleteStoredEndpoint or something like that?
At some point, it might be good to put the cache in a separate package to protect its map/mutex, then the call would be something like c.epStore.Delete(ep) would also work. But, as you noted offline, that'd be a bigger job because Endpoint/Network would create a circular dependency, or the cache would need to be implemented with a generic.
(upsertEndpoint is clearer, but ditto for deleteNetwork.)
There was a problem hiding this comment.
Renamed upsertEndpoint into storeEndpoint, and deleteEndpoint into deleteStoredEndpoint (and followed the same pattern for networks).
|
|
||
| package maputil | ||
|
|
||
| func FilterValues[K comparable, V any](in map[K]V, fn func(V) bool) []V { |
There was a problem hiding this comment.
The callers only want a count, the slice always gets discarded after a len() ... so maybe CountValues, and countEndpoints/countNetworks - and we can add findXYZ funcs if it turns out they're needed?
There was a problem hiding this comment.
Then you could get rid of these warnings for findEndpoints/findNetworks, because there'd be no danger ...
// This method is thread-safe, but do not use it unless you're sure your code
// uses the returned endpoints in thread-safe way (see the comment on
// Controller.endpoints).
There was a problem hiding this comment.
and we can add findXYZ funcs if it turns out they're needed?
I didn't want to change ActiveEndpointsError in this PR to make clear it's focused on data persistence.
But once this one is merged, I'll open another PR to change it, and findNetworks / findEndpoints will be needed. So, to avoid needless code churn, I think it's fine to add these here.
e49ce34 to
baf40a2
Compare
The `Controller`'s store is used by: - `deleteFromStore` - `getEndpointFromStore` - `getEndpointsFromStore` - `updateToStore` - … and other methods that can't store / delete / retrieve an Endpoint Calls to `updateToStore` and `deleteFromStore` have been replaced with `upsertEndpoint` and `deleteEndpoint`. Both `getEndpointFromStore` and `getEndpointsFromStore` call `cacheEndpoint` to ensure endpoints loaded from the datastore are kept in-memory. Finally, `sandboxRestore` was instantiating `Endpoint` itself. These are cached too. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The `Controller`'s store is used by: - `deleteFromStore` - `getNetworks` - `getNetworksFromStore` - `updateToStore` - … and other methods that can't store / delete / retrieve a Network Calls to `updateToStore` and `deleteFromStore` have been replaced with `upsertNetwork` and `deleteNetwork`. Both `getNetworks` and `getNetworksFromStore` call `cacheNetwork` to ensure networks loaded from the datastore are kept in-memory. Finally, `sandboxRestore` was instantiating `Network` itself. These are cached too. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
baf40a2 to
1169763
Compare
endpointCnt is a refcounter used to track how many endpoints use a network, and how many networks references a config-only network. It's stored separately from the network. This is only used to determine if a network can be removed. This commit removes the `endpointCnt` struct and all its references. The refcounter is replaced by two lookups in the newly introduced `networks` and `endpoints` caches added to the `Controller`. Signed-off-by: Albin Kerouanton <albinker@gmail.com>
1169763 to
51d7f95
Compare
| assert.Equal(t, len(found), 2) | ||
| assert.Equal(t, found[0], ep1) | ||
| assert.Equal(t, found[1], ep2) |
There was a problem hiding this comment.
nit: could use assert.Check
| assert.Equal(t, len(found), 1) | ||
| assert.Equal(t, found[0], ep2) |
| "gotest.tools/v3/assert" | ||
| ) | ||
|
|
||
| func TestEndpointStore(t *testing.T) { |
There was a problem hiding this comment.
I just hit some flakiness:
https://github.com/moby/moby/actions/runs/14308075196/job/40096446556?pr=49746#step:7:2040
networknamehas error has active endpoints #42119- What I did
EndpointandNetworkin libnetController.--config-fromnetwork, or endpoint is referencing a network before deleting it.endpointCntstruct.Libnetwork is afflicted by long-standing data consistency issues caused by its persistence layer creating new instances of
Network,EndpointandSandboxevery time an object is loaded from the datastore or its in-memory cache. See for instance the following code that ends with aCopyTo:moby/libnetwork/datastore/cache.go
Lines 134 to 149 in 2e92272
For instance, this led to #47186.
Fixing the data persistence layer itself is particularly hard as it touches virtually all code paths in libnetwork. We need to be especially cautious with mutex locking -- libnetwork makes heavy use of mutexes, but since two concurrent API calls referencing the same object result in to two different instances of that object living in memory, the concurrency model might be broken (unknowingly).
By maintaining a separate cache of all objects in libnet's
Controller, we'll be able to slowly iterate on all code paths, and verify each one through static analysis, without touching the data persistence layer.This PR specifically adds an
Endpointand aNetworkcache to theController. This allows us to delete the structendpointCntwhich was used to ref count networks (either when they were referenced byconfig-fromnetworks, or by endpoints) to prevent deleting networks in use.Instead,
(*Network).delete()looks for endpoints /config-fromnetworks that reference the network being deleted from the cache maintained in theController.- How I did it
See individual commits.
- How to verify it
Through unit and integration tests.
- Human readable description for the release notes
- Improve how network-endpoint relationships are tracked to reduce the chance of the "has active endpoints" error to be wrongfully returned.