-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-112529: Implement GC for free-threaded builds #114262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7568efc to
0a5d5bf
Compare
This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").
0a5d5bf to
1eb8a0c
Compare
|
@DinoV, @pablogsal, @nascheme - I think this is ready for review now. There is duplicate code between I'll put up a PR for the devguide as well. |
|
The design, structure and code style look good to me. A couple of minor suggestions. Next to the definition of ob_tid, I think it should also mention that this field is (ab)used for the linked-list and for the GC refs count. Instead of I think using the marking stack is okay, despite the extra allocations needed. If it is turns out to be a problem, e.g. a program with a very deeply linked object structure, a fairly simple fix is as follows. Limit the size of the stack. If the size is exceeded, re-start the marking process. An additional refinement is to have a "already marked" bit on each mimalloc page and that way if you re-start, those pages can be quickly skipped. Note that the |
|
Thanks for the review Neil. I've renamed
Yeah, that makes sense.
In a subsequent change (not yet a PR), I'm using |
Python/gc_free_threading.c
Outdated
| } | ||
|
|
||
| static int | ||
| gc_visit_heaps_locked(PyInterpreterState *interp, mi_block_visit_fun *visitor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to call this gc_visit_heaps_lock_held per our discussion on naming these style of functions?
Python/gc_free_threading.c
Outdated
| if (_PyObject_GC_IS_TRACKED(op) && !_Py_IsImmortal(op)) { | ||
| // If update_refs hasn't reached this object yet, mark it | ||
| // as (tentatively) unreachable and initialize ob_tid to zero. | ||
| if (!gc_is_unreachable(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to factor this into a helper like gc_init_refs or gc_maybe_init_refs that is used here and in update_refs
Python/gc_free_threading.c
Outdated
| if (PyTuple_CheckExact(op)) { | ||
| _PyTuple_MaybeUntrack(op); | ||
| if (!_PyObject_GC_IS_TRACKED(op)) { | ||
| if (gc_is_unreachable(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could similarly be a gc_restore_refs or gc_maybe_restore_refs and used below
| return NULL; | ||
| } | ||
|
|
||
| // Append all objects to a worklist. This abuses ob_tid. We will restore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth stopping the world here? It seems like we're fine to race with ob_tid from a correctness stand point, but it seems like a lot of objects are going to end up with merged ref counts in a multithreaded environment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to stop the world for the heap traversal (gc_visit_heaps) to be thread-safe... which I forgot to add here. I'll also add an assert to gc_visit_heaps() that the world is stopped.
This doesn't merge most refcounts. They get restored from the mimalloc data structure. They'll only get merged if the owning thread has already exited.
| // it later. NOTE: We can't append to the list during gc_visit_heaps() | ||
| // because PyList_Append() may reclaim an abandoned mimalloc segment | ||
| // while we are traversing it. | ||
| struct get_objects_args args = { 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto to get referrers
|
|
||
| op->ob_tid = 0; | ||
| op->ob_ref_local = 0; | ||
| op->ob_ref_shared = _Py_REF_SHARED(refcount, _Py_REF_MERGED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this needs to be an atomic operation? Can this just be a call to _Py_ExplicitMergeRefcount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other threads in the interpreter must be paused, so no atomics necessary. I'll add an assert here.
We could probably use _Py_ExplicitMergeRefcount(), but it seems convenient to avoid the atomic operations.
|
|
||
| // Clear weakrefs and enqueue callbacks (but do not call them). | ||
| clear_weakrefs(state); | ||
| _PyEval_StartTheWorld(interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think if I understand this correctly what's going to happen with the thread ID and any reference count changes that may happen at this point, but the assumption is that the thread ID is never going to actually clash with an object reference in the work list. And then anything which does have a reference count operation will just become merged when we restore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the key assumption is that _PyThread_Id() is itself a pointer to a distinct object so never conflicts with an object in the worklists. The implementation of _PyThread_Id() uses the address of the platform's thread control block or equivalent.
On the GC side, we abuse ob_tid for two purposes:
gc_refs, which can have arbitrary values (might conflict!), but this is only happens while other threads are paused, so nobody sees it.worklistpointers, which must not conflict with_PyThread_Id()
Some worklists are only used while other threads are paused (e.g., in _PyGC_GetObjects()). Other worklists are created when threads are paused, but still used while other threads may be running (e.g., unreachable, wrcb_to_call).
If the worklist continues to be used while other threads are running, then there are two other important considerations:
- The worklist holds a strong reference to the object so that it can't be freed while in the worklist.
- The object's reference count fields are merged before adding it to the worklist. This avoids some other thread trying to merge the refcount fields, which would be messy. (It also makes the deallocation happen more quickly when we get around to calling
tp_clear).
|
Here's the PR so far for the devguide: python/devguide#1263 |
* pythongh-112529: Implement GC for free-threaded builds This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").
* pythongh-112529: Implement GC for free-threaded builds This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").
This implements the GC for free-threaded builds. The free-threading GC follows the same basic algorithms as the existing GC, but operates on different data types. Specifically:
PyGC_Headlinked list)ob_tidfield (thread id) is abused for computinggc_refsas well as for "worklists" that keep track of the "unreachable" set of objects, objects with legacy finalizers, and weakref callbacks._PyObjectStack, a linked-list of fixed sized chunks. This requires memory allocation during GC for marking. It would be possible to avoid this, but at the cost of extra scans over the whole heap, which I don't think is worthwhile. For context, GC implementations in other languages, like OpenJDK and Go use essentially the same data structure for marking.ob_ref_localfor computinggc_refsbecauseob_tidis in use for the "unreachable" worklistThere is a bunch of clean-up and improvements that I'd like to defer to later PRs to keep this size of this manageable:
NUM_GENERATIONSto 1 in free-threaded builds. As written, every collection in the free-threaded build is a full collection, but we still behave as if we have three generations in a few places.PyGC_Headpre-header in free-threaded buildsPython/gc.candPython/gc_free_threading.conce Mark's incremental GC is landed--disable-gilbuilds #112529