Replace HashMap COOP transitions with Epoch-Based Reclamation (EBR)#124307
Merged
AaronRobinsonMSFT merged 24 commits intodotnet:mainfrom Feb 20, 2026
Merged
Replace HashMap COOP transitions with Epoch-Based Reclamation (EBR)#124307AaronRobinsonMSFT merged 24 commits intodotnet:mainfrom
AaronRobinsonMSFT merged 24 commits intodotnet:mainfrom
Conversation
HashMap's async mode used GCX_MAYBE_COOP_NO_THREAD_BROKEN to transition into cooperative GC mode on every operation, preventing the GC from freeing obsolete bucket arrays mid-read. Old bucket arrays were queued via SyncClean::AddHashMap and freed during GC pauses. This caused a deadlock: when HashMap::LookupValue() was called while holding the DebuggerController lock, the COOP transition (which is level-equivalent to taking the ThreadStore lock) violated lock ordering constraints, since ThreadStore must be acquired before DebuggerController. Replace both mechanisms with Epoch-Based Reclamation (EBR), based on Fraser's algorithm from 'Practical Lock-Freedom' (UCAM-CL-TR-579): - EnterCriticalRegion/ExitCriticalRegion are simple atomic flag stores with memory barriers -- they never block or trigger GC transitions - Obsolete bucket arrays are queued for deferred deletion and freed once all threads have passed through a quiescent state - An RAII holder (EbrCriticalRegionHolder) replaces GCX_MAYBE_COOP at all 6 call sites in hash.cpp Changes: - New: src/coreclr/vm/ebr.h, ebr.cpp (EbrCollector, ~340 lines) - hash.cpp: Replace 6 GCX_MAYBE_COOP_NO_THREAD_BROKEN with EBR holders, replace SyncClean::AddHashMap with QueueForDeletion - syncclean.hpp/cpp: Remove HashMap-related members and cleanup code - ceemain.cpp: Init g_HashMapEbr at startup, shutdown at EE shutdown - CrstTypes.def: Add CrstEbrThreadList, CrstEbrPending - crsttypes_generated.h: Regenerated with new Crst types - CMakeLists.txt: Add ebr.cpp, ebr.h to build
- Rename memoryBudget/m_pendingSize to memoryBudgetInBytes/m_pendingSizeInBytes - Mark EbrCollector and EbrCriticalRegionHolder as final - Delete move constructors/assignment operators - Move NextObsolete from hash.h (public) to hash.cpp (file-static) - Reuse DeleteObsoleteBuckets for sync-mode path in Rehash - Trim redundant backstory comments at EBR call sites - Remove unused forward decls from syncclean.hpp
Contributor
There was a problem hiding this comment.
Pull request overview
This PR replaces HashMap async-mode protection that relied on per-operation COOP GC transitions and GC-time cleanup with an Epoch-Based Reclamation (EBR) mechanism to avoid lock-ordering deadlocks (notably involving DebuggerController vs ThreadStore/GC transitions).
Changes:
- Introduces a new EBR implementation (
EbrCollector+EbrCriticalRegionHolder) and a global collector for HashMap async mode (g_HashMapEbr). - Updates HashMap async call sites to use EBR critical regions and queues obsolete bucket arrays for deferred deletion via EBR.
- Removes the HashMap-specific deferred cleanup path from
SyncCleanand adds new Crst types for EBR internal locks.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/syncclean.hpp | Removes HashMap cleanup surface from SyncClean. |
| src/coreclr/vm/syncclean.cpp | Removes HashMap obsolete-bucket list tracking and GC-time deletion. |
| src/coreclr/vm/hash.h | Removes NextObsolete helper from the header. |
| src/coreclr/vm/hash.cpp | Adds EBR critical region usage and EBR-based deferred deletion for obsolete buckets. |
| src/coreclr/vm/ebr.h | Adds public EBR APIs (EbrCollector, EbrCriticalRegionHolder) and global collector declaration. |
| src/coreclr/vm/ebr.cpp | Implements the EBR collector, per-thread tracking, and deferred deletion queues. |
| src/coreclr/vm/ceemain.cpp | Initializes/shuts down the global HashMap EBR collector during runtime startup/shutdown. |
| src/coreclr/vm/CMakeLists.txt | Adds EBR sources/headers to the VM build. |
| src/coreclr/inc/crsttypes_generated.h | Adds new CrstEbrPending / CrstEbrThreadList types and metadata. |
| src/coreclr/inc/CrstTypes.def | Declares new EBR Crst types. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
3 tasks
noahfalk
reviewed
Feb 12, 2026
- QueueForDeletion: leak object on OOM instead of immediate deletion, which could cause use-after-free for concurrent EBR readers. Track leaked count via InterlockedIncrement counter. - Rehash: read obsolete bucket size directly from allocation base instead of calling GetSize with wrong pointer (undefined behavior).
- Shutdown: early-return if !m_initialized instead of asserting - Buckets()/Rehash(): simplify assert to !m_fAsyncMode || InCriticalRegion() - LookupValue: remove GC thread exclusion from EBR critical region - Comment fixes in InsertValue and Rehash deferred deletion
Add EbrCollector::ThreadDetach() to unlink and free per-thread EBR data. Call it from ThreadDetaching() in corhost.cpp, following the existing StressLog::ThreadDetach() pattern. This prevents unbounded growth of the EBR thread list in processes with short-lived threads.
Replace thread_local EbrThreadData* with thread_local EbrThreadData
value, eliminating the OOM failure path in GetOrCreateThreadData().
This removes the risk of null dereference in ExitCriticalRegion()
when the RAII holder unwinds after a failed EnterCriticalRegion().
Shutdown and ThreadDetach now clear the data with = {} instead of
deleting heap memory.
3 tasks
jkotas
reviewed
Feb 13, 2026
Introduce static AllocateBuckets and FreeBuckets helpers to ensure consistent BYTE[] allocation and deallocation of bucket arrays. Move GetSize/SetSize from HashMap members to file-static functions. Remove vestigial NextObsolete and chain-traversal loop from DeleteObsoleteBuckets since EBR queues each array independently.
Split DrainQueue into DetachQueue (under lock) and DeletePendingEntries (outside lock) so CRT free calls don't hold m_pendingLock. Add EbrPendingEntry constructor to initialize fields at allocation time.
…mplify thread detachment logic
- Fix comment on t_pThreadData: it is a thread_local value, not a heap-allocated pointer, and ThreadDetach prunes it (not TryAdvanceEpoch). - Fix m_threadListLock comment: used for pruning and epoch scanning, not only pruning. - Fix fence comments: MemoryBarrier() is a full fence, not an acquire fence. - Remove CrstEbrPending and CrstEbrThreadList from CrstTypes.def and regenerate crsttypes_generated.h. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jkotas
reviewed
Feb 19, 2026
jkotas
reviewed
Feb 20, 2026
…ndling, and refine pending size management
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supercedes #123492
This pull request introduces Epoch-Based Reclamation (EBR) to the CoreCLR runtime for safe, low-overhead deferred deletion in concurrent data structures such as
HashMap. The EBR mechanism enables memory reclamation without requiring garbage collection suspension or cooperative mode transitions, improving performance and safety in async scenarios. The main changes include adding new EBR lock types, integrating the EBR collector, updatingHashMapmemory management, and ensuring proper cleanup of deferred deletions.EBR integration and lock management:
CrstEbrPendingandCrstEbrThreadList, specifically for EBR's internal thread list and pending deletion queues. These are leaf locks and must not be held across HashMap operations or GC transitions. (src/coreclr/inc/CrstTypes.def,src/coreclr/inc/crsttypes_generated.h) [1] [2] [3] [4]EBR collector implementation and initialization:
ebr.handebr.cppfiles, defining theEbrCollectorclass and related functions for managing critical regions, deferred deletion, and epoch advancement. The global collectorg_HashMapEbris initialized at EE startup and used throughout the runtime. (src/coreclr/vm/CMakeLists.txt,src/coreclr/vm/ceemain.cpp,src/coreclr/vm/ebr.h) [1] [2] [3] [4] [5]Deferred deletion and cleanup integration:
src/coreclr/vm/finalizerthread.cpp) [1] [2] [3]HashMap memory management and async safety:
HashMapbucket allocation and deletion to use new helper functions (AllocateBuckets,FreeBuckets,DeleteObsoleteBuckets) and integrated EBR critical region holders to protect async operations from concurrent memory reclamation. (src/coreclr/vm/hash.cpp) [1] [2] [3] [4] [5] [6] [7]Build and include updates:
src/coreclr/vm/CMakeLists.txt,src/coreclr/vm/corhost.cpp,src/coreclr/vm/hash.cpp,src/coreclr/vm/ceemain.cpp,src/coreclr/vm/finalizerthread.cpp) [1] [2] [3] [4] [5] [6]