Replace spin-wait with semaphore-based backoff for epoch table exhaustion#1543

badrishc · 2026-02-05T21:12:52Z

When hundreds of threads compete for epoch table entries, the previous spin-wait loop in ReserveEntry caused 100% CPU utilization due to tight spinning with Thread.Yield().

Changes:

Add SemaphoreSlim-based wait mechanism for threads when epoch table is full
Split ReserveEntry into fast path (TryAcquireEntry) and slow path (ReserveEntryWait)
Fast path: probes startOffset1, startOffset2, then circles table twice - fully inlinable
Slow path: uses try/finally with semaphore wait - marked NoInlining since kernel wait dominates cost anyway
Release() signals one waiting thread via volatile waiterCount check (nearly zero overhead when no waiters)
Double-check pattern in ReserveEntryWait prevents lost wakeups: increment waiterCount, re-check for slots, then wait
SemaphoreSlim uses Monitor.Pulse internally which provides FIFO wake-up order, preventing starvation

LightEpoch Change:

We make LightEpoch keep a separate epoch table and thread allocation per instance instead of using a static thread index table. This ensure no deadlock in case we are holding one epoch (e.g. KV) while blocked on another (e.g. AOF) because the AOF epoch is guaranteed to progress as long as other AOF threads are able to find slots in their dedicated thread table.

Performance characteristics:

No contention: unchanged - fast path acquires entry with same probing logic
Table full: threads wait efficiently instead of burning CPU
Release hot path: single volatile read of waiterCount when no waiters

If lock contention in SemaphoreSlim becomes a problem, a possible optimization would be a batched signaling approach, i.e., we wait for some number of entries to be released before signaling that many number of waiters.

…us spin-wait loop in ReserveEntry caused 100% CPU utilization due to tight spinning with Thread.Yield(). Changes: - Add SemaphoreSlim-based wait mechanism for threads when epoch table is full - Split ReserveEntry into fast path (TryAcquireEntry) and slow path (ReserveEntryWait) - Fast path: probes startOffset1, startOffset2, then circles table twice - fully inlinable - Slow path: uses try/finally with semaphore wait - marked NoInlining since kernel wait dominates cost anyway - Release() signals one waiting thread via volatile waiterCount check (nearly zero overhead when no waiters) - Double-check pattern in ReserveEntryWait prevents lost wakeups: increment waiterCount, re-check for slots, then wait - SemaphoreSlim uses Monitor.Pulse internally which provides FIFO wake-up order, preventing starvation Performance characteristics: - No contention: unchanged - fast path acquires entry with same probing logic - Table full: threads wait efficiently instead of burning CPU - Release hot path: single volatile read of waiterCount when no waiters

…spin

When the epoch table is full, threads block on a SemaphoreSlim in ReserveEntryWait until a slot is released. If LightEpoch is disposed while threads are waiting, they remain blocked indefinitely, preventing graceful shutdown. Add a CancellationTokenSource that is cancelled during Dispose, causing blocked threads to receive an OperationCanceledException. Dispose then spin-waits for all waiters to finish unwinding before disposing the CancellationTokenSource and SemaphoreSlim.

Copilot

Pull request overview

This PR replaces the spin-wait mechanism in LightEpoch's epoch table entry reservation with a semaphore-based backoff system to prevent 100% CPU utilization when hundreds of threads compete for epoch table entries. Additionally, it introduces per-instance epoch tables to prevent deadlocks when one epoch is held while waiting for another.

Changes:

Replaced tight spin-wait loops with SemaphoreSlim-based waiting when the epoch table is exhausted
Split epoch table entry acquisition into a fast path (TryAcquireEntry) and slow path (ReserveEntryWait) for optimal performance
Changed from a static shared thread index table to per-instance epoch tables, preventing inter-epoch deadlocks
Renamed ReleaseIfHeld() to TrySuspend() in the IEpochAccessor interface and moved the interface to its own file
Added benchmark tests for epoch operations and improved BenchmarkDotNet debugging support

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
libs/storage/Tsavorite/cs/src/core/Epochs/LightEpoch.cs	Core changes implementing per-instance epoch tables with InstanceIndexBuffer, semaphore-based backoff in ReserveEntryWait, and split fast/slow path acquisition logic
libs/storage/Tsavorite/cs/src/core/Epochs/IEpochAccessor.cs	New file extracting IEpochAccessor interface with renamed TrySuspend method
libs/storage/Tsavorite/cs/src/core/TsavoriteLog/TsavoriteLog.cs	Updated to use renamed TrySuspend method
libs/storage/Tsavorite/cs/benchmark/BDN-Tsavorite.Benchmark/EpochTests.cs	New benchmark tests for epoch operations (Resume/Suspend, ProtectAndDrain, BumpCurrentEpoch)
libs/storage/Tsavorite/cs/benchmark/BDN-Tsavorite.Benchmark/BenchmarkDotNetTestsApp.cs	Added DEBUG conditional compilation for DebugInProcessConfig

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…et into badrishc/epoch-avoid-spin

…tion (#1543) * When hundreds of threads compete for epoch table entries, the previous spin-wait loop in ReserveEntry caused 100% CPU utilization due to tight spinning with Thread.Yield(). Changes: - Add SemaphoreSlim-based wait mechanism for threads when epoch table is full - Split ReserveEntry into fast path (TryAcquireEntry) and slow path (ReserveEntryWait) - Fast path: probes startOffset1, startOffset2, then circles table twice - fully inlinable - Slow path: uses try/finally with semaphore wait - marked NoInlining since kernel wait dominates cost anyway - Release() signals one waiting thread via volatile waiterCount check (nearly zero overhead when no waiters) - Double-check pattern in ReserveEntryWait prevents lost wakeups: increment waiterCount, re-check for slots, then wait - SemaphoreSlim uses Monitor.Pulse internally which provides FIFO wake-up order, preventing starvation Performance characteristics: - No contention: unchanged - fast path acquires entry with same probing logic - Table full: threads wait efficiently instead of burning CPU - Release hot path: single volatile read of waiterCount when no waiters * add small comment * clarify comments, increment version * make lightepoch isolate instances properly * nits * nits * Cancel epoch table waiters on dispose for graceful shutdown When the epoch table is full, threads block on a SemaphoreSlim in ReserveEntryWait until a slot is released. If LightEpoch is disposed while threads are waiting, they remain blocked indefinitely, preventing graceful shutdown. Add a CancellationTokenSource that is cancelled during Dispose, causing blocked threads to receive an OperationCanceledException. Dispose then spin-waits for all waiters to finish unwinding before disposing the CancellationTokenSource and SemaphoreSlim. * nit * comments * nit * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * nit * fix dispose to be robust * nit * nit * nit * no need to refresh here * add better epoch logging, fix garnet epoch dispose * nit * fixes * Fix MultiDatabase to correctly dispose devices * undo change to test * share epoch across all aof instances * fix testcase to wait for checkpoint to complete * fix HasKeysInSlots * add debug helper static method to LightEpoch * actually add * reduce logger verbosity * nits * fix * fix CloseLock semantics to ensure dispose happens after write lock is released. * nit * change lock style for clarity * fixes * updates * nit * fix formatting * update test suite to check LightEpoch disposal * updatwe tsavo tests to have tear down checks in one place * ensure epochs are disposed if server throws in constructor * fix tsavo tests to properly dispose epoch * fix test * fixes * nit * fix test * improve comments * update LightEp;och copy in client * nit * clean up struct Entry * use new epoch for garnet client correctly * fix * nit * fix CanDoBulkDeleteTests * share client epoch for failover * fix * update version for release --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update Azure Cosmos DB Garnet Cache docs (#1548) * update registration process and troubleshooting * update phrasing * update email * Update website/docs/azure/faq.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * wait for recovery before issuing get keys (#1553) * Parallel ACL test fixes (#1554) * Parallel ACL tests sometimes run forever, cleaned up to properly use async and also check server responses. * nit * format * timeouts * reduce timeout * address comments * nit * nit * Misc fixes: epoch sharing, IEpochAccessor refactoring, lock improveme… (#1555) * Misc fixes: epoch sharing, IEpochAccessor refactoring, lock improvements, test fixes, and BDN benchmarks * Update libs/storage/Tsavorite/cs/src/core/TsavoriteLog/TsavoriteLog.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add using * Improve SingleWriterMultiReaderLock * Revert "Improve SingleWriterMultiReaderLock" This reverts commit 394e11c. * rename ownedEpoch --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Work around receive latency increasing with larger buffers (#1546) * shrink receive buffer if it grows past maximum configured - but only if buffer was large enough to serve last request in the first place * Update libs/common/Networking/NetworkHandler.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update libs/common/Networking/NetworkHandler.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * don't shrink if we still have pending data greater than the maximum * use correct variable --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Replace spin-wait with semaphore-based backoff for epoch table exhaustion (#1543) * When hundreds of threads compete for epoch table entries, the previous spin-wait loop in ReserveEntry caused 100% CPU utilization due to tight spinning with Thread.Yield(). Changes: - Add SemaphoreSlim-based wait mechanism for threads when epoch table is full - Split ReserveEntry into fast path (TryAcquireEntry) and slow path (ReserveEntryWait) - Fast path: probes startOffset1, startOffset2, then circles table twice - fully inlinable - Slow path: uses try/finally with semaphore wait - marked NoInlining since kernel wait dominates cost anyway - Release() signals one waiting thread via volatile waiterCount check (nearly zero overhead when no waiters) - Double-check pattern in ReserveEntryWait prevents lost wakeups: increment waiterCount, re-check for slots, then wait - SemaphoreSlim uses Monitor.Pulse internally which provides FIFO wake-up order, preventing starvation Performance characteristics: - No contention: unchanged - fast path acquires entry with same probing logic - Table full: threads wait efficiently instead of burning CPU - Release hot path: single volatile read of waiterCount when no waiters * add small comment * clarify comments, increment version * make lightepoch isolate instances properly * nits * nits * Cancel epoch table waiters on dispose for graceful shutdown When the epoch table is full, threads block on a SemaphoreSlim in ReserveEntryWait until a slot is released. If LightEpoch is disposed while threads are waiting, they remain blocked indefinitely, preventing graceful shutdown. Add a CancellationTokenSource that is cancelled during Dispose, causing blocked threads to receive an OperationCanceledException. Dispose then spin-waits for all waiters to finish unwinding before disposing the CancellationTokenSource and SemaphoreSlim. * nit * comments * nit * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * nit * fix dispose to be robust * nit * nit * nit * no need to refresh here * add better epoch logging, fix garnet epoch dispose * nit * fixes * Fix MultiDatabase to correctly dispose devices * undo change to test * share epoch across all aof instances * fix testcase to wait for checkpoint to complete * fix HasKeysInSlots * add debug helper static method to LightEpoch * actually add * reduce logger verbosity * nits * fix * fix CloseLock semantics to ensure dispose happens after write lock is released. * nit * change lock style for clarity * fixes * updates * nit * fix formatting * update test suite to check LightEpoch disposal * updatwe tsavo tests to have tear down checks in one place * ensure epochs are disposed if server throws in constructor * fix tsavo tests to properly dispose epoch * fix test * fixes * nit * fix test * improve comments * update LightEp;och copy in client * nit * clean up struct Entry * use new epoch for garnet client correctly * fix * nit * fix CanDoBulkDeleteTests * share client epoch for failover * fix * update version for release --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Bump qs from 6.14.1 to 6.14.2 in /website (#1562) Bumps [qs](https://github.com/ljharb/qs) from 6.14.1 to 6.14.2. - [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md) - [Commits](ljharb/qs@v6.14.1...v6.14.2) --- updated-dependencies: - dependency-name: qs dependency-version: 6.14.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Use shared LightEpoch in parallel ACL/auth tests (#1566) ParallelTests now use a shared LightEpoch for all GarnetClient instances, improving thread safety and resource management. TestUtils.GetGarnetClient accepts an optional epoch parameter, which is passed to the GarnetClient constructor. This reduces contention and potential corruption during parallel authentication and ACL operations. * Fix ClusterDisklessSyncResetSyncManagerCts (#1557) * fix ClusterDisklessSyncResetSyncManagerCts * set message only when error ocurrs * address comment --------- Co-authored-by: Tal Zaccai <talzacc@microsoft.com> * Support hostname resolution in MIGRATE command (#1565) * Initial plan * Add hostname resolution support to MIGRATE command Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Add tests for hostname resolution in MIGRATE command Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Fix test for invalid hostname and improve test robustness Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Address code review feedback: IPv4 preference, specific exception handling, null check Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Improve variable naming: resolvedAddress -> effectiveAddress Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * fix formatting * remove unecessary DEBUG * revert DEBUG flag to its original state * cleanup tests * Start worker search from index 2 to skip local worker and prevent self-migration Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Test all resolved IPs against cluster config and revert license changes Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unnecessary ArgumentOutOfRangeException catch block Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Use Dns.GetHostEntryAsync for hostname resolution Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> * Add ConfigureAwait(false) to Dns.GetHostEntryAsync call Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vazois <96085550+vazois@users.noreply.github.com> Co-authored-by: Vasileios Zois <vazois@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * format * Update libs/server/GarnetDatabase.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fixing merge issue * Added XML comment --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Justine Cocchi <jucocchi@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Vasileios Zois <96085550+vazois@users.noreply.github.com> Co-authored-by: Badrish Chandramouli <badrishc@microsoft.com> Co-authored-by: kevin-montrose <kmontrose@microsoft.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Vasileios Zois <vazois@microsoft.com>

badrishc added 10 commits February 2, 2026 16:53

add small comment

e6fbab9

clarify comments, increment version

db7669a

Merge remote-tracking branch 'origin/main' into badrishc/epoch-avoid-…

f85c1d4

…spin

Merge remote-tracking branch 'origin/main' into badrishc/epoch-avoid-…

56a638e

…spin

make lightepoch isolate instances properly

4c3f826

nits

2ba9ef5

nits

f1e1084

merge from main

c8ccd0f

Copilot AI review requested due to automatic review settings February 5, 2026 21:12

Copilot started reviewing on behalf of badrishc February 5, 2026 21:13 View session

badrishc added 2 commits February 5, 2026 13:14

nit

266d83c

comments

191f461

Copilot AI reviewed Feb 5, 2026

View reviewed changes

nit

0e79191

badrishc marked this pull request as draft February 5, 2026 21:36

badrishc and others added 13 commits February 5, 2026 13:50

Apply suggestion from @Copilot

797e19c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

71403b8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

nit

f59e81b

fix dispose to be robust

66ff756

nit

736a47e

nit

2046e1c

nit

4066d0a

no need to refresh here

e060758

Merge branch 'main' into badrishc/epoch-avoid-spin

f0472df

add better epoch logging, fix garnet epoch dispose

4108d25

Merge branch 'badrishc/epoch-avoid-spin' of github.com:microsoft/garn…

92d605c

…et into badrishc/epoch-avoid-spin

nit

7155ded

fixes

5bbe674

badrishc added 3 commits February 10, 2026 18:23

updatwe tsavo tests to have tear down checks in one place

820484c

ensure epochs are disposed if server throws in constructor

8ebc5d4

fix tsavo tests to properly dispose epoch

4395641

TedHartMS reviewed Feb 11, 2026

View reviewed changes

Comment thread libs/storage/Tsavorite/cs/src/core/Epochs/LightEpoch.cs

badrishc added 11 commits February 10, 2026 21:55

fix test

febe949

fixes

3063058

nit

419a7b5

fix test

066d59d

improve comments

349c546

update LightEp;och copy in client

43336c3

nit

2b0ac0a

clean up struct Entry

6e0dda0

use new epoch for garnet client correctly

1db5360

fix

8db6b0e

nit

dd6fb04

TedHartMS approved these changes Feb 11, 2026

View reviewed changes

fix CanDoBulkDeleteTests

e793cde

TedHartMS approved these changes Feb 11, 2026

View reviewed changes

badrishc added 2 commits February 11, 2026 13:53

share client epoch for failover

5f5d097

fix

6e75827

TedHartMS approved these changes Feb 11, 2026

View reviewed changes

badrishc added 3 commits February 11, 2026 15:55

Merge branch 'main' into badrishc/epoch-avoid-spin

8c8ee89

update version for release

ee33cd3

Merge branch 'badrishc/epoch-avoid-spin' of github.com:microsoft/garn…

3c95648

…et into badrishc/epoch-avoid-spin

TedHartMS approved these changes Feb 12, 2026

View reviewed changes

badrishc merged commit a1c052f into main Feb 12, 2026
89 of 92 checks passed

badrishc deleted the badrishc/epoch-avoid-spin branch February 12, 2026 04:48

github-actions Bot locked and limited conversation to collaborators Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace spin-wait with semaphore-based backoff for epoch table exhaustion#1543

Replace spin-wait with semaphore-based backoff for epoch table exhaustion#1543
badrishc merged 68 commits into
mainfrom
badrishc/epoch-avoid-spin

badrishc commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

badrishc commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants