Skip to content

Fix LockSublogs mutual exclusion and GC hole in AofAddress.Span#1774

Merged
vazois merged 5 commits into
devfrom
vazois/mlog-fixes
May 7, 2026
Merged

Fix LockSublogs mutual exclusion and GC hole in AofAddress.Span#1774
vazois merged 5 commits into
devfrom
vazois/mlog-fixes

Conversation

@vazois

@vazois vazois commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Two targeted fixes for the AOF multi-log replay infrastructure:

1. Fix LockSublogs mutual exclusion

ShardedLog.LockSublogs() was not enforcing mutual exclusion - it ORed bits without checking if they were already held. This could allow concurrent multi-sublog enqueues sharing a physical sublog to reorder TxnCommit markers, causing barrier-level deadlocks during replay.

Fix: Spin-wait with CAS until none of the requested bits are set, then atomically set them. This serializes multi-sublog enqueues that share any physical sublog.

2. Fix GC hole in AofAddress.Span property

The previous implementation used a fixed pointer that escaped scope, creating a dangling Span. The new implementation uses MemoryMarshal.CreateSpan with Unsafe.As and [UnscopedRef] to return a GC-tracked managed reference that remains valid as long as the struct is alive.

Testing

  • Solution builds with 0 warnings, 0 errors
  • Existing AOF replay tests pass

vazois and others added 2 commits May 5, 2026 20:02
The previous implementation unconditionally ORed bits into the lockMap
without checking if any requested bits were already held. This allowed
concurrent transactions to interleave their TxnCommit markers across
physical sublogs in different orders, potentially causing barrier-level
deadlocks during multi-log replay.

The fix spins until none of the requested sublog bits are held before
atomically setting them via CAS, serializing overlapping multi-sublog
enqueues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace fixed/pointer-based Span construction with managed-ref approach
using MemoryMarshal.CreateSpan + Unsafe.As. The previous implementation
returned a Span whose underlying pointer escaped the fixed block scope,
allowing the GC to relocate the struct and leave the Span dangling.

The new approach returns a Span backed by a tracked managed reference,
eliminating the pin requirement and ensuring GC safety. Added
[UnscopedRef] to allow the ref to escape the property.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 6, 2026 18:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets correctness and safety in the AOF multi-log (sharded sublog) infrastructure by tightening sublog mutual exclusion and fixing unsafe span creation over the AOF address vector.

Changes:

  • Enforces mutual exclusion in ShardedLog.LockSublogs() by waiting for overlapping sublog-lock bits to clear before atomically acquiring them.
  • Reworks AofAddress.Span to avoid returning a Span<byte> backed by a pointer that escapes a fixed scope, switching to MemoryMarshal.CreateSpan + Unsafe.As.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
libs/server/AOF/ShardedLog.cs Fixes lock-bit acquisition logic to prevent overlapping sublog acquisitions from interleaving.
libs/server/AOF/AofAddress.cs Changes Span implementation to avoid a dangling span from an escaped fixed pointer.

Comment thread libs/server/AOF/ShardedLog.cs Outdated
Comment thread libs/server/AOF/AofAddress.cs Outdated
…f] from AofAddress.Span

- Replace Thread.Yield() with SpinWait for progressive backoff under contention
- Add SpinOnce() on CAS failure path (previously only yielded on overlap check)
- Remove [UnscopedRef] from Span property: all callers use it in safe scopes,
  so the compiler's ref-escape analysis can enforce lifetime safety at compile time
- Remove unused System.Diagnostics.CodeAnalysis import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread libs/server/AOF/ShardedLog.cs Outdated
vazois and others added 2 commits May 7, 2026 11:19
Restructure the loop so CAS is only attempted when no requested bits
are held, and both failure paths (bits already held or CAS race) fall
through to a single SpinOnce() call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vazois vazois requested a review from TedHartMS May 7, 2026 18:50
@vazois vazois merged commit 14f01b8 into dev May 7, 2026
23 checks passed
@vazois vazois deleted the vazois/mlog-fixes branch May 7, 2026 20:30
chenhao-ye pushed a commit to chenhao-ye/garnet that referenced this pull request May 13, 2026
…osoft#1774)

* Fix LockSublogs to enforce mutual exclusion for multi-sublog enqueue

The previous implementation unconditionally ORed bits into the lockMap
without checking if any requested bits were already held. This allowed
concurrent transactions to interleave their TxnCommit markers across
physical sublogs in different orders, potentially causing barrier-level
deadlocks during multi-log replay.

The fix spins until none of the requested sublog bits are held before
atomically setting them via CAS, serializing overlapping multi-sublog
enqueues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix GC hole in AofAddress.Span property

Replace fixed/pointer-based Span construction with managed-ref approach
using MemoryMarshal.CreateSpan + Unsafe.As. The previous implementation
returned a Span whose underlying pointer escaped the fixed block scope,
allowing the GC to relocate the struct and leave the Span dangling.

The new approach returns a Span backed by a tracked managed reference,
eliminating the pin requirement and ensuring GC safety. Added
[UnscopedRef] to allow the ref to escape the property.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: use SpinWait in LockSublogs and remove [UnscopedRef] from AofAddress.Span

- Replace Thread.Yield() with SpinWait for progressive backoff under contention
- Add SpinOnce() on CAS failure path (previously only yielded on overlap check)
- Remove [UnscopedRef] from Span property: all callers use it in safe scopes,
  so the compiler's ref-escape analysis can enforce lifetime safety at compile time
- Remove unused System.Diagnostics.CodeAnalysis import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify LockSublogs: single SpinOnce for both failure paths

Restructure the loop so CAS is only attempted when no requested bits
are held, and both failure paths (bits already held or CAS race) fall
through to a single SpinOnce() call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chenhao-ye pushed a commit to chenhao-ye/garnet that referenced this pull request May 13, 2026
…osoft#1774)

* Fix LockSublogs to enforce mutual exclusion for multi-sublog enqueue

The previous implementation unconditionally ORed bits into the lockMap
without checking if any requested bits were already held. This allowed
concurrent transactions to interleave their TxnCommit markers across
physical sublogs in different orders, potentially causing barrier-level
deadlocks during multi-log replay.

The fix spins until none of the requested sublog bits are held before
atomically setting them via CAS, serializing overlapping multi-sublog
enqueues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix GC hole in AofAddress.Span property

Replace fixed/pointer-based Span construction with managed-ref approach
using MemoryMarshal.CreateSpan + Unsafe.As. The previous implementation
returned a Span whose underlying pointer escaped the fixed block scope,
allowing the GC to relocate the struct and leave the Span dangling.

The new approach returns a Span backed by a tracked managed reference,
eliminating the pin requirement and ensuring GC safety. Added
[UnscopedRef] to allow the ref to escape the property.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: use SpinWait in LockSublogs and remove [UnscopedRef] from AofAddress.Span

- Replace Thread.Yield() with SpinWait for progressive backoff under contention
- Add SpinOnce() on CAS failure path (previously only yielded on overlap check)
- Remove [UnscopedRef] from Span property: all callers use it in safe scopes,
  so the compiler's ref-escape analysis can enforce lifetime safety at compile time
- Remove unused System.Diagnostics.CodeAnalysis import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify LockSublogs: single SpinOnce for both failure paths

Restructure the loop so CAS is only attempted when no requested bits
are held, and both failure paths (bits already held or CAS race) fall
through to a single SpinOnce() call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants