Skip to content

Comments

[Android] Prevent race condition by synchronizing buffer access#117950

Merged
kotlarmilos merged 19 commits intodotnet:mainfrom
kotlarmilos:bugfix/android-coreclr-ssl-context
Jul 31, 2025
Merged

[Android] Prevent race condition by synchronizing buffer access#117950
kotlarmilos merged 19 commits intodotnet:mainfrom
kotlarmilos:bugfix/android-coreclr-ssl-context

Conversation

@kotlarmilos
Copy link
Member

@kotlarmilos kotlarmilos commented Jul 22, 2025

Problem

Android SSL callbacks WriteToConnection/ReadFromConnection receive an IntPtr that should reference a GCHandle to SafeDeleteSslContext. The handle was freed while native code could still invoke the callbacks with the IntPtr referencing a different object. This led to a NullReferenceException or InvalidCastException:

System.InvalidCastException: Unable to cast object of type 'System.Threading.Thread' to type 'System.Net.SafeDeleteSslContext'

Description

This PR synchronizes all buffer mutations behind a shared _lock, and updates the managed callbacks to verify that the handle is still allocated and its target is a SafeDeleteSslContext before using it.

Changes

  • Implemented WriteToConnection/ReadFromConnection checks for allocated GCHandle
  • Implemented lock in all buffer mutations to prevent race condition

Testing

The System.Net.Http.Functional.Tests now pass without the crash on Android.

Fixes #117045

@kotlarmilos
Copy link
Member Author

/azp run runtime-android,runtime-androidemulator

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@kotlarmilos
Copy link
Member Author

/azp run runtime-android,runtime-androidemulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to 'arch-android': @vitek-karas, @simonrozsival, @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a race condition in Android SSL implementation where GCHandle references were being freed while native callbacks could still access them, causing InvalidCastException or NullReferenceException. The fix extends the GCHandle lifetime to match the SafeDeleteSslContext object and adds proper synchronization for buffer operations.

Key changes:

  • Replace weak GCHandle allocation with persistent handle that lives for the full object lifetime
  • Add comprehensive locking around all buffer operations to prevent race conditions
  • Implement null/disposed checks in native callbacks with graceful error handling

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/libraries/tests.proj Re-enables System.Net.Http.Functional.Tests for Android and adds it to smoke tests
src/libraries/System.Net.Security/src/System/Net/Security/Pal.Android/SafeDeleteSslContext.cs Implements persistent GCHandle, thread-safe buffer access, and defensive callback handling

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

…Android/SafeDeleteSslContext.cs

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@kotlarmilos
Copy link
Member Author

/azp run runtime-android,runtime-androidemulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-android,runtime-androidemulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Member

@simonrozsival simonrozsival left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the failing tests are known and unrelated to these changes.

@kotlarmilos
Copy link
Member Author

/azp run runtime-android,runtime-androidemulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Member

@simonrozsival simonrozsival left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the failing tests are known and unrelated to these changes.

@kotlarmilos kotlarmilos merged commit 1d8a341 into dotnet:main Jul 31, 2025
106 of 110 checks passed
@jkotas
Copy link
Member

jkotas commented Jul 31, 2025

All the failing tests are known

What's the known issue for the wasm failure?

[17:58:18] fail: [MONO] no object of size 536935192d

Error
    at Mc (http://127.0.0.1:58317/_framework/dotnet.runtime.js:3:172233)
    at wasm_trace_logger (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[158]:0xb02e)
    at eglib_log_adapter (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[656]:0x455d4)
    at monoeg_g_logv_nofree (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[581]:0x436c9)
    at monoeg_g_log (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[583]:0x4378c)
    at ms_find_block_obj_size_index (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[952]:0x52e7b)
    at alloc_obj (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[991]:0x593be)
    at major_alloc_object (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[982]:0x54a4a)
    at alloc_for_promotion (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[1058]:0x5c4ec)
    at copy_object_no_checks (http://127.0.0.1:58317/_framework/dotnet.native.wasm:wasm-function[998]:0x5a05b)

@jkotas
Copy link
Member

jkotas commented Jul 31, 2025

I have opened #118244 on the wasm test failure.

All CI failure - even the ones unrelated to the change - should have tracking issue: https://github.com/dotnet/runtime/blob/main/docs/workflow/ci/failure-analysis.md#what-to-do-if-you-determine-the-failure-is-unrelated

@kotlarmilos
Copy link
Member Author

Thanks Jan — I merged the PR on the second CI attempt and the failure occurred on the first attempt, which is why it let me merge without an escape comment.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 31, 2025
@kotlarmilos
Copy link
Member Author

/backport to release/9.0-staging

@github-actions github-actions bot unlocked this conversation Feb 19, 2026
@github-actions
Copy link
Contributor

Started backporting to release/9.0-staging (link to workflow run)

@github-actions
Copy link
Contributor

@kotlarmilos backporting to release/9.0-staging failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Dispose SafeSslHandle and use thread-safe operations on PAL layer
Using index info to reconstruct a base tree...
M	src/libraries/System.Net.Security/src/System/Net/Security/Pal.Android/SafeDeleteSslContext.cs
M	src/libraries/tests.proj
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.Net.Security/src/System/Net/Security/Pal.Android/SafeDeleteSslContext.cs
Auto-merging src/libraries/tests.proj
CONFLICT (content): Merge conflict in src/libraries/tests.proj
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Dispose SafeSslHandle and use thread-safe operations on PAL layer
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Android] System.Net.Http.Functional.Tests.csproj fails with SIGABRT in libcoreclr.so

6 participants