win,pipe: support vectored writes, fix IPC deadlock#1843
Conversation
d027c94 to
4c6b72f
Compare
| UV_REQ_FIELDS | ||
| uv_write_cb cb; | ||
| uv_stream_t* send_handle; | ||
| uv_stream_t* send_handle; /* TODO: make private and unix-only in v2.x. */ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
src/win/pipe.c
Outdated
| } | ||
| } else { | ||
| uv_pipe_read_error_or_eof(loop, handle, GetLastError(), buf); | ||
| /* It is possible that more bytes more bytes were read than we thought |
|
CI is red, on Linux it complains about undefined reference to not ok 150 - pipe_getsockname_blocking
# exit code -1073741819
# Output from process `pipe_getsockname_blocking`: (no output) |
eaa1388 to
04a5b3c
Compare
|
@bnoordhuis |
santigimeno
left a comment
There was a problem hiding this comment.
Some comments, style mainly.
| @@ -0,0 +1,151 @@ | |||
| /* Copyright Joyent, Inc. and other Node contributors. All rights reserved. | |||
There was a problem hiding this comment.
Right, thanks, copypasta.
| } | ||
|
|
||
| TEST_IMPL(ipc_heavy_traffic_deadlock_bug) { | ||
| uv_loop_t* loop = uv_default_loop(); |
There was a problem hiding this comment.
style: separate declaration from assignment
There was a problem hiding this comment.
loop was unused anyway, so I removed it.
| 'test-idle.c', | ||
| 'test-ip6-addr.c', | ||
| 'test-ipc.c', | ||
| 'test-ipc-heavy-traffic-deadlock-bug.c', |
There was a problem hiding this comment.
Add this file to checksparse.sh as well?
src/win/pipe.c
Outdated
| data_length = 0; | ||
| for (i = 0; i < nbufs; i++) { | ||
| data_length += bufs[i].len; | ||
| } |
src/win/pipe.c
Outdated
| * because WriteFile() won't accept buffers larger than that. */ | ||
| if (data_length > UINT32_MAX) { | ||
| return WSAENOBUFS; /* Maps to UV_ENOBUFS. */ | ||
| } |
src/win/pipe.c
Outdated
| heap_buffer = uv__malloc(heap_buffer_length); | ||
| if (heap_buffer == NULL) { | ||
| return ERROR_NOT_ENOUGH_MEMORY; /* Maps to UV_ENOMEM. */ | ||
| } |
src/win/pipe.c
Outdated
| } | ||
| if (data_length > UINT32_MAX) { | ||
| return WSAENOBUFS; /* Maps to UV_ENOBUFS. */ | ||
| } |
| req->u.io.queued_bytes = write_buf.len; | ||
| handle->write_queue_size += req->u.io.queued_bytes; | ||
| if (WaitForSingleObject(req->u.io.overlapped.hEvent, INFINITE) != | ||
| WAIT_OBJECT_0) { |
There was a problem hiding this comment.
Just curious: we're using uv_translate_sys_error while in the rest on the function the error is returned directly
There was a problem hiding this comment.
Yeah, that looks incorrect. The code path is likely never taken, I don't think WaitForSingleObject can conceivably fail, unless hEvent is invalid, in which case there is a bug elsewhere.
I haven't analyzed the test but by calling the test with |
|
@santigimeno If you run the test without supervision of the test harness ( |
|
I see two potential issues:
|
|
What I see is that the child process receives all the data and exits correctly but the parent process is stuck. It looks some pending handle needs to be closed. |
Where in the code is this / should it be?
I think that's unlikely. The tests makes 256 write calls and sends a grand total of 6mb of data to the child process. The child process does the same in the other direction. That should not even take a second. |
| ssize_t i; | ||
| int r; | ||
|
|
||
| ASSERT(nread >= 0); |
There was a problem hiding this comment.
I believe nread here is overloaded with the number of bytes written | error code from the syscall (which can be negative)
There was a problem hiding this comment.
That's correct. The assert is here to check that there were no read errors (we don't expect any).
|
@piscisaureus - highlighted the point above. Agreed on the timeout stuff, shouldn't take that much ( I was speculating). Now that @santigimeno has spotted a hang, I guess that looks to be the right direction. |
|
The ipc channel needs to be closed. This patch fixes the issue locally for me: diff --git a/test/test-ipc-heavy-traffic-deadlock-bug.c b/test/test-ipc-heavy-traffic-deadlock-bug.c
index 62868e6a..d1b54bff 100644
--- a/test/test-ipc-heavy-traffic-deadlock-bug.c
+++ b/test/test-ipc-heavy-traffic-deadlock-bug.c
@@ -81,6 +81,15 @@ static void alloc_cb(uv_handle_t* handle,
static void read_cb(uv_stream_t* handle, ssize_t nread, const uv_buf_t* buf) {
ssize_t i;
int r;
+ uv_shutdown_t shut_req;
+
+ if (nread == UV_EOF) {
+ ASSERT(bytes_read == XFER_SIZE);
+ r = uv_read_stop(handle);
+ ASSERT(r == 0);
+ uv_close((uv_handle_t*)handle, NULL);
+ return;
+ }
ASSERT(nread >= 0);
bytes_read += nread;
@@ -91,7 +100,7 @@ static void read_cb(uv_stream_t* handle, ssize_t nread, const uv_buf_t* buf) {
free(buf->base);
if (bytes_read >= XFER_SIZE) {
- r = uv_read_stop(handle);
+ uv_shutdown(&shut_req, handle, NULL);
ASSERT(r == 0);
}
} |
5ff9808 to
973d73f
Compare
PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
* Don't silently ignore ERROR_OPERATION_ABORTED. Code to silently ignore this error was added in c42a4ca, under the false premise that this error is somehow equivalent to EINTR on posix platforms. This isn't true; ERROR_OPERATION_ABORTED doesn't happen unless the application explicitly aborts an I/O operation. Silently ignoring this error elsewhere could potentially hide bugs, hence libuv shouldn't do it. Instead, explicitly deal with it where it is expected. * Don't mark aborted reads as successful. The worker thread used to call ReadFile() on synchronous pipes would incorrectly mark cancelled read requests as successful, leading to issues later on. * Rely on main thread to restart aborted reads. After a blocking ReadFile() call was cancelled, the worker thread would previously attempt to restart it immediately, making synchronization logic needlessly complex. Instead, we simply cancel the operation, and leave it to the main loop to restart it if so desired. Since we now realy on the main thread to restart interrupted ReadFile() calls, we can now have a single function interrupts a synchronous read until the event loop restarts it again. * Clean up uv__pipe_read_stop(). A single function to interrupt blocking reads also allows us to remove weird logic in uv__pipe_read_stop() that quickly pauses and unpauses a read operation in the hope of cancelling it. * Assume CancelIo() and CancelSynchronousIo() are always available. Since libuv doesn't support windows XP and Server 2003 any more, we can assume that these APIs are always available, and do away with branching around them. PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137 PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: nodejs#3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: nodejs#9706 Fixes: nodejs#7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: nodejs#19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: nodejs#21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: nodejs#12803 PR-URL: nodejs#21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: #3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: #9706 Fixes: #7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: #19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: #21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: #12803 PR-URL: #21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: nodejs#3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: nodejs#9706 Fixes: nodejs#7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: nodejs#19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: nodejs#21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: nodejs#12803 PR-URL: nodejs#21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: #3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: #9706 Fixes: #7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: #19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: #21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: #12803 Backport-PR-URL: #24103 PR-URL: #21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Guard against sending the handle over the UNIX domain socket twice when the first sendmsg() didn't write all bytes. The changes to src/win partially undo changes made earlier this year, see the referenced pull request for details. Libuv never made promises about the value of `req->send_handle` at different points in time so this should be a safe, non-breaking change. No tests because this particular condition is pretty much impossible to hit reliably. Fixes: libuv#2086 Refs: libuv#1843
Guard against sending the handle over the UNIX domain socket twice when the first sendmsg() didn't write all bytes. The changes to src/win partially undo changes made earlier this year, see the referenced pull request for details. Libuv never made promises about the value of `req->send_handle` at different points in time so this should be a safe, non-breaking change. No tests because this particular condition is hard to hit reliably across platforms. I spent a lot of time trying to write one but it turned out hideously complex, and worse, flaky. Fixes: libuv#2086 PR-URL: libuv#2097 Refs: libuv#1843 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Guard against sending the handle over the UNIX domain socket twice when the first sendmsg() didn't write all bytes. The changes to src/win partially undo changes made earlier this year, see the referenced pull request for details. Libuv never made promises about the value of `req->send_handle` at different points in time so this should be a safe, non-breaking change. No tests because this particular condition is hard to hit reliably across platforms. I spent a lot of time trying to write one but it turned out hideously complex, and worse, flaky. Fixes: libuv#2086 PR-URL: libuv#2097 Refs: libuv#1843 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Fixes: #794
Fixes: #1099
Refs: nodejs/node#7657
Refs: electron/electron#10107
Refs: parcel-bundler/parcel#637
Refs: parcel-bundler/parcel#900
Refs: parcel-bundler/parcel#1137