src/pv/transfer.c: timer going off returns with total_written==0; which is not an error. #92

Merged
a-j-wood merged 2 commits from eborisch/pv:write-can-return-0 into main 2024-07-16 22:47:33 +02:00
Image
Contributor

UPDATED:

The issue is not write(2) returning 0; I thought I had that pinned down in the debugger, but alas, PEBKAC. write(2) is returning -1, with errno == EINTR.

The full error path is:

  1. select() reporting that a write is possible
  2. Attempting a write a small (16k), but larger than the available space (greater than 0, as select() indicated. Kernel tracing appears to indiciate this is indeed the case, with at least 4k space available) to the output fd.
    a. This blocks (on FreeBSD -- likely only for some/small sizes). If the timer expires before it is able to finish writing (due to the downstream reader process not fetching new data for over a second), -1/EINTR will result.
  3. The timer goes off during this (first) attempted write in pv__transfer_write_repeated() -- while total_written is still 0.

After receiving nwritten < 0 and handling EINTR (line 231), it returns 0 (total_written), and pv__transfer_write takes this as a reason to stop writing to stdout.

On Linux, in (2) it completes the partial write to top-off the buffer, and then returns EAGAIN on the attempted follow-up, so total_written will be the partial (>0) value. So I don't think you'll hit this specific set of conditions on linux.

The fix remains the same here; this is a transient / self-inflicted EINTR, don't stop attempting to write.

I'm not sure where this "on output EOF" concept is coming from -- that might be a misreading of the man page. (I think the man page is suggesting that a "writer can request a write of size 0 (returns 0)" to signal to a pipe listener that it is the end of the file (read returns 0) without SIGPIPE ?)

You can reproduce this on FreeBSD with the following:

pv < /dev/zero | { while sleep 1.1; do dd bs=63k count=1 of=/dev/null; done ; }

1+0 records in1 [62.1KiB/s] [ <=> ]
1+0 records out
64512 bytes transferred in 0.000093 secs (690647482 bytes/sec)
0+1 records in2 [0.00 B/s] [ <=> ]
0+1 records out
1024 bytes transferred in 0.000049 secs (21070392 bytes/sec)
64.0KiB 0:00:10 [0.00 B/s] [ <=>
[ End of output ]

Change the sleep to 0.9, for example, and it just keeps running.

UPDATED: The issue _is not_ write(2) returning 0; I thought I had that pinned down in the debugger, but alas, PEBKAC. write(2) is returning -1, with errno == EINTR. The full error path is: 1. select() reporting that a write is possible 2. Attempting a write a small (16k), but larger than the available space (greater than 0, as select() indicated. Kernel tracing appears to indiciate this is indeed the case, with at least 4k space available) to the output fd. a. This blocks (on FreeBSD -- likely only for some/small sizes). If the timer expires before it is able to finish writing (due to the downstream reader process not fetching new data for over a second), -1/EINTR will result. 3. The timer goes off during this (first) attempted write in pv__transfer_write_repeated() -- while total_written is still 0. After receiving nwritten < 0 and handling EINTR (line 231), it returns 0 (total_written), and pv__transfer_write takes this as a reason to stop writing to stdout. On Linux, in (2) it completes the partial write to top-off the buffer, and then returns EAGAIN on the attempted follow-up, so total_written will be the partial (>0) value. So I don't think you'll hit this specific set of conditions on linux. **The fix remains the same here; this is a transient / self-inflicted EINTR, don't stop attempting to write.** I'm not sure where this "on output EOF" concept is coming from -- that might be a misreading of the man page. (I think the man page is suggesting that a "writer can request a write of size 0 (returns 0)" to signal to a pipe listener that it is the end of the file (read returns 0) without SIGPIPE ?) You can reproduce this on FreeBSD with the following: # pv < /dev/zero | { while sleep 1.1; do dd bs=63k count=1 of=/dev/null; done ; } 1+0 records in1 [62.1KiB/s] [ <=> ] 1+0 records out 64512 bytes transferred in 0.000093 secs (690647482 bytes/sec) 0+1 records in2 [0.00 B/s] [ <=> ] 0+1 records out 1024 bytes transferred in 0.000049 secs (21070392 bytes/sec) 64.0KiB 0:00:10 [0.00 B/s] [ <=> [ End of output ] Change the sleep to 0.9, for example, and it just keeps running.
This is being hit easily on FreeBSD with pv sitting in a zfs send/recv pipeline. As write(2) returning 0 is not an error, handle it similar to returning -1 with EINTR or EAGAIN: wait a bit and try again.
Image
Author
Contributor

At least, it's not an error so far as I can see in the man 2 write on freebsd:

RETURN VALUES
Upon successful completion the number of bytes which were written is
returned. Otherwise a -1 is returned and the global variable errno is
set to indicate the error.

Some combination of the timers that are being configured, writing to a pipe, etc. is generating a 0 return when there isn't an error. See the discussion here (https://lists.freebsd.org/archives/freebsd-stable/2024-July/002282.html) to see if others decide this is, in fact, a bug in FreeBSD instead.

At least, it's not an error so far as I can see in the man 2 write on freebsd: RETURN VALUES Upon successful completion the number of bytes which were written is returned. Otherwise a -1 is returned and the global variable errno is set to indicate the error. Some combination of the timers that are being configured, writing to a pipe, etc. is generating a 0 return when there isn't an error. See the discussion here (https://lists.freebsd.org/archives/freebsd-stable/2024-July/002282.html) to see if others decide this is, in fact, a bug in FreeBSD instead.
Image eborisch changed title from src/pv/transfer.c: write(2) returning 0 is not an error to src/pv/transfer.c: write(2) returning 0 is not (?) an error 2024-07-16 05:51:08 +02:00
Image
Owner

Hello
Thanks for this.

The manual for write(2) suggests that it returning zero has different semantics to read(2) returning zero and PV's code doesn't seem to reflect that at first glance. The manual on Debian, at least. Certainly PV's code talks about "EOF on write" which maybe isn't really a thing as far as I can tell, so that's confusing at best.

It may be that Linux and FreeBSD behave differently here.

When you apply your changes, does "make check" still work?

Can you provide any examples of this behaviour without recourse to zfs, such that we could put it into a test that will run on multiple platforms (without admin rights)?

Hello Thanks for this. The manual for write(2) suggests that it returning zero has different semantics to read(2) returning zero and PV's code doesn't seem to reflect that at first glance. The manual on Debian, at least. Certainly PV's code talks about "EOF on write" which maybe isn't really a thing as far as I can tell, so that's confusing at best. It may be that Linux and FreeBSD behave differently here. When you apply your changes, does "`make check`" still work? Can you provide any examples of this behaviour without recourse to zfs, such that we could put it into a test that will run on multiple platforms (without admin rights)?
Image
Author
Contributor

I’ve tried (unsuccessfully so far) to create a synthetic test that aggravates this issue.

On the mailing list I linked to I’ve described the problem and posed the “should write(2) ever return 0 (when asked to write >0 bytes)?” question. (The FreeBSD manual page on the subject mainly says that on error the return will be -1.)

On FreeBSD 14-1-RELEASE-p1 with the patch, all 38 non-skipped (3 Memory_safety tests) tests pass.

I’ve tried (unsuccessfully so far) to create a synthetic test that aggravates this issue. On the mailing list I linked to I’ve described the problem and posed the “should write(2) ever return 0 (when asked to write >0 bytes)?” question. (The FreeBSD manual page on the subject mainly says that on error the return will be -1.) On FreeBSD 14-1-RELEASE-p1 with the patch, all 38 non-skipped (3 Memory_safety tests) tests pass.
Image
Author
Contributor

I'm curious if this may be related to #87 as well as the non-blocking issue with DD?

I'm curious if this may be related to #87 as well as the non-blocking issue with DD?
Image
Owner

It's good to know the current tests still pass with the changes in place.

I'm curious if this may be related to #87 as well as the non-blocking issue with DD?

That sounds like an interesting avenue to explore - it certainly sounds plausible. Thanks.

It's good to know the current tests still pass with the changes in place. > I'm curious if this may be related to #87 as well as the non-blocking issue with DD? That sounds like an interesting avenue to explore - it certainly sounds plausible. Thanks.
Image eborisch changed title from src/pv/transfer.c: write(2) returning 0 is not (?) an error to src/pv/transfer.c: timer going off returns with total_written==0; which is not an error. 2024-07-16 18:53:33 +02:00
Image
Owner

Thanks again for this. It has now been merged, with a few small changes applied afterwards.

It looks like the #87 issue is not related as that seems to involve signals not interrupting writes rather than early exits.

Thanks again for this. It has now been merged, with a few small changes applied afterwards. It looks like the #87 issue is not related as that seems to involve signals not interrupting writes rather than early exits.
Image
Author
Contributor

I've updated the overall description at the top to better describe what is happening.

I've updated the overall description at the top to better describe what is happening.
Image eborisch deleted branch write-can-return-0 2024-07-17 06:18:00 +02:00
Image a-j-wood referenced this pull request from a commit 2024-07-17 08:45:19 +02:00
Sign in to join this conversation.
No description provided.