eborisch commented

Contributor

UPDATED:

The issue is not write(2) returning 0; I thought I had that pinned down in the debugger, but alas, PEBKAC. write(2) is returning -1, with errno == EINTR.

The full error path is:

select() reporting that a write is possible
Attempting a write a small (16k), but larger than the available space (greater than 0, as select() indicated. Kernel tracing appears to indiciate this is indeed the case, with at least 4k space available) to the output fd.
a. This blocks (on FreeBSD -- likely only for some/small sizes). If the timer expires before it is able to finish writing (due to the downstream reader process not fetching new data for over a second), -1/EINTR will result.
The timer goes off during this (first) attempted write in pv__transfer_write_repeated() -- while total_written is still 0.

After receiving nwritten < 0 and handling EINTR (line 231), it returns 0 (total_written), and pv__transfer_write takes this as a reason to stop writing to stdout.

On Linux, in (2) it completes the partial write to top-off the buffer, and then returns EAGAIN on the attempted follow-up, so total_written will be the partial (>0) value. So I don't think you'll hit this specific set of conditions on linux.

The fix remains the same here; this is a transient / self-inflicted EINTR, don't stop attempting to write.

I'm not sure where this "on output EOF" concept is coming from -- that might be a misreading of the man page. (I think the man page is suggesting that a "writer can request a write of size 0 (returns 0)" to signal to a pipe listener that it is the end of the file (read returns 0) without SIGPIPE ?)

You can reproduce this on FreeBSD with the following:

pv < /dev/zero | { while sleep 1.1; do dd bs=63k count=1 of=/dev/null; done ; }

1+0 records in1 [62.1KiB/s] [ <=> ]
1+0 records out
64512 bytes transferred in 0.000093 secs (690647482 bytes/sec)
0+1 records in2 [0.00 B/s] [ <=> ]
0+1 records out
1024 bytes transferred in 0.000049 secs (21070392 bytes/sec)
64.0KiB 0:00:10 [0.00 B/s] [ <=>
[ End of output ]

Change the sleep to 0.9, for example, and it just keeps running.

UPDATED: The issue _is not_ write(2) returning 0; I thought I had that pinned down in the debugger, but alas, PEBKAC. write(2) is returning -1, with errno == EINTR. The full error path is: 1. select() reporting that a write is possible 2. Attempting a write a small (16k), but larger than the available space (greater than 0, as select() indicated. Kernel tracing appears to indiciate this is indeed the case, with at least 4k space available) to the output fd. a. This blocks (on FreeBSD -- likely only for some/small sizes). If the timer expires before it is able to finish writing (due to the downstream reader process not fetching new data for over a second), -1/EINTR will result. 3. The timer goes off during this (first) attempted write in pv__transfer_write_repeated() -- while total_written is still 0. After receiving nwritten < 0 and handling EINTR (line 231), it returns 0 (total_written), and pv__transfer_write takes this as a reason to stop writing to stdout. On Linux, in (2) it completes the partial write to top-off the buffer, and then returns EAGAIN on the attempted follow-up, so total_written will be the partial (>0) value. So I don't think you'll hit this specific set of conditions on linux. **The fix remains the same here; this is a transient / self-inflicted EINTR, don't stop attempting to write.** I'm not sure where this "on output EOF" concept is coming from -- that might be a misreading of the man page. (I think the man page is suggesting that a "writer can request a write of size 0 (returns 0)" to signal to a pipe listener that it is the end of the file (read returns 0) without SIGPIPE ?) You can reproduce this on FreeBSD with the following: # pv < /dev/zero | { while sleep 1.1; do dd bs=63k count=1 of=/dev/null; done ; } 1+0 records in1 [62.1KiB/s] [ <=> ] 1+0 records out 64512 bytes transferred in 0.000093 secs (690647482 bytes/sec) 0+1 records in2 [0.00 B/s] [ <=> ] 0+1 records out 1024 bytes transferred in 0.000049 secs (21070392 bytes/sec) 64.0KiB 0:00:10 [0.00 B/s] [ <=> [ End of output ] Change the sleep to 0.9, for example, and it just keeps running.

eborisch added 1 commit

2024-07-15 23:40:41 +02:00

src/pv/transfer.c: write(2) returning 0 is not an error 11e5b5fe95

This is being hit easily on FreeBSD with pv sitting in a zfs send/recv pipeline. As write(2) returning 0 is not an error, handle it similar to returning -1 with EINTR or EAGAIN: wait a bit and try again.

eborisch commented

2024-07-16 00:06:55 +02:00

Author

Contributor

At least, it's not an error so far as I can see in the man 2 write on freebsd:

RETURN VALUES
Upon successful completion the number of bytes which were written is
returned. Otherwise a -1 is returned and the global variable errno is
set to indicate the error.

Some combination of the timers that are being configured, writing to a pipe, etc. is generating a 0 return when there isn't an error. See the discussion here (https://lists.freebsd.org/archives/freebsd-stable/2024-July/002282.html) to see if others decide this is, in fact, a bug in FreeBSD instead.

At least, it's not an error so far as I can see in the man 2 write on freebsd: RETURN VALUES Upon successful completion the number of bytes which were written is returned. Otherwise a -1 is returned and the global variable errno is set to indicate the error. Some combination of the timers that are being configured, writing to a pipe, etc. is generating a 0 return when there isn't an error. See the discussion here (https://lists.freebsd.org/archives/freebsd-stable/2024-July/002282.html) to see if others decide this is, in fact, a bug in FreeBSD instead.

eborisch changed title from ~~src/pv/transfer.c: write(2) returning 0 is not an error~~ to src/pv/transfer.c: write(2) returning 0 is not (?) an error

2024-07-16 05:51:08 +02:00

eborisch added 1 commit

2024-07-16 05:57:14 +02:00

Clarify state. d86d8cf65a

a-j-wood commented

2024-07-16 06:39:46 +02:00

Owner

Hello
Thanks for this.

The manual for write(2) suggests that it returning zero has different semantics to read(2) returning zero and PV's code doesn't seem to reflect that at first glance. The manual on Debian, at least. Certainly PV's code talks about "EOF on write" which maybe isn't really a thing as far as I can tell, so that's confusing at best.

It may be that Linux and FreeBSD behave differently here.

When you apply your changes, does "make check" still work?

Can you provide any examples of this behaviour without recourse to zfs, such that we could put it into a test that will run on multiple platforms (without admin rights)?

Hello Thanks for this. The manual for write(2) suggests that it returning zero has different semantics to read(2) returning zero and PV's code doesn't seem to reflect that at first glance. The manual on Debian, at least. Certainly PV's code talks about "EOF on write" which maybe isn't really a thing as far as I can tell, so that's confusing at best. It may be that Linux and FreeBSD behave differently here. When you apply your changes, does "`make check`" still work? Can you provide any examples of this behaviour without recourse to zfs, such that we could put it into a test that will run on multiple platforms (without admin rights)?

eborisch commented

2024-07-16 07:03:30 +02:00

Author

Contributor

I’ve tried (unsuccessfully so far) to create a synthetic test that aggravates this issue.

On the mailing list I linked to I’ve described the problem and posed the “should write(2) ever return 0 (when asked to write >0 bytes)?” question. (The FreeBSD manual page on the subject mainly says that on error the return will be -1.)

On FreeBSD 14-1-RELEASE-p1 with the patch, all 38 non-skipped (3 Memory_safety tests) tests pass.

I’ve tried (unsuccessfully so far) to create a synthetic test that aggravates this issue. On the mailing list I linked to I’ve described the problem and posed the “should write(2) ever return 0 (when asked to write >0 bytes)?” question. (The FreeBSD manual page on the subject mainly says that on error the return will be -1.) On FreeBSD 14-1-RELEASE-p1 with the patch, all 38 non-skipped (3 Memory_safety tests) tests pass.

eborisch commented

2024-07-16 07:20:08 +02:00

Author

Contributor

I'm curious if this may be related to #87 as well as the non-blocking issue with DD?

a-j-wood commented

2024-07-16 07:31:54 +02:00

Owner

It's good to know the current tests still pass with the changes in place.

I'm curious if this may be related to #87 as well as the non-blocking issue with DD?

That sounds like an interesting avenue to explore - it certainly sounds plausible. Thanks.

It's good to know the current tests still pass with the changes in place. > I'm curious if this may be related to #87 as well as the non-blocking issue with DD? That sounds like an interesting avenue to explore - it certainly sounds plausible. Thanks.

eborisch changed title from ~~src/pv/transfer.c: write(2) returning 0 is not (?) an error~~ to src/pv/transfer.c: timer going off returns with total_written==0; which is not an error.

2024-07-16 18:53:33 +02:00

a-j-wood merged commit c7104f0c0d into main

2024-07-16 22:47:33 +02:00

a-j-wood referenced this pull request from a commit

2024-07-16 22:47:35 +02:00

Merge pull request 'src/pv/transfer.c: timer going off returns with total_written==0; which is not an error.' (#92) from eborisch/pv:write-can-return-0 into main

a-j-wood referenced this pull request from a commit

2024-07-16 23:04:04 +02:00

News item and acknowledgement for zero write bug fix (#92).

a-j-wood commented

2024-07-16 23:38:06 +02:00

Owner

Thanks again for this. It has now been merged, with a few small changes applied afterwards.

It looks like the #87 issue is not related as that seems to involve signals not interrupting writes rather than early exits.

Thanks again for this. It has now been merged, with a few small changes applied afterwards. It looks like the #87 issue is not related as that seems to involve signals not interrupting writes rather than early exits.

eborisch commented

2024-07-17 00:11:29 +02:00

Author

Contributor

I've updated the overall description at the top to better describe what is happening.

eborisch deleted branch write-can-return-0

2024-07-17 06:18:00 +02:00

eborisch referenced this pull request

2024-07-17 06:54:00 +02:00

Improve description and comments for blocking-write() after select() #93

a-j-wood referenced this pull request from a commit

2024-07-17 08:45:19 +02:00