I would suggest that this discussion should lead to an RFC regardless of the outcome, if only to clarify what’s going on. Right now it seems to be the case that:
- It is not undefined behavior to hand safe code references to undefined bytes.
- However, it is undefined behavior for that safe code to read such bytes. We have no way, in a function signature, to distinguish arguments that will be written to from those that will be both read from and written to, nor to enforce such a distinction inside a safe function itself. Therefore we are in the awkward position where you can start with an initially correct program, then change only an implementation detail of a completely safe function, without changing the interface, without adding an
unsafe block (directly or through a call to a function with an unsafe block), and without changing the outputs, and still cause technically undefined behavior.
- It is undefined behavior to hand to safe code references to shared bytes, if those bytes may be defined or mutated by another thread at any moment, since this is a data race.
- It is undefined behavior to hand to safe code, or in any other way use, primitive types that may have invalid values (e.g. references that may be dangling), even in private fields. This by extension rules out passing uninitialized data of these types. Presumably, this rule is intended to apply to
&[T], and to guarantee that not only the reference but the length part of the fat pointer is valid.
- It is not explicitly disallowed to hand types with dangerously broken invariants to safe code. E.g. passing a
Vec with a correct pointer, but a wrong length and capacity. However, it is UB to actually use such objects in a way that causes problems, which should ultimately only happen in unsafe code. E.g. indexing into a vector implicitly calls a function that does pointer arithmetic in an unsafe block.
With that out of the way, my thought on the performance problem at hand.
If the desire is to a problem with any arbitrary read implementation, then we have to worry about things such as a read implementation calling broken FFI functions, which can report a wrong number of bytes written to the buffer you hand them. This is really insoluble unless you zero out the buffer. But I think that this is likely to be a waste of time, since one should really just accept that any broken unsafe code, including FFI calls, can allow any number of bad things to happen.
However, if you only want to guarantee that an entirely safe function does not read uninitialized data, then I think @carllerche and @reem are on the right track; we just need some way of representing a reference to write-only memory, without exposing that memory to safe code. In that case, something like the original RFC can be passed without a significant, unavoidable performance hit that drives people to use unsafe.
The only other question I can see is whether we can also guarantee that the safe code writes the number of bytes that it says it does, which is a little bit harder, and perhaps not strictly necessary, but also desirable as a safety guarantee. If I understand correctly, I think that it could be done with a variant on MutBuf where advance is also unsafe (preventing the “skipping” of any bytes in completely safe code), and assuming that the user (e.g. read_to_end) determines the number of bytes by using the remaining function, rather than assuming that some independent byte count (e.g. the return value of read) is correct. (It might be best to provide a consumed query function for MutBuf that exposes the internal pos in this case, to avoid unnecessary cruft code to subtract remaining from the total capacity.)
In this case, a completely safe read_to_mut_buf method (e.g. doing memory-to-memory copies of bytes) can use the safe interface. A read_to_mut_buf method that has an unsafe block (e.g. doing I/O through an OS or other FFI call) can use the unsafe functions, which will be faster anyway in some specific cases.
This does involve a change to the Read trait, which is unfortunate. That churn could be ameliorated for users (but not so much for implementers) of Read by keeping the current read with the byte slice argument as-is, except that it would now be considered UB to hand it an uninitialized slice. It would have a default implementation that uses the new method (read_to_mut_buf or whatever it will be called).