-
-
Notifications
You must be signed in to change notification settings - Fork 184
Add new StreamReader.readuntil() method. #297
Conversation
asyncio/streams.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opening """ should be on the same line as the first word of the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
Ping? Any updates? |
|
@gvanrossum pong. There is nothing to change in that PR as I think. If not - please say what should be done. |
|
Here's what I recommend: instead of adding an optional argument to readline(), leave readline() alone and add a new method readrecord() with a mandatory trailing separator. You can then change the policies of readrecord() to be more strict when it encounters premature EOF or a record that's too long (or you could ignore the limit -- after all read() and readexactly() ignore the limit too). |
|
This is pleriminary fix. I don't know if I should change So, I'm asking your to:
After that I will change commit message, squash and so on. I'm happy to change anything. |
|
@gvanrossum ping |
|
Sorry, I've been alternatingly busy and sick. But I'll get to this On Wed, Dec 9, 2015 at 1:12 PM, Коренберг Марк [email protected]
--Guido van Rossum (python.org/~guido) |
|
Why Maybe |
|
@1st1 agree. Tornado call it |
|
BTW, do we consume the separator? Should we add a bool flag to either leave it in the buffer or consume it? |
|
@1st1 could you provide case when we should preserve separator ? I mean two different things - preserving it in internal buffer and prserving at the end of returned value. |
|
Does
I don't know any concrete use case for that, but I can imagine that being useful for separators more complex than |
|
Yes, if complete record is read successfully, separator is removed both from buffer and from returned value. In any case, it is guaranteed, that removed separator is the same as argument of the function, so data is not lost. |
|
It should consume (and return) the separator, similar to readline(). (The On Fri, Dec 11, 2015 at 9:48 AM, Yury Selivanov [email protected]
--Guido van Rossum (python.org/~guido) |
|
Oh, I see, that's different from readline(). Do you have a reason for that? On Fri, Dec 11, 2015 at 10:02 AM, Коренберг Марк [email protected]
--Guido van Rossum (python.org/~guido) |
|
@gvanrossum yes, I have reason. First, this function will be used to read message of unknown size from a stream. I.e multipart form data or end of http header or even end of http headers( Next, storing separator in returned value have meaning for readline api, since presense of newline at end of string is only way to detect if complete line is read. In our case, incomplete line reading is acknowledged by exception. the same for overlimited line. In other words, for my function, if something is returned, it is guaranteed that this is complete data. Next, after reading of such chunk, users will validate, parse it or pass to struct.unpack. In both cases, user does not need separator and he will cut it. Even for logging it is required to cut newline. So, my function consume separator, but not return it. Seems this is very suitable for all tasks. If not, please provide example. Yes, it is not traditional (tornado returns separator), but much more convenient. In extra case, user may manually append separator to returned value before next processing of data, since separator was passed as argument. But I'm consider this is very rare case or even impossible in real situations. I have experience in writing protocol implementations. |
|
@socketpair All good points. OTOH, I still don't like this subtle difference between |
|
@1st1 I have renamed function |
|
@1st1 and @gvanrossum. Will you accept this new function with current separator behaviour? |
asyncio/streams.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1st1 and @gvanrossum Is this exception's name good enough? I have problems with naming and with my english.
|
I am almost happy with this. Left a few comments. Thanks for persevering (and sorry for being so slow -- being alternatingly sick and busy is no fun). |
|
@gvanrossum you should not apologize, everyone understands your workload. :) I'm happy that you personally review my code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gvanrossum @socketpair Why do we leave the data in the buffer? Suppose one implements a MIME parser, and tries to read until they see a boundary. Would it make sense for them to handle the LimitOverrunError, and instead of closing the connection, trying to recover with just reading the data? Is there a realistic use case for this?
The reason I'm asking this, is because I don't like that readline and readuntil will behave differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous implementation there was configurable limit. now it has gone. So, user may make a warning, and try to read with bigger limit, or, read by portions to skip that bad data...or, maybe read until another separator, trying to recover from error.
One of the most unwanted feature of readline - is to leave stream in unpredicatable state when such error occur.
I can remove that odd (as you think) behaviour, but in that case I should add closing of connection right before raising such exception. (at least, disable any readings and simulate EOF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[..] and try to read with bigger limit, or, read by portions to skip that bad data...or, maybe read until another separator, trying to recover from error.
Unfortunately I haven't seen such an advanced code yet. Usually the code just crashes, and the connection gets closed :)
One of the most unwanted feature of readline - is to leave stream in unpredicatable state when such error occur.
Agree. But... The readline method before your PR did exactly that? Sorry, I'm a bit confused now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, readline before me just clears buffer (sometimes, depending on fact it found newline or not), and was not closing connection. You may see original, not so obvious code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, even though it is an obscure border case. There is actually an alternative way to read that buffered data -- using plain read().
|
@socketpair there are a bunch of questions to be resolved before merging this. Please read my comments. Sorry it takes so long for this PR, but it's a sizeable rewrite of the existing code + new methods. We have to be careful here. Thanks! |
|
OK, I understand you in your requirements, this is OK. My PR in nodejs tooks about 4 months :) and succeeded after that (so, I have discovered one other megabug while writing test on bug I found (assertion failed in C code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining what's going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a comment, saying that this restore original behaviour of readline (see - I even did not touch original tests). Since "original behaviour" is unknown for people seeing that comment in future, I decide to remove comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment(s) should be restored. The only reason we have these two except clauses is for backwards compatibility, so we should state that. Otherwise in the future people may waste time trying to understand the reason or try to fix it and accidentally break code that depends on the old way. (Actually, the return e.partial is for compatibility with readline() methods on synchronous I/O classes; the backwards compatibility concern applies to deleting the consumed portion of the buffer only.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Alternatively, in the future people may legitimately decide to leave the excess data in the buffer for the LimitOverrunError case -- it helps that the only reason to not do that was backwards compatibility, not anything deeper.)
|
Seems, I have fixed everything requested, except long comment and I speak about REQUIREMENTS, i.e. changes, without of which you will not merge. |
I left you a comment re |
This function overcome ReadStream.readline() limitation on separator value. Also update docstrings of read*() functions and do some refactoring on them.
|
Well, only question leaft is a big comment. Will you merge with that? Also, when I remove that |
We'll merge the PR, it's only a matter of time to fix the remaining nits!
That might be a good reason to keep that |
|
Happy New Year everybody! There seems to be one open issue here (the investigation into why that +1 is needed). I will try to conduct a separate review. Then we can hopefully merge this. (Personally I have no problem with the big comment.) |
|
I looked into the reason for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not clear the buffer here? The same argument applies -- there's another way to read the rejected data if an app really wants it. That may be better than stuffing it in the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not clear the buffer here?
- It is compatibility with
readexactly. - If
LimitOverrunoccurs, this may not mean some protocol violation (say, in HTTP there are no limits on length in various places). This exception is not "fatal", and we should preserve data in order to continue. For example, users of library may issue warning in logs and try to re-read with bigger limit. UPD: Specifyinglimitin that function was removed (but I do not agree with that, but remove it in order to be merged). - If
IncompleteReadErroroccurs, this mean protocol violation (or sudden disconnection). That mean requested operation can never complete. Yes, someone may want to re-read data in different way, but I can not provide real use case (as opposed toLimitOverruncase). Partially fetched record can not be used in a general, so this exception is "fatal". If someone wants to inspect what data still have been fetched - it have ability to. If I remove buffer cleanup, and user wants to get that partial data, he will required to call.read()in a loop in order to fetch everything unread, since we can not get amount of data in buffer (and callreadexactly). Moreover, it is common to use this function to read data line-by-line, and so users should distinguish between empty and non-empty data after this exception happen. In other words, having data inside THIS exception is very suitable.
|
OK, you've convinced me. LGTM. Maybe Yury has some final remarks? Yury, do you want to merge? |
|
@gvanrossum Yes, I like this PR, and the new @socketpair Would you be able to create an issue on bugs.python.org and draft a patch for Python documentation (update asyncio docs, whatsnew, etc)? |
|
|
I believe it's https://github.com/python/cpython/blob/master/Doc/library/asyncio-stream.rst for doc update |
|
Mark, I've committed your changes in 47fafbb. Thank you very much for your hard work on this PR. Please upload a draft patch with documentation for |
No description provided.