Skip to content
This repository was archived by the owner on Nov 23, 2017. It is now read-only.

Conversation

@socketpair
Copy link

No description provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opening """ should be on the same line as the first word of the docstring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@gvanrossum
Copy link
Member

Ping? Any updates?

@socketpair
Copy link
Author

@gvanrossum pong. There is nothing to change in that PR as I think. If not - please say what should be done.

@gvanrossum
Copy link
Member

Here's what I recommend: instead of adding an optional argument to readline(), leave readline() alone and add a new method readrecord() with a mandatory trailing separator. You can then change the policies of readrecord() to be more strict when it encounters premature EOF or a record that's too long (or you could ignore the limit -- after all read() and readexactly() ignore the limit too).

@socketpair
Copy link
Author

This is pleriminary fix. I don't know if I should change readline code ever. Now, all test passed and no code duplication. Also, I use offset instead of merging/slicing buffers - this is faster than previous implementation.

So, I'm asking your to:

  1. Decide, should I leave original readline implementation or not
  2. make a quick code-review + English-related fixes.

After that I will change commit message, squash and so on.

I'm happy to change anything.

@socketpair
Copy link
Author

@gvanrossum ping

@gvanrossum
Copy link
Member

Sorry, I've been alternatingly busy and sick. But I'll get to this
eventually!

On Wed, Dec 9, 2015 at 1:12 PM, Коренберг Марк [email protected]
wrote:

@gvanrossum https://github.com/gvanrossum ping


Reply to this email directly or view it on GitHub
#297 (comment).

--Guido van Rossum (python.org/~guido)

@1st1
Copy link
Member

1st1 commented Dec 11, 2015

Why readrecord? When I think about records I assume that they have constant sizes, or are encoded in [header+body-len][body] format.

Maybe readuntil would be a better name?

@socketpair
Copy link
Author

@1st1 agree. Tornado call it read_until. I wIll rename to same function, but without underscore, as you said.

@1st1
Copy link
Member

1st1 commented Dec 11, 2015

BTW, do we consume the separator? Should we add a bool flag to either leave it in the buffer or consume it?

@socketpair
Copy link
Author

@1st1 could you provide case when we should preserve separator ?

I mean two different things - preserving it in internal buffer and prserving at the end of returned value.

@1st1
Copy link
Member

1st1 commented Dec 11, 2015

Does readrecord in your current patch consume the separator (i.e. will it be left for future consumption in the internal StreamReader buffer)? The docstrting doesn't mention that.

@1st1 could you provide case when we should preserve separator ?

I don't know any concrete use case for that, but I can imagine that being useful for separators more complex than \r\n. But I think this is something we can always add later.

@socketpair
Copy link
Author

Yes, if complete record is read successfully, separator is removed both from buffer and from returned value.

In any case, it is guaranteed, that removed separator is the same as argument of the function, so data is not lost.

@gvanrossum
Copy link
Member

It should consume (and return) the separator, similar to readline(). (The
one difference is at EOF -- IIUC if readuntil() sees EOF before seeing a
full separator it should raise EOFError (and also if it sees EOF
immediately -- similar to readexactly()).

On Fri, Dec 11, 2015 at 9:48 AM, Yury Selivanov [email protected]
wrote:

Does readrecord in your current patch consume the separator (i.e. will it
be left for future consumption in the internal StreamReader buffer)? The
docstrting doesn't mention that.

@1st1 https://github.com/1st1 could you provide case when we should
preserve separator ?

I don't know any concrete use case for that, but I can imagine that being
useful for separators more complex than \r\n. But I think this is
something we can always add later.


Reply to this email directly or view it on GitHub
#297 (comment).

--Guido van Rossum (python.org/~guido)

@gvanrossum
Copy link
Member

Oh, I see, that's different from readline(). Do you have a reason for that?

On Fri, Dec 11, 2015 at 10:02 AM, Коренберг Марк [email protected]
wrote:

Yes, if complete record is read successfully, separator is removed both
from buffer and from returned value.

In any case, it is guaranteed, that removed separator is the same as
argument of the function, so data is not lost.


Reply to this email directly or view it on GitHub
#297 (comment).

--Guido van Rossum (python.org/~guido)

@socketpair
Copy link
Author

@gvanrossum yes, I have reason. First, this function will be used to read message of unknown size from a stream. I.e multipart form data or end of http header or even end of http headers(\r\n\r\n). In all cases, separator is a some protocol defined delimiter, it have no meaning in data it service for.

Next, storing separator in returned value have meaning for readline api, since presense of newline at end of string is only way to detect if complete line is read. In our case, incomplete line reading is acknowledged by exception. the same for overlimited line. In other words, for my function, if something is returned, it is guaranteed that this is complete data.

Next, after reading of such chunk, users will validate, parse it or pass to struct.unpack. In both cases, user does not need separator and he will cut it. Even for logging it is required to cut newline.

So, my function consume separator, but not return it. Seems this is very suitable for all tasks. If not, please provide example. Yes, it is not traditional (tornado returns separator), but much more convenient. In extra case, user may manually append separator to returned value before next processing of data, since separator was passed as argument. But I'm consider this is very rare case or even impossible in real situations. I have experience in writing protocol implementations.

@1st1
Copy link
Member

1st1 commented Dec 12, 2015

@socketpair All good points.

OTOH, I still don't like this subtle difference between readline and readuntil. Which one of them returns the delimiter is something that I will constantly forget. Removing the delimiter is a matter of doing a simple slice operation. Tornado, by the way, doesn't strip the delimiter: http://www.tornadoweb.org/en/stable/iostream.html#tornado.iostream.BaseIOStream.read_until

@1st1 1st1 changed the title ReadStream.readline() now supports custom separators Add new StreamReader.readuntil() method. Dec 12, 2015
@socketpair
Copy link
Author

@1st1 I have renamed function

@socketpair
Copy link
Author

@1st1 and @gvanrossum. Will you accept this new function with current separator behaviour?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1st1 and @gvanrossum Is this exception's name good enough? I have problems with naming and with my english.

@gvanrossum
Copy link
Member

I am almost happy with this. Left a few comments. Thanks for persevering (and sorry for being so slow -- being alternatingly sick and busy is no fun).

@socketpair
Copy link
Author

@gvanrossum you should not apologize, everyone understands your workload. :) I'm happy that you personally review my code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvanrossum @socketpair Why do we leave the data in the buffer? Suppose one implements a MIME parser, and tries to read until they see a boundary. Would it make sense for them to handle the LimitOverrunError, and instead of closing the connection, trying to recover with just reading the data? Is there a realistic use case for this?

The reason I'm asking this, is because I don't like that readline and readuntil will behave differently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous implementation there was configurable limit. now it has gone. So, user may make a warning, and try to read with bigger limit, or, read by portions to skip that bad data...or, maybe read until another separator, trying to recover from error.

One of the most unwanted feature of readline - is to leave stream in unpredicatable state when such error occur.

I can remove that odd (as you think) behaviour, but in that case I should add closing of connection right before raising such exception. (at least, disable any readings and simulate EOF).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[..] and try to read with bigger limit, or, read by portions to skip that bad data...or, maybe read until another separator, trying to recover from error.

Unfortunately I haven't seen such an advanced code yet. Usually the code just crashes, and the connection gets closed :)

One of the most unwanted feature of readline - is to leave stream in unpredicatable state when such error occur.

Agree. But... The readline method before your PR did exactly that? Sorry, I'm a bit confused now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, readline before me just clears buffer (sometimes, depending on fact it found newline or not), and was not closing connection. You may see original, not so obvious code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, even though it is an obscure border case. There is actually an alternative way to read that buffered data -- using plain read().

@1st1
Copy link
Member

1st1 commented Dec 14, 2015

@socketpair there are a bunch of questions to be resolved before merging this. Please read my comments. Sorry it takes so long for this PR, but it's a sizeable rewrite of the existing code + new methods. We have to be careful here. Thanks!

@socketpair
Copy link
Author

OK, I understand you in your requirements, this is OK. My PR in nodejs tooks about 4 months :) and succeeded after that (so, I have discovered one other megabug while writing test on bug I found (assertion failed in C code))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining what's going on here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment, saying that this restore original behaviour of readline (see - I even did not touch original tests). Since "original behaviour" is unknown for people seeing that comment in future, I decide to remove comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment(s) should be restored. The only reason we have these two except clauses is for backwards compatibility, so we should state that. Otherwise in the future people may waste time trying to understand the reason or try to fix it and accidentally break code that depends on the old way. (Actually, the return e.partial is for compatibility with readline() methods on synchronous I/O classes; the backwards compatibility concern applies to deleting the consumed portion of the buffer only.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Alternatively, in the future people may legitimately decide to leave the excess data in the buffer for the LimitOverrunError case -- it helps that the only reason to not do that was backwards compatibility, not anything deeper.)

@socketpair
Copy link
Author

Seems, I have fixed everything requested, except long comment and offset = buflen - seplen. Will it be merged in current state, or it is required to fix something ?

I speak about REQUIREMENTS, i.e. changes, without of which you will not merge.

@1st1
Copy link
Member

1st1 commented Dec 21, 2015

Seems, I have fixed everything requested, except long comment and offset = buflen - seplen. Will it be merged in current state, or it is required to fix something ?

I left you a comment re offset = buflen - seplen.

@1st1 1st1 closed this Dec 21, 2015
@1st1 1st1 reopened this Dec 21, 2015
This function overcome ReadStream.readline() limitation on separator
value.

Also update docstrings of read*() functions and do some refactoring on
them.
@socketpair
Copy link
Author

Well, only question leaft is a big comment. Will you merge with that? Also, when I remove that +1 tests hang (!). I now discovering why.

@1st1
Copy link
Member

1st1 commented Dec 21, 2015

Well, only question leaft is a big comment. Will you merge with that?

We'll merge the PR, it's only a matter of time to fix the remaining nits!

Also, when I remove that +1 tests hang (!). I now discovering why.

That might be a good reason to keep that + 1. Or, maybe, it's a hint that there is another bug. Please investigate.

@gvanrossum
Copy link
Member

Happy New Year everybody! There seems to be one open issue here (the investigation into why that +1 is needed). I will try to conduct a separate review. Then we can hopefully merge this. (Personally I have no problem with the big comment.)

@gvanrossum
Copy link
Member

I looked into the reason for the + 1 and it is correct. The reason the test hangs is simple: It sets the limit to 7, fills the buffer with 8 bytes (none of which are the separator, which has size 1), and then expects LimitOverrunError to be raised. Using offset = bufsize + 1 - seplen, offset becomes 8, which is > 7, so the exception is raised. But without + 1, offset is only 7, which doesn't cause the exception. But the test neither sets EOF not adds more data, so then readuntil() simply hangs (there is no higher-level timeout in the test or in the stream code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not clear the buffer here? The same argument applies -- there's another way to read the rejected data if an app really wants it. That may be better than stuffing it in the exception.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvanrossum

Maybe we should not clear the buffer here?

  1. It is compatibility with readexactly.
  2. If LimitOverrun occurs, this may not mean some protocol violation (say, in HTTP there are no limits on length in various places). This exception is not "fatal", and we should preserve data in order to continue. For example, users of library may issue warning in logs and try to re-read with bigger limit. UPD: Specifying limit in that function was removed (but I do not agree with that, but remove it in order to be merged).
  3. If IncompleteReadError occurs, this mean protocol violation (or sudden disconnection). That mean requested operation can never complete. Yes, someone may want to re-read data in different way, but I can not provide real use case (as opposed to LimitOverrun case). Partially fetched record can not be used in a general, so this exception is "fatal". If someone wants to inspect what data still have been fetched - it have ability to. If I remove buffer cleanup, and user wants to get that partial data, he will required to call .read() in a loop in order to fetch everything unread, since we can not get amount of data in buffer (and call readexactly). Moreover, it is common to use this function to read data line-by-line, and so users should distinguish between empty and non-empty data after this exception happen. In other words, having data inside THIS exception is very suitable.

@gvanrossum
Copy link
Member

OK, you've convinced me. LGTM. Maybe Yury has some final remarks? Yury, do you want to merge?

@1st1
Copy link
Member

1st1 commented Jan 8, 2016

@gvanrossum Yes, I like this PR, and the new readuntil method. Please let me to experiment with the patch over the weekend and I can merge it on Monday.

@socketpair Would you be able to create an issue on bugs.python.org and draft a patch for Python documentation (update asyncio docs, whatsnew, etc)?

@socketpair
Copy link
Author

@1st1

  1. Thanks to spend time with my code :)
  2. Yes, I able to. http://bugs.python.org/issue26050
  3. Please point me which files I should update

@asvetlov
Copy link

asvetlov commented Jan 8, 2016

@1st1
Copy link
Member

1st1 commented Jan 11, 2016

Mark, I've committed your changes in 47fafbb.

Thank you very much for your hard work on this PR. Please upload a draft patch with documentation for readuntil and LimitOverrunError to http://bugs.python.org/issue26050.

@1st1 1st1 closed this Jan 11, 2016
@socketpair
Copy link
Author

@socketpair socketpair deleted the separ branch January 14, 2016 15:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants