Skip to content

Conversation

@methane
Copy link
Member

@methane methane commented Jul 4, 2018

tarfile._Stream has two buffer for compressed and uncompressed data.
Those buffers are not aligned so unnecessary bytes slicing happens
for every reading chunks.

This commit bypass compressed buffering.

In this benchmark 1, user time become 250ms from 300ms.

https://bugs.python.org/issue34043

tarfile._Stream has two buffer for compressed and uncompressed data.
Those buffers are not aligned so unnecessary bytes slicing happens
for every reading chunks.

This commit bypass compressed buffering.

In this benchmark [1], user time become 250ms from 300ms.

[1]: https://bugs.python.org/msg320763
@methane methane added the performance Performance or resource usage label Jul 4, 2018
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor nitpick.

Lib/tarfile.py Outdated
# Skip underlaying buffer to avoid unaligned double
# buffering.
if self.buf:
buf, self.buf = self.buf, b""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I would prefer to do that on two lines, but it's just a matter of taste.

@serhiy-storchaka serhiy-storchaka self-requested a review July 4, 2018 13:05
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just saw a typo in a comment.

Lib/tarfile.py Outdated
buf = self.__read(self.bufsize)
if not buf:
break
# Skip underlaying buffer to avoid unaligned double
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? underlying?

@methane methane merged commit 8d13091 into python:master Jul 6, 2018
@methane methane deleted the tarfile branch July 6, 2018 05:06
If size is not defined, return all bytes of the stream
up to EOF.
"""
if size is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only one question. Why this branch was removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is follow up of bpo-34010, (GH-8020).
See also, https://bugs.python.org/issue34010#msg321040

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Instances of _Stream are never leaked to the end user? Then this change LGTM.

Copy link
Member Author

@methane methane Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be accessed via TarFile.fileobj.
But using it directly is not pragramatic, and TarFile.fileobj.read() caused TypeError because of "".join() bug.
So it must not used for previous versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that the class is private, but I didn't notice that TarFile.fileobj is public. Maybe just replace the assertion with a regular if/raise, just in case? Maybe raise a NotImplementedError?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants