bpo-34043: Optimize tarfile uncompress performance #8089

methane · 2018-07-04T12:20:04Z

tarfile._Stream has two buffer for compressed and uncompressed data.
Those buffers are not aligned so unnecessary bytes slicing happens
for every reading chunks.

This commit bypass compressed buffering.

In this benchmark 1, user time become 250ms from 300ms.

https://bugs.python.org/issue34043

tarfile._Stream has two buffer for compressed and uncompressed data. Those buffers are not aligned so unnecessary bytes slicing happens for every reading chunks. This commit bypass compressed buffering. In this benchmark [1], user time become 250ms from 300ms. [1]: https://bugs.python.org/msg320763

vstinner

LGTM, just a minor nitpick.

vstinner · 2018-07-04T12:27:16Z

Lib/tarfile.py

+            # Skip underlaying buffer to avoid unaligned double
+            # buffering.
+            if self.buf:
+                buf, self.buf = self.buf, b""


nitpick: I would prefer to do that on two lines, but it's just a matter of taste.

vstinner

LGTM. I just saw a typo in a comment.

vstinner · 2018-07-05T12:22:51Z

Lib/tarfile.py

-            buf = self.__read(self.bufsize)
-            if not buf:
-                break
+            # Skip underlaying buffer to avoid unaligned double


typo? underlying?

serhiy-storchaka · 2018-07-06T06:45:47Z

Lib/tarfile.py

-           If size is not defined, return all bytes of the stream
-           up to EOF.
-        """
-        if size is None:


I have only one question. Why this branch was removed?

This issue is follow up of bpo-34010, (GH-8020).
See also, https://bugs.python.org/issue34010#msg321040

Thank you. Instances of _Stream are never leaked to the end user? Then this change LGTM.

It can be accessed via TarFile.fileobj.
But using it directly is not pragramatic, and TarFile.fileobj.read() caused TypeError because of "".join() bug.
So it must not used for previous versions.

I saw that the class is private, but I didn't notice that TarFile.fileobj is public. Maybe just replace the assertion with a regular if/raise, just in case? Maybe raise a NotImplementedError?

the-knights-who-say-ni added the CLA signed label Jul 4, 2018

bedevere-bot added the awaiting merge label Jul 4, 2018

methane added the performance Performance or resource usage label Jul 4, 2018

vstinner approved these changes Jul 4, 2018

View reviewed changes

serhiy-storchaka self-requested a review July 4, 2018 13:05

split two assignments

ab33cea

vstinner approved these changes Jul 5, 2018

View reviewed changes

fix typo

cd48a2c

methane merged commit 8d13091 into python:master Jul 6, 2018

bedevere-bot removed the awaiting merge label Jul 6, 2018

methane deleted the tarfile branch July 6, 2018 05:06

serhiy-storchaka reviewed Jul 6, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-34043: Optimize tarfile uncompress performance #8089

bpo-34043: Optimize tarfile uncompress performance #8089

Uh oh!

methane commented Jul 4, 2018 •

edited by bedevere-bot

Loading

Uh oh!

vstinner left a comment

Uh oh!

vstinner Jul 4, 2018

Uh oh!

vstinner left a comment

Uh oh!

vstinner Jul 5, 2018

Uh oh!

serhiy-storchaka Jul 6, 2018

Uh oh!

methane Jul 6, 2018

Uh oh!

serhiy-storchaka Jul 6, 2018

Uh oh!

methane Jul 6, 2018 •

edited

Loading

Uh oh!

vstinner Jul 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

bpo-34043: Optimize tarfile uncompress performance #8089

bpo-34043: Optimize tarfile uncompress performance #8089

Uh oh!

Conversation

methane commented Jul 4, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 4, 2018

Choose a reason for hiding this comment

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Jul 6, 2018

Choose a reason for hiding this comment

Uh oh!

methane Jul 6, 2018

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Jul 6, 2018

Choose a reason for hiding this comment

Uh oh!

methane Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 6, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

methane commented Jul 4, 2018 •

edited by bedevere-bot

Loading

methane Jul 6, 2018 •

edited

Loading