Skip to content

Reduce single file format overhead by lazily importing modules and loading .gitignore / --exclude#3211

Merged
ichard26 merged 3 commits into
mainfrom
reduce-single-file-format-overhead
Aug 27, 2022
Merged

Reduce single file format overhead by lazily importing modules and loading .gitignore / --exclude#3211
ichard26 merged 3 commits into
mainfrom
reduce-single-file-format-overhead

Conversation

@ichard26
Copy link
Copy Markdown
Collaborator

@ichard26 ichard26 commented Aug 5, 2022

Description

`black.reformat_many` depends on a lot of slow-to-import modules. When
formatting simply a single file, the time paid to import those modules
is totally wasted. So I moved `black.reformat_many` and its helpers
to `black.concurrency` which is now *only* imported if there's more
than one file to reformat. This way, running Black over a single file
is snappier

Here are the numbers before and after this patch running `python -m
black --version`:

- interpreted: 411 ms +- 9 ms -> 342 ms +- 7 ms: 1.20x faster
- compiled: 365 ms +- 15 ms -> 304 ms +- 7 ms: 1.20x faster
Loading .gitignore and compiling the exclude regex can take more than
15ms. We shouldn't and don't need to pay this cost if we're simply
formatting files given on the command line directly.

I would've loved to lazily import pathspec, but the patch won't be clean
until the file collection and discovery logic is refactored first.

Co-authored-by: Fabio Zadrozny <fabiofz@gmail.com>

Fixes #2564 and closes #2656.

Checklist - did you ...

  • Add a CHANGELOG entry if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation? -> n/a

@ichard26 ichard26 added the C: performance Black is too slow. Or too fast. label Aug 5, 2022
@ichard26
Copy link
Copy Markdown
Collaborator Author

ichard26 commented Aug 5, 2022

For what it is worth, I'd prefer a rebase merge for this PR given the commits are mostly independent (other than the changelog entry).

@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26
Copy link
Copy Markdown
Collaborator Author

ichard26 commented Aug 5, 2022

Coolio, diff-shades is failing and I don't know why at all ...

@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
Copy link
Copy Markdown

@Doyhenc99 Doyhenc99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadang bib

Comment thread src/black/__init__.py Outdated
Comment thread src/black/__init__.py Outdated
ichard26 and others added 3 commits August 13, 2022 13:49
`black.reformat_many` depends on a lot of slow-to-import modules. When
formatting simply a single file, the time paid to import those modules
is totally wasted. So I moved `black.reformat_many` and its helpers
to `black.concurrency` which is now *only* imported if there's more
than one file to reformat. This way, running Black over a single file
is snappier

Here are the numbers before and after this patch running `python -m
black --version`:

- interpreted: 411 ms +- 9 ms -> 342 ms +- 7 ms: 1.20x faster
- compiled: 365 ms +- 15 ms -> 304 ms +- 7 ms: 1.20x faster

Co-authored-by: Fabio Zadrozny <fabiofz@gmail.com>
Loading .gitignore and compiling the exclude regex can take more than
15ms. We shouldn't and don't need to pay this cost if we're simply
formatting files given on the command line directly.

I would've loved to lazily import pathspec, but the patch won't be clean
until the file collection and discovery logic is refactored first.

Co-authored-by: Fabio Zadrozny <fabiofz@gmail.com>
os.cpu_count() can return None (sounds like a super arcane edge case
though) so the type annotation for the `workers` parameter of
`black.main` is wrong. This *could* technically cause a runtime
TypeError since it'd trip one of mypyc's runtime type checks so we
might as well fix it.

Reading the documentation (and cross-checking with the source code),
you are actually allowed to pass None as `max_workers` to
`concurrent.futures.ProcessPoolExecutor`. If it is None, the pool
initializer will simply call os.cpu_count() [^1] (defaulting to 1 if it
returns None [^2]). It'll even round down the worker count to a level
that's safe for Windows.

... so theoretically we don't even need to call os.cpu_count()
ourselves, but our Windows limit is 60 (unlike the stdlib's 61) and I'd
prefer not accidentally reintroducing a crash on machines with many,
many CPU cores.

[^1]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
[^2]: https://github.com/python/cpython/blob/a372a7d65320396d44e8beb976e3a6c382963d4e/Lib/concurrent/futures/process.py#L600
@ichard26 ichard26 force-pushed the reduce-single-file-format-overhead branch from aafd5b9 to 2c0d427 Compare August 13, 2022 17:50
@ichard26 ichard26 requested a review from JelleZijlstra August 13, 2022 17:50
@ichard26 ichard26 marked this pull request as ready for review August 13, 2022 17:50
@ichard26
Copy link
Copy Markdown
Collaborator Author

PTAL. Please note I'd prefer a rebase merge here if you approve and chose to merge right after.

Comment thread src/black/concurrency.py
workers: Optional[int],
) -> None:
"""Reformat multiple files using a ProcessPoolExecutor."""
maybe_install_uvloop()
Copy link
Copy Markdown
Collaborator Author

@ichard26 ichard26 Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to keep the helper function instead of always attempting an import of uvloop in the module scope just as insurance if we ever accidentally break the invariant black.concurrency should only be imported via black.main if there's more than one file to format.

@github-actions
Copy link
Copy Markdown
Contributor

diff-shades reports zero changes comparing this PR (2c0d427) to main (4ebf14d).


What is this? | Workflow run | diff-shades documentation

@ichard26
Copy link
Copy Markdown
Collaborator Author

This works on Windows with PyInstaller (just tested locally) so I'm merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C: performance Black is too slow. Or too fast.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Black slow startup to format a single file

3 participants