-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Use memchr to speedup newline search on x86
#3985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
MichaReiser
merged 3 commits into
main
from
Use_memchr_to_speedup_newline_search_on_x86
Apr 26, 2023
Merged
Use memchr to speedup newline search on x86
#3985
MichaReiser
merged 3 commits into
main
from
Use_memchr_to_speedup_newline_search_on_x86
Apr 26, 2023
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
Author
|
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
6 tasks
Contributor
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinuxWindows |
0e7a8fa to
c477216
Compare
9791361 to
ebfeb11
Compare
Merged
dc30757 to
815f484
Compare
63fed61 to
322b800
Compare
8eca22d to
c29914a
Compare
322b800 to
6d1311c
Compare
cfe0fe1 to
78b9a89
Compare
6d1311c to
4b3c569
Compare
7f7036b to
c1b739d
Compare
charliermarsh
approved these changes
Apr 20, 2023
Member
charliermarsh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Needs to be rebased but I reviewed the new commits individually.)
4b3c569 to
c1b40a3
Compare
90fc963 to
5995306
Compare
c1b40a3 to
d22ae89
Compare
This was referenced Apr 26, 2023
5995306 to
7893968
Compare
d22ae89 to
1df35db
Compare
renovate bot
referenced
this pull request
in ixm-one/pytest-cmake-presets
May 2, 2023
[](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `^0.0.263` -> `^0.0.264` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [https://github.com/charliermarsh/ruff/pull/4123](https://togithub.com/charliermarsh/ruff/pull/4123) - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [https://github.com/charliermarsh/ruff/pull/4122](https://togithub.com/charliermarsh/ruff/pull/4122) ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4087](https://togithub.com/charliermarsh/ruff/pull/4087) ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4110](https://togithub.com/charliermarsh/ruff/pull/4110) - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4152](https://togithub.com/charliermarsh/ruff/pull/4152) - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4161](https://togithub.com/charliermarsh/ruff/pull/4161) - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4083](https://togithub.com/charliermarsh/ruff/pull/4083) - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4111](https://togithub.com/charliermarsh/ruff/pull/4111) - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4129](https://togithub.com/charliermarsh/ruff/pull/4129) - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4145](https://togithub.com/charliermarsh/ruff/pull/4145) - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4165](https://togithub.com/charliermarsh/ruff/pull/4165) - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4176](https://togithub.com/charliermarsh/ruff/pull/4176) ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3931](https://togithub.com/charliermarsh/ruff/pull/3931) - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4022](https://togithub.com/charliermarsh/ruff/pull/4022) - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3985](https://togithub.com/charliermarsh/ruff/pull/3985) - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4138](https://togithub.com/charliermarsh/ruff/pull/4138) **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/ixm-one/pytest-cmake-presets). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42Ni4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjYuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Signed-off-by: Renovate Bot <[email protected]> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
renovate bot
referenced
this pull request
in allenporter/flux-local
May 3, 2023
[](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `==0.0.263` -> `==0.0.264` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [https://github.com/charliermarsh/ruff/pull/4123](https://togithub.com/charliermarsh/ruff/pull/4123) - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [https://github.com/charliermarsh/ruff/pull/4122](https://togithub.com/charliermarsh/ruff/pull/4122) ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4087](https://togithub.com/charliermarsh/ruff/pull/4087) ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4110](https://togithub.com/charliermarsh/ruff/pull/4110) - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4152](https://togithub.com/charliermarsh/ruff/pull/4152) - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4161](https://togithub.com/charliermarsh/ruff/pull/4161) - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4083](https://togithub.com/charliermarsh/ruff/pull/4083) - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4111](https://togithub.com/charliermarsh/ruff/pull/4111) - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4129](https://togithub.com/charliermarsh/ruff/pull/4129) - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4145](https://togithub.com/charliermarsh/ruff/pull/4145) - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4165](https://togithub.com/charliermarsh/ruff/pull/4165) - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4176](https://togithub.com/charliermarsh/ruff/pull/4176) ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3931](https://togithub.com/charliermarsh/ruff/pull/3931) - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4022](https://togithub.com/charliermarsh/ruff/pull/4022) - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3985](https://togithub.com/charliermarsh/ruff/pull/3985) - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4138](https://togithub.com/charliermarsh/ruff/pull/4138) **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/allenporter/flux-local). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
renovate bot
referenced
this pull request
in allenporter/pyrainbird
May 3, 2023
[](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `==0.0.263` -> `==0.0.264` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [https://github.com/charliermarsh/ruff/pull/4123](https://togithub.com/charliermarsh/ruff/pull/4123) - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [https://github.com/charliermarsh/ruff/pull/4122](https://togithub.com/charliermarsh/ruff/pull/4122) ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4087](https://togithub.com/charliermarsh/ruff/pull/4087) ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4110](https://togithub.com/charliermarsh/ruff/pull/4110) - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4152](https://togithub.com/charliermarsh/ruff/pull/4152) - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4161](https://togithub.com/charliermarsh/ruff/pull/4161) - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4083](https://togithub.com/charliermarsh/ruff/pull/4083) - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4111](https://togithub.com/charliermarsh/ruff/pull/4111) - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4129](https://togithub.com/charliermarsh/ruff/pull/4129) - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4145](https://togithub.com/charliermarsh/ruff/pull/4145) - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4165](https://togithub.com/charliermarsh/ruff/pull/4165) - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4176](https://togithub.com/charliermarsh/ruff/pull/4176) ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3931](https://togithub.com/charliermarsh/ruff/pull/3931) - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4022](https://togithub.com/charliermarsh/ruff/pull/4022) - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3985](https://togithub.com/charliermarsh/ruff/pull/3985) - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4138](https://togithub.com/charliermarsh/ruff/pull/4138) **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/allenporter/pyrainbird). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
BurntSushi
added a commit
that referenced
this pull request
Nov 10, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 10, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 10, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 27, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 28, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

Use
memchrto find the newline characters in strings. I expect this to improve performance on X86 processors becausememchrusesSIMDinternally.