Skip to content

cut: fix -s flag for newline delimiter and optimize memory allocation#11143

Merged
cakebaker merged 1 commit into
uutils:mainfrom
akervald:fix-cut-newline-s-flag
Mar 4, 2026
Merged

cut: fix -s flag for newline delimiter and optimize memory allocation#11143
cakebaker merged 1 commit into
uutils:mainfrom
akervald:fix-cut-newline-s-flag

Conversation

@akervald

@akervald akervald commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Fixes & Improvements

  • -slogic fix: Add the missing only_delimited check to properly suppress non-delimited lines.
  • Field-Level Streaming: Replace whole-file split().collect() with a memchr-powered loop. This shifts memory complexity from O(Total File Size) to O(Max Field Size) - as "OOM-safe" as the specification allows.
  • Zero-Allocation Skipping: Bypass unselected fields using BufReader::consume() to avoid heap copies.
  • Sequential Pointer Tracking: Replace nested loops and segments.get() lookups with a single-pass range_idx pointer that synchronizes "Skip" and "Keep" paths in one linear sweep.
  • Early Exit: Terminate I/O immediately once the highest requested field is processed.
  • Edge Case Support: Correctly handle single lines lacking a trailing newline.

Benchmarks

10,000,000 records (seq 1 10000000 > bench_input.txt), base M1 Pro chip.

Case 1: Filtered Selection with Early Exit (-s -d $'\n' -f 2,1024,4096)

Command Mean [ms] Min [ms] Max [ms] Relative
gcut 330.8 ± 5.6 326.5 341.5 169.60 ± 30.28
./cut_old 437.1 ± 9.1 430.7 456.6 224.09 ± 40.10
./cut_new 2.0 ± 0.3 1.3 3.3 1.00

Result: ~224x faster than cut_old, ~170x faster than GNU cut.

Case 2: Full File Read / Base Throughput (-s -d $'\n' -f 1-10000000)

Command Mean [ms] Min [ms] Max [ms] Relative
gcut 676.7 ± 5.1 673.7 689.3 3.96 ± 0.13
./cut_old 527.0 ± 16.8 517.9 573.0 3.08 ± 0.14
./cut_new 171.1 ± 5.5 168.9 192.1 1.00

Result: ~3x faster than cut_old, ~4x faster than GNU cut.

References

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/misc/io-errors is no longer failing!
Congrats! The gnu test tests/tail/tail-n0f is now passing!

Comment thread tests/by-util/test_cut.rs Outdated
Comment thread tests/by-util/test_cut.rs
@codspeed-hq

codspeed-hq Bot commented Feb 27, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 45.19%

⚡ 1 improved benchmark
✅ 301 untouched benchmarks
🆕 2 new benchmarks
⏩ 42 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Simulation cut_fields_newline_delim N/A 189.8 µs N/A
🆕 Memory cut_fields_newline_delim N/A 67.8 KB N/A
Memory cut_fields_custom_delim 67.8 KB 46.7 KB +45.19%

Comparing akervald:fix-cut-newline-s-flag (fe4e36b) with main (f335d14)

Open in CodSpeed

Footnotes

  1. 42 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@akervald akervald closed this Feb 27, 2026
@akervald akervald reopened this Feb 27, 2026
@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/unexpand/bounded-memory is now being skipped but was previously passing.
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/follow-name (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/dd/no-allocate is now being skipped but was previously passing.
Note: The gnu test tests/pr/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.
Congrats! The gnu test tests/expand/bounded-memory is now passing!
Note: The gnu test tests/env/env-signal-handler was skipped on 'main' but is now failing.

@akervald akervald marked this pull request as draft February 28, 2026 10:36
@akervald akervald marked this pull request as ready for review February 28, 2026 11:58
@akervald akervald requested a review from cakebaker February 28, 2026 12:03
@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/follow-name (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cut/cut-huge-range is now being skipped but was previously passing.
Congrats! The gnu test tests/expand/bounded-memory is now passing!
Note: The gnu test tests/env/env-signal-handler was skipped on 'main' but is now failing.

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/date/resolution (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/io-errors is no longer failing!
Congrats! The gnu test tests/timeout/timeout-group is no longer failing!
Note: The gnu test tests/seq/seq-epipe is now being skipped but was previously passing.
Congrats! The gnu test tests/tail/tail-n0f is now passing!

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/cut/cut. tests/cut/cut is passing on 'main'. Maybe you have to rebase?
Note: The gnu test tests/tail/pipe-f is now being skipped but was previously passing.
Congrats! The gnu test tests/cp/link-heap is now passing!
Congrats! The gnu test tests/seq/seq-epipe is now passing!

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/resolution (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/io-errors is no longer failing!
Congrats! The gnu test tests/tail/retry is no longer failing!

@akervald

Copy link
Copy Markdown
Contributor Author

Hi @cakebaker, the tests passed, but the benchmark failed due to an infrastructure issue. Could you please re-run that job? Thanks!

@akervald

akervald commented Feb 28, 2026

Copy link
Copy Markdown
Contributor Author

@sylvestre I noticed Attempt №3 was cancelled. Since I don't have permissions to trigger the CI/CD jobs myself, could you let me know if there’s a specific fix I need to make, or if you could re-run the checks when the environment is ready? Thanks!

Comment thread tests/by-util/test_cut.rs Outdated
Comment thread tests/by-util/test_cut.rs Outdated
Comment thread tests/by-util/test_cut.rs Outdated
@github-actions

github-actions Bot commented Mar 2, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/rm/isatty is no longer failing!
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.

@akervald akervald requested a review from cakebaker March 2, 2026 12:14
@akervald

akervald commented Mar 2, 2026

Copy link
Copy Markdown
Contributor Author

@cakebaker Switching this to Draft. I've found a performance regression in the hot loop iter().any() is too slow when handling complex or numerous ranges. I'm going to optimize the range-filtering logic to ensure that we match the performance lead over GNU before I ask for a final review.

@akervald akervald marked this pull request as draft March 2, 2026 20:13
@akervald akervald marked this pull request as ready for review March 2, 2026 21:15
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)

@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/symlink (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)

Comment thread src/uu/cut/src/cut.rs Outdated
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)

@akervald

akervald commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

@cakebaker should be ready for a review/merge

Comment thread src/uu/cut/src/cut.rs Outdated
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.

@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

Comment thread src/uu/cut/src/cut.rs Outdated
Comment thread src/uu/cut/src/cut.rs Outdated
@github-actions

github-actions Bot commented Mar 4, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/tail/retry is no longer failing!
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

- Fixed the -s flag incorrectly suppressing output when the delimiter is a newline.
- Improved performance in cut_fields_newline_char_delim.
- Updated tests to match GNU cut behavior for newline delimiters.
@github-actions

github-actions Bot commented Mar 4, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/seq/seq-epipe is now being skipped but was previously passing.

@akervald akervald requested a review from cakebaker March 4, 2026 08:30
@cakebaker cakebaker merged commit 9bbb58b into uutils:main Mar 4, 2026
163 checks passed
@cakebaker

Copy link
Copy Markdown
Contributor

Thanks for your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cut: incorrect delimiter handling when delimiter is newline

2 participants