fix(apply): review-driven fixes — malformed CSV, init bugs, doc typos, perf#3741
Conversation
…, perf - Reject --new-column (-c) with multi-column selection for operations and emptyreplace: the loop pushed N data fields per row but only one new header, producing malformed CSV. Add validation + regression test. - Switch Round/Thousands/Crc32/Gender_Guess from OnceLock::set to get_or_init so reusing them in a chain no longer fails with a misleading "Cannot initialize" error. - Drop the CRC32 OnceLock entirely (the lock just stored a fresh hasher that was cloned per row — equivalent to constructing one inline). - Fix --new_column → --new-column in user-facing validation errors (the actual flag is hyphenated; copy-paste of the old text fails). - Fix two docstring typos (mismatched quotes in calcconv example; "CALCONV" → "CALCCONV"). - DynFmt/CalcConv: stop silently outputting "" when the first selected column is empty (the formatstr may not even reference that column). Always run the template; for CalcConv, only skip eval if the formatted expression is blank. - DynFmt/CalcConv: build the format-args buffer as Vec<&str> instead of cloning every field into Vec<String> per row. - Hoist the duplicated fract_3digits regex into a shared helper. - Use eq_ignore_ascii_case for the <NULL> check (avoids per-row allocation) and clarify the comment that the substitution is chain-wide. - Drop unused `use base62;` and `use crc32fast;` extern-crate-style imports. https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
Same family of bugs the apply.rs review uncovered also lived in applydp.rs.
- Reject --new-column (-c) with multi-column selection for operations and
emptyreplace; the loop pushed N data fields per row but only one new header,
producing malformed CSV. Add validation + regression test.
- Switch Round from OnceLock::set to get_or_init so reusing it in a chain no
longer fails with a misleading "Cannot initialize" error.
- Fix --new_column → --new-column in the copy validation error.
- DynFmt: stop silently outputting "" when the first selected column happens
to be empty (the formatstr may not even reference that column). Always run
the template; fall back to the source cell only if formatting itself errors.
- DynFmt: build the format-args buffer as Vec<&str> instead of cloning every
field into Vec<String> per row.
- "dynfmt/calcconv subcommand requires headers" → "dynfmt subcommand requires
headers" (applydp has no calcconv).
- <NULL>: use eq_ignore_ascii_case (no allocation) and clarify the comment
that the substitution is chain-wide.
- Doc bugs: example regex selector `\_name$\` → `'/_name$/'`; missing
`operations` subcommand in the replace example; doubled "to to"; comparand
description mentioned similarity ops which applydp doesn't have.
- Replace `match { Ok(...) => ..., Err(e) => return Err(e) }` with `?`.
- Use `fail_incorrectusage_clierror!` for the "unknown subcommand" arm to
match apply.rs.
- Collapse Mtrim/Mltrim/Mrtrim arm into a guard pattern (clippy warning).
https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 84 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Pull request overview
This PR hardens apply/applydp against producing malformed CSV output when --new-column is combined with multi-column transforms, while also addressing initialization pitfalls with OnceLock, improving dynfmt/calcconv behavior on empty inputs, and reducing per-row allocations.
Changes:
- Add validation to reject
--new-column (-c)with multi-column selection foroperationsandemptyreplace, plus regression tests. - Update dynfmt/calcconv to always run the template (even if the first selected cell is empty) and reduce allocations by formatting from
Vec<&str>. - Cleanup/fixes: doc/help typos, switch some
OnceLock::setusage toget_or_init, remove CRC32OnceLock, and hoist the 3-decimal regex into a helper.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
tests/test_applydp.rs |
Adds regression test ensuring multi-column --new-column is rejected for applydp operations. |
tests/test_apply.rs |
Fixes expected error wording and adds regression test for multi-column --new-column rejection in apply. |
src/cmd/applydp.rs |
Implements the new validation, improves dynfmt formatting path, adjusts help/docs, and makes Round init reusable. |
src/cmd/apply.rs |
Adds the new validation, improves dynfmt/calcconv formatting/allocation behavior, refactors OnceLock usage, removes CRC32 lock, and deduplicates regex logic. |
| -c, --new-column <name> Put the transformed values in a new column instead. | ||
| -r, --rename <name> New name for the transformed column. | ||
| -C, --comparand=<string> The string to compare against for replace & similarity operations. | ||
| -C, --comparand=<string> The string to compare against for replace & strip operations. |
There was a problem hiding this comment.
The -C/--comparand help text now says it’s used for “replace & strip operations”, but --comparand is also required/used by match-trim ops (mtrim/mltrim/mrtrim) and by regex_replace. Consider updating this option description so users can discover all operations that depend on --comparand.
| -C, --comparand=<string> The string to compare against for replace & strip operations. | |
| -C, --comparand=<string> The string to compare against for replace, strip, | |
| match-trim (mtrim/mltrim/mrtrim) & regex_replace operations. |
There was a problem hiding this comment.
Good catch — fixed in 48d31c4. Updated the help to list all four operation families that consume --comparand: replace, strip, match-trim (mtrim/mltrim/mrtrim) & regex_replace.
Generated by Claude Code
| // operations or emptyreplace. Use --rename (-r) for multi-column transforms. | ||
| if args.flag_new_column.is_some() | ||
| && (args.cmd_operations || args.cmd_emptyreplace) | ||
| && sel.len() > 1 | ||
| { | ||
| return fail_incorrectusage_clierror!( | ||
| "--new-column (-c) requires a single input column. Use --rename (-r) for \ | ||
| multi-column transformations." |
There was a problem hiding this comment.
This validation error suggests using --rename (-r) for multi-column usage, but for operations/emptyreplace the most direct fix is often to omit --new-column (-c) (in-place transform), and --rename is optional. Consider adjusting the wording to avoid implying --rename is required or applicable in all cases (e.g., copy requires -c).
| // operations or emptyreplace. Use --rename (-r) for multi-column transforms. | |
| if args.flag_new_column.is_some() | |
| && (args.cmd_operations || args.cmd_emptyreplace) | |
| && sel.len() > 1 | |
| { | |
| return fail_incorrectusage_clierror!( | |
| "--new-column (-c) requires a single input column. Use --rename (-r) for \ | |
| multi-column transformations." | |
| // operations or emptyreplace. For multi-column transforms, omit --new-column | |
| // to update in place; use --rename (-r) only if you also want to rename columns. | |
| if args.flag_new_column.is_some() | |
| && (args.cmd_operations || args.cmd_emptyreplace) | |
| && sel.len() > 1 | |
| { | |
| return fail_incorrectusage_clierror!( | |
| "--new-column (-c) requires a single input column. For multi-column \ | |
| operations or emptyreplace, omit --new-column to transform in place; \ | |
| optionally use --rename (-r) to rename the affected columns." |
There was a problem hiding this comment.
Agreed — reworded in 48d31c4 to lead with the in-place fix and treat --rename as optional: "For multi-column operations/emptyreplace, omit --new-column to transform columns in place; optionally use --rename (-r) to rename the transformed columns." Updated the regression test to match.
Generated by Claude Code
| // operations or emptyreplace. Use --rename (-r) for multi-column transforms. | ||
| if args.flag_new_column.is_some() | ||
| && (args.cmd_operations || args.cmd_emptyreplace) | ||
| && sel.len() > 1 | ||
| { | ||
| return fail_incorrectusage_clierror!( | ||
| "--new-column (-c) requires a single input column. Use --rename (-r) for \ | ||
| multi-column transformations." |
There was a problem hiding this comment.
This validation error suggests using --rename (-r) for multi-column usage, but for operations/emptyreplace the most direct fix is often to omit --new-column (-c) (in-place transform), and --rename is optional. Consider adjusting the wording to avoid implying --rename is required or applicable in all cases (e.g., copy requires -c).
| // operations or emptyreplace. Use --rename (-r) for multi-column transforms. | |
| if args.flag_new_column.is_some() | |
| && (args.cmd_operations || args.cmd_emptyreplace) | |
| && sel.len() > 1 | |
| { | |
| return fail_incorrectusage_clierror!( | |
| "--new-column (-c) requires a single input column. Use --rename (-r) for \ | |
| multi-column transformations." | |
| // operations or emptyreplace. For multi-column transforms, omit --new-column | |
| // to update the selected columns in place; use --rename (-r) only if you also | |
| // want to rename those transformed columns. | |
| if args.flag_new_column.is_some() | |
| && (args.cmd_operations || args.cmd_emptyreplace) | |
| && sel.len() > 1 | |
| { | |
| return fail_incorrectusage_clierror!( | |
| "--new-column (-c) requires a single input column. For multi-column \ | |
| operations/emptyreplace, omit --new-column to transform columns in place; \ | |
| optionally use --rename (-r) to rename the transformed columns." |
There was a problem hiding this comment.
Same fix applied to apply.rs in 48d31c4 — reworded message and updated the matching regression test.
Generated by Claude Code
| let (cell_for_eval, append_unit) = | ||
| if let Some(stripped) = formatted.strip_suffix("<UNIT>") { | ||
| (stripped, true) | ||
| } else { | ||
| &cell | ||
| (formatted.as_str(), false) | ||
| }; |
There was a problem hiding this comment.
strip_suffix("<UNIT>") removes only a single trailing <UNIT>, whereas the previous trim_end_matches("<UNIT>") behavior removed all consecutive trailing occurrences. If multiple <UNIT> suffixes can appear (even accidentally), this changes evaluation semantics and may cause eval() failures. Consider using trim_end_matches("<UNIT>") (or an equivalent loop) to preserve the prior behavior.
There was a problem hiding this comment.
Confirmed regression — reverted to trim_end_matches("<UNIT>") in 48d31c4 so consecutive trailing <UNIT> markers are all stripped, matching the original semantics. Added a comment noting why strip_suffix would be wrong here.
Generated by Claude Code
- Restore trim_end_matches semantics for the calcconv <UNIT> suffix: the refactor used strip_suffix which only removes one occurrence, but the original stripped all consecutive trailing <UNIT> markers. Repeated occurrences would otherwise leak into eval() and cause errors. - Reword the multi-column + --new-column error message in both apply and applydp to point users to the more direct fix (omit --new-column for in-place transform) rather than implying --rename is required. - applydp: --comparand help text was incomplete after the previous edit; list all operations that consume it (replace, strip, mtrim/mltrim/mrtrim, regex_replace) so users can discover them from --help. - Update the two regression tests that pin the new error wording. https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
Reflow long error messages to match nightly rustfmt's wrap column. No behavior change. https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
emptyreplace: the loop pushed N data fields per row but only one new header,
producing malformed CSV. Add validation + regression test.
so reusing them in a chain no longer fails with a misleading "Cannot
initialize" error.
was cloned per row — equivalent to constructing one inline).
flag is hyphenated; copy-paste of the old text fails).
"CALCCONV").
is empty (the formatstr may not even reference that column). Always run the
template; for CalcConv, only skip eval if the formatted expression is blank.
every field into Vec per row.
and clarify the comment that the substitution is chain-wide.
use base62;anduse crc32fast;extern-crate-style imports.https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3