Skip to content

fix(apply): review-driven fixes — malformed CSV, init bugs, doc typos, perf#3741

Merged
jqnatividad merged 4 commits into
masterfrom
claude/review-apply-rs-1qGGs
Apr 25, 2026
Merged

fix(apply): review-driven fixes — malformed CSV, init bugs, doc typos, perf#3741
jqnatividad merged 4 commits into
masterfrom
claude/review-apply-rs-1qGGs

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator
  • Reject --new-column (-c) with multi-column selection for operations and
    emptyreplace: the loop pushed N data fields per row but only one new header,
    producing malformed CSV. Add validation + regression test.
  • Switch Round/Thousands/Crc32/Gender_Guess from OnceLock::set to get_or_init
    so reusing them in a chain no longer fails with a misleading "Cannot
    initialize" error.
  • Drop the CRC32 OnceLock entirely (the lock just stored a fresh hasher that
    was cloned per row — equivalent to constructing one inline).
  • Fix --new_column → --new-column in user-facing validation errors (the actual
    flag is hyphenated; copy-paste of the old text fails).
  • Fix two docstring typos (mismatched quotes in calcconv example; "CALCONV" →
    "CALCCONV").
  • DynFmt/CalcConv: stop silently outputting "" when the first selected column
    is empty (the formatstr may not even reference that column). Always run the
    template; for CalcConv, only skip eval if the formatted expression is blank.
  • DynFmt/CalcConv: build the format-args buffer as Vec<&str> instead of cloning
    every field into Vec per row.
  • Hoist the duplicated fract_3digits regex into a shared helper.
  • Use eq_ignore_ascii_case for the check (avoids per-row allocation)
    and clarify the comment that the substitution is chain-wide.
  • Drop unused use base62; and use crc32fast; extern-crate-style imports.

https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3

claude added 2 commits April 25, 2026 15:32
…, perf

- Reject --new-column (-c) with multi-column selection for operations and
  emptyreplace: the loop pushed N data fields per row but only one new header,
  producing malformed CSV. Add validation + regression test.
- Switch Round/Thousands/Crc32/Gender_Guess from OnceLock::set to get_or_init
  so reusing them in a chain no longer fails with a misleading "Cannot
  initialize" error.
- Drop the CRC32 OnceLock entirely (the lock just stored a fresh hasher that
  was cloned per row — equivalent to constructing one inline).
- Fix --new_column → --new-column in user-facing validation errors (the actual
  flag is hyphenated; copy-paste of the old text fails).
- Fix two docstring typos (mismatched quotes in calcconv example; "CALCONV" →
  "CALCCONV").
- DynFmt/CalcConv: stop silently outputting "" when the first selected column
  is empty (the formatstr may not even reference that column). Always run the
  template; for CalcConv, only skip eval if the formatted expression is blank.
- DynFmt/CalcConv: build the format-args buffer as Vec<&str> instead of cloning
  every field into Vec<String> per row.
- Hoist the duplicated fract_3digits regex into a shared helper.
- Use eq_ignore_ascii_case for the <NULL> check (avoids per-row allocation)
  and clarify the comment that the substitution is chain-wide.
- Drop unused `use base62;` and `use crc32fast;` extern-crate-style imports.

https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
Same family of bugs the apply.rs review uncovered also lived in applydp.rs.

- Reject --new-column (-c) with multi-column selection for operations and
  emptyreplace; the loop pushed N data fields per row but only one new header,
  producing malformed CSV. Add validation + regression test.
- Switch Round from OnceLock::set to get_or_init so reusing it in a chain no
  longer fails with a misleading "Cannot initialize" error.
- Fix --new_column → --new-column in the copy validation error.
- DynFmt: stop silently outputting "" when the first selected column happens
  to be empty (the formatstr may not even reference that column). Always run
  the template; fall back to the source cell only if formatting itself errors.
- DynFmt: build the format-args buffer as Vec<&str> instead of cloning every
  field into Vec<String> per row.
- "dynfmt/calcconv subcommand requires headers" → "dynfmt subcommand requires
  headers" (applydp has no calcconv).
- <NULL>: use eq_ignore_ascii_case (no allocation) and clarify the comment
  that the substitution is chain-wide.
- Doc bugs: example regex selector `\_name$\` → `'/_name$/'`; missing
  `operations` subcommand in the replace example; doubled "to to"; comparand
  description mentioned similarity ops which applydp doesn't have.
- Replace `match { Ok(...) => ..., Err(e) => return Err(e) }` with `?`.
- Use `fail_incorrectusage_clierror!` for the "unknown subcommand" arm to
  match apply.rs.
- Collapse Mtrim/Mltrim/Mrtrim arm into a guard pattern (clippy warning).

https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
@jqnatividad jqnatividad requested a review from Copilot April 25, 2026 15:40
@codacy-production

codacy-production Bot commented Apr 25, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 84 complexity

Metric Results
Complexity 84

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens apply/applydp against producing malformed CSV output when --new-column is combined with multi-column transforms, while also addressing initialization pitfalls with OnceLock, improving dynfmt/calcconv behavior on empty inputs, and reducing per-row allocations.

Changes:

  • Add validation to reject --new-column (-c) with multi-column selection for operations and emptyreplace, plus regression tests.
  • Update dynfmt/calcconv to always run the template (even if the first selected cell is empty) and reduce allocations by formatting from Vec<&str>.
  • Cleanup/fixes: doc/help typos, switch some OnceLock::set usage to get_or_init, remove CRC32 OnceLock, and hoist the 3-decimal regex into a helper.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tests/test_applydp.rs Adds regression test ensuring multi-column --new-column is rejected for applydp operations.
tests/test_apply.rs Fixes expected error wording and adds regression test for multi-column --new-column rejection in apply.
src/cmd/applydp.rs Implements the new validation, improves dynfmt formatting path, adjusts help/docs, and makes Round init reusable.
src/cmd/apply.rs Adds the new validation, improves dynfmt/calcconv formatting/allocation behavior, refactors OnceLock usage, removes CRC32 lock, and deduplicates regex logic.

Comment thread src/cmd/applydp.rs Outdated
-c, --new-column <name> Put the transformed values in a new column instead.
-r, --rename <name> New name for the transformed column.
-C, --comparand=<string> The string to compare against for replace & similarity operations.
-C, --comparand=<string> The string to compare against for replace & strip operations.

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -C/--comparand help text now says it’s used for “replace & strip operations”, but --comparand is also required/used by match-trim ops (mtrim/mltrim/mrtrim) and by regex_replace. Consider updating this option description so users can discover all operations that depend on --comparand.

Suggested change
-C, --comparand=<string> The string to compare against for replace & strip operations.
-C, --comparand=<string> The string to compare against for replace, strip,
match-trim (mtrim/mltrim/mrtrim) & regex_replace operations.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 48d31c4. Updated the help to list all four operation families that consume --comparand: replace, strip, match-trim (mtrim/mltrim/mrtrim) & regex_replace.


Generated by Claude Code

Comment thread src/cmd/applydp.rs Outdated
Comment on lines +271 to +278
// operations or emptyreplace. Use --rename (-r) for multi-column transforms.
if args.flag_new_column.is_some()
&& (args.cmd_operations || args.cmd_emptyreplace)
&& sel.len() > 1
{
return fail_incorrectusage_clierror!(
"--new-column (-c) requires a single input column. Use --rename (-r) for \
multi-column transformations."

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation error suggests using --rename (-r) for multi-column usage, but for operations/emptyreplace the most direct fix is often to omit --new-column (-c) (in-place transform), and --rename is optional. Consider adjusting the wording to avoid implying --rename is required or applicable in all cases (e.g., copy requires -c).

Suggested change
// operations or emptyreplace. Use --rename (-r) for multi-column transforms.
if args.flag_new_column.is_some()
&& (args.cmd_operations || args.cmd_emptyreplace)
&& sel.len() > 1
{
return fail_incorrectusage_clierror!(
"--new-column (-c) requires a single input column. Use --rename (-r) for \
multi-column transformations."
// operations or emptyreplace. For multi-column transforms, omit --new-column
// to update in place; use --rename (-r) only if you also want to rename columns.
if args.flag_new_column.is_some()
&& (args.cmd_operations || args.cmd_emptyreplace)
&& sel.len() > 1
{
return fail_incorrectusage_clierror!(
"--new-column (-c) requires a single input column. For multi-column \
operations or emptyreplace, omit --new-column to transform in place; \
optionally use --rename (-r) to rename the affected columns."

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — reworded in 48d31c4 to lead with the in-place fix and treat --rename as optional: "For multi-column operations/emptyreplace, omit --new-column to transform columns in place; optionally use --rename (-r) to rename the transformed columns." Updated the regression test to match.


Generated by Claude Code

Comment thread src/cmd/apply.rs Outdated
Comment on lines +473 to +480
// operations or emptyreplace. Use --rename (-r) for multi-column transforms.
if args.flag_new_column.is_some()
&& (args.cmd_operations || args.cmd_emptyreplace)
&& sel.len() > 1
{
return fail_incorrectusage_clierror!(
"--new-column (-c) requires a single input column. Use --rename (-r) for \
multi-column transformations."

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation error suggests using --rename (-r) for multi-column usage, but for operations/emptyreplace the most direct fix is often to omit --new-column (-c) (in-place transform), and --rename is optional. Consider adjusting the wording to avoid implying --rename is required or applicable in all cases (e.g., copy requires -c).

Suggested change
// operations or emptyreplace. Use --rename (-r) for multi-column transforms.
if args.flag_new_column.is_some()
&& (args.cmd_operations || args.cmd_emptyreplace)
&& sel.len() > 1
{
return fail_incorrectusage_clierror!(
"--new-column (-c) requires a single input column. Use --rename (-r) for \
multi-column transformations."
// operations or emptyreplace. For multi-column transforms, omit --new-column
// to update the selected columns in place; use --rename (-r) only if you also
// want to rename those transformed columns.
if args.flag_new_column.is_some()
&& (args.cmd_operations || args.cmd_emptyreplace)
&& sel.len() > 1
{
return fail_incorrectusage_clierror!(
"--new-column (-c) requires a single input column. For multi-column \
operations/emptyreplace, omit --new-column to transform columns in place; \
optionally use --rename (-r) to rename the transformed columns."

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix applied to apply.rs in 48d31c4 — reworded message and updated the matching regression test.


Generated by Claude Code

Comment thread src/cmd/apply.rs Outdated
Comment on lines 683 to 688
let (cell_for_eval, append_unit) =
if let Some(stripped) = formatted.strip_suffix("<UNIT>") {
(stripped, true)
} else {
&cell
(formatted.as_str(), false)
};

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip_suffix("<UNIT>") removes only a single trailing <UNIT>, whereas the previous trim_end_matches("<UNIT>") behavior removed all consecutive trailing occurrences. If multiple <UNIT> suffixes can appear (even accidentally), this changes evaluation semantics and may cause eval() failures. Consider using trim_end_matches("<UNIT>") (or an equivalent loop) to preserve the prior behavior.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed regression — reverted to trim_end_matches("<UNIT>") in 48d31c4 so consecutive trailing <UNIT> markers are all stripped, matching the original semantics. Added a comment noting why strip_suffix would be wrong here.


Generated by Claude Code

- Restore trim_end_matches semantics for the calcconv <UNIT> suffix: the
  refactor used strip_suffix which only removes one occurrence, but the
  original stripped all consecutive trailing <UNIT> markers. Repeated
  occurrences would otherwise leak into eval() and cause errors.
- Reword the multi-column + --new-column error message in both apply and
  applydp to point users to the more direct fix (omit --new-column for
  in-place transform) rather than implying --rename is required.
- applydp: --comparand help text was incomplete after the previous edit;
  list all operations that consume it (replace, strip, mtrim/mltrim/mrtrim,
  regex_replace) so users can discover them from --help.
- Update the two regression tests that pin the new error wording.

https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Reflow long error messages to match nightly rustfmt's wrap column. No
behavior change.

https://claude.ai/code/session_018dUycY9NG1Z8hr7ruR3wN3
@jqnatividad jqnatividad merged commit f77ee59 into master Apr 25, 2026
17 checks passed
@jqnatividad jqnatividad deleted the claude/review-apply-rs-1qGGs branch April 25, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants