Skip to content

[MergeDups] Separate identical vs similar duplicate finding with intermediate dialog#4039

Merged
imnasnainaec merged 15 commits intomasterfrom
copilot/add-identical-vernacular-indication
Jan 29, 2026
Merged

[MergeDups] Separate identical vs similar duplicate finding with intermediate dialog#4039
imnasnainaec merged 15 commits intomasterfrom
copilot/add-identical-vernacular-indication

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

When all identical-vernacular duplicates are processed, users need feedback and a choice before the system searches for similar duplicates (which can take minutes on large projects).

With deferred duplicates, with over 1000 frontier entries, still loading:
Screenshot 2026-01-29 164100

With deferred duplicates, done loading:
Screenshot 2026-01-29 164154

Original prompt

This section details on the original issue you should resolve

<issue_title>[MergeDups] Give indication when all identical-vernacular duplicates have been dealt with</issue_title>
<issue_description>When all sets of words with identical vernacular have been dealt with (or deferred) is a good time to:

  • Congratulate the user for getting through them all
  • Warn that finding potential duplicates with similar vernacular forms takes a long time (e.g., 2.5 minutes for an 18,000 entry project)
  • If there are deferred duplicates, say so and offer button to load those instead

Todo:

  • Backend/Services/MergeService.cs:
    • Add to GetPotentialDuplicates a new parameter bool identicalVernacular, right after int maxLists,. If true, use dupFinder.GetIdenticalVernWords; if false, use dupFinder.GetSimilarWords(collection, isUnavailableSet, ignoreProtected).
    • Update GetAndStorePotentialDuplicates to call GetPotentialDuplicates with identicalVernacular = false
  • Backend/Controllers/MergeController.cs:
    • Add method for new endpoint [HttpGet("findidenticaldups/{maxInList:int}/{maxLists:int}/{ignoreProtected:bool}", Name = "FindIdenticalPotentialDuplicates")]. Unlike the existing FindPotentialDuplicates, finding those with identical vernacular is fast, so instead of using a get-then-signal function, just call on _mergeService.GetPotentialDuplicates directly with identicalVernacular = true
  • Update interfaces, mocks, and tests
  • Run npm run fmt-backend
  • Run npm run backend, initiate the Python virtual environment (venv) specified in the README, then run python scripts/generate_openapi.py
  • src/backend/index.ts:
    • Add new function findIdenticalDuplicates (before findDuplicates) that uses mergeApi.findIdenticalPotentialDuplicates
  • src/goals/Redux/GoalActions.ts:
    • asyncAddGoal no longer needs a special GoalType.MergeDups
    • loadGoalData in the GoalType.MergeDups case should use the new findIdenticalDuplicates function
  • src/goals/MergeDuplicates/*
    • If loadGoalData loaded an empty list, then instead of the "nothing to merge" screen, an action should be dispatched to do the stuff dropped from asyncAddGoal
    • While loading (i.e. waiting for the signal), there should be a dialog pop up over the loading screen:
      • Congratulate user for having processed all sets of words with identical vernacular
      • If there are any deferred dups, let user know how many there are, and include a button at the bottom of the dialog to review deferred duplicate
      • Explain that The Combine is now finding potential duplicates with similar (non-identical) vernacular form.
      • If the project has more than 1000 frontier entries, warn that this can take several minutes
      • Have button at bottom to cancel (returns to Data Entry)
      • Have button at bottom to continue loading
      • Automatically close the dialog if the signal comes that the duplicate finding is done</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.


This change is Reviewable

Summary by CodeRabbit

  • New Features

    • Dedicated flow to find identical-vernacular duplicates and a completion dialog that lets you review deferred items or continue.
  • API

    • New endpoint to retrieve frontier word counts and separate operations for identical vs similar duplicate searches.
  • Utilities

    • Introduced a public limit for merge lists (max words per list).
  • Tests

    • Added tests for identical-duplicates authorization and success paths; updated tests to cover frontier count behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI and others added 5 commits November 21, 2025 19:58
…egration

Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Copilot AI changed the title [WIP] Add indication for resolving identical-vernacular duplicates [MergeDups] Separate identical vs similar duplicate finding with intermediate dialog Nov 21, 2025
Copilot AI requested a review from imnasnainaec November 21, 2025 20:16
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

⚠️ Commit Message Format Issues ⚠️
commit 165c2c3464:
1: T1 Title exceeds max length (77>72): "Backend: Add findIdenticalDuplicates endpoint and update frontend integration"

@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 49.33333% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.73%. Comparing base (3fb1a86) to head (7698cc1).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...oals/MergeDuplicates/IdenticalDuplicatesDialog.tsx 9.52% 19 Missing ⚠️
src/goals/DefaultGoal/BaseGoalScreen.tsx 0.00% 9 Missing ⚠️
src/goals/Redux/GoalActions.ts 57.89% 7 Missing and 1 partial ⚠️
src/goals/MergeDuplicates/MergeHub.tsx 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4039      +/-   ##
==========================================
- Coverage   74.92%   74.73%   -0.19%     
==========================================
  Files         295      296       +1     
  Lines       10911    10958      +47     
  Branches     1366     1369       +3     
==========================================
+ Hits         8175     8190      +15     
- Misses       2341     2373      +32     
  Partials      395      395              
Flag Coverage Δ
backend 86.54% <100.00%> (+0.08%) ⬆️
frontend 65.67% <26.92%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@imnasnainaec

This comment was marked as resolved.

Copy link
Collaborator

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imnasnainaec reviewed 2 files and all commit messages.
Reviewable status: 2 of 14 files reviewed, all discussions resolved.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This PR implements a feature for handling identical vernacular duplicates in the merge workflow. It adds a backend API endpoint for fast identical duplicate detection, a frontend dialog informing users of completion and next steps, and refactors the merge duplicate loading logic to split identical and similar duplicate searches into two distinct phases.

Changes

Cohort / File(s) Summary
Backend API Endpoint
Backend/Controllers/MergeController.cs, Backend/Interfaces/IMergeService.cs, Backend/Services/MergeService.cs
Added FindIdenticalPotentialDuplicates endpoint and exposed GetPotentialDuplicates as public method with new identicalVernacular boolean parameter to control whether identical or similar duplicates are returned. Updated GetAndStorePotentialDuplicates to call the refactored method.
Backend Tests
Backend.Tests/Controllers/MergeControllerTests.cs
Added two unit tests: permission check (ForbidResult) and successful response (OkObjectResult) for the new endpoint.
Generated API Client
src/api/api/merge-api.ts
Generated API client code for new findIdenticalPotentialDuplicates endpoint across param creator, FP, factory, and class-based layers; includes new request parameter interface.
Backend Export Functions
src/backend/index.ts
Replaced findDuplicates with findIdenticalDuplicates and added findSimilarDuplicates to separate identical and similar duplicate workflows.
Goal State Management
src/goals/Redux/GoalActions.ts, src/goals/Redux/tests/GoalRedux.test.ts
Refactored asyncAddGoal and loadGoalData to orchestrate data loading via asyncLoadNewGoalData, updated to use findIdenticalDuplicates for MergeDups and trigger findSimilarDuplicates when appropriate. Updated mock setup in tests.
Frontend UI Components
src/goals/DefaultGoal/BaseGoalScreen.tsx, src/goals/MergeDuplicates/IdenticalDuplicatesDialog.tsx, src/goals/MergeDuplicates/MergeHub.tsx
Added new IdenticalDuplicatesDialog component with congratulations message and options to review deferred duplicates or continue to similar search. Integrated dialog display logic in BaseGoalScreen. Minor async/await refactoring in MergeHub.
Localization
public/locales/en/translation.json
Added identicalCompleted translation block under mergeDups with keys for dialog title, congratulations, deferred status, warning about similar search duration, and action buttons.
Utility Constant
src/utilities/goalUtilities.tsx
Added exported maxWordsInMergeList = 5 constant.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend as Frontend<br/>(BaseGoalScreen)
    participant API as API Client<br/>(merge-api)
    participant Backend as Backend<br/>(MergeController)
    participant Service as MergeService
    
    User->>Frontend: Open MergeDups Goal
    Frontend->>API: findIdenticalDuplicates(projectId, maxInList, maxLists)
    API->>Backend: GET /v1/projects/{id}/merge/findidenticaldups/...
    Backend->>Service: GetPotentialDuplicates(..., identicalVernacular=true)
    Service-->>Backend: List<List<Word>>
    Backend-->>API: OkObjectResult(duplicates)
    API-->>Frontend: Response with identical duplicates
    Frontend->>Frontend: Show IdenticalDuplicatesDialog (Loading state)
    alt Has identical duplicates found
        rect rgba(100, 149, 237, 0.5)
        Note over Frontend,Service: Dialog shows completion congratulations
        User->>Frontend: Click "Review Deferred"
        Frontend->>Frontend: Dispatch ReviewDeferredDups goal
        Frontend->>Frontend: Close dialog
        end
    else User clicks "Continue"
        rect rgba(144, 238, 144, 0.5)
        Note over Frontend,Service: Initiate similar duplicates search
        Frontend->>API: findSimilarDuplicates(projectId, maxInList, maxLists)
        API->>Backend: GET /v1/projects/{id}/merge/findpotentialdups/...
        Backend->>Service: GetPotentialDuplicates(..., identicalVernacular=false)
        Service-->>Backend: List<List<Word>>
        Backend-->>API: OkObjectResult(duplicates)
        API-->>Frontend: Response with similar duplicates
        Frontend->>Frontend: Close dialog, load similar duplicates
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • sillsdev/TheCombine#4095 — Modifies MergeService class implementation and public API signatures with related method refactoring and test updates.

Suggested labels

project, 🟨Medium

Suggested reviewers

  • jasonleenaylor

Poem

🐰 Hopping through duplicates with glee,
Identical vernaculars now we see!
A dialog pops to celebrate the feat,
Then similar search completes the treat! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: separating identical vs similar duplicate finding with an intermediate dialog, which aligns with the primary objective of adding a two-phase flow with user-visible breakpoint.
Linked Issues check ✅ Passed All key requirements from issue #3721 are met: backend endpoint FindIdenticalPotentialDuplicates added with identicalVernacular parameter, frontend findIdenticalDuplicates function created, IdenticalDuplicatesDialog implemented with congratulations/deferred-count/warning/actions, and two-phase flow (identical then similar) fully integrated.
Out of Scope Changes check ✅ Passed All changes directly support the two-phase duplicate-finding flow and IdenticalDuplicatesDialog feature from issue #3721; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@imnasnainaec

This comment was marked as resolved.

@coderabbitai

This comment was marked as outdated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/goals/DefaultGoal/BaseGoalScreen.tsx`:
- Around line 58-75: The dialog stays open because useEffect uses
setOpenDupsDialog(prev => prev || dataLoadStatus === DataLoadStatus.Loading) so
once true it never resets; change the logic in the useEffect for
GoalType.MergeDups to setOpenDupsDialog(dataLoadStatus ===
DataLoadStatus.Loading && goalType === GoalType.MergeDups && status !==
GoalStatus.Completed) (or otherwise directly tie openDupsDialog to
dataLoadStatus when the goal is active) so the IdenticalDuplicatesDialog
automatically closes when dataLoadStatus is no longer Loading; update references
in the same effect that use useEffect, setOpenDupsDialog, openDupsDialog,
dataLoadStatus, GoalType.MergeDups, and GoalStatus.Completed.

In `@src/goals/MergeDuplicates/IdenticalDuplicatesDialog.tsx`:
- Around line 47-52: The mount useEffect currently calls hasGraylistEntries()
and getFrontierWords() without handling rejections; update the effect that calls
hasGraylistEntries, getFrontierWords (and that calls setHasDeferred and
setFrontierCount) to handle promise failures — e.g., use an async IIFE or attach
.catch handlers and in the catch set safe defaults (false for setHasDeferred and
0 for setFrontierCount) and log or report the error. Ensure both
hasGraylistEntries and getFrontierWords invocations are wrapped so no unhandled
rejection can occur and state updates only happen for successful responses.
🧹 Nitpick comments (4)
Backend.Tests/Controllers/MergeControllerTests.cs (1)

157-172: Add a basic payload assertion to the success test.

This strengthens the test by ensuring the endpoint returns a non-null payload, not just an OK status.

✅ Suggested tweak
             var result = _mergeController.FindIdenticalPotentialDuplicates(ProjId, 5, 10, false).Result;
             Assert.That(result, Is.InstanceOf<OkObjectResult>());
+            var ok = result as OkObjectResult;
+            Assert.That(ok, Is.Not.Null);
+            Assert.That(ok!.Value, Is.Not.Null);
src/goals/Redux/tests/GoalRedux.test.ts (1)

37-65: Align findSimilarDuplicates mock with its async signature.

Right now it returns undefined, which can be brittle if the action creator ever awaits or chains it. Mocking it as a resolved promise makes the tests more robust.

♻️ Suggested test mock setup
+const mockFindSimilarDuplicates = jest.fn();
+
 jest.mock("backend", () => ({
   addGoalToUserEdit: (...args: any[]) => mockAddGoalToUserEdit(...args),
   addStepToGoal: jest.fn(),
   createUserEdit: () => mockCreateUserEdit(),
   findIdenticalDuplicates: () => mockFindIdenticalDuplicates(),
-  findSimilarDuplicates: jest.fn(),
+  findSimilarDuplicates: () => mockFindSimilarDuplicates(),
   getGraylistEntries: () => Promise.resolve([]),
   getUserEditById: (...args: any[]) => mockGetUserEditById(...args),
   hasGraylistEntries: jest.fn(),
   retrieveDuplicates: () => mockRetrieveDuplicates(),
 }));
 
 function setMockFunctions(): void {
   mockAddGoalToUserEdit.mockResolvedValue(0);
   mockCreateUserEdit.mockResolvedValue(mockUser());
   mockFindIdenticalDuplicates.mockResolvedValue([]);
+  mockFindSimilarDuplicates.mockResolvedValue(undefined);
   mockGetUserEditById.mockResolvedValue(mockUserEdit(true));
   mockRetrieveDuplicates.mockResolvedValue(goalDataMock.plannedWords);
 }
src/goals/Redux/GoalActions.ts (2)

174-188: Redundant state access.

currentProjectState.project is already destructured at line 153-154. Line 181 unnecessarily calls getState() again.

♻️ Suggested simplification
     } else if (
       currentGoal.goalType === GoalType.MergeDups &&
       dataLoadStatus !== DataLoadStatus.Success
     ) {
       // All identical-vernacular duplicates have been processed.
       // Initialize similar-vernacular duplicate finding in the backend.
       dispatch(setDataLoadStatus(DataLoadStatus.Loading));
-      const currentProj = getState().currentProjectState.project;
       await Backend.findSimilarDuplicates(
         maxWordsInMergeList,
         maxNumSteps(currentGoal.goalType),
-        currentProj.protectedDataMergeAvoidEnabled === OffOnSetting.On
+        currentProjectState.project.protectedDataMergeAvoidEnabled === OffOnSetting.On
       );
       return;
     }

260-279: Wrap switch case declaration in a block.

The const dups declaration inside the switch case can potentially be accessed from other cases. While this case returns immediately, wrapping in a block is best practice and satisfies the linter.

♻️ Proposed fix
   switch (goalType) {
-    case GoalType.MergeDups:
+    case GoalType.MergeDups: {
       // Catch failure and pass to caller to allow for error dispatch.
       const dups =
         dataLoadStatus === DataLoadStatus.Success
           ? await Backend.retrieveDuplicates().catch(() => {})
           : await Backend.findIdenticalDuplicates(
               maxWordsInMergeList,
               maxNumSteps(goalType),
               project.protectedDataMergeAvoidEnabled === OffOnSetting.On
             ).catch(() => {});
       return dups ? checkMergeData(dups) : Promise.reject();
+    }
     case GoalType.ReviewDeferredDups:

imnasnainaec

This comment was marked as outdated.

@imnasnainaec imnasnainaec marked this pull request as ready for review January 29, 2026 21:46
@imnasnainaec imnasnainaec added the 🟥High High-priority PR: please review this asap! label Jan 29, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/goals/Redux/GoalActions.ts`:
- Around line 255-271: The MergeDups case in loadGoalData declares local
variable dups directly in the switch case which leaks scope; wrap the entire
case body in a block (i.e., add { ... } after case GoalType.MergeDups:) so
variables like dups are block-scoped, keep the existing async awaits and the
return dups ? checkMergeData(dups) : Promise.reject() inside that block, and
ensure no other switch cases are affected.
- Around line 174-187: Wrap the awaited Backend.findSimilarDuplicates call
inside a try/catch in the asyncAddGoal flow (the branch checking
currentGoal.goalType === GoalType.MergeDups) so rejections don't leave the UI in
Loading; on catch, dispatch setDataLoadStatus(DataLoadStatus.Failure) (matching
the loadGoalData pattern), optionally log or pass the error for diagnostics, and
ensure the function returns after handling the error to avoid continuing as if
successful.

Copy link
Contributor

@jasonleenaylor jasonleenaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@jasonleenaylor reviewed 21 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @copilot[bot]).

Copy link
Contributor

@jasonleenaylor jasonleenaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@jasonleenaylor reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @copilot[bot]).

Copy link
Collaborator

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imnasnainaec reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @copilot[bot]).

@imnasnainaec imnasnainaec merged commit db43ae0 into master Jan 29, 2026
19 of 20 checks passed
@imnasnainaec imnasnainaec deleted the copilot/add-identical-vernacular-indication branch January 29, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend frontend goal: MergeDup goal 🟥High High-priority PR: please review this asap! test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MergeDups] Give indication when all identical-vernacular duplicates have been dealt with

3 participants