Skip to content

[Data Cleanup] Speed up check for deferred duplicates#3793

Merged
imnasnainaec merged 14 commits intomasterfrom
has-graylist
May 19, 2025
Merged

[Data Cleanup] Speed up check for deferred duplicates#3793
imnasnainaec merged 14 commits intomasterfrom
has-graylist

Conversation

@imnasnainaec
Copy link
Collaborator

@imnasnainaec imnasnainaec commented May 13, 2025

This change is Reviewable

@codecov
Copy link

codecov bot commented May 13, 2025

Codecov Report

Attention: Patch coverage is 79.16667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 72.87%. Comparing base (dbbe87c) to head (f6d039d).
Report is 43 commits behind head on master.

Files with missing lines Patch % Lines
Backend/Services/MergeService.cs 81.25% 3 Missing ⚠️
Backend/Controllers/MergeController.cs 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3793      +/-   ##
==========================================
+ Coverage   72.82%   72.87%   +0.05%     
==========================================
  Files         286      286              
  Lines       10715    10737      +22     
  Branches     1330     1335       +5     
==========================================
+ Hits         7803     7825      +22     
+ Misses       2517     2515       -2     
- Partials      395      397       +2     
Flag Coverage Δ
backend 81.92% <78.26%> (+0.08%) ⬆️
frontend 65.73% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@imnasnainaec imnasnainaec requested a review from andracc May 13, 2025 17:41
/// <param name="userId"> Id of user whose merge graylist is to be used. </param>
[HttpGet("hasgraylist/{userId}", Name = "HasGraylistEntries")]
[ProducesResponseType(StatusCodes.Status200OK, Type = typeof(bool))]
public async Task<IActionResult> getGraylistEntries(string projectId, string userId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name getGraylistEntries should be capitalized

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

using var activity = OtelService.StartActivityWithTag(otelTagName, "checking if Frontier contains words");

var words = await _wordDatabase.Frontier.Find(GetProjectWordsFilter(projectId, wordIds))
.Limit(count).ToListAsync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might I suggest CountAsync?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might... it's a good suggestion. I opted for Find so I could use .Limit, which doesn't work with CountDocuments. But at your suggestion, I took a look at https://www.mongodb.com/docs/manual/reference/method/db.collection.countDocuments/ and discovered that it supports the limit in a different way!

Copy link
Collaborator Author

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions (waiting on @andracc and @hahn-kev)

/// <param name="userId"> Id of user whose merge graylist is to be used. </param>
[HttpGet("hasgraylist/{userId}", Name = "HasGraylistEntries")]
[ProducesResponseType(StatusCodes.Status200OK, Type = typeof(bool))]
public async Task<IActionResult> getGraylistEntries(string projectId, string userId)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

using var activity = OtelService.StartActivityWithTag(otelTagName, "checking if Frontier contains words");

var words = await _wordDatabase.Frontier.Find(GetProjectWordsFilter(projectId, wordIds))
.Limit(count).ToListAsync();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might... it's a good suggestion. I opted for Find so I could use .Limit, which doesn't work with CountDocuments. But at your suggestion, I took a look at https://www.mongodb.com/docs/manual/reference/method/db.collection.countDocuments/ and discovered that it supports the limit in a different way!

Copy link
Collaborator

@andracc andracc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r1, 1 of 7 files at r2, 1 of 3 files at r3.
Reviewable status: 3 of 13 files reviewed, 3 unresolved discussions (waiting on @hahn-kev and @imnasnainaec)


Backend.Tests/Mocks/WordRepositoryMock.cs line 81 at r3 (raw file):

        {
            return Task.FromResult(
                _frontier.Where(w => w.ProjectId == projectId && wordIds.Contains(w.Id)).Count() == count);

Since count represents a minimum and we are not limiting the results here, it looks like we can use >= instead of ==.

Copy link
Collaborator Author

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @hahn-kev from 2 discussions.
Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @imnasnainaec)


Backend.Tests/Mocks/WordRepositoryMock.cs line 81 at r3 (raw file):

Previously, andracc (Andra) wrote…

Since count represents a minimum and we are not limiting the results here, it looks like we can use >= instead of ==.

We shouldn't have multiple words in the frontier with the same id, though we're not enforcing that in the test setting, so we could go either >= or == here. I'm inclined to keep it as == to closer mimic the expectations of the real thing.

Copy link
Collaborator Author

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @imnasnainaec)


Backend.Tests/Mocks/WordRepositoryMock.cs line 81 at r3 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

We shouldn't have multiple words in the frontier with the same id, though we're not enforcing that in the test setting, so we could go either >= or == here. I'm inclined to keep it as == to closer mimic the expectations of the real thing.

Oh, I'm sorry! Thanks for nudging me on this in Slack. (For some reason I was thinking about the case when count is the same as wordIds.Count.) I've fixed it now.

Copy link
Collaborator

@andracc andracc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 7 files at r1, 4 of 7 files at r2, 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @imnasnainaec)

@imnasnainaec imnasnainaec merged commit 9697905 into master May 19, 2025
18 checks passed
@imnasnainaec imnasnainaec deleted the has-graylist branch May 19, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants