[Data Cleanup] Speed up check for deferred duplicates#3793
[Data Cleanup] Speed up check for deferred duplicates#3793imnasnainaec merged 14 commits intomasterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3793 +/- ##
==========================================
+ Coverage 72.82% 72.87% +0.05%
==========================================
Files 286 286
Lines 10715 10737 +22
Branches 1330 1335 +5
==========================================
+ Hits 7803 7825 +22
+ Misses 2517 2515 -2
- Partials 395 397 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| /// <param name="userId"> Id of user whose merge graylist is to be used. </param> | ||
| [HttpGet("hasgraylist/{userId}", Name = "HasGraylistEntries")] | ||
| [ProducesResponseType(StatusCodes.Status200OK, Type = typeof(bool))] | ||
| public async Task<IActionResult> getGraylistEntries(string projectId, string userId) |
There was a problem hiding this comment.
the name getGraylistEntries should be capitalized
| using var activity = OtelService.StartActivityWithTag(otelTagName, "checking if Frontier contains words"); | ||
|
|
||
| var words = await _wordDatabase.Frontier.Find(GetProjectWordsFilter(projectId, wordIds)) | ||
| .Limit(count).ToListAsync(); |
There was a problem hiding this comment.
might I suggest CountAsync?
There was a problem hiding this comment.
You might... it's a good suggestion. I opted for Find so I could use .Limit, which doesn't work with CountDocuments. But at your suggestion, I took a look at https://www.mongodb.com/docs/manual/reference/method/db.collection.countDocuments/ and discovered that it supports the limit in a different way!
imnasnainaec
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions (waiting on @andracc and @hahn-kev)
| /// <param name="userId"> Id of user whose merge graylist is to be used. </param> | ||
| [HttpGet("hasgraylist/{userId}", Name = "HasGraylistEntries")] | ||
| [ProducesResponseType(StatusCodes.Status200OK, Type = typeof(bool))] | ||
| public async Task<IActionResult> getGraylistEntries(string projectId, string userId) |
| using var activity = OtelService.StartActivityWithTag(otelTagName, "checking if Frontier contains words"); | ||
|
|
||
| var words = await _wordDatabase.Frontier.Find(GetProjectWordsFilter(projectId, wordIds)) | ||
| .Limit(count).ToListAsync(); |
There was a problem hiding this comment.
You might... it's a good suggestion. I opted for Find so I could use .Limit, which doesn't work with CountDocuments. But at your suggestion, I took a look at https://www.mongodb.com/docs/manual/reference/method/db.collection.countDocuments/ and discovered that it supports the limit in a different way!
andracc
left a comment
There was a problem hiding this comment.
Reviewed 1 of 7 files at r1, 1 of 7 files at r2, 1 of 3 files at r3.
Reviewable status: 3 of 13 files reviewed, 3 unresolved discussions (waiting on @hahn-kev and @imnasnainaec)
Backend.Tests/Mocks/WordRepositoryMock.cs line 81 at r3 (raw file):
{ return Task.FromResult( _frontier.Where(w => w.ProjectId == projectId && wordIds.Contains(w.Id)).Count() == count);
Since count represents a minimum and we are not limiting the results here, it looks like we can use >= instead of ==.
imnasnainaec
left a comment
There was a problem hiding this comment.
Dismissed @hahn-kev from 2 discussions.
Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @imnasnainaec)
Backend.Tests/Mocks/WordRepositoryMock.cs line 81 at r3 (raw file):
Previously, andracc (Andra) wrote…
Since
countrepresents a minimum and we are not limiting the results here, it looks like we can use>=instead of==.
We shouldn't have multiple words in the frontier with the same id, though we're not enforcing that in the test setting, so we could go either >= or == here. I'm inclined to keep it as == to closer mimic the expectations of the real thing.
imnasnainaec
left a comment
There was a problem hiding this comment.
Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @imnasnainaec)
Backend.Tests/Mocks/WordRepositoryMock.cs line 81 at r3 (raw file):
Previously, imnasnainaec (D. Ror.) wrote…
We shouldn't have multiple words in the frontier with the same id, though we're not enforcing that in the test setting, so we could go either
>=or==here. I'm inclined to keep it as==to closer mimic the expectations of the real thing.
Oh, I'm sorry! Thanks for nudging me on this in Slack. (For some reason I was thinking about the case when count is the same as wordIds.Count.) I've fixed it now.
andracc
left a comment
There was a problem hiding this comment.
Reviewed 3 of 7 files at r1, 4 of 7 files at r2, 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @imnasnainaec)
This change is