Add new lint hashset_insert_after_contains#12873
Conversation
llogiq
left a comment
There was a problem hiding this comment.
Thanks for taking on this lint. This is a good start. I left a few notes. We'd want a lint check before merging it though.
I'm currently not at my desk, will look into running the check some time next week when I get around to it.
| value: &'tcx Expr<'tcx>, | ||
| span: Span, | ||
| } | ||
| fn try_parse_contains<'tcx>(cx: &LateContext<'_>, expr: &'tcx Expr<'_>) -> Option<ContainsExpr<'tcx>> { |
There was a problem hiding this comment.
try_parse_insert and try_parse_contains are equal on all but two things: The UnOp (Not vs. Deref) and the call sym (insert vs. contains). Please factor out a try_parse_op_call method to use in both cases (oh, and the contains result also containing the span, but we can live with ignoring that for the insert case later on).
There was a problem hiding this comment.
@llogiq thanks for your comment. I was trying to take a look at it and I just have one doubt that I wanted to validate with you.
There is another difference betwheen try_parse_insert and try_parse_contains: on try_parse_insert I am peeling the expr while it is a Not, at the beginning, and at the try_parse_contains I am peeling the value of the expr -if it is a method call- while it is a Deref. So the UnOp is applied on different things (expr vs value of method call expr). To make a generic function I think that I would need to always do both peelings. I don't think it would be a problem, in terms of correctness, but probably it is not necessary, so I just wanted to double check with you.
Does it make sense?
| span_lint_and_then( | ||
| cx, | ||
| HASHSET_INSERT_AFTER_CONTAINS, | ||
| expr.span, |
There was a problem hiding this comment.
It might be a good idea to use a MultiSpan here containing the !contains and the insert call each (simply call span_lint with a vec![contains_span, insert_span]). That will reduce the visual clutter especially if there is more code within the if expression.
There was a problem hiding this comment.
Nice! I think I did it as you mentioned, here 2b0cad6
| { | ||
| span_lint_and_then( | ||
| cx, | ||
| HASHSET_INSERT_AFTER_CONTAINS, |
There was a problem hiding this comment.
How about SET_CONTAINS_OR_INSERT? Yes, for now this only works on HashSets, but we can easily extend it to also work on BTreeSets or the respective Map types.
There was a problem hiding this comment.
Ok, sounds more generic! Renamed it here 2b0cad6
| let borrow_set = &mut set; | ||
| if !borrow_set.contains(&value) { | ||
| borrow_set.insert(value); | ||
| } |
There was a problem hiding this comment.
How about
if set.contains(&value) {
println!("value is already in set");
} else {
set.insert(value);
}There was a problem hiding this comment.
Good point. Right now the code is only searching for the insert in the then part of the IF. I think it would make sense to search as well in the else part. I am just not sure if we would not warn if we find it in the else and the then has "a lot of things" (meaning that probably it was a stylist choice of the developer) or if we keep it simple and just warn if we find the insert in the else as well... I think that keeping it simple would be the best choice, but, for sure, I am open to suggestions.
| /// | ||
| /// ### Why is this bad? | ||
| /// Using just `insert` and checking the returned `bool` is more efficient. | ||
| /// |
There was a problem hiding this comment.
I know of one possible false positive: If the value is only borrowed & expensive to clone or impossible to clone twice, we may opt to check with contains before inserting to avoid the clone. There should be a "known problems" section mentioning this.
| #[clippy::version = "1.80.0"] | ||
| pub HASHSET_INSERT_AFTER_CONTAINS, | ||
| nursery, | ||
| "unnecessary call to `HashSet::contains` followed by `HashSet::insert`" |
There was a problem hiding this comment.
| "unnecessary call to `HashSet::contains` followed by `HashSet::insert`" | |
| "call to `HashSet::contains` followed by `HashSet::insert`" |
As stated, we cannot know whether the call is necessary.
|
☔ The latest upstream changes (presumably #12849) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Sorry this is taking so long, I am unfortunately quite busy at the moment. I'll ping you once I've run the check so you don't need to rebase more than once. |
| /// | ||
| /// ### Known problems | ||
| /// In case the value that wants to be inserted is borrowed and also expensive or impossible | ||
| /// to clone. In such scenario, the developer might want to check with `contain` before inserting, |
There was a problem hiding this comment.
| /// to clone. In such scenario, the developer might want to check with `contain` before inserting, | |
| /// to clone. In such a scenario, the developer might want to check with `contains` before inserting, |
|
I finally came around to do a lintcheck run, and it looked OK. So r=me after a rebase and the docs fix @bitfield suggested. |
|
@bors r=llogiq |
|
@lochetti: 🔑 Insufficient privileges: Not in reviewers |
Oops. It looks like I can't |
|
No problem. @bors r+ |
|
☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test |
|
Awesome work! One question -- would it make sense to generalize the name of this lint to |
|
@nyurik: Currently the lint only catches HashMaps, but I'd welcome a PR to change that along with the name. |
|
I could do a quick PR to rename the lint, but implementing it for other types might take a bit longer, and might not even make the cutoff for the next release (at which point the lint name would become permanent) |
|
Oh, my apologies, this was already renamed in the code to
|
|
Renamed in #13053 |
This PR closes #11103.
This is my first PR creating a new lint (and the second attempt of creating this PR, the first one I was not able to continue because of personal reasons). Thanks for the patience :)
The idea of the lint is to find insert in hashmanps inside if staments that are checking if the hashmap contains the same value that is being inserted. This is not necessary since you could simply call the insert and check for the bool returned if you still need the if statement.
changelog: new lint: [hashset_insert_after_contains]