-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Fix a bug in String.replace_leading/3 and String.replace_trailing/3 #5022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug in String.replace_leading/3 and String.replace_trailing/3 #5022
Conversation
lib/elixir/lib/string.ex
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to do this in a guard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like using a guard (or pattern matching in a function) is the usual way of dealing with invalid input.
def replace_leading(_string, "", _replacement) do
raise ArgumentError, "cannot use an empty string as the match to replace"
end
def replace_leading(string, match, replacement)
when is_binary(string) and is_binary(match) and is_binary(replacement) do
# implementation
endlooks to me the most idiomatic. I would also suspect guards to introduce less overhead (although that's marginally important in that case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about overhead. The problem with your example is that then we say that the problem with replace_leading(:not_a_binary, "", %{hello: "mom, look not a binary!"}) will be that the match is "" instead of the other two arguments not being binaries. So the solution would be to add is_binary/1 clauses to the "" clause as well, but at that point I prefer the if version. ¯_(ツ)_/¯ I can change if other people prefer that, but personally I think the difference is not very strong anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. using if saves us from two extra is_binary guard checks (one for string, one for replacement),
if you want to use guards, which seems reasonable, is that you need to create a private function, check only once in the public function, and not check anything in the private one.
def replace_leading(string, match, replacement) when is_binary(string) and is_binary(match) and is_binary(replacement),
do: do_replace_leading(string, match, replacement)
defp do_replace_leading(_string, "", _replacement) do
raise ArgumentError, "cannot use an empty string as the match to replace"
end
defp do_replace_leading(string, match, replacement) do
# implementation
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but here I think the if is the cleanest and easiest thing to read :)
|
I'm not so sure about this behaviour. I will do a bit of research. another weird behaviour is iex> Regex.replace(~r//, "123", "x")
"x1x2x3x" |
|
@eksperimental if we return the original string, it will be as if we considered |
|
Ok, it makes sense, but there's another case. I have tried previous regex in Perl, and it works as in Elixir $ echo "123a456" | perl -pe "s//X/g"
X1X2X3XaX4X5X6X |
|
@eksperimental the thing is that |
|
yes. I can see what you mean. it makes sense. |
|
@eksperimental good call, updated with a new test. |
lib/elixir/lib/string.ex
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return should be t | :no_return
since we are raising
|
@whatyouhide we need to update the docs explaining this special case. |
|
Let's take a step back and understand the usage of "" throughout String: ex(2)> String.split "foo", ""
["f", "o", "o", ""]
iex(3)> String.contains? "foo", ""
true
iex(4)> String.ends_with? "foo", ""
trueIf a string contains "", shouldn't replacing "" mean something? How are other languages tackling this? |
|
I am pinging @ThomasArts because he may be able to encode this discussion into properties that may resonate with everyone. :) |
|
@josevalim then it should behave same way as my Regex example, |
|
In Ruby, it's: 'foo'.split('')
#=> ['f', 'o', 'o']In Node, it's: 'foo'.split('')
//=> ['f', 'o', 'o']In Python (thankfully someone makes sense here 😄): >>> 'foo'.split('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: empty separatorI think the correct behaviour might be to raise for empty separator in As for |
|
@whatyouhide I was mostly concerned about how replace_leading and replace_prefix works on other languages with empty strings. :) To put things in perspective I would expect this to happen: if |
|
Yeah I agree for The property you showed would be nice to have but I think this is an intrinsic problem with having |
|
I think we should raise on empty string in |
|
@lexmag that's good news. I can take care of that, and I can add tests for a Now we only need to decide what to do with |
|
@whatyouhide we are not changing String.split. If we want to, we should open up a discussion apart from this one but I would very likely veto it since it is well documented and I see no reason why we should stick with the behaviour described in :binary.match/2. Specially given we changed String.contains? to match :binary.match in the past and then @ThomasArts proved with properties that not following :binary.match was the best way to go. |
|
I fail to see the problem with the infinity of empty strings.
|
|
We can of course look into this from the properties perspective, that would be helpful I think. But which property The empty string is dangerous and I think we should treat it as such. |
|
@eksperimental you're making |
|
@whatyouhide how come? |
|
@eksperimental you're just treating it like a codepoint in your examples. |
|
@whatyouhide in my examples I am treating it as infinite codepoints, and every code point will be surrounded by infinite empty codepoints. |
|
It's a slippery slope dealing with infinites @eksperimental. In your reasonings, ∞−∞ = 0, which is not a truth in math (it's indeterminate). This math.stackexchange.com answer explains this nicely :) |
|
but where in my examples I end up with 0? It always ends up with inifite empty spaces |
|
@eksperimental you replace infinite empty strings with infinite empty strings considering them equal because they're both infinite empty strings, but as I said above, ∞−∞ = 0 (which means ∞ = ∞) is not true and indeterminate in math. E.g. you do |
|
Here is another property we could consider: Given that String.split on "" does not return infinite empty strings at the end, I think it is reasonable for replace_trailing with an empty string to behave the same as replace_suffix with an empty string. |
|
Considering how |
|
Yes we either raise in all of them or in none of them (raising in all three would be consistent as well, just saying 😄). So what shall we go with? |
|
I think it should behave like replace_suffix and replace_prefix (basically the value is appended or prepended). |
|
@whatyouhide sorry, i don't agree. My reasoning for _prefix I that I replace one of the empty strings in the series of infinite strings, with non-empty strings which is surrounded by infinite empty strings.
for a more practical way of explaining _leading (which is pretty much the same as for _prefix).. think of grabbing an infinite number of empty strings, split it right in the middle and insert a non-empty string which is surrounded by infinite number of empty strings. |
|
Ok @josevalim, let's go with that then. @eksperimental you're doing the same you did before; you can't replace |
|
The property I typically would like to see hold is: In Elixir version 1.3.2, this property passes 100,000 tests without problem. In other words, if we take a random strings My string generator does indeed generate empty strings. As a consequence of the |
|
Another property... which reveals the problem that you discuss is: Given a string that is just This is a good reason to raise an exception when |
|
Thanks for getting me into the discussion. I was a bit late with my reaction, but I added properties to the discussion and let me know if you want more of such. I came to the same conclusion as you reached: raise an exception when the leading string is empty.
|
|
Thanks a lot @ThomasArts! 💟 This was superhelpful. And your latter property shows pretty well that things start to break (conceptually I mean) when we allow to replace all the leading occurrences of Thoughts? |
|
For the replace_prefix, I expect the following property to hold: Which it does, even for ls being the empty string. The reason is that the empty string appended to an arbitrary string always at most represents one string. In my opinion, allowing the empty string as prefix makes sense, It turns replace_prefix in the <> operator. In general, the simpler the specification, the easier for people to understand what the function does. If one adds complexity to the specification, such as: Then the user has to program extra code to check and avoid the first argument to be empty or to reason that this in the specific application will never happen. /Thomas
|
|
@ThomasArts yes, for |
|
That was really helpful @ThomasArts, thank you! So to tidy up the discussion:
Let's make sure to add documentation that says it is impossible to replace multiple (leading|trailing) empty strings, hence the error. |
|
I like the current proposed solution, as I think that even though it is not "mathematically' consistent, it is pragmatic and will prevent people from doing something that doesn't really make sense to do. 👍 |
|
Small update: |
These function would end up in infinite recursion if the "match" argument to replace was an empty string. With this commit, we explicitly check for that empty string and possibly raise an ArgumentError exception.
|
Update the PR so that |
|
❤️ 💚 💙 💛 💜 |
Closes #5019.
These function would end up in infinite recursion if the "match" argument to replace was an empty string. With this commit, we explicitly check for that empty string and possibly raise an
ArgumentErrorexception.