Fix a bug in String.replace_leading/3 and String.replace_trailing/3 #5022

whatyouhide · 2016-07-17T11:30:03Z

Closes #5019.

These function would end up in infinite recursion if the "match" argument to replace was an empty string. With this commit, we explicitly check for that empty string and possibly raise an ArgumentError exception.

michalmuskala · 2016-07-17T11:57:28Z

lib/elixir/lib/string.ex

Wouldn't it be better to do this in a guard?

It seems like using a guard (or pattern matching in a function) is the usual way of dealing with invalid input.

def replace_leading(_string, "", _replacement) do raise ArgumentError, "cannot use an empty string as the match to replace" end def replace_leading(string, match, replacement) when is_binary(string) and is_binary(match) and is_binary(replacement) do # implementation end

looks to me the most idiomatic. I would also suspect guards to introduce less overhead (although that's marginally important in that case).

I wouldn't worry about overhead. The problem with your example is that then we say that the problem with replace_leading(:not_a_binary, "", %{hello: "mom, look not a binary!"}) will be that the match is "" instead of the other two arguments not being binaries. So the solution would be to add is_binary/1 clauses to the "" clause as well, but at that point I prefer the if version. ¯_(ツ)_/¯ I can change if other people prefer that, but personally I think the difference is not very strong anyways.

yes. using if saves us from two extra is_binary guard checks (one for string, one for replacement),
if you want to use guards, which seems reasonable, is that you need to create a private function, check only once in the public function, and not check anything in the private one.

def replace_leading(string, match, replacement) when is_binary(string) and is_binary(match) and is_binary(replacement), do: do_replace_leading(string, match, replacement) defp do_replace_leading(_string, "", _replacement) do raise ArgumentError, "cannot use an empty string as the match to replace" end defp do_replace_leading(string, match, replacement) do # implementation end

Yes, but here I think the if is the cleanest and easiest thing to read :)

eksperimental · 2016-07-17T12:21:07Z

I'm not so sure about this behaviour.
I think if replacement is "", we should return the original string.

I will do a bit of research.

another weird behaviour is

iex>  Regex.replace(~r//, "123", "x")
"x1x2x3x"

whatyouhide · 2016-07-17T12:41:21Z

@eksperimental if we return the original string, it will be as if we considered "" to not have matched, but that makes no sense since "" always matches. I discussed this with @lexmag as well before submitting the PR and we both agreed on this, hence why I went with the raising.

eksperimental · 2016-07-17T12:47:37Z

Ok, it makes sense, but there's another case.
String.replace_leading("", "", "found")
if empty string is found, and it was same as original, shouldn't we return "found", instead of raising?

I have tried previous regex in Perl, and it works as in Elixir

$ echo "123a456" | perl -pe "s//X/g" 
X1X2X3XaX4X5X6X

whatyouhide · 2016-07-17T12:59:30Z

@eksperimental the thing is that String.replace_leading/3 (and _trailing) accept the "match" as a controlled input, meaning the user is supposed to control it and check it's something meaningful. I think it's reasonable to never allow "" because semantically it makes no sense. And in the replace_leading("", "", "foo") example, I still think "" matches infinite times even if the string is an empty string itself: infinite empty strings concatenated are still an empty string and match everything infinite times, even the empty string itself. Did I explain what I mean? :)

eksperimental · 2016-07-17T13:02:15Z

yes. I can see what you mean. it makes sense.
Could we please add String.replace_leading("", "", "found") as a test?

whatyouhide · 2016-07-17T13:18:57Z

@eksperimental good call, updated with a new test.

eksperimental · 2016-07-17T13:30:11Z

lib/elixir/lib/string.ex

return should be t | :no_return
since we are raising

eksperimental · 2016-07-17T13:31:48Z

@whatyouhide we need to update the docs explaining this special case.

josevalim · 2016-07-17T13:34:23Z

Let's take a step back and understand the usage of "" throughout String:

ex(2)> String.split "foo", ""
["f", "o", "o", ""]
iex(3)> String.contains? "foo", ""
true
iex(4)> String.ends_with? "foo", ""
true

If a string contains "", shouldn't replacing "" mean something? How are other languages tackling this?

josevalim · 2016-07-17T13:35:05Z

I am pinging @ThomasArts because he may be able to encode this discussion into properties that may resonate with everyone. :)

eksperimental · 2016-07-17T13:40:24Z

@josevalim then it should behave same way as my Regex example,
and it will also be in agreement with that @whatyouhide says: #5022 (comment)
there are infinite empty strings between each letter, so slot in between gets a replacement inserted

whatyouhide · 2016-07-17T13:42:23Z

In Ruby, it's:

'foo'.split('')
#=> ['f', 'o', 'o']

In Node, it's:

'foo'.split('')
//=> ['f', 'o', 'o']

In Python (thankfully someone makes sense here 😄):

>>> 'foo'.split('')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: empty separator

I think the correct behaviour might be to raise for empty separator in String.split/2 as well. If you want codepoints/graphemes, we have String.codepoints/1 and String.graphemes/1 so achieving that through String.split/2 seems like a weird way to do it.

As for contains?/2 and ends_with?/2, they seem to behave consistently in all languages (e.g. every string contains the empty string and every string ends/starts with the empty string). These properties may make sense as the empty string has no size so every string contains infinite empty strings. And this is the reason why we raise in replace_leading but not replace_prefix (we can replace one prefix, but not all of them because there's infinite). Makes sense? :)

josevalim · 2016-07-17T13:48:09Z

@whatyouhide I was mostly concerned about how replace_leading and replace_prefix works on other languages with empty strings. :)

To put things in perspective I would expect this to happen: if String.ends_with?(string, value) then String.replace_trailing(string, value, another) |> String.ends_with?(another) should be true. The code in this PR violates this property. However, I do understand the matter of infinity which happens with replace_trailing which it does not happen with replace_suffix. Should we at least agree that replace_prefix and replace_suffix should accept empty strings and add tests for those too?

whatyouhide · 2016-07-17T13:56:03Z

Yeah I agree for replace_prefix and replace_suffix, those make sense to me as they replace the first occurrence of the match so it makes sense with "" as the match as well.

The property you showed would be nice to have but I think this is an intrinsic problem with having String.replace_leading/3 and String.replace_trailing/3 in the stdlib. Other languages (at least the ones I'm looking at) appear to have no similar functionality and you have to build that yourself, treating the empty string in the way that better fits you. Not sure what to do then 😕

lexmag · 2016-07-17T13:59:04Z

I think we should raise on empty string in String.split/2 as well; today we have a "hack" here to handle "" match, since :binary.split("asd", "", [:global]) raises ArgumentError.

whatyouhide · 2016-07-17T14:00:19Z

@lexmag that's good news. I can take care of that, and I can add tests for a "" match to replace_prefix and replace_suffix.

Now we only need to decide what to do with replace_leading and replace_trailing.

josevalim · 2016-07-17T14:06:35Z

@whatyouhide we are not changing String.split. If we want to, we should open up a discussion apart from this one but I would very likely veto it since it is well documented and I see no reason why we should stick with the behaviour described in :binary.match/2. Specially given we changed String.contains? to match :binary.match in the past and then @ThomasArts proved with properties that not following :binary.match was the best way to go.

eksperimental · 2016-07-17T14:12:10Z

I fail to see the problem with the infinity of empty strings.
let's say [IES] (Infinite empty strings)
"foo" is [IES]f[IES]o[IES]o[IES]

String.prefix("foo", "", "X), replaces the first empty space in [IES].
should be interpreted as:
``String.prefix([IES]f[IES]o[IES]o[IES], [IES], [IES]X[IES]) should return:[IES]X[IES][IES]f[IES]o[IES]o[IES]` which reads as `"Xfoo"`

String.leading("foo", "", "X"), replaces all empty spaces in [IES].
should be interpreted as:
``String.prefix([IES]f[IES]o[IES]o[IES], [IES], [IES]X[IES]) should return:[IES]X[IES]f[IES]o[IES]o[IES]` which reads as `"Xfoo"`

whatyouhide · 2016-07-17T14:13:06Z

We can of course look into this from the properties perspective, that would be helpful I think. But which property String.split/2 holds, that sounds harder :) Also, String.split("", "") == [""], this stuff is a slippery slope: does that mean that "" is actually an empty string followed by an empty string? Or an empty string preceded by an empty string?

The empty string is dangerous and I think we should treat it as such.

whatyouhide · 2016-07-17T14:13:56Z

@eksperimental you're making IES (infinite empty string) finite like that :)

eksperimental · 2016-07-17T14:14:35Z

@whatyouhide how come?

whatyouhide · 2016-07-17T14:16:17Z

@eksperimental you're just treating it like a codepoint in your examples.

eksperimental · 2016-07-17T14:19:51Z

@whatyouhide in my examples I am treating it as infinite codepoints, and every code point will be surrounded by infinite empty codepoints.

whatyouhide · 2016-07-17T14:22:12Z

It's a slippery slope dealing with infinites @eksperimental. In your reasonings, ∞−∞ = 0, which is not a truth in math (it's indeterminate). This math.stackexchange.com answer explains this nicely :)

eksperimental · 2016-07-17T14:32:40Z

but where in my examples I end up with 0? It always ends up with inifite empty spaces

whatyouhide · 2016-07-17T14:37:34Z

@eksperimental you replace infinite empty strings with infinite empty strings considering them equal because they're both infinite empty strings, but as I said above, ∞−∞ = 0 (which means ∞ = ∞) is not true and indeterminate in math. E.g. you do String.prefix([IES]f[IES]o[IES]o[IES], [IES], [IES]X[IES]) and replace [IES] at the beginning, under the wrong assumption that IES == IES, that's where you apply ∞−∞ = 0.

josevalim · 2016-07-17T14:39:49Z

String.split("", "") == [""] an empty string cannot be further split. Split also always returns a non-empty list. I would say this result is actually quite reasonable.

Here is another property we could consider:

if String.split(string, value) |> Enum.at(-1) == "" then
  String.split(String.replace_trailing(string, value, another), another) |> Enum.at(-1) == ""

Given that String.split on "" does not return infinite empty strings at the end, I think it is reasonable for replace_trailing with an empty string to behave the same as replace_suffix with an empty string.

lexmag · 2016-07-17T14:43:21Z

Considering how String.split/2 behaves today, I think not raising on empty string in replace_trailing and replace_leading is consistent.

whatyouhide · 2016-07-17T14:44:46Z

Yes we either raise in all of them or in none of them (raising in all three would be consistent as well, just saying 😄). So what shall we go with?

josevalim · 2016-07-17T14:45:32Z

I think it should behave like replace_suffix and replace_prefix (basically the value is appended or prepended).

eksperimental · 2016-07-17T14:51:07Z

@whatyouhide sorry, i don't agree. My reasoning for _prefix I that I replace one of the empty strings in the series of infinite strings, with non-empty strings which is surrounded by infinite empty strings.

String.prefix([IES_1][IES_2][IES_3][IES_...]f[IES_...]o[IES_...]o[IES_...], [IES_...], [IES_...]X[IES_...])
returns: [IES_...]X[IES_...][IES_2][IES_3][IES_...]f[IES_...]o[IES_...]o[IES_...]
which ends up being squeezed up as: [IES_...]X[IES_...]f[IES_...]o[IES_...]o[IES_...]

for a more practical way of explaining _leading (which is pretty much the same as for _prefix).. think of grabbing an infinite number of empty strings, split it right in the middle and insert a non-empty string which is surrounded by infinite number of empty strings.
it makes no difference whether is _leading or _prefix. you will always will end up with the non-empty string surrounded by infinite empty strings.

whatyouhide · 2016-07-17T14:58:49Z

Ok @josevalim, let's go with that then.

@eksperimental you're doing the same you did before; you can't replace IES_1 with IES_... because you have to replace matching strings and IES_1 (supposedly a single empty string) doesn't match IES_.... Also, I could argue that IES_1 is not one empty string but infinite empty strings; no way to disproof that :) Math is hard, so let's not try to justify this with it and let's just go with prepending/appending and document this well, and be sad that chaos rules this world. ¯_(ツ)_/¯

ThomasArts · 2016-07-17T18:12:19Z

The property I typically would like to see hold is:

property "replace all leading occurrence of strings" do
    forall {ls, n, s, rs}  <- {string(), nat(), string(), string()} do
      implies not String.starts_with?(s, ls) do
         string = String.duplicate(ls, n) <> s 
         replaced = String.duplicate(rs, n) <> s
         ensure String.replace_leading(string, ls, rs) == replaced
        end
    end
end

In Elixir version 1.3.2, this property passes 100,000 tests without problem.

In other words, if we take a random strings lsand s such that ls is not a prefix of s, then I expect that replace leading replaces n copies of ls with ncopies of rs. No matter what value n and rs have.

My string generator does indeed generate empty strings.

As a consequence of the implies operator, this means that we never call the replace_leading function with the empty string, since it makes no sense to predict the outcome.

ThomasArts · 2016-07-17T18:43:20Z

Another property... which reveals the problem that you discuss is:

  property "replace all multiple occurrences of string" do
    forall {ls, n, rs}  <- {string(), nat(), string()} do
        string = String.duplicate(ls, n) 
        replaced = String.duplicate(rs, n)
        ensure String.replace_leading(string, ls, rs) == replaced
      end
  end

Given a string that is just n copies of a random string ls, we expect this to result in n copies of rs. However, this property fails with a timeout. The replacement takes too longif ls is the empty string.

This is a good reason to raise an exception when ls is the empty string. After all, how would you replace 4 copies of "" by 4 copies of "foo"?

ThomasArts · 2016-07-18T07:41:33Z

Thanks for getting me into the discussion. I was a bit late with my reaction, but I added properties to the discussion and let me know if you want more of such.

I came to the same conclusion as you reached: raise an exception when the leading string is empty.
Cheers
Thomas

On 17 Jul 2016, at 15:40, Eksperimental [email protected] wrote:

@josevalim https://github.com/josevalim then it should behave same way as my Regex example,
and it will also be in agreement with that @whatyouhide https://github.com/whatyouhide says: #5022 (comment) #5022 (comment)
there are infinite empty strings between each letter, so slot in between gets a replacement inserted

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #5022 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAm8P_xN8_jtqYxUfPu2bEZMf1IjSq9qks5qWjDYgaJpZM4JONOS.

whatyouhide · 2016-07-18T08:37:51Z

Thanks a lot @ThomasArts! 💟 This was superhelpful. And your latter property shows pretty well that things start to break (conceptually I mean) when we allow to replace all the leading occurrences of "" in a string. We may go with the solution @josevalim proposed as well (make it behave like replace_prefix/3) for the sake of just "making it work", but I am not convinced.

Thoughts?

ThomasArts · 2016-07-18T09:07:18Z

For the replace_prefix, I expect the following property to hold:

 property "replace one prefix occurrences of string" do
    forall {ls, rs, s}  <- {string(), string(), string()} do
        string = ls <> s
        replaced = rs <> s
        ensure String.replace_prefix(string, ls, rs) == replaced
    end
  end

Which it does, even for ls being the empty string. The reason is that the empty string appended to an arbitrary string always at most represents one string.
However, in replace_leading, the same string can represent zero or more leading empty strings. The information on the amount of occurrences is not present when string length is zero.

In my opinion, allowing the empty string as prefix makes sense, It turns replace_prefix in the <> operator.
Not very useful, but consistent.

In general, the simpler the specification, the easier for people to understand what the function does. If one adds complexity to the specification, such as:

 property "replace one prefix occurrences of string" do
    forall {ls, rs, s}  <- {non_empty(string()), string(), string()} do
        string = ls <> s
        replaced = rs <> s
        ensure String.replace_prefix(string, ls, rs) == replaced
    end
  end

Then the user has to program extra code to check and avoid the first argument to be empty or to reason that this in the specific application will never happen.

/Thomas

On 18 Jul 2016, at 10:38, Andrea Leopardi [email protected] wrote:

Thanks a lot @ThomasArts https://github.com/ThomasArts! 💟 This was superhelpful. And your latter property shows pretty well that things start to break (conceptually I mean) when we allow to replace all the leading occurrences of "" in a string. We may go with the solution @josevalim https://github.com/josevalim proposed as well (make it behave like replace_prefix/3) for the sake of just "making it work", but I am not convinced.

Thoughts?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #5022 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAm8P80KOUhyvtVLtG2YiL_4tuu26zElks5qWztvgaJpZM4JONOS.

whatyouhide · 2016-07-18T09:13:32Z

@ThomasArts yes, for replace_prefix we agreed that "" is allowed and it turns it in <> basically, as you said. So no need for empty string checks and such. But for replace_leading, as you showed, it's a bit more complicated and we may need the users to do their empty string checks in order to be able to have a consistent function.

josevalim · 2016-07-18T14:56:14Z

That was really helpful @ThomasArts, thank you!

So to tidy up the discussion:

replace_prefix and replace_suffix will work with empty strings
replace_leading and replace_trailing will raise for empty strings

Let's make sure to add documentation that says it is impossible to replace multiple (leading|trailing) empty strings, hence the error.

ericentin · 2016-07-18T15:03:57Z

I like the current proposed solution, as I think that even though it is not "mathematically' consistent, it is pragmatic and will prevent people from doing something that doesn't really make sense to do. 👍

whatyouhide · 2016-07-18T16:42:58Z

Small update: String.replace_prefix/3 and String.replace_suffix/3 now (93bc364) officially support "" as the match, in which case they prepend/append the given replacement, respectively.

These function would end up in infinite recursion if the "match" argument to replace was an empty string. With this commit, we explicitly check for that empty string and possibly raise an ArgumentError exception.

whatyouhide · 2016-07-18T17:02:11Z

Update the PR so that replace_leading/3 and replace_trailing/3 raise if the match is ""; I also update the typespec with no_return for both and documented the new behaviour.

josevalim · 2016-07-19T16:46:51Z

❤️ 💚 💙 💛 💜

michalmuskala reviewed Jul 17, 2016
View reviewed changes

eksperimental reviewed Jul 17, 2016
View reviewed changes

lib/elixir/lib/string.ex Outdated

Copy link

Contributor

eksperimental Jul 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return should be t | :no_return
since we are raising

Fix a bug in String.replace_leading/3 and String.replace_trailing/3

57cf0dd

These function would end up in infinite recursion if the "match" argument to replace was an empty string. With this commit, we explicitly check for that empty string and possibly raise an ArgumentError exception.

josevalim merged commit 070e395 into elixir-lang:master Jul 19, 2016

whatyouhide deleted the string-replace-recursion-bug branch July 19, 2016 19:14

lexmag mentioned this pull request Sep 13, 2016

String.trim("abc", "") won't return anything #5210

Closed

lexmag mentioned this pull request Sep 11, 2017

Allow passing empty string as match argument to String.replace/4 #6559

Merged

josevalim mentioned this pull request Sep 11, 2017

String.replace_trailing/leading match empty string #6563

Closed

Fix a bug in String.replace_leading/3 and String.replace_trailing/3 #5022

Fix a bug in String.replace_leading/3 and String.replace_trailing/3 #5022

Uh oh!

Conversation

whatyouhide commented Jul 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalmuskala Jul 17, 2016

Choose a reason for hiding this comment

Uh oh!

whatyouhide Jul 17, 2016

Choose a reason for hiding this comment

Uh oh!

michalmuskala Jul 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whatyouhide Jul 17, 2016

Choose a reason for hiding this comment

Uh oh!

eksperimental Jul 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whatyouhide Jul 17, 2016

Choose a reason for hiding this comment

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eksperimental Jul 17, 2016

Choose a reason for hiding this comment

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

josevalim commented Jul 17, 2016

Uh oh!

josevalim commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

josevalim commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

lexmag commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

josevalim commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016

Uh oh!

eksperimental commented Jul 17, 2016

Uh oh!

whatyouhide commented Jul 17, 2016 •

edited

Loading

michalmuskala Jul 17, 2016 •

edited

Loading

eksperimental Jul 17, 2016 •

edited

Loading

whatyouhide commented Jul 17, 2016 •

edited

Loading

eksperimental commented Jul 17, 2016 •

edited

Loading

eksperimental commented Jul 17, 2016 •

edited

Loading

ThomasArts commented Jul 17, 2016 •

edited

Loading