fix for quantization error with fractional step#247

mbostock · 2022-04-03T02:37:08Z

Ref. test breakage in observablehq/plot@2551afa

The new quantization approach for binning was introducing some rounding error, resulting in values being put in the wrong adjacent bin. For example, if x = 1.64, x0 = 1.2, and step = -50 (a bin width of 0.02), then quantization would produce a value of (x0 - x) * step = (1.2 - 1.64) * -50 = 21.999999999999996. When this is floored to 21, the value 1.64 would be erroneously placed in the bin representing [1.62, 1.64) rather than [1.64, 1.66).

If we switch to (x0 * step - x * step), then we get (1.2 * -50 - 1.64 * -50) = 22, producing the correct result. I’m not totally sure that this fixes the binning in all cases, but I think it might?

Fil · 2022-04-03T10:34:18Z

Unfortunately it doesn't fix all cases, see https://observablehq.com/@d3/d3-bin-errors-247

My hunch is we should revert to 3.1.2, and create a new API for quantized bins—this API would only return a sum, and would make no promises of being exact for numbers that are "at" the limit between two bins.

Fil · 2022-04-03T14:16:48Z

Another possibility would be to test each time if the quantization "got it right", like so:

@@ -91,7 +91,9 @@ export default function bin() {
       } else if (step < 0) {
         for (i = 0; i < n; ++i) {
           if ((x = values[i]) != null && x0 <= x && x <= x1) {
-            bins[Math.floor(x0 * step - x * step)].push(data[i]);
+            let k = Math.floor(x0 * step - x * step);
+            if (bins[k].x1 <= x) k++;
+            bins[k].push(data[i]);
           }
         }
       }

and we could add the following unit test:

it("bin(data) doesn't round the values", () => {
  for (const n of [1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000]) {
    assert.strictEqual(bin().thresholds(n)(ticks(1, 2, n)).filter(d => d.length !== 1).length, 0);
  }
});

mbostock · 2022-04-03T14:42:55Z

Yes, this is what I was thinking (though I’m probably off by one):

if (tz[k] <= x) ++k;

But don’t we need to test both sides? Meaning, it might need to be more like this:

if (tz[k] <= x) ++k;
else if (tz[k - 1] < x) --k;

And theoretically we might need a while loop if you can be off by more than one, but I think that would only apply if the bin width is at the limit of floating point precision and we can just ignore that.

mbostock · 2022-04-03T14:52:18Z

I pushed the conditional fix. It also means we can go back to using (x0 - x) * step instead of x0 * step - x * step. Which do you prefer?

Fil · 2022-04-03T15:28:03Z

this one is fine (and one multiplication less than the other)

fix for quantization error with fractional step

279de9b

mbostock requested a review from Fil April 3, 2022 02:37

conditional fix for rounding error

6507f5c

Fil approved these changes Apr 3, 2022

View reviewed changes

mbostock merged commit 57225fc into main Apr 3, 2022

mbostock deleted the mbostock/bin-test branch April 3, 2022 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for quantization error with fractional step#247

fix for quantization error with fractional step#247
mbostock merged 2 commits intomainfrom
mbostock/bin-test

mbostock commented Apr 3, 2022

Uh oh!

Fil commented Apr 3, 2022

Uh oh!

Fil commented Apr 3, 2022

Uh oh!

mbostock commented Apr 3, 2022

Uh oh!

mbostock commented Apr 3, 2022

Uh oh!

Fil commented Apr 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

mbostock commented Apr 3, 2022

Uh oh!

Fil commented Apr 3, 2022

Uh oh!

Fil commented Apr 3, 2022

Uh oh!

mbostock commented Apr 3, 2022

Uh oh!

mbostock commented Apr 3, 2022

Uh oh!

Fil commented Apr 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants