fix for quantization error with fractional step#247
Conversation
|
Unfortunately it doesn't fix all cases, see https://observablehq.com/@d3/d3-bin-errors-247 My hunch is we should revert to 3.1.2, and create a new API for quantized bins—this API would only return a sum, and would make no promises of being exact for numbers that are "at" the limit between two bins. |
|
Another possibility would be to test each time if the quantization "got it right", like so: @@ -91,7 +91,9 @@ export default function bin() {
} else if (step < 0) {
for (i = 0; i < n; ++i) {
if ((x = values[i]) != null && x0 <= x && x <= x1) {
- bins[Math.floor(x0 * step - x * step)].push(data[i]);
+ let k = Math.floor(x0 * step - x * step);
+ if (bins[k].x1 <= x) k++;
+ bins[k].push(data[i]);
}
}
}and we could add the following unit test: it("bin(data) doesn't round the values", () => {
for (const n of [1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000]) {
assert.strictEqual(bin().thresholds(n)(ticks(1, 2, n)).filter(d => d.length !== 1).length, 0);
}
}); |
|
Yes, this is what I was thinking (though I’m probably off by one): if (tz[k] <= x) ++k;But don’t we need to test both sides? Meaning, it might need to be more like this: if (tz[k] <= x) ++k;
else if (tz[k - 1] < x) --k;And theoretically we might need a while loop if you can be off by more than one, but I think that would only apply if the bin width is at the limit of floating point precision and we can just ignore that. |
|
I pushed the conditional fix. It also means we can go back to using |
|
this one is fine (and one multiplication less than the other) |
Ref. test breakage in observablehq/plot@2551afa
The new quantization approach for binning was introducing some rounding error, resulting in values being put in the wrong adjacent bin. For example, if x = 1.64, x0 = 1.2, and step = -50 (a bin width of 0.02), then quantization would produce a value of (x0 - x) * step = (1.2 - 1.64) * -50 = 21.999999999999996. When this is floored to 21, the value 1.64 would be erroneously placed in the bin representing [1.62, 1.64) rather than [1.64, 1.66).
If we switch to (x0 * step - x * step), then we get (1.2 * -50 - 1.64 * -50) = 22, producing the correct result. I’m not totally sure that this fixes the binning in all cases, but I think it might?