Skip to content

Use a binary search tree#128

Merged
dumbmatter merged 4 commits intodumbmatter:masterfrom
nolanlawson:binary-tree
Aug 27, 2025
Merged

Use a binary search tree#128
dumbmatter merged 4 commits intodumbmatter:masterfrom
nolanlawson:binary-tree

Conversation

@nolanlawson
Copy link
Copy Markdown
Contributor

Fixes #44

This replaces the current array-based approach with a simple binary search tree. The runtime of the benchmark is ~13x faster (using hyperfine):

  • Before: 4.300 s ± 0.178 s
  • After: 323.2 ms ± 42.1 ms

My main goal with the binary search tree was simplicity of implementation. I figure fake-indexeddb has more value as a "reference implementation" for IndexedDB rather than a hyper-optimized database. (But it's still nice for it to be ~13x faster!)

I opted for a basic red-black tree with a deletion strategy borrowed from the scapegoat tree. Deletion in a scapegoat tree is much simpler: you just mark nodes with deleted tombstones and rebuild the tree if there are too many tombstones, which avoids rotations to rebalance. I felt this was a good compromise to try to keep things simple. I doubt most fake-indexeddb users are deleting a bunch of nodes (you'd more normally just throw away the whole database between each test), and even if they do, the amortized cost of the deletions is still O(log(n)).

(Using a scapegoat tree itself is a nice idea, but it falls apart on insertions. The FDBObjectStore.count performance should be reasonable test is extremely slow with a scapegoat tree, because nodes are inserted in sorted order, which basically devolves into a linked list where the tree is rebuilt on every insertion. Whereas with the current tree, that particular test is only slightly slower: 214ms vs 127ms. It could probably be faster if I avoided recursion/generators, but again I was opting for readability.)

@nolanlawson
Copy link
Copy Markdown
Contributor Author

BTW the FDBObjectStore.count performance should be reasonable regression looks caused by the lookup costs rather than the insertion costs, which I imagine is because we're comparing the cost of iterating through an array versus iterating through a custom binary search implementation. I still think it could be improved, but I'm not sure it's worth the effort.

@nolanlawson
Copy link
Copy Markdown
Contributor Author

^ Made some optimizations, the FDBObjectStore.count performance should be reasonable test is now down to ~162ms.

@dumbmatter dumbmatter merged commit 5442f71 into dumbmatter:master Aug 27, 2025
@dumbmatter
Copy link
Copy Markdown
Owner

This is really fantastic, thank you so much!

Would be a good time to do a new release, or is there other stuff you are thinking of doing soon?

@nolanlawson
Copy link
Copy Markdown
Contributor Author

Your call! 🙂 I was planning on doing some more tinkering this weekend, but maybe it's good to have a release just with this change, since there is some chance of regression with a big overhaul like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poor insertion performance when using multiEntry

2 participants