-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Jemalloc performance on 64-bit ARM #34476
Copy link
Copy link
Closed
Labels
A-runtimeArea: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflowsArea: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflowsI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Metadata
Metadata
Assignees
Labels
A-runtimeArea: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflowsArea: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflowsI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Type
Fields
Give feedbackNo fields configured for issues without a type.
I've just run the
binary_treesbenchmark on anARMv8, Cortex-A53 processor, having converted an Android TV box to Linux.I'd found previously, on a much weaker (but more power efficient)
armv7Cortex A5, the results were equal. On the new machine (using the latest officialaarch64rustc nightly)./binary_trees 23produces the following results:sysalloc1m28s 5m10s 0m10sjemalloc1m35s 5m10s 0m53swhich is palpably worse actually, even though Cortex-A53 is a much stronger core.
I'm beginning to think
jemalloconly makes sense on Intel processors with heaps or L1/L2 cache.More benchmark ideas welcome, though.
added retroactively:
To reproduce, unpack the attachment and run:
inside the binary_trees directory. Uncomment the first 2 lines in main.rs to produce a sysalloc version.