4,706 questions
Advice
3
votes
18
replies
445
views
Is CPU allowed to perform a speculative execution on nullptr branch conditions?
CPU often performs speculative execution on code branches like
if ( a > b)
And discards the result in case of misprediction.
However let's consider the following:
int* ptr = nullptr;
//do some work
...
Advice
1
vote
1
replies
63
views
What is the difference between Node and Cell in NUMA system?
Can someone explain the difference between a Node and a Cell in a NUMA system? They seem to refer to the same thing.
Advice
0
votes
7
replies
135
views
Number of bits used for representing different types vs what the CPU uses
I was thinking about floats and doubles and was wondering if the fact that they use 32 bits and 64 bits respectively means there is a performance increase either in terms of memory used or time for ...
1
vote
0
answers
80
views
PyTorch and NVIdia Flare is taking all computing resource on machine learning experiments
I am utilizing PyTorch for federated experiments. As my experiments involves 50 datasets with models, so, I have to run multiple ML models experiments parallelly.
The code for training ML model is ...
7
votes
0
answers
216
views
Why partial sums (multiple accumulators) don't improve performance on M2?
I have a loop that counts the newlines in a string. I played with the loop unrolling and I saw an improvement in performance. Then I thought that the CPU won't be able to utilize its pipelining ...
0
votes
0
answers
102
views
Why does my Next.js Page Router app show periodic CPU spikes above container limits after deployment?
The chart shows CPU usage over time for a container running a Next.js 12 Page Router application.
I’m running a Next.js 12 application using the Page Router with SSR (getServerSideProps) in production....
Best practices
1
vote
14
replies
6k
views
Integer comparison performance on x86
Do all the integer comparison operators(>=, >, <=, <, ==) have the same performance? If we look at this c++ code for example:
#include <iostream>
int main(int argc, char * argv[])
{
...
1
vote
0
answers
96
views
Intellij Utimate edition V2025.3 "Profiler" does not exist in settings
I have Intellij Ultimate edition V2025.3 "Profiler" does not exist in Settings/Preferences > Build, Execution, Deployment > Java Profiler.
I have tried the below option as well, no ...
Advice
1
vote
2
replies
157
views
How the Computer Handles Interrupts
What is the difference between an interrupt and a context switch?
I understand the concept of an interrupt and how it occurs. However, I'm digging deeper into the topic.
I studied Computer ...
3
votes
1
answer
184
views
How to catch EXCEPTION_PRIV_INSTRUCTION from RDPMC directly in Assembly (and without SEH)?
I'm experimenting with measuring CPU's instructions latency and throughput on P and E cores using RDPMC on Win 11, something like that:
MOV ECX, 0x40000000 ; Instructions Counter
RDPMC ; Read ...
0
votes
1
answer
128
views
Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]
I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false.
How is this possible?
How ...
1
vote
1
answer
132
views
Randomness instructions vs syscalls [closed]
I've been digging into "true" randomness idea, and I've noticed that modern CPUs support instructions for generating randomness. X64 has RDRAND instruction, while ARM has RNDR (I'm not ...
1
vote
1
answer
126
views
Is CPU multithreading effected by divergence?
Building on this question here
The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...
0
votes
1
answer
1k
views
How to handle "Could not initialize NNPACK! Reason: Unsupported hardware" warning in PyTorch / Silero VAD on cloud CPU?
I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...
7
votes
1
answer
252
views
Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?
I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled).
My test loop is ...