-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Disable hyper-threading
- In BIOS
- Some sources mention the
nohtkernel parameter, but others say it doesn't work. echo 0 > /sys/devices/system/cpu/cpuN/onlinefor allNthat don't have their own core id incat /proc/cpuinfo- https://serverfault.com/questions/235825/disable-hyperthreading-from-within-linux-no-access-to-bios
NUMA
The machine only has a single NUMA node, so we don't need to worry about it.
http://stackoverflow.com/questions/11126093/how-do-i-know-if-my-server-has-numa
scala@scalabench:~$ sudo dmesg | grep -i numa
[ 0.000000] No NUMA configuration found
scala@scalabench:~$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
Use cpu sets
Install cset: sudo apt-get install cpuset. (On NUMA machines, cset also handles sets of memory nodes, but we only have one.)
cset setto create, manipulate CPU setscset procto mange processes into setscset shieldis convenience, simpler to use, allows isolating a process
Shielding
cset shieldshows the current statuscset shield -c 1-3- creates 3 sets: "root" with all CPUs, "user" with CPUs 1-3 (the "shield"), and "system" with the other CPUs.
- userspace processes in root are moved to system
cset shield -k onmoves kernel threads (those that can be moved) from root to system (some kernel threads are specific to a CPU and not moved)cset shield -v -s/-ushow shielded / unshielded processescset shield -e cmd -- -cmdArgexecutecmd -cmdArgin the shieldcset shield -rreset the shield
References
- https://rt.wiki.kernel.org/index.php/Cpuset_Management_Utility/tutorial
- http://stackoverflow.com/questions/11111852/how-to-shield-a-cpu-from-the-linux-scheduler-prevent-it-scheduling-threads-onto
Use isolated CPUs
NOTE: Using isolated CPUs for running the JVM is not a good idea. The kernel doesn't do any load balancing across isolated CPUs. https://groups.google.com/forum/#!topic/mechanical-sympathy/Tkcd2I6kG-s, https://www.novell.com/support/kb/doc.php?id=7009596. Use cset instead of isolcpus and taskset.
lscpu --all --extended lists CPUs, also logical cores (if hyper-threading is enabled). The CORE column shows the physical core.
Kernel parameter isolcpus=2,3 removes CPUs 2 and 3 from the kernel's scheduler.
- In
/etc/default/grub, for exampleGRUB_CMDLINE_LINUX_DEFAULT="quiet isolcpus=2,3" sudo update-grub
Verify
cat /proc/cmdlinecat /sys/devices/system/cpu/isolatedtaskset -cp 1-- affinity list of process 1ps -eww --forest -o pid,ppid,psr,user,stime,args-- there should be nothing on isolated cores.
Use taskset -c 2,3 <cmd> to run cmd (and child processes) only on CPUs 2 and 3.
Questions
- Running on fewer cores probably impacts performance as the JVM runs compilation and GC concurrently.
- When using
taskset -c 2,3, does the JVM still think the system has 4 cores? Would that be a problem?
$ taskset -c 0,1 ~/scala/scala-2.11.8/bin/scala -e 'println(Runtime.getRuntime().availableProcessors())'
2
$ taskset -c 1 ~/scala/scala-2.11.8/bin/scala -e 'println(Runtime.getRuntime().availableProcessors())'
2
References
- https://haypo.github.io/journey-to-stable-benchmark-system.html
- https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter
Tickless / NOHZ
Disable scheduling clock interrupts on the CPUs used for benchmarking, add the nohz_full=2,3 kernel parameter if there's a single task (thread) on the CPU.
Verify
cat /sys/devices/system/cpu/nohz_fulldmesg|grep dyntickshould show the CPUssudo perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 stress -t 1 -c 1should show 1 tick (see redhat reference)- On my test system (after building a kernel with
CONFIG_NO_HZ_FULL), i got numbers between 20 and 90 ticks on the otherwise idle CPU 1. Running on CPU 0, I get ~390 ticks. watch -n 1 -d grep LOC /proc/interruptsshows 1 tick per second on CPU 1 when idle- Running anything
stress -t 1 -c 1on CPU 1 causes more ticks - Running the scala REPL on CPU 1 causes more ticks whenever the RPEL is not idle
- On my test system (after building a kernel with
NOTE: disabling interrupts has some effect on CPU frequency, see https://fosdem.org/2017/schedule/event/python_stable_benchmark/ (24:45). Make sure to use a fixed CPU frequency. I don't have the full picture yet, but its something like that: the intel_pstate driver is no longer notified and does not update the CPU frequency.
- Therefore: disable
intel_pstatewhen using tickless mode - https://bugzilla.redhat.com/show_bug.cgi?id=1378529
(Some more advanced stuff in http://www.breakage.org/2013/11, pin some regular tasks to specific CPUs, writeback/cpumask, writeback/numa).
References
- https://haypo.github.io/journey-to-stable-benchmark-system.html
- https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Performance_Tuning_Guide/sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-CPU-Configuration_suggestions.html
- https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
rcu_nocbs
RCU is a thread synchronization mechanism. RCU callbacks may prevent a cpu from entering adaptive-tick mode (tickless with 0/1 tasks). https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
The rcu_nocbs=2,3 kernel param prevents CPUs 2 and 3 from queuing RCU callbacks.
References
- https://en.wikipedia.org/wiki/Read-copy-update
- Mentioned on http://stackoverflow.com/questions/20133523/how-do-i-get-tickless-kernel-to-work-nohz-full-rcu-nocbs-isolcpus-what-else
- https://fosdem.org/2017/schedule/event/python_stable_benchmark/ (6:30): "I don't know the details, but the idea is that it will not spawn kernel code on this CPU"
- https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
Interrupt handlers
Avoid running interrupt handlers on certain CPUs
/proc/irq/default_smp_affinityis the default bit mask of CPUs permitted for an interrupt handle/proc/irq/N/containssmp_affinity(bit mask of allowed CPUs) andsmp_affinity_list(list of CPUs able to execute the interrupt handler)
Verify
cat /proc/interrupts
There's an irqbalance service (systemctl status irqbalance)
- https://www.novell.com/support/kb/doc.php?id=7007602: make sure to disable
irqbalancewhen pinning irq handlers to certain processors - https://serverfault.com/questions/513807/is-there-still-a-use-for-irqbalance-on-modern-hardware: "You should use irqbalance unless You are manually pinning your apps/IRQ's to specific cores for a very good reason"
References
- https://haypo.github.io/journey-to-stable-benchmark-system.html
- https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html
CPU Frequency
Disable Turbo Boost
- In BIOS
- Or write
1to/sys/devices/system/cpu/intel_pstate/no_turbo-- if usingpstate- with
intel_pstate=disable, find out how to disable turbo boost it in the system
- with
There seem to be two linux tools
cpufrequtils, withcpufreq-infoandcpufreq-set(https://wiki.debian.org/HowTo/CpuFrequencyScaling), used by kruncpupower(https://wiki.archlinux.org/index.php/CPU_frequency_scaling) - for debian jessie that only exists in backports- It seems
cpupoweris actively developed and has more features, support for newer cpus (https://bbs.archlinux.org/viewtopic.php?id=135820)
Intel can run in different P-States, voltage-frequency pairs when running a process. C-States are idle / power saving states. The intel_pstate driver handles this.
The intel_pstate=disable kernel argument disables the intel_pstate driver and uses acpi-cpufreq instead (see redhad reference).
sudo apt-get install linux-cpupower(in jessie backports only!)cpupower frequency-infoandcpupower idle-infoto show the active drivers.
CPU Info
lscpucat /proc/cpuinfo(| grep MHz)cpupower frequency-infowatch -n 1 grep \"cpu MHz\" /proc/cpuinfo
CPUfreq Governors
- List available governor:
cpupower frequency-info --governors(Examples:performance,powersave, ...). Should useperformance, which keeps the maximal frequency. NOTE: theintel_pstatedriver still does dynamic scaling in this mode. - Check active governors:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor - Set governor:
cpupower -c 1-3 frequency-set --governor [governor](on CPUs 2, 3)
Set a specific frequency:
sudo cpupower -c 1-3 frequency-set -f 2400MHz. Use-ufor max,-dfor min.- This activates the
userspacecpu governor
- This activates the
- Valid frequencies:
cpupower frequency-info - Which frequency? Dmitry suggested to under-clock.
- Does not work with the
intel_pstatedriver (http://stackoverflow.com/questions/23526671/how-to-solve-the-cpufreqset-errors)
The intel_pstate driver has /sys/devices/system/cpu/intel_pstate/min_perf_pct and max_perf_pct, maybe these can be used if we stick with that driver?
References
- https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states
- https://haypo.github.io/intel-cpus.html
- https://fosdem.org/2017/schedule/event/python_stable_benchmark/ (18:15)
- https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt
- https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
- https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/cpufreq_governors.html
- https://wiki.archlinux.org/index.php/CPU_frequency_scaling
Disable git gc
https://stackoverflow.com/questions/28092485/how-to-prevent-garbage-collection-in-git
$ git config --global gc.auto 0
Disable hpet
Suggested by Dmitry, I haven't found any other references.
hpet is a hardware timer with a frequency of at least 10 MHz (higher than older timer circuits).
- Current source:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource - Available sources:
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
Change using a kernel parameter clocksource=acpi_pm
Explanation of clock sources: https://access.redhat.com/solutions/18627
References
Ramdisk
tmpfs vs ramfs
- ramfs: older, cannot set size limit
- tmpfs: can set size limit, may swap if system runs out of memory
- https://www.jamescoyle.net/knowledge/951-the-difference-between-a-tmpfs-and-ramfs-ram-disk
Added to /etc/fstab
tmpfs /mnt/ramdisk tmpfs defaults,size=16g 0 0
Disable "transparent hugepages"
There are some recommendations out there to disable "transparent hugepages", mostly for database servers
khugepagedprocess- https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
- https://docs.oracle.com/database/122/CWLIN/disabling-transparent-hugepages.htm
Disable khungtaskd
Probably not useful, runs every 120 seconds only. Detects hung tasks.
Cron jobs
https://help.ubuntu.com/community/CronHowto
- User crontabs:
crontab -eto edit,crontab -lto show - Show all user crontabs:
for user in $(cut -f1 -d: /etc/passwd); do sudo crontab -u $user -l; done. Or make sure that the/var/spool/cron/crontabsdirectory is empty. - System crontab:
/etc/crontab- should not edit by hand /etc/cron.dcontains files with system crontab entries/etc/cron.hourly/.daily/.monthly/.weeklycontain scripts executed from/etc/crontab(or byanacron, if installed)
Disable / enable cron
systemctl stop cronsystemctl start cron
Disable / enable at
systemctl stop atdsystemctl start atd
Run under perf stat
Suggestion by Dmitry, discard benchmarks with too many cpu-migrations, context-switches. Would need to keep track of expected values.
sudo perf stat -x, scalac Test.scala(machine-readable output)-prof perfnormin jmh
References
Build custom kernel
Ah well, probably have to figure out some more details how to do this correctly.
- https://github.com/softdevteam/krun#building-a-tickless-kernel
- https://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official
- https://debian-handbook.info/browse/stable/sect.kernel-compilation.html
apt-get install linux-source-4.9
tar xaf /usr/src/linux-source-4.9.tar.xz
apt-get install build-essential fakeroot libncurses5-dev
cd linux-source-4.9
cp /boot/config-4.9.0-0.bpo.2-amd64 .config
make menuconfig
- General setup->Timers subsystem->Timer tick handling -> Full dynticks system (tickless)
- Up one level -> Full dynticks system on all CPUs by default (except CPU 0)
- General setup->Local Version, enter a simple string
nano .config
- comment out CONFIG_SYSTEM_TRUSTED_KEYS
https://unix.stackexchange.com/questions/293642/attempting-to-compile-any-kernel-yields-a-certification-error
make deb-pkg
cd ..
sudo dpkg -i linux-image-4.9.18_4.9.18-1_amd64.deb
Scripting all of that
It seems that python3's "perf" package will do most configurations:
pip3 install perf
python3 -m perf system show
python3 -m perf system tune
python3 -m perf system reset
Important: check all settings before starting a benchmark.
Check load
Find a way to ensure that the benchmark machine is idle before starting a job.
Machine Specs
NX236-S2HD (http://www.nixsys.com/nx236-s2hd.html)
- Motherboard: X11SSZ https://www.supermicro.com/manuals/motherboard/C236/MNL-1744.pdf
- Intel C236 Chipset
- Intel Core i7-6700 (4 Core, 8 Thread)
- 64GB (4x 16GB) DDR4 PC17000 (2133MHz)
- WD Black 2TB (WD2003FZEX)
- Samsung 850 Pro 512GB