Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  [[email protected]]  [@jschauma]  [RSS]

Stacking Threads

October 29th, 2025

Diagram of
a process's layout in memory as a vertical rectangle
with the high address at the top, the stack growing
down towards the heap and the data and text segments
at the low address.In my Advanced Programming in the UNIX Environment class, we discuss the layout of a Unix process in memory with the aid of a diagram like the one to the right, illustrating the location of the different segments. You've no doubt seen similar ones before.

During our discussion, a question relating to multi-threaded applications came up. In such an application, each thread gets its own stack, but within the same process space of the main program. And so the question was with regards to the placement of stacks in a multi-threaded application, and whether or not those would be located at predictable offsets.

Now normally, I'd answer questions about threads with a link to shouldiusethreads.com, but I figured "What the hell, let's explore this." My initial take was that, assuming the use of Address Space Layout Randomization (ASLR), the location of each thread's stack ought to be non-predictable. But of course the answer is never quite that easy.

Locating thread stacks

Like we've done before, let's print the addresses of a local variable to estimate the location of the function frame. Since each thread still runs within the main process's space, we expect those stacks to be below main.

$ uname -mrsp
NetBSD 10.1 evbarm aarch64
$ clang -Wall -Wextra thread-stacks1.c  -lpthread -lm
$ ./a.out
argc at 0xfffffffc4b58
argv at 0xfffffffc4b50
envp at 0xfffffffc4b48
main at 0xfffffffc4b28

Guard size:   65536
Stack size: 8388608

Thread 1 is at 0xf0ef84bdffc0.
Thread 1 stack size: 8388608.
Thread 0 is at 0xf0ef85bfffc0.
Thread 0 stack size: 8388608.
Thread 2 is at 0xf0ef853effc0.
Thread 2 stack size: 8388608.

Stack address differences between threads:
Thread 0 - Thread 1: 8454144 bytes
Thread 1 - Thread 2: 8454144 bytes
$ 

Running this multiple times in succession on a NetBSD/amd64 10.0 system, this seems to show us that while the initial thread is placed at an unpredictable location, each subsequent thread always gets placed at a fixed offset below the previous thread. That offset can be calculated as the sum of the stack size (see ulimit -s / RLIMIT_STACK; here: 8388608) and the size of the stack guard page (see sysctl vm.thread_guard_size; here: 65536), adding up to 8454144 in this example.

Comparing to other OS, we find:

  • On an Linux/x86_64 5.15.0 system, this looks very similar, with the predictable offset also equal to the sum of the stack size and the guard page size; on a different Linux/x86_64 6.8.0 system, the offset is larger than that, but still a fixed size.1
  • macOS 15.7.1 shows a fixed offset as well (although it looks like there the thread stack size is 0.5MB, i.e., smaller than RLIMIT_STACK) and the thread stacks are located above the main stack.
  • OpenBSD 7.7 and OmniOS r151044 appear to apply ASLR for this placement as well and locate the stack threads at unpredictable, inconsistent offsets from one another.
  • FreeBSD 15.0 BETA places the thread stack at unpredictable offsets from one another, but, like macOS, above the main stack.

These layout choices can then be visualized as shown below:

$ uname -srm
Linux 5.15.0-119-generic x86_64
$ ./a.out
argc at 0x7ffc033042d8
argv at 0x7ffc033042d0
envp at 0x7ffc033042c8
main at 0x7ffc033042a0

Guard size:    4096
Stack size: 8388608

Thread 0 is at 0x7fde47a9be40.
Thread 0 stack size: 8388608.
Thread 1 is at 0x7fde4729ae40.
Thread 1 stack size: 8388608.
Thread 2 is at 0x7fde46a99e40.
Thread 2 stack size: 8388608.

Stack address differences between threads:
Thread 0 - Thread 1: 8392704 bytes
Thread 1 - Thread 2: 8392704 bytes
diagram of a
            process's layout in memory as a vertical rectangle with
            the high address at the top, the stack growing down
            and three thread stacks adjacent to each other.
$ uname -srm
Linux 6.8.0-53-generic x86_64
$ ./a.out
argc at 0x7fffa00cd858
argv at 0x7fffa00cd850
envp at 0x7fffa00cd848
main at 0x7fffa00cd820

Guard size:    4096
Stack size: 8388608

Thread 0 is at 0x7316083ffeb0.
Thread 0 stack size: 8388608.
Thread 1 is at 0x7316079ffeb0.
Thread 1 stack size: 8388608.
Thread 2 is at 0x731606fffeb0.
Thread 2 stack size: 8388608.

Stack address differences between threads:
Thread 0 - Thread 1: 10485760 bytes
Thread 1 - Thread 2: 10485760 bytes
diagram of a
            process's layout in memory as a vertical rectangle with
            the high address at the top, the stack growing down
            and three thread stacks adjacent to each other, but
            with a fixed size gap.
$ uname -orm
Darwin 24.6.0 arm64
$ ./a.out 
argc at 0x16ee72f40
argv at 0x16ee72f38
envp at 0x16ee72f30
main at 0x16ee72f68

Guard size:   16384
Stack size: 8372224

Thread 1 is at 0x16ef86fb0.
Thread 1 stack size: 536576.
Thread 0 is at 0x16eefafb0.
Thread 0 stack size: 536576.
Thread 2 is at 0x16f012fb0.
Thread 2 stack size: 536576.

Stack address differences between threads:
Thread 0 - Thread 1: -573440 bytes
Thread 1 - Thread 2: -573440 bytes
diagram of a
            process's layout in memory as a vertical rectangle with
            the high address at the top, the stack growing down
            and three thread stacks adjacent to each other, but
            above the main stack.
$ uname -mrsp
OpenBSD 7.7 arm64 aarch64
$ ./a.out
argc at 0x61c1e61fa8
argv at 0x61c1e61fa0
envp at 0x61c1e61f98
main at 0x61c1e61f70

Guard size:    4096
Stack size: 4194304

Thread 0 is at 0x1f919a6fc0.
Thread 1 is at 0x1f5a3f5e60.
Thread 0 stack size: 524288.
Thread 1 stack size: 524288.
Thread 2 is at 0x1f459cfb70.
Thread 2 stack size: 524288.

Stack address differences between threads:
Thread 0 - Thread 1: 928715104 bytes
Thread 1 - Thread 2: 346186480 bytes
diagram of a
            process's layout in memory as a vertical rectangle with
            the high address at the top, the stack growing down
            and three thread stacks placed at unpredictable offsets.
$ uname -mrsp
FreeBSD 15.0-BETA2 arm64 aarch64
argc at 0x80b7da78
argv at 0x80b7da70
envp at 0x80b7da68
main at 0x80b7da48

Guard size:       4096
Stack size: 1073741824

Thread 2 is at 0x8636cfa0.
Thread 2 stack size: 2097152.
Thread 1 is at 0x85f4ffa0.
Thread 1 stack size: 2097152.
Thread 0 is at 0x84981fa0.
Thread 0 stack size: 2097152.

Stack address differences between threads:
Thread 0 - Thread 1: -22863872 bytes
Thread 1 - Thread 2: -4313088 bytes
diagram of a
            process's layout in memory as a vertical rectangle with
            the high address at the top, the stack growing down
            and three thread stacks at unpredictable
            offsets above the main stack.

If you've noticed that some of the addresses there don't add up quite as neatly as the illustration on the right shows, well, we glossed over a few details here to illustrate the point. Depending on your OS, you can inspect all the details via, e.g., /proc/self/maps or (on macOS) use vmmap(1).

Thread Local Storage

Looking at the placement of the thread stacks, another question that arises is where each stack saves its registers (stack pointers, program counters, link registers, what have you). Due to context switching, all the details of the thread must be saved somewhere, and we might have any number of threads at runtime, but as we saw above, sometimes the threads are right below one another -- so where do those go?

To investigate this, we are looking for the Thread Control Block (TCB), which in turn is dynamically allocated with the Thread Local Storage (TLS) at thread creation time together with the space for the stack. The address of the TLS (no, not that one) is found in the thread pointer register (%fs on x86_64, %tpdir_el0 on arm64); if you want to inspect it, you'll have to inline the assembly call to pull the address from the register. (Combined with the use of some non-portable functions to get a thread's address, the resulting code shows why writing cross-platform code using POSIX threads is not much fun.)

Let's again see what this looks like on the different platforms:

$ uname -srmp
NetBSD 10.1 evbarm aarch64
$ clang -Wall -Wextra tcb.c -lpthread
$ ./a.out

[Main]
  argc at             : 0xffffffb8cd58
  argv at             : 0xffffffb8cd50
  envp at             : 0xffffffb8cd48
  Main stack start    : 0xffffffb8cd40
  Malloc'd pointer p2 : 0xf9d765d82400
  Malloc'd pointer p1 : 0xf9d765d82000

[Thread 01]
  T01 TCB address     : 0xf9d765960000
  T01 TCB/stack offset: +9830400
  T01 Stack start     : 0xf9d765000000
  T01 Thread arg      : 0xf9d764ffffc8
  T01 Thread local var: 0xf9d764ffffc4
  T01 Stack size      : 8388608 bytes
  T01 Stack low addr  : 0xf9d764800000

[Thread 02]
  T02 TCB address     : 0xf9d765962000
  T02 TCB/stack offset: +18292736
  T02 Stack start     : 0xf9d7647f0000
  T02 Thread arg      : 0xf9d7647effc8
  T02 Thread local var: 0xf9d7647effc4
  T02 Stack size      : 8388608 bytes
  T02 Stack low addr  : 0xf9d763ff0000

[...]
diagram of a
            process's layout in memory showing the location
            of the TCB below the heap and the thread stack
            frames below that.
$ uname -srm
Linux 5.15.0-119-generic x86_64
$ ./a.out

[Main]
  argc at             : 0x7ffec360f1b8
  argv at             : 0x7ffec360f1b0
  envp at             : 0x7ffec360f1a8
  Main stack start    : 0x7ffec360f1a0
  Malloc'd pointer p2 : 0x559a43d732c0
  Malloc'd pointer p1 : 0x559a43d712b0

[Thread 2]
  T02 TCB address     : 0x7f19a92fa640
  T02 TCB/stack offset: -2496
  T02 Stack start     : 0x7f19a92fb000
  T02 Thread arg      : 0x7f19a92f9e20
  T02 Thread local var: 0x7f19a92f9e1c
  T02 Stack size      : 8388608 bytes
  T02 Stack low addr  : 0x7f19a8afb000

[Thread 1]
  T01 TCB address     : 0x7f19a9afb640
  T01 TCB/stack offset: -2496
  T01 Stack start     : 0x7f19a9afc000
  T01 Thread arg      : 0x7f19a9afae20
  T01 Thread local var: 0x7f19a9afae1c
  T01 Stack size      : 8388608 bytes
  T01 Stack low addr  : 0x7f19a92fc000

[...]
diagram of a
            process's layout in memory showing the location
            of the TCB on each thread's stacks adjacent to
            one another and above the heap.
$ uname -srm
Linux 6.8.0-53-generic x86_64
$ ./a.out

[Main]
  argc at             : 0x7ffcaec786e8
  argv at             : 0x7ffcaec786e0
  envp at             : 0x7ffcaec786d8
  Main stack start    : 0x7ffcaec786d0
  Malloc'd pointer p2 : 0x61d57a7892c0
  Malloc'd pointer p1 : 0x61d57a7872b0

[Thread 1]
  T01 TCB address     : 0x7555ef2006c0
  T01 TCB/stack offset: -2368
  T01 Stack start     : 0x7555ef201000
  T01 Thread arg      : 0x7555ef1ffe90
  T01 Thread local var: 0x7555ef1ffe8c
  T01 Stack size      : 8388608 bytes
  T01 Stack low addr  : 0x7555eea01000

[Thread 2]
  T02 TCB address     : 0x7555ee8006c0
  T02 TCB/stack offset: -2368
  T02 Stack start     : 0x7555ee801000
  T02 Thread arg      : 0x7555ee7ffe90
  T02 Thread local var: 0x7555ee7ffe8c
  T02 Stack size      : 8388608 bytes
  T02 Stack low addr  : 0x7555ee001000
[...]
diagram of a
            process's layout in memory showing the location
            of the TCB on each thread's stacks adjacent to
            one another and above the heap.
$ uname -orm
Darwin 24.6.0 arm64
$ ./a.out

[Main]
  argc at             : 0x00016f2c6ee0
  argv at             : 0x00016f2c6ed8
  envp at             : 0x00016f2c6ed0
  Main stack start    : 0x00016f2c6ec8
  Malloc'd pointer p2 : 0x00014980cc00
  Malloc'd pointer p1 : 0x00014980c800

[Thread 1]
  T01 TCB address     : 0x00016f34f0e0
  T01 TCB/stack offset: +224
  T01 Stack start     : 0x00016f34f000
  T01 Thread arg      : 0x00016f34ef60
  T01 Thread local var: 0x00016f34ef5c
  T01 Stack size      : 536576 bytes
  T01 Stack low addr  : 0x00016f2cc000

[Thread 2]
  T02 TCB address     : 0x00016f3db0e0
  T02 TCB/stack offset: +224
  T02 Stack start     : 0x00016f3db000
  T02 Thread arg      : 0x00016f3daf60
  T02 Thread local var: 0x00016f3daf5c
  T02 Stack size      : 536576 bytes
  T02 Stack low addr  : 0x00016f358000
[...]
diagram of a
            process's layout in memory showing the location
            of the TCB on each thread's stacks atop
            the main stack frame and above the heap.
$ uname -mrsp
OpenBSD 7.7 arm64 aarch64
$ ./a.out

[Main]
  argc at             : 0x006d954a03f8
  argv at             : 0x006d954a03f0
  envp at             : 0x006d954a03e8
  Main stack start    : 0x006d954a03e0
  Malloc'd pointer p2 : 0x000d3e655c00
  Malloc'd pointer p1 : 0x000d3e655400

[Thread 1]
  T01 TCB address     : 0x000c78e488f8
  T01 TCB/stack offset: -194971400
  T01 Stack start     : 0x000c84839000
  T01 Thread arg      : 0x000c84838a78
  T01 Thread local var: 0x000c84838a74
  T01 Stack size      : 524288 bytes
  T01 Stack low addr  : 0x000c847b9000

[Thread 2]
  T02 TCB address     : 0x000d05126af8
  T02 TCB/stack offset: -407684360
  T02 Stack start     : 0x000d1d5f3000
  T02 Thread arg      : 0x000d1d5f27c8
  T02 Thread local var: 0x000d1d5f27c4
  T02 Stack size      : 524288 bytes
  T02 Stack low addr  : 0x000d1d573000
[...]
diagram of a
            process's layout in memory showing the location
            of the TCB and thread stacks randomly
            above and below the heap.
$ uname -mrsp
FreeBSD 15.0-BETA2 arm64 aarch64
$ ./a.out

[Main]
  argc at             : 0x000080dbe288
  argv at             : 0x000080dbe280
  envp at             : 0x000080dbe278
  Main stack start    : 0x000080dbe270
  Malloc'd pointer p2 : 0x2241fc214400
  Malloc'd pointer p1 : 0x2241fc214000

[Thread 6]
  T06 TCB address     : 0x2241fbe39010
  T06 TCB/stack offset: +37664523825168
  T06 Stack start     : 0x000087536000
  T06 Thread arg      : 0x000087535fa8
  T06 Thread local var: 0x000087535fa4
  T06 Stack size      : 2097152 bytes
  T06 Stack low addr  : 0x000087336000

[Thread 1]
  T01 TCB address     : 0x2241fbe25010
  T01 TCB/stack offset: +37664588726288
  T01 Stack start     : 0x00008373d000
  T01 Thread arg      : 0x00008373cfa8
  T01 Thread local var: 0x00008373cfa4
  T01 Stack size      : 2097152 bytes
  T01 Stack low addr  : 0x00008353d000
[...]
diagram of a
            process's layout in memory showing the location
            of the TCB in order above the randomly
            places thread stacks, but below the heap.
$ uname -msp
SunOS i86pc i386
$ . /etc/os-release
$ echo ${PRETTY_NAME}
OmniOS Community Edition v11 r151044
$ ./a.out

[Main]
  argc at             : 0xfffffc7fffdfcf88
  argv at             : 0xfffffc7fffdfcf80
  envp at             : 0xfffffc7fffdfcf78
  Main stack start    : 0xfffffc7fffdfcf70
  Malloc'd pointer p2 : 0x000000886840
  Malloc'd pointer p1 : 0x000000886430

[Thread 1]
  T01 TCB address     : 0xfffffc7feea10240
  T01 TCB/stack offset: +9286208
  T01 Stack start     : 0xfffffc7fee135000
  T01 Thread arg      : 0xfffffc7fee134f80
  T01 Thread local var: 0xfffffc7fee134f7c
  T01 Stack size      : 2088960 bytes
  T01 Stack low addr  : 0xfffffc7fedf37000

[Thread 8]
  T08 TCB address     : 0xfffffc7feea13a40
  T08 TCB/stack offset: +12732992
  T08 Stack start     : 0xfffffc7feddef000
  T08 Thread arg      : 0xfffffc7feddeef80
  T08 Thread local var: 0xfffffc7feddeef7c
  T08 Stack size      : 2088960 bytes
  T08 Stack low addr  : 0xfffffc7fedbf1000
[...]
diagram of a
            process's layout in memory showing the location
            of the TCB in order above the randomly
            places thread stacks.

It may be easier for you to view the relative placement of each element by running the command as ./a.out | sort -r -k2 -t: | grep 0x.

Observations of interest:

  • NetBSD appears to place 4 TCB's underneath one another in one location, then place another 4 at a randomized location below that, then place the first thread at a randomized location below all TCBs, then place all subsequent threads underneath that, one next to the other.
  • Linux places the TCB into the thread's stack, shortly below the thread stack's high address, above the thread's argument and the local variables.
  • macOS places the TCB at 224 bytes above the thread stack's high address.
  • OpenBSD completely randomizes the placement of both the TCBs as well as the thread stacks, with some TCBs and some thread stacks ending up above the heap, and some below.
  • FreeBSD (on arm64, anyway) places the heap way above the main stack, the TCBs growing upwards below the heap, and the thread stacks below that, but still above the main stack.
  • OmniOS places all TCBs underneath one another, but then randomizes the placement of each thread's stack. In addition, OmniOS may reuse a given thread's stack location if the thread that was placed there first has already terminated by the time another thread is spawned.

What else?

As we've seen here, the layout of virtual memory for a given process really can be quite a bit different from the simplified illustrations we use when explaining core concepts. It's useful to periodically be reminded of the fact that, as so often in Computer Science, we're dealing with layers of abstraction, and that the implementations of such abstractions may well vary from operating system to operating system.

While digging into all of this, I noticed a few other angles worth investigating and explaining, including the way arguments are passed and how to better understand shared memory. But this blog post is already too long, so I'll get back to those topics another time.

A cropped section
of Dali's 'Soft Monster in Angelic Landscape'

October 29th, 2025


Footnotes:

[1] I'm not sure what causes the observed difference; it could have to do with a change in the stack_guard_gap between the two kernel versions, or with the use of a shadow stack in the newer kernel, or simply with some alignment of the stacks, but honestly, I'm really just guessing here.


Links:


← [Sites using PQC (September 2025)]
[Process Memory Sharing] →

[homepage]  [blog]  [[email protected]]  [@jschauma]  [RSS]