<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Digital Cabin - linux</title>
<link rel="alternate" href="https://cabin.digital"/>
<link rel="self" href="https://cabin.digital/tags/linux.xml"/>
<id>https://cabin.digital/</id>
<updated>2025-06-17T23:00:00Z</updated>
<generator uri="https://cabin.digital/golog.html" version="1.10.1beta">golog</generator>
<subtitle>Comfy cabin in a rough digital ocean</subtitle>
<author>
<name>dweller</name>
</author>
<category term="sh"/>
<category term="bench"/>
<category term="programming"/>
<category term="programing"/>
<category term="dev"/>
<category term="c"/>
<category term="linux"/>
<category term="x86"/>
<category term="asm"/>
<entry>
<id>https://cabin.digital/tags/linux.xml:::https://cabin.digital/log/small_elfs.html</id>
<link rel="alternate" href="https://cabin.digital/log/small_elfs.html"/>
<title>Small ELFs</title>
<updated>2025-06-17T23:00:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programing"/>
<category term="dev"/>
<category term="c"/>
<category term="linux"/>
<category term="x86"/>
<category term="asm"/>
<category term="sh"/>
<category term="bench"/>
<content type="html"><![CDATA[<p>In a <a href="/log/flatbin.html">previous</a> log I mentioned that I was trying to make small
<a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF</a>s. Well that also got me thinking
how big is an average ELF. So I spent too much time on writing some sh(1) and awk(1) scripts, and
this is the result:</p>

<table class="scroll-indicator">
<thead>
<tr>
<th>lang</th>
<th>size</th>
<th>rundeps</th>
<th>↓total</th>
<th>type</th>
<th>arch</th>
<th>linkage</th>
<th>debug?</th>
<th>strip?</th>
<th>buildT</th>
<th>runT/100</th>
</tr>
</thead>
<tbody><tr>
<td>nasm-elf32-s</td>
<td>134</td>
<td>0</td>
<td>134</td>
<td>ELF32l</td>
<td>Intel 80386</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>6.05ms</td>
<td>134.12µs</td>
</tr>
<tr>
<td>nasm-elf32</td>
<td>141</td>
<td>0</td>
<td>141</td>
<td>ELF32l</td>
<td>Intel 80386</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>7.99ms</td>
<td>165.81µs</td>
</tr>
<tr>
<td>nasm-elf-s</td>
<td>165</td>
<td>0</td>
<td>165</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>14.65ms</td>
<td>131.77µs</td>
</tr>
<tr>
<td>nasm-elf</td>
<td>173</td>
<td>0</td>
<td>173</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>7.66ms</td>
<td>150.67µs</td>
</tr>
<tr>
<td>jmpld</td>
<td>56</td>
<td>308</td>
<td>364</td>
<td>bin</td>
<td>jmpld</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>19.09ms</td>
<td>144.06µs</td>
</tr>
<tr>
<td>bld</td>
<td>54</td>
<td>493</td>
<td>547</td>
<td>bin</td>
<td>bld</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>9.28ms</td>
<td>233.59µs</td>
</tr>
<tr>
<td>nasm</td>
<td>8.30KB</td>
<td>0</td>
<td>8.30KB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>14.62ms</td>
<td>139.83µs</td>
</tr>
<tr>
<td>c-gcc-nlibc</td>
<td>8.72KB</td>
<td>0</td>
<td>8.72KB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>50.08ms</td>
<td>180.19µs</td>
</tr>
<tr>
<td>c-musl-st</td>
<td>13.45KB</td>
<td>0</td>
<td>13.45KB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>41.59ms</td>
<td>190.95µs</td>
</tr>
<tr>
<td>jmpld-c</td>
<td>56</td>
<td>17.45KB</td>
<td>17.50KB</td>
<td>bin</td>
<td>jmpld</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>83.34ms</td>
<td>164.79µs</td>
</tr>
<tr>
<td>c-gcc-st</td>
<td>648.45KB</td>
<td>0</td>
<td>648.45KB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>109.54ms</td>
<td>304.16µs</td>
</tr>
<tr>
<td>c-clang-st</td>
<td>656.55KB</td>
<td>0</td>
<td>656.55KB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>141.96ms</td>
<td>301.99µs</td>
</tr>
<tr>
<td>oksh-musl-st</td>
<td>42</td>
<td>1010.13KB</td>
<td>1010.17KB</td>
<td>script</td>
<td>oksh</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>4.07ms</td>
<td>571.29µs</td>
</tr>
<tr>
<td>go-s</td>
<td>1.36MB</td>
<td>0</td>
<td>1.36MB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>206.59ms</td>
<td>1.08ms</td>
</tr>
<tr>
<td>c++gcc-st</td>
<td>2.10MB</td>
<td>0</td>
<td>2.10MB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>707.85ms</td>
<td>560.04µs</td>
</tr>
<tr>
<td>go</td>
<td>2.10MB</td>
<td>0</td>
<td>2.10MB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>yes</td>
<td>no</td>
<td>257.03ms</td>
<td>1.07ms</td>
</tr>
<tr>
<td>c++clang-st</td>
<td>2.14MB</td>
<td>0</td>
<td>2.14MB</td>
<td>ELF64l</td>
<td>x86-64</td>
<td>static</td>
<td>no</td>
<td>yes</td>
<td>722.34ms</td>
<td>563.63µs</td>
</tr>
<tr>
<td>rust-nstd-s</td>
<td>14.05KB</td>
<td>13.45MB</td>
<td>13.46MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>163.81ms</td>
<td>598.39µs</td>
</tr>
<tr>
<td>c-gcc-s-lto</td>
<td>14.14KB</td>
<td>13.45MB</td>
<td>13.47MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>104.36ms</td>
<td>479.64µs</td>
</tr>
<tr>
<td>c-gcc-s</td>
<td>14.14KB</td>
<td>13.45MB</td>
<td>13.47MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>66.75ms</td>
<td>472.23µs</td>
</tr>
<tr>
<td>c-clang-s</td>
<td>14.16KB</td>
<td>13.45MB</td>
<td>13.47MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>95.86ms</td>
<td>458.92µs</td>
</tr>
<tr>
<td>rust-nstd</td>
<td>19.70KB</td>
<td>13.45MB</td>
<td>13.47MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>82.25ms</td>
<td>457.37µs</td>
</tr>
<tr>
<td>c-gcc</td>
<td>19.88KB</td>
<td>13.45MB</td>
<td>13.47MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>63.22ms</td>
<td>621.80µs</td>
</tr>
<tr>
<td>c-clang</td>
<td>19.89KB</td>
<td>13.45MB</td>
<td>13.47MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>90.55ms</td>
<td>554.57µs</td>
</tr>
<tr>
<td>c-gcc-g</td>
<td>48.71KB</td>
<td>13.45MB</td>
<td>13.50MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>77.53ms</td>
<td>511.37µs</td>
</tr>
<tr>
<td>sh-unix</td>
<td>23</td>
<td>13.57MB</td>
<td>13.57MB</td>
<td>script</td>
<td>dash</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>3.11ms</td>
<td>729.54µs</td>
</tr>
<tr>
<td>sh</td>
<td>34</td>
<td>13.57MB</td>
<td>13.57MB</td>
<td>script</td>
<td>dash</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>2.71ms</td>
<td>721.37µs</td>
</tr>
<tr>
<td>rust-s</td>
<td>312.67KB</td>
<td>13.63MB</td>
<td>13.93MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>2.97s</td>
<td>658.44µs</td>
</tr>
<tr>
<td>rust-s-nlto</td>
<td>360.77KB</td>
<td>13.63MB</td>
<td>13.98MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>160.25ms</td>
<td>753.09µs</td>
</tr>
<tr>
<td>rust</td>
<td>436.70KB</td>
<td>13.63MB</td>
<td>14.05MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>139.50ms</td>
<td>797.12µs</td>
</tr>
<tr>
<td>c++gcc-s</td>
<td>14.15KB</td>
<td>20.21MB</td>
<td>20.22MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>549.65ms</td>
<td>1.22ms</td>
</tr>
<tr>
<td>c++clang-s</td>
<td>14.19KB</td>
<td>20.21MB</td>
<td>20.22MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>no</td>
<td>yes</td>
<td>612.06ms</td>
<td>1.15ms</td>
</tr>
<tr>
<td>c++clang</td>
<td>20.21KB</td>
<td>20.21MB</td>
<td>20.23MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>558.83ms</td>
<td>1.14ms</td>
</tr>
<tr>
<td>c++gcc</td>
<td>20.40KB</td>
<td>20.21MB</td>
<td>20.23MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>529.38ms</td>
<td>1.22ms</td>
</tr>
<tr>
<td>c++clang-g</td>
<td>35.40KB</td>
<td>20.21MB</td>
<td>20.24MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>629.34ms</td>
<td>1.41ms</td>
</tr>
<tr>
<td>c++gcc-g</td>
<td>148.93KB</td>
<td>20.21MB</td>
<td>20.35MB</td>
<td>ELF64l-PIE</td>
<td>x86-64</td>
<td>dynamic</td>
<td>yes</td>
<td>no</td>
<td>622.06ms</td>
<td>1.17ms</td>
</tr>
<tr>
<td>tail</td>
<td>32</td>
<td>20.55MB</td>
<td>20.55MB</td>
<td>script</td>
<td>tail</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>3.19ms</td>
<td>724.24µs</td>
</tr>
<tr>
<td>bash</td>
<td>40</td>
<td>21.89MB</td>
<td>21.89MB</td>
<td>script</td>
<td>bash</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>2.16ms</td>
<td>2.01ms</td>
</tr>
<tr>
<td>zsh</td>
<td>39</td>
<td>25.42MB</td>
<td>25.42MB</td>
<td>script</td>
<td>zsh</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>3.19ms</td>
<td>2.45ms</td>
</tr>
<tr>
<td>perl</td>
<td>42</td>
<td>27.96MB</td>
<td>27.96MB</td>
<td>script</td>
<td>perl</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>2.65ms</td>
<td>1.71ms</td>
</tr>
<tr>
<td>ruby</td>
<td>40</td>
<td>29.07MB</td>
<td>29.07MB</td>
<td>script</td>
<td>ruby</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>5.30ms</td>
<td>57.78ms</td>
</tr>
<tr>
<td>py3</td>
<td>45</td>
<td>29.17MB</td>
<td>29.17MB</td>
<td>script</td>
<td>python3.13</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>4.94ms</td>
<td>14.39ms</td>
</tr>
<tr>
<td>nodejs</td>
<td>49</td>
<td>177.31MB</td>
<td>177.31MB</td>
<td>script</td>
<td>node</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>7.88ms</td>
<td>26.03ms</td>
</tr>
<tr>
<td>c-musl-s</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>FAIL</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>c-musl</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>FAIL</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>rust-nstd-st</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>FAIL</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>rust-st</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>FAIL</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>rust-tiny</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>FAIL</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</tbody></table>

<p>Explanation of naming:</p>

<ul>
<li>-g = debug;</li>
<li>-s = small (stripped + something like -Os if available and I knew of it)</li>
<li>-st = static build</li>
<li>-noX = without X, like -nolibc = without any libc</li>
</ul>

<p>Some things failed because:</p>

<ol>
<li>I only had static musl built, so dynamic failed;</li>
<li>I didn&rsquo;t know how to coax some rust builds since I don&rsquo;t use it.</li>
</ol>

<p><strong>A big caveat regarding the &ldquo;runtime&rdquo;.</strong>
The &ldquo;runtime&rdquo; benchmarks are more of a benchmark of shell parsing, execve(2) and fork(2)ing and
runtime start up of the program, since the program doesn&rsquo;t really do anything. Still it does show
how slow the startup is, I guess it matters for shell scripts/tools.</p>

<p>You can check out the source for this all at my
<a href="https://cabin.digital/git/elfsizes.git/tree/">git repo</a>.</p>

<p>IDK if it has any use except to, maybe, show that static binaries are not as bloated as people often
think. (Because any decent lib will be split into many .o and only used code will be linked. + LTO)
Which is just another reason to use them over the death of us all that is dynamic linking.</p>

<p><script src="/js/tablesort.js"></script></p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/linux.xml:::https://cabin.digital/log/flatbin.html</id>
<link rel="alternate" href="https://cabin.digital/log/flatbin.html"/>
<title>Loading flat binaries in Linux</title>
<updated>2025-05-22T23:14:36Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programming"/>
<category term="dev"/>
<category term="c"/>
<category term="linux"/>
<category term="x86"/>
<category term="asm"/>
<content type="html"><![CDATA[<p>Do you remember the good old times when things were simple and you could just load some binary into
RAM and jump into it? Yea, me neither. But that time existed. These days things are complicated,
often for no reason, but this isn&rsquo;t one of those cases. That said, don&rsquo;t you wanna do it? Come on
it&rsquo;s fun right? Just assemble and load!</p>

<p>I accidentally came to this idea when trying to do the smallest ELF &ldquo;challenge&rdquo; on Linux. You know,
turn a 1.4MB (a whole fucking 3½&rdquo; HD floppy!) abomination that is Go&rsquo;s &ldquo;Hello world&rdquo; program into
the smallest equivalent? Anyways, <a href="https://cirosantilli.com/elf-hello-world">plenty</a>
of <a href="https://github.com/tchajed/minimal-elf">people</a> wrote
<a href="https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html">about</a> that better than I can, but
maybe I will make a write up about <em>my own</em> journey next time.  (Spoiler alert, I got it to 91 bytes
for just <code>exit(0)</code>.) What I want to write about is how I had a thought that if you disregard loader
size (which people often do for dynamically linked executables) then If I craft my own loader I can
strip ELF and just load a flat binary!</p>

<p>First, our payload, a simple program that just prints &ldquo;Nothing happens.&rdquo; shall do. Let&rsquo;s call it
<code>xyzzy</code>. Let&rsquo;s write a minimal x86_64 assembly program for it, I&rsquo;ll use
<a href="https://www.nasm.us/"><code>nasm</code></a> as my assembler of choice this time:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>
<span class='golog-keyword'>db</span> <span class='golog-string'>&#39;#!./bld&#39;</span>,0xA

start:
       xor edi, edi
       inc edi
       lea rsi, [rel msg]
       mov edx, len
       mov eax, edi
       syscall

       dec edi
       mov eax, <span class='golog-number'>60</span>
       syscall

msg: <span class='golog-keyword'>db</span> <span class='golog-string'>&#34;Nothing happens.&#34;</span>,0xA
len: <span class='golog-keyword'>equ</span> $-msg
</code></pre></div>
</div>

<p>Build it with <code>nasm -fbin xyzzy.asm</code> and <code>chmod +x xyzzy</code> it.
The very first thing you&rsquo;ll notice is the embedded &ldquo;#!./bld&rdquo; line. Since Linux will go through the
list of
its own loaders embedded into kernel and try them all, one of them will look for &ldquo;#!&rdquo; and execute
the path after it (till &lsquo;\n&rsquo; aka 0x0A) instead, passing the path of the original executable
(that we tried to launch)
to it and let it handle it in user space. This is how your Python, Perl, Ruby and,
of course, shell scripts work. But as you <em>might</em> know, <a href="/log/celf.html#how-does-it-work">POSIX shell scripts can work even without
that</a>. This saves us the trouble of writing our loader in kernel
space. Although I think I would like to do that anyways, and maybe shave some more bytes. Since
in real world, you&rsquo;d have &ldquo;#!/bin/bld&rdquo; at least, which is 11 bytes (remember the &lsquo;\n&rsquo;). Instead we
could just either look at filename or a smaller magic number. But I am getting sidetracked.</p>

<p>Okay let&rsquo;s write the simplest loader I could think of, I&rsquo;ll use C for now:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#if</span> <span class='golog-number'>0</span>
    exe=<span class='golog-string'>&#34;$(basename &#34;</span>$<span class='golog-number'>0</span><span class='golog-string'>&#34; | cut -d. -f1)&#34;</span>
    CC=musl-gcc
    CFLAGS=<span class='golog-string'>&#34;-pipe -<span class='golog-keyword'>static</span> -std=c89 -Os -s -Wall -Wextra&#34;</span>
    exec $CC $CFLAGS <span class='golog-string'>&#34;$<span class='golog-number'>0</span>&#34;</span> -o <span class='golog-string'>&#34;$exe&#34;</span>
<span class='golog-keyword'>#endif</span>

<span class='golog-keyword'>#include</span> &lt;stdlib.h&gt;
<span class='golog-keyword'>#include</span> &lt;stdio.h&gt;
<span class='golog-keyword'>#include</span> &lt;errno.h&gt;
<span class='golog-keyword'>#include</span> &lt;unistd.h&gt;
<span class='golog-keyword'>#include</span> &lt;fcntl.h&gt;
<span class='golog-keyword'>#include</span> &lt;sys/stat.h&gt;
<span class='golog-keyword'>#include</span> &lt;sys/mman.h&gt;


<span class='golog-keyword'>typedef</span> <span class='golog-keyword'>void</span> (*jmp_t)(<span class='golog-keyword'>void</span>);

<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    <span class='golog-keyword'>int</span> fd = -<span class='golog-number'>1</span>;
    off_t i = <span class='golog-number'>0</span>;
    <span class='golog-keyword'>struct</span> stat st = {<span class='golog-number'>0</span>};
    <span class='golog-keyword'>char</span>* newcode = <span class='golog-keyword'>NULL</span>;

    <span class='golog-keyword'>if</span>(argc &lt; <span class='golog-number'>2</span>) exit(<span class='golog-number'>1</span>);

    fd = open(argv[<span class='golog-number'>1</span>], O_RDONLY | O_NOATIME);
    <span class='golog-keyword'>if</span>(fd &lt; <span class='golog-number'>0</span>)
    {
        perror(<span class='golog-string'>&#34;open&#34;</span>);
        exit(<span class='golog-number'>2</span>);
    }

    <span class='golog-keyword'>if</span>(fstat(fd, &amp;st) != <span class='golog-number'>0</span>)
    {
        perror(<span class='golog-string'>&#34;fstat&#34;</span>);
        exit(<span class='golog-number'>3</span>);
    }

    newcode = mmap(<span class='golog-keyword'>NULL</span>, st.st_size, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, <span class='golog-number'>0</span>);
    <span class='golog-keyword'>if</span>(newcode == MAP_FAILED)
    {
        perror(<span class='golog-string'>&#34;mmap&#34;</span>);
        exit(<span class='golog-number'>4</span>);
    }

    close(fd);

    <span class='golog-keyword'>for</span>(i = <span class='golog-number'>0</span>; i &lt; st.st_size; i++)
    {
        <span class='golog-keyword'>if</span>(newcode[i] == <span class='golog-string'>&#39;\n&#39;</span>)
        {
            jmp_t jmp = (jmp_t)((<span class='golog-keyword'>void</span>*)(newcode + i + <span class='golog-number'>1</span>));
            jmp();
            exit(<span class='golog-number'>0</span>);
        }
    }

    fprintf(stderr, <span class='golog-string'>&#34;loader: Failed to skip shebang.\n&#34;</span>);
    exit(<span class='golog-number'>5</span>);
}
</code></pre></div>
</div>

<p>As you can see, I use my self-build trick described <a href="/log/celf.html">previously</a> to build the
loader. The idea here is simple, we open the file, map it into an executable memory, find the
end of the shebang (#!) line and jump there. Very crude. The child inherits the loaders execution
environment, including process name. So it won&rsquo;t look nice in <code>ps</code>/<code>top</code>. But it&rsquo;s small and simple.</p>

<p>Before we turn it into more proper loader, let&rsquo;s check something out:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ./bld.c # build
$ ./xyzzy
Nothing happens.
$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 18K May 22 23:41 bld
-rwxr-xr-x 1 dwlr dwlr  54 May 22 23:41 xyzzy
</code></pre></div>
</div>

<p>First, hell yea it works! :) Second, the <code>xyzzy</code> executable is only 54 bytes! But look at our
loader! Oof. That ain&rsquo;t smol, and remember I started this in the spirit of minimal ELF file. So
let&rsquo;s fix this first. All I am going to do is rewrite this program in nasm just like we wrote
<code>xyzzy</code>, just make it a proper ELF:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>

st:   <span class='golog-keyword'>equ</span> <span class='golog-number'>144</span>               ; sizeof(struct stat)
stsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>48</span>                ; offsetof(struct stat, st_size)

<span class='golog-keyword'>section</span> .text
<span class='golog-keyword'>global</span> _start
_start:
        cmp DWORD [rsp], <span class='golog-number'>2</span>  ; [rsp] = argc
        jge .<span class='golog-number'>1</span>              ; argc &lt; <span class='golog-number'>2</span> ?
            xor edi, edi
            inc edi
            jmp exit
    .<span class='golog-number'>1</span>:
        mov eax, <span class='golog-number'>2</span>          ; <span class='golog-number'>2</span> = open(<span class='golog-number'>2</span>)
        lea rdi, [rsp+<span class='golog-number'>16</span>]   ; [rsp+<span class='golog-number'>16</span>] = argv[<span class='golog-number'>1</span>]
        mov rdi, [rdi]
        xor esi, esi        ; O_RDONLY = <span class='golog-number'>0</span>
        xor edx, edx        ; no mode, not creating
        syscall

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>2</span>              ; fd &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>2</span>
            jmp exit
    .<span class='golog-number'>2</span>:
        mov r15, rax        ; save fd

        sub rsp, st         ; push struct stat onto a stack

        mov rdi, r15        ; fd
        mov rsi, rsp        ; struct stat*
        mov eax, <span class='golog-number'>5</span>          ; <span class='golog-number'>5</span> = fstat(<span class='golog-number'>2</span>)
        syscall

        cmp eax, <span class='golog-number'>0</span>
        je .<span class='golog-number'>3</span>               ; ret != <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>3</span>
            jmp exit
    .<span class='golog-number'>3</span>:
        xor edi, edi        ; NULL
        mov rsi, [rsp+stsz] ; st.st_size
        mov edx, <span class='golog-number'>1|2</span>|4      ; PROT_READ | PROT_WRITE | PROT_EXEC
        mov r10, <span class='golog-number'>2</span>          ; MAP_PRIVATE
        mov r8,  r15        ; fd
        xor r9,  r9         ; offset
        mov eax, <span class='golog-number'>9</span>          ; <span class='golog-number'>9</span> = mmap(<span class='golog-number'>2</span>)
        syscall

        cmp rax, -<span class='golog-number'>1</span>
        jne .<span class='golog-number'>4</span>              ;  ret == MAP_FAILED
            mov edi, <span class='golog-number'>4</span>
            jmp exit
    .<span class='golog-number'>4</span>:
        mov r14, rax        ; save new memory addr

        mov rdi, r15        ; fd, closing it so <span class='golog-string'>&#34;child&#34;</span> doesn&#39;t have an extra fd, takes space tho
        mov eax, <span class='golog-number'>3</span>          ; <span class='golog-number'>3</span> = close(<span class='golog-number'>2</span>)
        syscall             ; not checking this one, since if I couldn&#39;t close, who cares

        cld                 ; clear direction (for scas*)
        mov al,  0xA        ; needle, newline 0xA = <span class='golog-string'>&#39;\n&#39;</span>
        mov rdi, r14        ; hay
        mov rcx, [rsp+stsz] ; hay size
        repne scasb         ; search for it

        cmp rcx, <span class='golog-number'>0</span>
        je .<span class='golog-number'>5</span>               ; found
            add rsp, st     ; pop struct stat from stack
            jmp rdi         ; <span class='golog-string'>&#34;load&#34;</span> <span class='golog-string'>&#34;child&#34;</span>
            mov edi, <span class='golog-number'>0</span>
            jmp exit
    .<span class='golog-number'>5</span>:
        mov edi, <span class='golog-number'>5</span>
exit:
        mov eax, <span class='golog-number'>60</span>         ; <span class='golog-number'>60</span> = exit(<span class='golog-number'>2</span>)
        syscall
</code></pre></div>
</div>

<p>And build like this:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
</span></pre></div>
    <div class="golog-lines"><pre><code>$ nasm -felf64 bld.asm
$ ld -static -nostdlib bld.o -o bld
$ ./xyzzy
Nothing happens.
$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 5.0K May 22 23:55 bld
-rwxr-xr-x 1 dwlr dwlr   54 May 22 23:41 xyzzy
$ strip bld
-rwxr-xr-x 1 dwlr dwlr 4.4K May 22 23:56 bld
</code></pre></div>
</div>

<p>It&rsquo;s literally the same program as the C one, but in assembly. You can read the comments if you&rsquo;re
bad at assembly like me, and I wasn&rsquo;t trying too hard to optimize its code size. Partially because
I don&rsquo;t really know how, since as I said, I am not an assembly wizard. 4.4K! Not bad, but we can do
way better. <code>gcc</code> and <code>ld</code> put a lot of cruft we don&rsquo;t really need for our purposes, so let&rsquo;s ditch
<code>ld</code> and just handroll an ELF straight in the assembly! The only difference between the above
program, and the following is the ELF headers, so I will skip the actual code:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>

BASE:   <span class='golog-keyword'>equ</span> <span class='golog-number'>0x400000</span>
ENTRY:  <span class='golog-keyword'>equ</span> BASE + _start
PAGESZ: <span class='golog-keyword'>equ</span> <span class='golog-number'>0x1000</span>

elfhead:
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0x7F</span>, <span class='golog-string'>&#34;ELF&#34;</span>    ; magic
        <span class='golog-keyword'>db</span>   <span class='golog-number'>2</span>              ; <span class='golog-number'>64</span>-bit
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>              ; Little Endian
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>              ; ELF Version
        <span class='golog-keyword'>db</span>   <span class='golog-number'>3</span>              ; Linux
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>              ; ignored on Linux
<span class='golog-keyword'>times</span> <span class='golog-number'>7</span> <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>              ; padding
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>2</span>              ; executable file
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0x3E</span>           ; AMD64
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>              ; ELF Version, again
        <span class='golog-keyword'>dq</span>   ENTRY          ; program entry
        <span class='golog-keyword'>dq</span>   progheads      ; program headers offset
        <span class='golog-keyword'>dq</span>   sectheads      ; <span class='golog-keyword'>section</span> headers offset
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>0</span>              ; flags
        <span class='golog-keyword'>dw</span>   elfsz          ; size of this ELF header
        <span class='golog-keyword'>dw</span>   pgsz           ; size of <span class='golog-number'>1</span> program header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>1</span>              ; num of program headers
        <span class='golog-keyword'>dw</span>   scsz           ; size of <span class='golog-number'>1</span> <span class='golog-keyword'>section</span> header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>              ; num of secttion headers
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>              ; <span class='golog-keyword'>section</span> header string table index
elfsz: <span class='golog-keyword'>equ</span> $-elfhead

progheads:
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>              ; Load
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1|2</span>|4          ; Exec | Write | Read
        <span class='golog-keyword'>dq</span>   <span class='golog-number'>0</span>              ; offset
        <span class='golog-keyword'>dq</span>   BASE           ; virt mem addr
        <span class='golog-keyword'>dq</span>   BASE           ; phys mem addr if applic.
        <span class='golog-keyword'>dq</span>   allsz          ; file size
        <span class='golog-keyword'>dq</span>   allsz          ; mem size
        <span class='golog-keyword'>dq</span>   PAGESZ         ; alignment
pgsz: <span class='golog-keyword'>equ</span> $-progheads

sectheads: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span>
scsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span> ;$-sectheads

st:   <span class='golog-keyword'>equ</span> <span class='golog-number'>144</span>               ; sizeof(struct stat)
stsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>48</span>                ; offsetof(struct stat, st_size)

text:
_start:
    ; ... same code here as in previous code segement&#39;s _start function

textsz: <span class='golog-keyword'>equ</span> $-text
allsz: <span class='golog-keyword'>equ</span> $-elfhead
</code></pre></div>
</div>

<p>We build same as <code>xyzzy</code>: <code>nasm -fbin bld.asm</code> and <code>chmod +x bld</code> it.
If you just go to Wikipedia and find the page about
<a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELFs</a> you can see the whole format,
I just replicated the structure straight in the assembly. If you&rsquo;re totally unfamiliar with assembly
the <code>db</code>, <code>dw</code>, <code>dd</code>, <code>dq</code> define a byte, word(16 bits), double word(32bit) and quad word(64bits)
respectively straight in the resulting binary. So I just define all the fields we need starting from
the beginning of the binary. <code>equ</code> is kind of like <code>#define</code> in C. It stores the value but does not
emit it into the binary. <code>$</code> is the current location in binary, which is auto-incremented as we add
things, including code. The labels are the same as in C. Although to be honest if you&rsquo;re reading
this and know C, you must be familiar enough to get this, so IDK who I am explaining this to. :P
Search the web if you&rsquo;re totally lost.</p>

<p>So, the only things we are required to have a valid ELF are: the ELF header, that describes what
ISA (Instruction Set Architecture) and what ABI (Application Binary Interface) the ELF is for. As
well as, at least 1 Program Header that points the ELF loader where to find, where to and how to
load the executable code in the binary. Of course normal ELFs have way more than this, especially if
you have debugging info embedded into it, but this is not an article on ELFs and DWARFs.
Linux will use System V or GNU/Linux for ABI on AMD64(x86_64) ISA. Out executable is non-PIE
(Position Independent Code) so we just provide a static address as an entry point. The single
program header asks the ELF loader to load the code binary as readable, writable (don&rsquo;t really need
this, but if you&rsquo;re into self-modifying code and security breaches this is for you ;)) and
executable memory and tells how big it is. That&rsquo;s it!</p>

<p>Let&rsquo;s see the results:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 308 May 22 23:41 bld
-rwxr-xr-x 1 dwlr dwlr  54 May 22 23:41 xyzzy
</code></pre></div>
</div>

<p>Now we&rsquo;re talking! Only 308 bytes! I can live with that, but what I can&rsquo;t live with is, is it not
actually loading the binary as a new executable. It&rsquo;s more of a &ldquo;module&rdquo; loader. Which is fine, but
not what I <em>really</em> wanted. Let&rsquo;s fix that.</p>

<p>The main problem here, is that we want to load a program without using kernel&rsquo;s
<a href="https://www.man7.org/linux/man-pages/man2/execve.2.html">execve(2)</a> syscall.
You know the one that actually loads a program into memory. Why? It was already called by the shell
(usually), failed to load an ELF (since <code>xyzzy</code> isn&rsquo;t an ELF), saw &ldquo;#!&rdquo; and passed the path to <em>us</em>!
So we&rsquo;re past the execve(2) and, as I said, <code>xyzzy</code> isn&rsquo;t and ELF.</p>

<p>My first idea was to basically do execve(2) ourselves, without the kernel. Using
<a href="https://www.man7.org/linux/man-pages/man2/fork.2.html">fork(2)</a> +
<a href="https://www.man7.org/linux/man-pages/man2/ptrace.2.html">ptrace(2)</a> + reading and writing to the
<em>/proc/$pid/mem</em> I could change the child&rsquo;s memory, close fds and do general clean up. This is
essentially how <code>gdb</code> and other debuggers are able to change stuff in debugee program. But that
sounds like a lot of work, not to mention I never used ptrace(2) so I would have to learn it. And
even though that might be fun, I wasn&rsquo;t feeling it at the time. This idea came to me before I wrote
the assembly version of the loader.</p>

<p>Maybe you noticed, but the way the Program Headers are set up, I don&rsquo;t just load the code into
memory, I load the whole file. Including the ELF header. A new idea hatched:</p>

<ol>
<li>Loader is loaded into the memory with its ELF headers</li>
<li>Loader copies its ELF into some new place</li>
<li>Loader opens and appends the program to be loaded (<code>xyzzy</code> for us) after the ELF copy</li>
<li>Fix up the ELF copy size and entry point</li>
<li>execve(2) the new binary.</li>
<li>&hellip;</li>
<li>PROFIT</li>
</ol>

<p>There was one snag, execve(2) only takes a file path to the executable&hellip; Well, we could write out
the new ELF to <em>/tmp</em> and launch that, but that just feels wrong. Fortunately, since Linux 3.17
we&rsquo;ve got a new very useful syscall -
<a href="https://www.man7.org/linux/man-pages/man2/memfd_create.2.html">memfd_create(2)</a>. So we can create
a file purely in memory. I can then map it as a shared memory and basically have a view into the
file. Okay that&rsquo;s all cool and nice, but don&rsquo;t we need a file path not an open fd? An observant
reader would ask me. And you would be correct, but some time after Linux 3.19 there was a new
syscall - <a href="https://www.man7.org/linux/man-pages/man2/execveat.2.html">execveat(2)</a>. At first glance
it won&rsquo;t help us, since it&rsquo;s just like
<a href="https://www.man7.org/linux/man-pages/man2/openat.2.html">openat(2)</a> but for executing instead of
opening files. That is, it uses a <em>directory</em> fd as a root to search for file path relative to. But
if you read the man page:</p>

<blockquote>
<p>If pathname is an empty string and the AT_EMPTY_PATH flag is
specified, then the file descriptor dirfd specifies the file to be
executed (i.e., dirfd refers to an executable file, rather than a
directory).</p>
</blockquote>

<p>Bingo! So the plan of action is:</p>

<ol>
<li>Loader is loaded into the memory with its ELF headers</li>
<li>Loader <a href="https://www.man7.org/linux/man-pages/man2/open.2.html">open(2)</a>s the to-be-loaded binary
as O_CLOEXEC so it automatically closes on execveat(2)</li>
<li>Loader <a href="https://www.man7.org/linux/man-pages/man2/fstat.2.html">fstat(2)</a>s the binary to get its
size</li>
<li>Create memfd, and <a href="https://www.man7.org/linux/man-pages/man2/ftruncate.2.html">ftruncate(2)</a> it
to the size of loader&rsquo;s ELF headers + the size of binary-to-be-loaded</li>
<li><a href="https://www.man7.org/linux/man-pages/man2/mmap.2.html">mmap(2)</a> a shared memory backed by memfd</li>
<li>Copy the ELF headers and <a href="https://www.man7.org/linux/man-pages/man2/read.2.html">read(2)</a> the
binary into the freshly mapped memory</li>
<li>Fixup the ELF copy</li>
<li>execveat(2)</li>
<li>&hellip;</li>
<li>PROFIT</li>
</ol>

<p>We don&rsquo;t need to worry about open(2) files and memfd_create(2)ed by us since we pass them O_CLOEXEC
and MFD_CLOEXEC. So they automatically close on execveat(2). We didn&rsquo;t really do anything else to
pollute the child&rsquo;s execution environment, except mmap(2), but execve(2) and execveat(2) create a
whole new memory map for the new process, so it&rsquo;s not inherited. Although you could say it is, since
the new program <em>is</em> running from the memfd memory.
Anyways! Even though the code is very similar to the previous one, I will paste the whole listing
here:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>

BASE:   <span class='golog-keyword'>equ</span> <span class='golog-number'>0x400000</span>
ENTRY:  <span class='golog-keyword'>equ</span> BASE + _start
PAGESZ: <span class='golog-keyword'>equ</span> <span class='golog-number'>0x1000</span>

elfhead:
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0x7F</span>, <span class='golog-string'>&#34;ELF&#34;</span>            ; magic
        <span class='golog-keyword'>db</span>   <span class='golog-number'>2</span>                      ; <span class='golog-number'>64</span>-bit
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>                      ; Little Endian
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>                      ; ELF Version
        <span class='golog-keyword'>db</span>   <span class='golog-number'>3</span>                      ; Linux
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>                      ; ignored on Linux
<span class='golog-keyword'>times</span> <span class='golog-number'>7</span> <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>                      ; padding
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>2</span>                      ; executable file
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0x3E</span>                   ; AMD64
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>                      ; ELF Version, again
e_entr: <span class='golog-keyword'>dq</span>   ENTRY                  ; program entry
        <span class='golog-keyword'>dq</span>   progheads              ; program headers offset
        <span class='golog-keyword'>dq</span>   sectheads              ; <span class='golog-keyword'>section</span> headers offset
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>0</span>                      ; flags
        <span class='golog-keyword'>dw</span>   elfheadsz              ; size of this ELF header
        <span class='golog-keyword'>dw</span>   pgsz                   ; size of <span class='golog-number'>1</span> program header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>1</span>                      ; num of program headers
        <span class='golog-keyword'>dw</span>   scsz                   ; size of <span class='golog-number'>1</span> <span class='golog-keyword'>section</span> header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>                      ; num of secttion headers
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>                      ; <span class='golog-keyword'>section</span> header string table index
elfheadsz: <span class='golog-keyword'>equ</span> $-elfhead

progheads:
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>                      ; Load
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1|2</span>|4                  ; Exec | Write | Read
        <span class='golog-keyword'>dq</span>   <span class='golog-number'>0</span>                      ; offset
        <span class='golog-keyword'>dq</span>   BASE                   ; virt mem addr
        <span class='golog-keyword'>dq</span>   BASE                   ; phys mem addr if applic.
ph_fsz: <span class='golog-keyword'>dq</span>   allsz                  ; file size
ph_msz: <span class='golog-keyword'>dq</span>   allsz                  ; mem size
        <span class='golog-keyword'>dq</span>   PAGESZ                 ; alignment
pgsz: <span class='golog-keyword'>equ</span> $-progheads

sectheads: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span>
scsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span> ;$-sectheads

elfsz: <span class='golog-keyword'>equ</span> $-elfhead

st:   <span class='golog-keyword'>equ</span> <span class='golog-number'>144</span>                       ; sizeof(struct stat)
stsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>48</span>                        ; offsetof(struct stat, st_size)

memfdname: <span class='golog-keyword'>db</span> <span class='golog-string'>&#39;elfstub&#39;</span>,<span class='golog-number'>0</span>
text:
_start:
        mov rbp, rsp

        cmp DWORD [rbp], <span class='golog-number'>2</span>          ; [rbp] = argc
        jge .<span class='golog-number'>1</span>                      ; argc &lt; <span class='golog-number'>2</span> ?
            xor edi, edi
            inc edi
            jmp exit
    .<span class='golog-number'>1</span>:
        mov eax, <span class='golog-number'>2</span>                  ; <span class='golog-number'>2</span> = open(<span class='golog-number'>2</span>)
        lea rdi, [rbp+<span class='golog-number'>16</span>]           ; [rbp+<span class='golog-number'>16</span>] = argv[<span class='golog-number'>1</span>]
        mov rdi, [rdi]
        mov esi, <span class='golog-number'>0x80000</span>            ; <span class='golog-number'>02000000</span> == <span class='golog-number'>0x80000</span> == O_CLOEXEC, O_RDONLY == <span class='golog-number'>0</span>
        xor edx, edx                ; no mode, not creating
        syscall

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>2</span>                      ; fd &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>2</span>
            jmp exit
    .<span class='golog-number'>2</span>:
        mov r15, rax                ; save fd

        sub rsp, st                 ; push struct stat onto a stack

        mov rdi, r15                ; fd
        mov rsi, rsp                ; struct stat*
        mov eax, <span class='golog-number'>5</span>                  ; <span class='golog-number'>5</span> = fstat(<span class='golog-number'>2</span>)
        syscall

        cmp eax, <span class='golog-number'>0</span>
        je .<span class='golog-number'>3</span>                       ; ret != <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>3</span>
            jmp exit
    .<span class='golog-number'>3</span>:
        lea rdi, [rel memfdname]    ; name
        mov esi, <span class='golog-number'>1</span>                  ; <span class='golog-number'>1</span> == MFD_CLOEXEC
        mov eax, <span class='golog-number'>319</span>                ; <span class='golog-number'>319</span> = memfd_create(<span class='golog-number'>2</span>)
        syscall
        mov r12, rax                ; save memfd

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>4</span>                      ; ret &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>4</span>
            jmp exit
    .<span class='golog-number'>4</span>:
        mov rdi, r12                ; memfd
        mov rsi, [rsp+stsz]         ; st.st_size + full ELF size
        add rsi, elfsz
        mov eax, <span class='golog-number'>77</span>                 ; <span class='golog-number'>77</span> = ftruncate(<span class='golog-number'>2</span>)
        syscall

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>5</span>                      ;  ret &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>5</span>
            jmp exit
    .<span class='golog-number'>5</span>:
        xor edi, edi                ; NULL
                                    ; RSI, size unchanged
        mov edx, <span class='golog-number'>1|2</span>|4              ; PROT_READ | PROT_WRITE | PROT_EXEC
        mov r10, <span class='golog-number'>1</span>                  ; MAP_SHARED
        mov r8,  r12                ; memfd
        xor r9,  r9                 ; offset
        mov eax, <span class='golog-number'>9</span>                  ; <span class='golog-number'>9</span> = mmap(<span class='golog-number'>2</span>)
        syscall

        cmp rax, -<span class='golog-number'>1</span>
        jne .<span class='golog-number'>6</span>                      ;  ret == MAP_FAILED
            mov edi, <span class='golog-number'>6</span>
            jmp exit
    .<span class='golog-number'>6</span>:
        mov r14, rax                ; save new memory addr

        cld                         ; clear direction (for scas*/stos*)

        mov rsi, BASE               ; source, start of own image, ELF heaer
        mov rdi, r14                ; dest, newmem
        mov rcx, elfsz / <span class='golog-number'>8</span>          ; size, ELF headers size
        rep movsq                   ; memcpy(<span class='golog-number'>3</span>)

        mov rsi, rdi                ; newmem + elf header
        mov rdi, r15                ; fd = file
        mov rdx, [rsp+stsz]         ; size
        xor eax, eax                ; <span class='golog-number'>0</span> = read(<span class='golog-number'>2</span>)
        syscall

        cmp rax, rdx
        je .<span class='golog-number'>7</span>                       ;  ret != st.st_size ?
            mov edi, <span class='golog-number'>7</span>
            jmp exit
    .<span class='golog-number'>7</span>:
        mov r13, rsi                ; save newmem + ELF header offset

        mov al,  0xA                ; needle, newline 0xA = <span class='golog-string'>&#39;\n&#39;</span>
        mov rdi, r13                ; hay
        mov rcx, [rsp+stsz]         ; hay size
        repne scasb                 ; search for it

        cmp rcx, <span class='golog-number'>0</span>
        jne .<span class='golog-number'>8</span>                      ; not found
            mov edi, <span class='golog-number'>8</span>
            jmp exit
    .<span class='golog-number'>8</span>:
        sub rdi, r14                ; patch e_entr in memfd = BASE + (BinStart[<span class='golog-string'>&#39;\n&#39;</span>] - BinStart)
        add rdi, BASE
        mov [r14+e_entr], rdi

        mov rdi, [rsp+stsz]         ; load st.st_size and patch memfd ELF Prog. Headers
        mov [r14+ph_fsz], rdi
        mov [r14+ph_msz], rdi

        mov rdi, r12                ; memfd
        mov BYTE [rbp-<span class='golog-number'>16</span>], <span class='golog-number'>0</span>
        lea rsi, [rbp-<span class='golog-number'>16</span>]
        lea rdx, [rbp+<span class='golog-number'>16</span>]           ; argv = loader argv + <span class='golog-number'>1</span>
        mov rax, [rbp]              ;   tmp argc
        lea r10, [rbp + rax*<span class='golog-number'>8</span> + <span class='golog-number'>16</span>] ; envp
        mov r8,  <span class='golog-number'>0x1000</span>             ; AT_EMPTY_PATH = <span class='golog-number'>0x1000</span>
        mov eax, <span class='golog-number'>322</span>                ; <span class='golog-number'>322</span> = exeveat(<span class='golog-number'>2</span>)
        syscall

    .<span class='golog-number'>9</span>:
        mov edi, <span class='golog-number'>9</span>
exit:
        mov eax, <span class='golog-number'>60</span>                 ; <span class='golog-number'>60</span> = exit(<span class='golog-number'>2</span>)
        syscall

textsz: <span class='golog-keyword'>equ</span> $-text
allsz: <span class='golog-keyword'>equ</span> $-elfhead
</code></pre></div>
</div>

<p>And that&rsquo;s it, we build it the same way as before aaaand:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 493 May 23 01:06 bld
-rwxr-xr-x 1 dwlr dwlr  54 May 23 01:06 xyzzy
$ ./xyzzy
Nothing happens.
</code></pre></div>
</div>

<p>One more thing to mention, is the need to pass command line arguments, and the environment to the
child. That&rsquo;s easily done by dereferencing <code>argc</code> from rsp register (stack pointer), then skipping
the required number of <code>argv</code> pointers + 1 more one (there&rsquo;s NULL pointer after last one) and
that&rsquo;s your <code>envp</code> environment pointers. We, of course, also skip the 1st argument as it would be
&ldquo;./bld&rdquo;.</p>

<p>And that&rsquo;s about it. Last thing I wanted to mention is that while developing this
<a href="https://www.man7.org/linux/man-pages/man1/strace.1.html"><code>strace(1)</code></a> was an invaluable help. As
I could write simple and normal C programs, and see what they do, as well as quickly debug wrong
arguments to syscalls in my assembly programs. Much quicker than loading gdb and stepping through
it.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
</span></pre></div>
    <div class="golog-lines"><pre><code>$ strace ./xyzzy &gt; /dev/null
execve(&#34;./xyzzy&#34;, [&#34;./xyzzy&#34;], 0x7fff60023570 /* 86 vars */) = 0
open(&#34;./xyzzy&#34;, O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=54, ...}) = 0
memfd_create(&#34;elfstub&#34;, MFD_CLOEXEC)    = 4
ftruncate(4, 174)                       = 0
mmap(NULL, 174, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 4, 0) = 0x7f29e6ecf000
read(3, &#34;#!./bld\n1\377\377\307H\2155\22\0\0\0\272\21\0\0\0\211\370\17\5\377\317\270&lt;&#34;..., 54) = 54
execveat(4, &#34;&#34;, [&#34;./xyzzy&#34;], 0x7ffdcf145970 /* 86 vars */, AT_EMPTY_PATH) = 0
write(1, &#34;Nothing happens.\n&#34;, 17)      = 17
exit(0)                                 = ?
+++ exited with 0 +++
</code></pre></div>
</div>

<p>After a break, I think I will try to move the <code>bld</code> into kernel proper, could be fun! Bye! o/</p>
]]></content>
</entry>
</feed>
