<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Digital Cabin - programming</title>
<link rel="alternate" href="https://cabin.digital"/>
<link rel="self" href="https://cabin.digital/tags/programming.xml"/>
<id>https://cabin.digital/</id>
<updated>2025-05-28T06:40:00Z</updated>
<generator uri="https://cabin.digital/golog.html" version="1.10.1beta">golog</generator>
<subtitle>Comfy cabin in a rough digital ocean</subtitle>
<author>
<name>dweller</name>
</author>
<category term="sh"/>
<category term="unix"/>
<category term="x86"/>
<category term="rant"/>
<category term="programming"/>
<category term="dev"/>
<category term="c"/>
<category term="haskell"/>
<category term="glocal"/>
<category term="langdev"/>
<category term="linux"/>
<category term="asm"/>
<category term="python"/>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/celf2.html</id>
<link rel="alternate" href="https://cabin.digital/log/celf2.html"/>
<title>Executable C source files 2: Shell Boogaloo</title>
<updated>2025-05-28T06:40:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programming"/>
<category term="dev"/>
<category term="c"/>
<category term="sh"/>
<category term="unix"/>
<content type="html"><![CDATA[<p>Hey, this small and dumb post is a sidequel to my <a href="/log/celf.html">previous post</a>. Check it out for
more context.</p>

<p>So, last time I siked C preprocessor and abused UNIX shell. This time we&rsquo;re gonna let the C
preprocessor do its job, and won&rsquo;t abuse shells to achieve a very similar effect. I came up with
this idea on a sleepless night, like most other dumb ideas are created, and in its final form from
the get go. But I am going to waste your time by going at it step by step. :P</p>

<h3 id="everything-is-a-file">Everything is a file</h3>

<p>Did you know that on UNIX®, everything is a file <sub>(terms and conditions may apply)</sub>? Yes,
it&rsquo;s (mostly) true, and that means that stdin also exists and looks like a normal file on a file
system. Well, a pipe most likely, but that&rsquo;s basically a file. Each process can open <em>/dev/stdin</em>
and the bunch to access their stdin, stdout and stderr. And as you know we can also change what
files get actually open as those with shell redirection and pipes.</p>

<p>You might also be aware of <a href="https://en.wikipedia.org/wiki/Here_document">heredocs</a>, &ldquo;file
literals&rdquo; so to say. So if we combine the two, we can embed C source code, and pipe it to the stdin
of our compiler!</p>

<blockquote>
<p>But <code>cc</code> doesn&rsquo;t read from stdin, you doofus!</p>
</blockquote>

<p>A mean and obtuse reader would say, but I am sure I don&rsquo;t have readers like that! Of course it can,
just pass it <em>/dev/stdin</em>!</p>

<h3 id="piping-cc-s">Piping <code>cc</code>s</h3>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cat &lt;&lt; EOF | cc /dev/stdin
&gt; #include &lt;stdio.h&gt;
&gt; int main(void)
&gt; {
&gt;   puts(&#34;Hell world...&#34;);
&gt;   return 0;
&gt; }
&gt; EOF
/usr/bin/ld: /dev/stdin: file not recognized: Illegal seek
</code></pre></div>
</div>

<p>Well darn&hellip; But do not despair, gcc has billions of switches, I am sure we can find something.
Something like:</p>

<blockquote>
<p>-x language
  Specify explicitly the language for the following input files (rather than letting the
  compiler choose a default based on the file name suffix).  This option applies to all
  following input files until the next -x option.  Possible values for language are:</p>

<ul>
<li>c</li>
<li>c-header</li>
<li>cpp-output</li>
<li>c++</li>
</ul>

<p>[&hellip;]</p>
</blockquote>

<p>&ndash; gcc(1)</p>

<p>Since ld(1) which is launched by gcc(1) said &ldquo;file not recognized&rdquo;, let&rsquo;s help &lsquo;em out:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cat &lt;&lt; EOF | cc -x c /dev/stdin
&gt; #include &lt;stdio.h&gt;
&gt; int main(void)
&gt; {
&gt;   puts(&#34;Hell world...&#34;);
&gt;   return 0;
&gt; }
&gt; EOF
$ ./a.out
Hell world...
</code></pre></div>
</div>

<p>Voilà! Ez-pz, lemon squeeze and all that, life is good. But it can be better!</p>

<h3 id="find-the-errors-in-your-ways">Find the errors in your ways</h3>

<p>But suppose we make an error, I bet the compiler output will be pretty bad!</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cat &lt;&lt; EOF | cc -x c /dev/stdin
&gt; #include &lt;stdio.h&gt;
&gt; int main(void)
&gt; {
&gt;   for puts(&#34;Hell world...&#34;);
&gt;   return 0;
&gt; }
&gt; EOF
/dev/stdin: In function &#39;main&#39;:
/dev/stdin:4:6: error: expected &#39;(&#39; before &#39;puts&#39;
</code></pre></div>
</div>

<p>Yeah, as expected not good. The file name is literally &ldquo;/dev/stdin&rdquo; and the line numbers are off
because of our little shell header. Fortunately, C preprocessor has just the directive for us -
<em>#line</em>:</p>

<blockquote>
<ul>
<li>#line linenum</li>
</ul>

<p>linenum is a non-negative decimal integer constant. It specifies the line number which should be reported for the following line of input [&hellip;]</p>

<ul>
<li>#line linenum filename</li>
</ul>

<p>linenum is the same as for the first form, and has the same effect. In addition, filename is a string constant. [&hellip;]</p>
</blockquote>

<p>&ndash; <a href="https://gcc.gnu.org/onlinedocs/cpp/Line-Control.html">GCC online docs</a></p>

<p>So we can control what the compiler will think the line numbers and file names are! Since the
heredoc is actually a shell file, we can use all the shell substitutions and features inside of it.
So let&rsquo;s replace the line with 3, as it&rsquo;s the following line and the file name with &ldquo;$0&rdquo; as that&rsquo;s
usually the script&rsquo;s name:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cat &lt;&lt; EOF | cc -x c /dev/stdin
#line 3 &#34;$0&#34;
#include &lt;stdio.h&gt;

int main(void)
{
    for puts(&#34;Hell world...&#34;);
    return 0;
}
EOF
    7 | ??????????????????????????

                                 ???????
                                               ????
      |         ^~~~
      |         (
</code></pre></div>
</div>

<p>Uhh, I guess it&rsquo;s not happy that this is right on the shell&hellip; Probably &ldquo;$0&rdquo; is something like
<em>/bin/sh</em> or whatever shell you use. Let&rsquo;s make a proper file, and make it self-executable while
we&rsquo;re at it:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</span></pre></div>
    <div class="golog-lines"><pre><code>$ nl -ba c.sh
     1  #!/bin/sh -e
     2  cat &lt;&lt; EOF | cc -x c /dev/stdin -o .&#34;$0&#34; &amp;&amp; ./.&#34;$0&#34;
     3  #line 4 &#34;$0&#34;
     4  #include &lt;stdio.h&gt;
     5
     6  int main(void)
     7  {
     8     for puts(&#34;Hell world...&#34;);
     9     return 0;
    10  }
    11  EOF
$ chmod +x c.sh
$ ,/c.sh
./c.sh: In function &#39;main&#39;:
./c.sh:8:9: error: expected &#39;(&#39; before &#39;puts&#39;
    8 |     for puts(&#34;Hell world...&#34;);
      |         ^~~~
      |         (
$ vim c.sh # fix the error
$ ./c.sh
Hell world...
</code></pre></div>
</div>

<p>And so it works! But what if you want to add lots of flags to your compiler, and keep them on
different lines too, that would move our #line directive, and fixing it manually every time would be
a pain. Well, if you want to use bash(1) instead of sh(1) I think it has a built-in variable like
<code>$BASH_LINENO</code> or something, don&rsquo;t quote me on that. But if you want to keep it POSIX, we could make
this a tad more complex, but automatic. We can make the shell script search inside of the line that
does the searching inside of it. For example, with awk(1), which might be overkill, but makes it
easy:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</span></pre></div>
    <div class="golog-lines"><pre><code>$ vim c.sh # reintroduce the error
$ nl -ba c.sh
     1  #!/bin/sh -e
     2  cat &lt;&lt; EOF | cc -x c /dev/stdin -o .&#34;$0&#34; \
     3     -Wall -Wextra -pedantic \
     4     -std=c89 -O2 \
     5     &amp;&amp; ./.&#34;$0&#34;
     6  #line $(nl -ba &#34;$0&#34; | awk &#39;/#line/ {print $1+1}&#39;) &#34;$0&#34;
     7  #include &lt;stdio.h&gt;
     8
     9  int main(void)
    10  {
    11     for puts(&#34;Hell world...&#34;);
    12     return 0;
    13  }
    14  EOF
$ ./c.sh
./c.sh: In function &#39;main&#39;:
./c.sh:11:9: error: expected &#39;(&#39; before &#39;puts&#39;
   11 |     for puts(&#34;Hell world...&#34;);
      |         ^~~~
      |         (
$ vim c.sh # fix the error
$ ./c.sh
Hell world...
</code></pre></div>
</div>

<p>And that&rsquo;s about it! Just use nl(1) to list the file with line numbers, select that line with awk(1)
and replace it with just the line number + 1 as per the docs. Kind of meta if you think about it.</p>

<p>Anyways, doubt this will be useful anywhere but in some obfuscation thing, but if you read this this
far now you know it!</p>

<p>More soon! (No promises&hellip;)</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/flatbin.html</id>
<link rel="alternate" href="https://cabin.digital/log/flatbin.html"/>
<title>Loading flat binaries in Linux</title>
<updated>2025-05-22T23:14:36Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programming"/>
<category term="dev"/>
<category term="c"/>
<category term="linux"/>
<category term="x86"/>
<category term="asm"/>
<content type="html"><![CDATA[<p>Do you remember the good old times when things were simple and you could just load some binary into
RAM and jump into it? Yea, me neither. But that time existed. These days things are complicated,
often for no reason, but this isn&rsquo;t one of those cases. That said, don&rsquo;t you wanna do it? Come on
it&rsquo;s fun right? Just assemble and load!</p>

<p>I accidentally came to this idea when trying to do the smallest ELF &ldquo;challenge&rdquo; on Linux. You know,
turn a 1.4MB (a whole fucking 3½&rdquo; HD floppy!) abomination that is Go&rsquo;s &ldquo;Hello world&rdquo; program into
the smallest equivalent? Anyways, <a href="https://cirosantilli.com/elf-hello-world">plenty</a>
of <a href="https://github.com/tchajed/minimal-elf">people</a> wrote
<a href="https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html">about</a> that better than I can, but
maybe I will make a write up about <em>my own</em> journey next time.  (Spoiler alert, I got it to 91 bytes
for just <code>exit(0)</code>.) What I want to write about is how I had a thought that if you disregard loader
size (which people often do for dynamically linked executables) then If I craft my own loader I can
strip ELF and just load a flat binary!</p>

<p>First, our payload, a simple program that just prints &ldquo;Nothing happens.&rdquo; shall do. Let&rsquo;s call it
<code>xyzzy</code>. Let&rsquo;s write a minimal x86_64 assembly program for it, I&rsquo;ll use
<a href="https://www.nasm.us/"><code>nasm</code></a> as my assembler of choice this time:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>
<span class='golog-keyword'>db</span> <span class='golog-string'>&#39;#!./bld&#39;</span>,0xA

start:
       xor edi, edi
       inc edi
       lea rsi, [rel msg]
       mov edx, len
       mov eax, edi
       syscall

       dec edi
       mov eax, <span class='golog-number'>60</span>
       syscall

msg: <span class='golog-keyword'>db</span> <span class='golog-string'>&#34;Nothing happens.&#34;</span>,0xA
len: <span class='golog-keyword'>equ</span> $-msg
</code></pre></div>
</div>

<p>Build it with <code>nasm -fbin xyzzy.asm</code> and <code>chmod +x xyzzy</code> it.
The very first thing you&rsquo;ll notice is the embedded &ldquo;#!./bld&rdquo; line. Since Linux will go through the
list of
its own loaders embedded into kernel and try them all, one of them will look for &ldquo;#!&rdquo; and execute
the path after it (till &lsquo;\n&rsquo; aka 0x0A) instead, passing the path of the original executable
(that we tried to launch)
to it and let it handle it in user space. This is how your Python, Perl, Ruby and,
of course, shell scripts work. But as you <em>might</em> know, <a href="/log/celf.html#how-does-it-work">POSIX shell scripts can work even without
that</a>. This saves us the trouble of writing our loader in kernel
space. Although I think I would like to do that anyways, and maybe shave some more bytes. Since
in real world, you&rsquo;d have &ldquo;#!/bin/bld&rdquo; at least, which is 11 bytes (remember the &lsquo;\n&rsquo;). Instead we
could just either look at filename or a smaller magic number. But I am getting sidetracked.</p>

<p>Okay let&rsquo;s write the simplest loader I could think of, I&rsquo;ll use C for now:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#if</span> <span class='golog-number'>0</span>
    exe=<span class='golog-string'>&#34;$(basename &#34;</span>$<span class='golog-number'>0</span><span class='golog-string'>&#34; | cut -d. -f1)&#34;</span>
    CC=musl-gcc
    CFLAGS=<span class='golog-string'>&#34;-pipe -<span class='golog-keyword'>static</span> -std=c89 -Os -s -Wall -Wextra&#34;</span>
    exec $CC $CFLAGS <span class='golog-string'>&#34;$<span class='golog-number'>0</span>&#34;</span> -o <span class='golog-string'>&#34;$exe&#34;</span>
<span class='golog-keyword'>#endif</span>

<span class='golog-keyword'>#include</span> &lt;stdlib.h&gt;
<span class='golog-keyword'>#include</span> &lt;stdio.h&gt;
<span class='golog-keyword'>#include</span> &lt;errno.h&gt;
<span class='golog-keyword'>#include</span> &lt;unistd.h&gt;
<span class='golog-keyword'>#include</span> &lt;fcntl.h&gt;
<span class='golog-keyword'>#include</span> &lt;sys/stat.h&gt;
<span class='golog-keyword'>#include</span> &lt;sys/mman.h&gt;


<span class='golog-keyword'>typedef</span> <span class='golog-keyword'>void</span> (*jmp_t)(<span class='golog-keyword'>void</span>);

<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    <span class='golog-keyword'>int</span> fd = -<span class='golog-number'>1</span>;
    off_t i = <span class='golog-number'>0</span>;
    <span class='golog-keyword'>struct</span> stat st = {<span class='golog-number'>0</span>};
    <span class='golog-keyword'>char</span>* newcode = <span class='golog-keyword'>NULL</span>;

    <span class='golog-keyword'>if</span>(argc &lt; <span class='golog-number'>2</span>) exit(<span class='golog-number'>1</span>);

    fd = open(argv[<span class='golog-number'>1</span>], O_RDONLY | O_NOATIME);
    <span class='golog-keyword'>if</span>(fd &lt; <span class='golog-number'>0</span>)
    {
        perror(<span class='golog-string'>&#34;open&#34;</span>);
        exit(<span class='golog-number'>2</span>);
    }

    <span class='golog-keyword'>if</span>(fstat(fd, &amp;st) != <span class='golog-number'>0</span>)
    {
        perror(<span class='golog-string'>&#34;fstat&#34;</span>);
        exit(<span class='golog-number'>3</span>);
    }

    newcode = mmap(<span class='golog-keyword'>NULL</span>, st.st_size, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, <span class='golog-number'>0</span>);
    <span class='golog-keyword'>if</span>(newcode == MAP_FAILED)
    {
        perror(<span class='golog-string'>&#34;mmap&#34;</span>);
        exit(<span class='golog-number'>4</span>);
    }

    close(fd);

    <span class='golog-keyword'>for</span>(i = <span class='golog-number'>0</span>; i &lt; st.st_size; i++)
    {
        <span class='golog-keyword'>if</span>(newcode[i] == <span class='golog-string'>&#39;\n&#39;</span>)
        {
            jmp_t jmp = (jmp_t)((<span class='golog-keyword'>void</span>*)(newcode + i + <span class='golog-number'>1</span>));
            jmp();
            exit(<span class='golog-number'>0</span>);
        }
    }

    fprintf(stderr, <span class='golog-string'>&#34;loader: Failed to skip shebang.\n&#34;</span>);
    exit(<span class='golog-number'>5</span>);
}
</code></pre></div>
</div>

<p>As you can see, I use my self-build trick described <a href="/log/celf.html">previously</a> to build the
loader. The idea here is simple, we open the file, map it into an executable memory, find the
end of the shebang (#!) line and jump there. Very crude. The child inherits the loaders execution
environment, including process name. So it won&rsquo;t look nice in <code>ps</code>/<code>top</code>. But it&rsquo;s small and simple.</p>

<p>Before we turn it into more proper loader, let&rsquo;s check something out:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ./bld.c # build
$ ./xyzzy
Nothing happens.
$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 18K May 22 23:41 bld
-rwxr-xr-x 1 dwlr dwlr  54 May 22 23:41 xyzzy
</code></pre></div>
</div>

<p>First, hell yea it works! :) Second, the <code>xyzzy</code> executable is only 54 bytes! But look at our
loader! Oof. That ain&rsquo;t smol, and remember I started this in the spirit of minimal ELF file. So
let&rsquo;s fix this first. All I am going to do is rewrite this program in nasm just like we wrote
<code>xyzzy</code>, just make it a proper ELF:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>

st:   <span class='golog-keyword'>equ</span> <span class='golog-number'>144</span>               ; sizeof(struct stat)
stsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>48</span>                ; offsetof(struct stat, st_size)

<span class='golog-keyword'>section</span> .text
<span class='golog-keyword'>global</span> _start
_start:
        cmp DWORD [rsp], <span class='golog-number'>2</span>  ; [rsp] = argc
        jge .<span class='golog-number'>1</span>              ; argc &lt; <span class='golog-number'>2</span> ?
            xor edi, edi
            inc edi
            jmp exit
    .<span class='golog-number'>1</span>:
        mov eax, <span class='golog-number'>2</span>          ; <span class='golog-number'>2</span> = open(<span class='golog-number'>2</span>)
        lea rdi, [rsp+<span class='golog-number'>16</span>]   ; [rsp+<span class='golog-number'>16</span>] = argv[<span class='golog-number'>1</span>]
        mov rdi, [rdi]
        xor esi, esi        ; O_RDONLY = <span class='golog-number'>0</span>
        xor edx, edx        ; no mode, not creating
        syscall

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>2</span>              ; fd &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>2</span>
            jmp exit
    .<span class='golog-number'>2</span>:
        mov r15, rax        ; save fd

        sub rsp, st         ; push struct stat onto a stack

        mov rdi, r15        ; fd
        mov rsi, rsp        ; struct stat*
        mov eax, <span class='golog-number'>5</span>          ; <span class='golog-number'>5</span> = fstat(<span class='golog-number'>2</span>)
        syscall

        cmp eax, <span class='golog-number'>0</span>
        je .<span class='golog-number'>3</span>               ; ret != <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>3</span>
            jmp exit
    .<span class='golog-number'>3</span>:
        xor edi, edi        ; NULL
        mov rsi, [rsp+stsz] ; st.st_size
        mov edx, <span class='golog-number'>1|2</span>|4      ; PROT_READ | PROT_WRITE | PROT_EXEC
        mov r10, <span class='golog-number'>2</span>          ; MAP_PRIVATE
        mov r8,  r15        ; fd
        xor r9,  r9         ; offset
        mov eax, <span class='golog-number'>9</span>          ; <span class='golog-number'>9</span> = mmap(<span class='golog-number'>2</span>)
        syscall

        cmp rax, -<span class='golog-number'>1</span>
        jne .<span class='golog-number'>4</span>              ;  ret == MAP_FAILED
            mov edi, <span class='golog-number'>4</span>
            jmp exit
    .<span class='golog-number'>4</span>:
        mov r14, rax        ; save new memory addr

        mov rdi, r15        ; fd, closing it so <span class='golog-string'>&#34;child&#34;</span> doesn&#39;t have an extra fd, takes space tho
        mov eax, <span class='golog-number'>3</span>          ; <span class='golog-number'>3</span> = close(<span class='golog-number'>2</span>)
        syscall             ; not checking this one, since if I couldn&#39;t close, who cares

        cld                 ; clear direction (for scas*)
        mov al,  0xA        ; needle, newline 0xA = <span class='golog-string'>&#39;\n&#39;</span>
        mov rdi, r14        ; hay
        mov rcx, [rsp+stsz] ; hay size
        repne scasb         ; search for it

        cmp rcx, <span class='golog-number'>0</span>
        je .<span class='golog-number'>5</span>               ; found
            add rsp, st     ; pop struct stat from stack
            jmp rdi         ; <span class='golog-string'>&#34;load&#34;</span> <span class='golog-string'>&#34;child&#34;</span>
            mov edi, <span class='golog-number'>0</span>
            jmp exit
    .<span class='golog-number'>5</span>:
        mov edi, <span class='golog-number'>5</span>
exit:
        mov eax, <span class='golog-number'>60</span>         ; <span class='golog-number'>60</span> = exit(<span class='golog-number'>2</span>)
        syscall
</code></pre></div>
</div>

<p>And build like this:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
</span></pre></div>
    <div class="golog-lines"><pre><code>$ nasm -felf64 bld.asm
$ ld -static -nostdlib bld.o -o bld
$ ./xyzzy
Nothing happens.
$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 5.0K May 22 23:55 bld
-rwxr-xr-x 1 dwlr dwlr   54 May 22 23:41 xyzzy
$ strip bld
-rwxr-xr-x 1 dwlr dwlr 4.4K May 22 23:56 bld
</code></pre></div>
</div>

<p>It&rsquo;s literally the same program as the C one, but in assembly. You can read the comments if you&rsquo;re
bad at assembly like me, and I wasn&rsquo;t trying too hard to optimize its code size. Partially because
I don&rsquo;t really know how, since as I said, I am not an assembly wizard. 4.4K! Not bad, but we can do
way better. <code>gcc</code> and <code>ld</code> put a lot of cruft we don&rsquo;t really need for our purposes, so let&rsquo;s ditch
<code>ld</code> and just handroll an ELF straight in the assembly! The only difference between the above
program, and the following is the ELF headers, so I will skip the actual code:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>

BASE:   <span class='golog-keyword'>equ</span> <span class='golog-number'>0x400000</span>
ENTRY:  <span class='golog-keyword'>equ</span> BASE + _start
PAGESZ: <span class='golog-keyword'>equ</span> <span class='golog-number'>0x1000</span>

elfhead:
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0x7F</span>, <span class='golog-string'>&#34;ELF&#34;</span>    ; magic
        <span class='golog-keyword'>db</span>   <span class='golog-number'>2</span>              ; <span class='golog-number'>64</span>-bit
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>              ; Little Endian
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>              ; ELF Version
        <span class='golog-keyword'>db</span>   <span class='golog-number'>3</span>              ; Linux
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>              ; ignored on Linux
<span class='golog-keyword'>times</span> <span class='golog-number'>7</span> <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>              ; padding
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>2</span>              ; executable file
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0x3E</span>           ; AMD64
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>              ; ELF Version, again
        <span class='golog-keyword'>dq</span>   ENTRY          ; program entry
        <span class='golog-keyword'>dq</span>   progheads      ; program headers offset
        <span class='golog-keyword'>dq</span>   sectheads      ; <span class='golog-keyword'>section</span> headers offset
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>0</span>              ; flags
        <span class='golog-keyword'>dw</span>   elfsz          ; size of this ELF header
        <span class='golog-keyword'>dw</span>   pgsz           ; size of <span class='golog-number'>1</span> program header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>1</span>              ; num of program headers
        <span class='golog-keyword'>dw</span>   scsz           ; size of <span class='golog-number'>1</span> <span class='golog-keyword'>section</span> header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>              ; num of secttion headers
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>              ; <span class='golog-keyword'>section</span> header string table index
elfsz: <span class='golog-keyword'>equ</span> $-elfhead

progheads:
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>              ; Load
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1|2</span>|4          ; Exec | Write | Read
        <span class='golog-keyword'>dq</span>   <span class='golog-number'>0</span>              ; offset
        <span class='golog-keyword'>dq</span>   BASE           ; virt mem addr
        <span class='golog-keyword'>dq</span>   BASE           ; phys mem addr if applic.
        <span class='golog-keyword'>dq</span>   allsz          ; file size
        <span class='golog-keyword'>dq</span>   allsz          ; mem size
        <span class='golog-keyword'>dq</span>   PAGESZ         ; alignment
pgsz: <span class='golog-keyword'>equ</span> $-progheads

sectheads: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span>
scsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span> ;$-sectheads

st:   <span class='golog-keyword'>equ</span> <span class='golog-number'>144</span>               ; sizeof(struct stat)
stsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>48</span>                ; offsetof(struct stat, st_size)

text:
_start:
    ; ... same code here as in previous code segement&#39;s _start function

textsz: <span class='golog-keyword'>equ</span> $-text
allsz: <span class='golog-keyword'>equ</span> $-elfhead
</code></pre></div>
</div>

<p>We build same as <code>xyzzy</code>: <code>nasm -fbin bld.asm</code> and <code>chmod +x bld</code> it.
If you just go to Wikipedia and find the page about
<a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELFs</a> you can see the whole format,
I just replicated the structure straight in the assembly. If you&rsquo;re totally unfamiliar with assembly
the <code>db</code>, <code>dw</code>, <code>dd</code>, <code>dq</code> define a byte, word(16 bits), double word(32bit) and quad word(64bits)
respectively straight in the resulting binary. So I just define all the fields we need starting from
the beginning of the binary. <code>equ</code> is kind of like <code>#define</code> in C. It stores the value but does not
emit it into the binary. <code>$</code> is the current location in binary, which is auto-incremented as we add
things, including code. The labels are the same as in C. Although to be honest if you&rsquo;re reading
this and know C, you must be familiar enough to get this, so IDK who I am explaining this to. :P
Search the web if you&rsquo;re totally lost.</p>

<p>So, the only things we are required to have a valid ELF are: the ELF header, that describes what
ISA (Instruction Set Architecture) and what ABI (Application Binary Interface) the ELF is for. As
well as, at least 1 Program Header that points the ELF loader where to find, where to and how to
load the executable code in the binary. Of course normal ELFs have way more than this, especially if
you have debugging info embedded into it, but this is not an article on ELFs and DWARFs.
Linux will use System V or GNU/Linux for ABI on AMD64(x86_64) ISA. Out executable is non-PIE
(Position Independent Code) so we just provide a static address as an entry point. The single
program header asks the ELF loader to load the code binary as readable, writable (don&rsquo;t really need
this, but if you&rsquo;re into self-modifying code and security breaches this is for you ;)) and
executable memory and tells how big it is. That&rsquo;s it!</p>

<p>Let&rsquo;s see the results:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 308 May 22 23:41 bld
-rwxr-xr-x 1 dwlr dwlr  54 May 22 23:41 xyzzy
</code></pre></div>
</div>

<p>Now we&rsquo;re talking! Only 308 bytes! I can live with that, but what I can&rsquo;t live with is, is it not
actually loading the binary as a new executable. It&rsquo;s more of a &ldquo;module&rdquo; loader. Which is fine, but
not what I <em>really</em> wanted. Let&rsquo;s fix that.</p>

<p>The main problem here, is that we want to load a program without using kernel&rsquo;s
<a href="https://www.man7.org/linux/man-pages/man2/execve.2.html">execve(2)</a> syscall.
You know the one that actually loads a program into memory. Why? It was already called by the shell
(usually), failed to load an ELF (since <code>xyzzy</code> isn&rsquo;t an ELF), saw &ldquo;#!&rdquo; and passed the path to <em>us</em>!
So we&rsquo;re past the execve(2) and, as I said, <code>xyzzy</code> isn&rsquo;t and ELF.</p>

<p>My first idea was to basically do execve(2) ourselves, without the kernel. Using
<a href="https://www.man7.org/linux/man-pages/man2/fork.2.html">fork(2)</a> +
<a href="https://www.man7.org/linux/man-pages/man2/ptrace.2.html">ptrace(2)</a> + reading and writing to the
<em>/proc/$pid/mem</em> I could change the child&rsquo;s memory, close fds and do general clean up. This is
essentially how <code>gdb</code> and other debuggers are able to change stuff in debugee program. But that
sounds like a lot of work, not to mention I never used ptrace(2) so I would have to learn it. And
even though that might be fun, I wasn&rsquo;t feeling it at the time. This idea came to me before I wrote
the assembly version of the loader.</p>

<p>Maybe you noticed, but the way the Program Headers are set up, I don&rsquo;t just load the code into
memory, I load the whole file. Including the ELF header. A new idea hatched:</p>

<ol>
<li>Loader is loaded into the memory with its ELF headers</li>
<li>Loader copies its ELF into some new place</li>
<li>Loader opens and appends the program to be loaded (<code>xyzzy</code> for us) after the ELF copy</li>
<li>Fix up the ELF copy size and entry point</li>
<li>execve(2) the new binary.</li>
<li>&hellip;</li>
<li>PROFIT</li>
</ol>

<p>There was one snag, execve(2) only takes a file path to the executable&hellip; Well, we could write out
the new ELF to <em>/tmp</em> and launch that, but that just feels wrong. Fortunately, since Linux 3.17
we&rsquo;ve got a new very useful syscall -
<a href="https://www.man7.org/linux/man-pages/man2/memfd_create.2.html">memfd_create(2)</a>. So we can create
a file purely in memory. I can then map it as a shared memory and basically have a view into the
file. Okay that&rsquo;s all cool and nice, but don&rsquo;t we need a file path not an open fd? An observant
reader would ask me. And you would be correct, but some time after Linux 3.19 there was a new
syscall - <a href="https://www.man7.org/linux/man-pages/man2/execveat.2.html">execveat(2)</a>. At first glance
it won&rsquo;t help us, since it&rsquo;s just like
<a href="https://www.man7.org/linux/man-pages/man2/openat.2.html">openat(2)</a> but for executing instead of
opening files. That is, it uses a <em>directory</em> fd as a root to search for file path relative to. But
if you read the man page:</p>

<blockquote>
<p>If pathname is an empty string and the AT_EMPTY_PATH flag is
specified, then the file descriptor dirfd specifies the file to be
executed (i.e., dirfd refers to an executable file, rather than a
directory).</p>
</blockquote>

<p>Bingo! So the plan of action is:</p>

<ol>
<li>Loader is loaded into the memory with its ELF headers</li>
<li>Loader <a href="https://www.man7.org/linux/man-pages/man2/open.2.html">open(2)</a>s the to-be-loaded binary
as O_CLOEXEC so it automatically closes on execveat(2)</li>
<li>Loader <a href="https://www.man7.org/linux/man-pages/man2/fstat.2.html">fstat(2)</a>s the binary to get its
size</li>
<li>Create memfd, and <a href="https://www.man7.org/linux/man-pages/man2/ftruncate.2.html">ftruncate(2)</a> it
to the size of loader&rsquo;s ELF headers + the size of binary-to-be-loaded</li>
<li><a href="https://www.man7.org/linux/man-pages/man2/mmap.2.html">mmap(2)</a> a shared memory backed by memfd</li>
<li>Copy the ELF headers and <a href="https://www.man7.org/linux/man-pages/man2/read.2.html">read(2)</a> the
binary into the freshly mapped memory</li>
<li>Fixup the ELF copy</li>
<li>execveat(2)</li>
<li>&hellip;</li>
<li>PROFIT</li>
</ol>

<p>We don&rsquo;t need to worry about open(2) files and memfd_create(2)ed by us since we pass them O_CLOEXEC
and MFD_CLOEXEC. So they automatically close on execveat(2). We didn&rsquo;t really do anything else to
pollute the child&rsquo;s execution environment, except mmap(2), but execve(2) and execveat(2) create a
whole new memory map for the new process, so it&rsquo;s not inherited. Although you could say it is, since
the new program <em>is</em> running from the memfd memory.
Anyways! Even though the code is very similar to the previous one, I will paste the whole listing
here:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>bits</span> <span class='golog-number'>64</span>

BASE:   <span class='golog-keyword'>equ</span> <span class='golog-number'>0x400000</span>
ENTRY:  <span class='golog-keyword'>equ</span> BASE + _start
PAGESZ: <span class='golog-keyword'>equ</span> <span class='golog-number'>0x1000</span>

elfhead:
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0x7F</span>, <span class='golog-string'>&#34;ELF&#34;</span>            ; magic
        <span class='golog-keyword'>db</span>   <span class='golog-number'>2</span>                      ; <span class='golog-number'>64</span>-bit
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>                      ; Little Endian
        <span class='golog-keyword'>db</span>   <span class='golog-number'>1</span>                      ; ELF Version
        <span class='golog-keyword'>db</span>   <span class='golog-number'>3</span>                      ; Linux
        <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>                      ; ignored on Linux
<span class='golog-keyword'>times</span> <span class='golog-number'>7</span> <span class='golog-keyword'>db</span>   <span class='golog-number'>0</span>                      ; padding
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>2</span>                      ; executable file
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0x3E</span>                   ; AMD64
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>                      ; ELF Version, again
e_entr: <span class='golog-keyword'>dq</span>   ENTRY                  ; program entry
        <span class='golog-keyword'>dq</span>   progheads              ; program headers offset
        <span class='golog-keyword'>dq</span>   sectheads              ; <span class='golog-keyword'>section</span> headers offset
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>0</span>                      ; flags
        <span class='golog-keyword'>dw</span>   elfheadsz              ; size of this ELF header
        <span class='golog-keyword'>dw</span>   pgsz                   ; size of <span class='golog-number'>1</span> program header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>1</span>                      ; num of program headers
        <span class='golog-keyword'>dw</span>   scsz                   ; size of <span class='golog-number'>1</span> <span class='golog-keyword'>section</span> header
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>                      ; num of secttion headers
        <span class='golog-keyword'>dw</span>   <span class='golog-number'>0</span>                      ; <span class='golog-keyword'>section</span> header string table index
elfheadsz: <span class='golog-keyword'>equ</span> $-elfhead

progheads:
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1</span>                      ; Load
        <span class='golog-keyword'>dd</span>   <span class='golog-number'>1|2</span>|4                  ; Exec | Write | Read
        <span class='golog-keyword'>dq</span>   <span class='golog-number'>0</span>                      ; offset
        <span class='golog-keyword'>dq</span>   BASE                   ; virt mem addr
        <span class='golog-keyword'>dq</span>   BASE                   ; phys mem addr if applic.
ph_fsz: <span class='golog-keyword'>dq</span>   allsz                  ; file size
ph_msz: <span class='golog-keyword'>dq</span>   allsz                  ; mem size
        <span class='golog-keyword'>dq</span>   PAGESZ                 ; alignment
pgsz: <span class='golog-keyword'>equ</span> $-progheads

sectheads: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span>
scsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>0</span> ;$-sectheads

elfsz: <span class='golog-keyword'>equ</span> $-elfhead

st:   <span class='golog-keyword'>equ</span> <span class='golog-number'>144</span>                       ; sizeof(struct stat)
stsz: <span class='golog-keyword'>equ</span> <span class='golog-number'>48</span>                        ; offsetof(struct stat, st_size)

memfdname: <span class='golog-keyword'>db</span> <span class='golog-string'>&#39;elfstub&#39;</span>,<span class='golog-number'>0</span>
text:
_start:
        mov rbp, rsp

        cmp DWORD [rbp], <span class='golog-number'>2</span>          ; [rbp] = argc
        jge .<span class='golog-number'>1</span>                      ; argc &lt; <span class='golog-number'>2</span> ?
            xor edi, edi
            inc edi
            jmp exit
    .<span class='golog-number'>1</span>:
        mov eax, <span class='golog-number'>2</span>                  ; <span class='golog-number'>2</span> = open(<span class='golog-number'>2</span>)
        lea rdi, [rbp+<span class='golog-number'>16</span>]           ; [rbp+<span class='golog-number'>16</span>] = argv[<span class='golog-number'>1</span>]
        mov rdi, [rdi]
        mov esi, <span class='golog-number'>0x80000</span>            ; <span class='golog-number'>02000000</span> == <span class='golog-number'>0x80000</span> == O_CLOEXEC, O_RDONLY == <span class='golog-number'>0</span>
        xor edx, edx                ; no mode, not creating
        syscall

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>2</span>                      ; fd &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>2</span>
            jmp exit
    .<span class='golog-number'>2</span>:
        mov r15, rax                ; save fd

        sub rsp, st                 ; push struct stat onto a stack

        mov rdi, r15                ; fd
        mov rsi, rsp                ; struct stat*
        mov eax, <span class='golog-number'>5</span>                  ; <span class='golog-number'>5</span> = fstat(<span class='golog-number'>2</span>)
        syscall

        cmp eax, <span class='golog-number'>0</span>
        je .<span class='golog-number'>3</span>                       ; ret != <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>3</span>
            jmp exit
    .<span class='golog-number'>3</span>:
        lea rdi, [rel memfdname]    ; name
        mov esi, <span class='golog-number'>1</span>                  ; <span class='golog-number'>1</span> == MFD_CLOEXEC
        mov eax, <span class='golog-number'>319</span>                ; <span class='golog-number'>319</span> = memfd_create(<span class='golog-number'>2</span>)
        syscall
        mov r12, rax                ; save memfd

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>4</span>                      ; ret &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>4</span>
            jmp exit
    .<span class='golog-number'>4</span>:
        mov rdi, r12                ; memfd
        mov rsi, [rsp+stsz]         ; st.st_size + full ELF size
        add rsi, elfsz
        mov eax, <span class='golog-number'>77</span>                 ; <span class='golog-number'>77</span> = ftruncate(<span class='golog-number'>2</span>)
        syscall

        cmp eax, <span class='golog-number'>0</span>
        jge .<span class='golog-number'>5</span>                      ;  ret &lt; <span class='golog-number'>0</span> ?
            mov edi, <span class='golog-number'>5</span>
            jmp exit
    .<span class='golog-number'>5</span>:
        xor edi, edi                ; NULL
                                    ; RSI, size unchanged
        mov edx, <span class='golog-number'>1|2</span>|4              ; PROT_READ | PROT_WRITE | PROT_EXEC
        mov r10, <span class='golog-number'>1</span>                  ; MAP_SHARED
        mov r8,  r12                ; memfd
        xor r9,  r9                 ; offset
        mov eax, <span class='golog-number'>9</span>                  ; <span class='golog-number'>9</span> = mmap(<span class='golog-number'>2</span>)
        syscall

        cmp rax, -<span class='golog-number'>1</span>
        jne .<span class='golog-number'>6</span>                      ;  ret == MAP_FAILED
            mov edi, <span class='golog-number'>6</span>
            jmp exit
    .<span class='golog-number'>6</span>:
        mov r14, rax                ; save new memory addr

        cld                         ; clear direction (for scas*/stos*)

        mov rsi, BASE               ; source, start of own image, ELF heaer
        mov rdi, r14                ; dest, newmem
        mov rcx, elfsz / <span class='golog-number'>8</span>          ; size, ELF headers size
        rep movsq                   ; memcpy(<span class='golog-number'>3</span>)

        mov rsi, rdi                ; newmem + elf header
        mov rdi, r15                ; fd = file
        mov rdx, [rsp+stsz]         ; size
        xor eax, eax                ; <span class='golog-number'>0</span> = read(<span class='golog-number'>2</span>)
        syscall

        cmp rax, rdx
        je .<span class='golog-number'>7</span>                       ;  ret != st.st_size ?
            mov edi, <span class='golog-number'>7</span>
            jmp exit
    .<span class='golog-number'>7</span>:
        mov r13, rsi                ; save newmem + ELF header offset

        mov al,  0xA                ; needle, newline 0xA = <span class='golog-string'>&#39;\n&#39;</span>
        mov rdi, r13                ; hay
        mov rcx, [rsp+stsz]         ; hay size
        repne scasb                 ; search for it

        cmp rcx, <span class='golog-number'>0</span>
        jne .<span class='golog-number'>8</span>                      ; not found
            mov edi, <span class='golog-number'>8</span>
            jmp exit
    .<span class='golog-number'>8</span>:
        sub rdi, r14                ; patch e_entr in memfd = BASE + (BinStart[<span class='golog-string'>&#39;\n&#39;</span>] - BinStart)
        add rdi, BASE
        mov [r14+e_entr], rdi

        mov rdi, [rsp+stsz]         ; load st.st_size and patch memfd ELF Prog. Headers
        mov [r14+ph_fsz], rdi
        mov [r14+ph_msz], rdi

        mov rdi, r12                ; memfd
        mov BYTE [rbp-<span class='golog-number'>16</span>], <span class='golog-number'>0</span>
        lea rsi, [rbp-<span class='golog-number'>16</span>]
        lea rdx, [rbp+<span class='golog-number'>16</span>]           ; argv = loader argv + <span class='golog-number'>1</span>
        mov rax, [rbp]              ;   tmp argc
        lea r10, [rbp + rax*<span class='golog-number'>8</span> + <span class='golog-number'>16</span>] ; envp
        mov r8,  <span class='golog-number'>0x1000</span>             ; AT_EMPTY_PATH = <span class='golog-number'>0x1000</span>
        mov eax, <span class='golog-number'>322</span>                ; <span class='golog-number'>322</span> = exeveat(<span class='golog-number'>2</span>)
        syscall

    .<span class='golog-number'>9</span>:
        mov edi, <span class='golog-number'>9</span>
exit:
        mov eax, <span class='golog-number'>60</span>                 ; <span class='golog-number'>60</span> = exit(<span class='golog-number'>2</span>)
        syscall

textsz: <span class='golog-keyword'>equ</span> $-text
allsz: <span class='golog-keyword'>equ</span> $-elfhead
</code></pre></div>
</div>

<p>And that&rsquo;s it, we build it the same way as before aaaand:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ls -lh xyzzy bld
-rwxr-xr-x 1 dwlr dwlr 493 May 23 01:06 bld
-rwxr-xr-x 1 dwlr dwlr  54 May 23 01:06 xyzzy
$ ./xyzzy
Nothing happens.
</code></pre></div>
</div>

<p>One more thing to mention, is the need to pass command line arguments, and the environment to the
child. That&rsquo;s easily done by dereferencing <code>argc</code> from rsp register (stack pointer), then skipping
the required number of <code>argv</code> pointers + 1 more one (there&rsquo;s NULL pointer after last one) and
that&rsquo;s your <code>envp</code> environment pointers. We, of course, also skip the 1st argument as it would be
&ldquo;./bld&rdquo;.</p>

<p>And that&rsquo;s about it. Last thing I wanted to mention is that while developing this
<a href="https://www.man7.org/linux/man-pages/man1/strace.1.html"><code>strace(1)</code></a> was an invaluable help. As
I could write simple and normal C programs, and see what they do, as well as quickly debug wrong
arguments to syscalls in my assembly programs. Much quicker than loading gdb and stepping through
it.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
</span></pre></div>
    <div class="golog-lines"><pre><code>$ strace ./xyzzy &gt; /dev/null
execve(&#34;./xyzzy&#34;, [&#34;./xyzzy&#34;], 0x7fff60023570 /* 86 vars */) = 0
open(&#34;./xyzzy&#34;, O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=54, ...}) = 0
memfd_create(&#34;elfstub&#34;, MFD_CLOEXEC)    = 4
ftruncate(4, 174)                       = 0
mmap(NULL, 174, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 4, 0) = 0x7f29e6ecf000
read(3, &#34;#!./bld\n1\377\377\307H\2155\22\0\0\0\272\21\0\0\0\211\370\17\5\377\317\270&lt;&#34;..., 54) = 54
execveat(4, &#34;&#34;, [&#34;./xyzzy&#34;], 0x7ffdcf145970 /* 86 vars */, AT_EMPTY_PATH) = 0
write(1, &#34;Nothing happens.\n&#34;, 17)      = 17
exit(0)                                 = ?
+++ exited with 0 +++
</code></pre></div>
</div>

<p>After a break, I think I will try to move the <code>bld</code> into kernel proper, could be fun! Bye! o/</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/fw_dev.html</id>
<link rel="alternate" href="https://cabin.digital/log/fw_dev.html"/>
<title>Rule #0 of Firmware Development</title>
<updated>2024-12-25T00:00:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="dev"/>
<category term="programming"/>
<content type="html"><![CDATA[<p>I was reading some datasheets and documentation for a
<a href="https://en.wikipedia.org/wiki/System_on_a_chip">SoC</a> I was tinkering with, and was reminded
about a very important practice in firmware development. In order to appreciate it, first I want to
ask you a few <em>hypotheticals</em>.</p>

<p>Have you ever had your <a href="https://en.wikipedia.org/wiki/X86_64">x86_64</a>,
<a href="https://en.wikipedia.org/wiki/UEFI">EFI32</a> laptop be bricked by an OS because it decided to,
without asking the user, to &ldquo;update&rdquo; the firmware on your machine. And it just so happened that it
didn&rsquo;t check the EFI platform and just assumed if the CPU is 64bit so is the EFI? No?
Well Ubuntu did that to my old Macbook. (I do not endorse using Apple, it&rsquo;s my dad&rsquo;s old laptop that
held sentimental value.)</p>

<p>Have you ever was in the middle of updating your phone&rsquo;s ROM just to brick it because either you
made a mistake in the convoluted steps or because the ROM is bad? This didn&rsquo;t happened to me (yet),
but did to my friends lots of times.</p>

<p>Or maybe you have <a href="https://en.wikipedia.org/wiki/Internet_of_things">IoT</a> devices (my condolences)
and one of them (or more) just died on an unattended update you wasn&rsquo;t even aware of?
(Push updates are the worst thing ever BTW, separate rant.)</p>

<p>All of these have one thing in common. They violate the <strong>0th rule of firmware development</strong>.
So what&rsquo;s the rule? It&rsquo;s simple:</p>

<blockquote>
<p>Always keep two copies of your firmware on the device.</p>
</blockquote>

<p>In case of an update you download to one slot, and mark it &ldquo;not-tested&rdquo; or something. Set the boot
to that slot, start a <a href="https://en.wikipedia.org/wiki/Watchdog_timer">watchdog timer</a>
(IDK any modern SoC that doesn&rsquo;t have watchdogs, if yours don&rsquo;t
add an <a href="https://en.wikipedia.org/wiki/I%C2%B2C">I²C</a>
one or something connected to a <a href="https://en.wikipedia.org/wiki/Non-maskable_interrupt">NMI</a>)
and reboot. If the watchdog timer runs out before
successful boot, or the boot failed in some other, detectable way more than N times,
you mark that slot as &ldquo;bad&rdquo; and boot back from previous, working slot. In case of success, you mark
the slot as &ldquo;working&rdquo; and mark the previous slot as &ldquo;old&rdquo;. So next update will pick that slot.</p>

<p>This technique also helps in case your ROM develops a bad sector
(a <a href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">CRC</a> check during boot should
mark the slot as &ldquo;bad&rdquo;,) or a very rare and unfortunate bug you didn&rsquo;t catch in dev. cleared a wrong
Flash page. In any case, generally speaking, having redundant copy of firmware lets your device be
way more resilient to fatal conditions. It also makes possible for technicians to issue a firmware
downgrade in a very easy, less error prone and fast manner.</p>

<p>This requires you to sacrifice storage/code space, as well as, necessitates a bootloader of sorts.
But it is way easier to thoroughly test the bootloader vs your whole firmware. I also understand
that many older embedded devices simply didn&rsquo;t have the room to store even one firmware the
developers wanted, let alone two. But there is no excuse in the modern world for embedded devices to
not have enough Flash storage. Nor is there any excuse in past eternity for consumer oriented PCs
and laptops to not have that space.</p>

<p>I honestly have no idea why so many devices to this day are shipped in a state where one wrong move
bricks them. Well, maybe one idea - planned obsolescence. Or just sheer incompetence. Whatever it
is, I didn&rsquo;t came up with this rule alone, I am sure any reasonable person who worked in embedded
space recognized at some point how valuable this is. (Like my old coworkers.)</p>

<p>I am giving this #0 because I am fed up of devices getting bricked all the time.</p>

<p>And with that, I leave you be for now, back to celebrations and nice food! Hope you have nice
holidays and have fun!</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/goto_macros_skills.html</id>
<link rel="alternate" href="https://cabin.digital/log/goto_macros_skills.html"/>
<title>Macros, GOTOs and Skill Issues</title>
<updated>2024-11-15T21:50:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="c"/>
<category term="programming"/>
<category term="rant"/>
<content type="html"><![CDATA[<p>I often heard people repeat that C macros are bad and you shouldn&rsquo;t use them. Same with <code>goto</code>s and
I never really knew why. I mean, yes, I know it can be hard to debug macros or even read them
sometimes. And <code>goto</code>s are evil because they are
<a href="https://en.wikipedia.org/wiki/Considered_harmful">considered &ldquo;harmful&rdquo;</a>. Even so, to me, they
clearly have their use. <code>goto</code>s can be useful for cleanup:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#include</span> &lt;stdio.h&gt;

<span class='golog-keyword'>#define</span> fail(ret, label) <span class='golog-keyword'>do</span>{rc=ret; <span class='golog-keyword'>goto</span> label;}<span class='golog-keyword'>while</span>(<span class='golog-number'>0</span>)

<span class='golog-keyword'>int</span> func(<span class='golog-keyword'>void</span>)
{
    <span class='golog-keyword'>int</span> rc = <span class='golog-number'>0</span>;
    FILE* a;
    FILE* b;

    a = fopen(<span class='golog-string'>&#34;a&#34;</span>, <span class='golog-string'>&#34;r&#34;</span>);
    <span class='golog-keyword'>if</span>(!a) fail(<span class='golog-number'>1</span>, fail_a);

    b = fopen(<span class='golog-string'>&#34;b&#34;</span>, <span class='golog-string'>&#34;w&#34;</span>);
    <span class='golog-keyword'>if</span>(!b) fail(<span class='golog-number'>2</span>, fail_b);

    <span class='golog-comment'>/* lots of code here */</span>

    fclose(b);
fail_b:
    fclose(a);
fail_a:
    <span class='golog-keyword'>return</span> rc;
}
</code></pre></div>
</div>

<p>And macros can be used for all sort of things, like <a href="https://en.wikipedia.org/wiki/X_Macro">X macros</a>,
intrusive or otherwise generic data structures, generic &ldquo;functions&rdquo;, etc. Are they a perfect too?
No of course not, I&rsquo;d rather have <code>defer</code> and real meta-programming. BUT not only we don&rsquo;t have that
in C, just because they aren&rsquo;t perfect doesn&rsquo;t mean they can&rsquo;t be useful at all and need total ban
in the codebase. As long as you don&rsquo;t see everything as a nail for your <del>new</del> <code>goto</code>/macro hammer.</p>

<p>My point, <strong>SKILL ISSUE</strong>! No but really, people are afraid of what they don&rsquo;t know. If you&rsquo;re not
skilled enough to differentiate good vs bad use of nuanced features, you&rsquo;ll probably find a
consensus from someone you trust enough (which is a problem if you&rsquo;re new, because you don&rsquo;t know
who&rsquo;s trustworthy since you have no experience to see who talks bullshit since&hellip; Well you got it.)
and stick to their opinion. Which is in modern day &ldquo;X IS CONSIDERED HARMFUL!&rdquo; Which is true for X,
formerly known as Twitter, but not true for many things in programming that you hear on &ldquo;teh
interwebz&rdquo;.</p>

<p>I realized it only now, when I was reading about
<a href="https://en.wikipedia.org/wiki/M4_(computer_language)">m4</a>. To me it was always this arcane thing
that graybeards used and abused before switching to Perl or something. But it was because I didn&rsquo;t
know it. I mean I knew it&rsquo;s a macro processor, but I didn&rsquo;t knew how it worked. Now I know, now I
see that it&rsquo;s cool, and that it can be useful, and that you need to treat it as you&rsquo;d treat any
<em>nuanced</em> tool, with due diligence and research.</p>

<p>So, TL;DR, skill issue, as always. Learn your tools, stop being afraid of things, &ldquo;X is considered
harmful&rdquo; is considered harmful.</p>

<p><strong>P.S.</strong></p>

<p>Dammit, this was supposed to be a short and sweet <a href="/tags/ulog.html">#ulog</a> entry, but as always I
am a bad writer who cannot condense text down to the crux of the issue. In the <em>&ldquo;wise&rdquo;</em> words of
whoever the fuck wrote Dragon Age: Veilguard - &ldquo;Sorry. I ramble sometimes. I am a rambler.&rdquo; (Don&rsquo;t
look it up, it&rsquo;s&hellip; it&rsquo;s dumb.)</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/c89_embed.html</id>
<link rel="alternate" href="https://cabin.digital/log/c89_embed.html"/>
<title>Binary resource inclusion like it's 1989</title>
<updated>2024-07-30T23:40:49Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programming"/>
<category term="dev"/>
<category term="c"/>
<content type="html"><![CDATA[<p>So C23 is adding <code>#embed</code> preprocessor command which can be useful to embed resources into your
binary. It looks something like this:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>const</span> <span class='golog-keyword'>char</span> image[] =
{
    <span class='golog-keyword'>#embed</span> <span class='golog-string'>&#34;image.tga&#34;</span>
};
</code></pre></div>
</div>

<p>It also has some niceties like <code>if_empty</code> so you can embed some default data if file is empty or
non-existent (it&rsquo;s an assumption about the latter.) Check out
<a href="https://en.cppreference.com/w/c/preprocessor/embed">cppreference.com</a> for more information,
or get <a href="https://open-std.org/JTC1/SC22/WG14/www/docs/n3301.pdf">latest C23 draft</a> as of writing this
post.</p>

<p>In any case, during the writing of this entry to my log, there are no compilers that support this.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
</span></pre></div>
    <div class="golog-lines"><pre><code>$ gcc -std=c2x -Wall -Wextra -pedantic test.c
test.c:5:6: error: invalid preprocessing directive #embed
    5 |     #embed &#34;image.tga&#34;
      |      ^~~~~
test.c:3:12: error: zero or negative size array &#39;image&#39;
    3 | const char image[] =
      |            ^~~~~
</code></pre></div>
</div>

<p>Not only that, this would be useful for people like me who are either stuck with, or intentionally
use older standards like C99 or even C89 (like me most of the time.)</p>

<p>While conversing with my friend about adding an embed-like feature to his programming language he
said:</p>

<blockquote>
<p>you mean #include? :)</p>
</blockquote>

<p>Thus the idea was born. <code>#include</code> indeed just <em>embeds</em> a file into your source code. Alas, the
compiler will try to parse the raw binary and won&rsquo;t be happy:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>const</span> <span class='golog-keyword'>char</span> image[] =
{
    <span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;image.tga&#34;</span>
};
</code></pre></div>
</div>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cc -std=c89 -Wall -Wextra -pedantic test2.c
...
image.tga:13:14: error: stray &#39;\377&#39; in program
   13 |        &lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0016&gt;&lt;U+000C&gt;&lt;d9&gt;&lt;ff&gt;&lt;U+000F&gt;&lt;U+000B&gt;&lt;e3&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0004&gt;Z&lt;f7&gt;&lt;ff&gt;&lt;U+0002&gt;m&lt;fa&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+000B&gt;&lt;9d&gt;&lt;fc&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;?
      |                                                                            ^~~~

image.tga:13:15: warning: null character(s) ignored
   13 |    &lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0016&gt;&lt;U+000C&gt;&lt;d9&gt;&lt;ff&gt;&lt;U+000F&gt;&lt;U+000B&gt;&lt;e3&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0004&gt;Z&lt;f7&gt;&lt;ff&gt;&lt;U+0002&gt;m&lt;fa&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+000B&gt;&lt;9d&gt;&lt;fc&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;?
      |                                                                            ^~~~~~~~

image.tga:13:23: error: stray &#39;\4&#39; in program
   13 |    &lt;e3&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0004&gt;Z&lt;f7&gt;&lt;ff&gt;&lt;U+0002&gt;m&lt;fa&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+000B&gt;&lt;9d&gt;&lt;fc&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;?
      |                                                                            ^~~~~~~~

image.tga:13:25: error: stray &#39;\367&#39; in program
   13 |   &lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0004&gt;Z&lt;f7&gt;&lt;ff&gt;&lt;U+0002&gt;m&lt;fa&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+000B&gt;&lt;9d&gt;&lt;fc&gt;&lt;ff&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;&lt;U+0000&gt;?
      |                                                                            ^~~~
...
</code></pre></div>
</div>

<p>Well, let&rsquo;s make the compiler happy. All we need to do is do some preprocessing before the C
preprocessor. Prepreprocessing if you will.</p>

<p>My first idea was simple and worked out of the box, with a small nuance. Let&rsquo;s just read a binary
file and output its escaped bytes and cover it all in quotes, finished with a semicolon. It is
easily done with <em>printf(3)</em>&rsquo;s <code>%x</code> conversion specifier. (See the code in <em>bin2c.c</em> file below)</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>const</span> <span class='golog-keyword'>char</span> image[] =
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;image.tga&#34;</span>

</code></pre></div>
</div>

<p>This works, but as I mentioned above, it has one itty-bitty problem:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
</span></pre></div>
    <div class="golog-lines"><pre><code>... warning: string length &#39;21000&#39; is greater than the length &#39;509&#39; ISO C90 compilers are required to
support [-Woverlength-strings]

</code></pre></div>
</div>

<p>You live, you learn. Apparently ISO C89/C90 compilers are not required to handle strings literals
larger than 509 characters long. GCC 13.2.0 seems to be handing it well, but I wanted to be in spec.
As such, I simply output a <code>char</code> literal for each byte. This sadly makes the post-processed file
larger.</p>

<p><em>bin2c.c</em>:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#define</span> _XOPEN_SOURCE <span class='golog-number'>500</span>

<span class='golog-keyword'>#include</span> &lt;stdlib.h&gt;
<span class='golog-keyword'>#include</span> &lt;stdio.h&gt;
<span class='golog-keyword'>#include</span> &lt;errno.h&gt;


<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    <span class='golog-keyword'>int</span>   rc  = <span class='golog-number'>0</span>;
    <span class='golog-keyword'>int</span>   ret = EXIT_SUCCESS;
    FILE* in  = stdin;
    FILE* out = stdout;
    <span class='golog-keyword'>const</span> <span class='golog-keyword'>char</span>* name  = <span class='golog-string'>&#34;stdin&#34;</span>;
    <span class='golog-keyword'>char</span> buffer[<span class='golog-number'>4096</span>] = {<span class='golog-number'>0</span>};
    <span class='golog-keyword'>size_t</span> got = <span class='golog-number'>0</span>;

    <span class='golog-keyword'>if</span>(argc == <span class='golog-number'>2</span>)
    {
        name = argv[<span class='golog-number'>1</span>];
        in = fopen(name, <span class='golog-string'>&#34;r&#34;</span>);
        <span class='golog-keyword'>if</span>(!in)
        {
            perror(<span class='golog-string'>&#34;fopen&#34;</span>);
            exit(EXIT_FAILURE);
        }

        rc = snprintf(buffer, <span class='golog-keyword'>sizeof</span>(buffer) - <span class='golog-number'>1</span>, <span class='golog-string'>&#34;%s.h&#34;</span>, name);
        <span class='golog-keyword'>if</span>(rc &lt; <span class='golog-number'>0</span>)
        {
            perror(<span class='golog-string'>&#34;snprintf&#34;</span>);
            exit(EXIT_FAILURE);
        }

        out = fopen(buffer, <span class='golog-string'>&#34;w&#34;</span>);
        <span class='golog-keyword'>if</span>(!out)
        {
            perror(<span class='golog-string'>&#34;fopen&#34;</span>);
            fclose(in);
            exit(EXIT_FAILURE);
        }

    }
    <span class='golog-keyword'>else</span> <span class='golog-keyword'>if</span>(argc &gt; <span class='golog-number'>2</span>) fprintf(stderr, <span class='golog-string'>&#34;warning: ignoring excess paramters\n&#34;</span>);

    <span class='golog-keyword'>for</span>(;;)
    {
        <span class='golog-keyword'>size_t</span> i;

        got = fread(buffer, <span class='golog-number'>1</span>, <span class='golog-keyword'>sizeof</span>(buffer), in);
        rc  = ferror(in);
        <span class='golog-keyword'>if</span>(rc)
        {
            perror(<span class='golog-string'>&#34;fread&#34;</span>);
            ret = EXIT_FAILURE;
            <span class='golog-keyword'>goto</span> end;
        }

        <span class='golog-keyword'>for</span>(i = <span class='golog-number'>0</span>; i &lt; got; i++) fprintf(out, <span class='golog-string'>&#34;<span class='golog-string'>&#39;\\x%02x&#39;</span>,&#34;</span>, (<span class='golog-keyword'>unsigned</span> <span class='golog-keyword'>char</span>)buffer[i]);

        rc = feof(in);
        <span class='golog-keyword'>if</span>(rc) <span class='golog-keyword'>break</span>;
    }

    fprintf(out, <span class='golog-string'>&#34;\n&#34;</span>);

end:
    fclose(in);
    fclose(out);
    <span class='golog-keyword'>return</span> ret;
}
</code></pre></div>
</div>

<p>As you can see, the simple program just creates a header file with the same name as the input file.
It also can just read from standard in, so you can chain it with pipes in scripts. I also wrote it
to only depend on standard C library so anyone can use it. You are free to steal this.</p>

<p>With this, we finally can <code>#include</code> files in our source code to embed them in the binary.
To demonstrate this, I wrote a simple program with an embedded
<a href="https://en.wikipedia.org/wiki/Truevision_TGA">TGA</a> file that it prints to standard out using
<a href="https://en.wikipedia.org/wiki/ANSI_escape_code">ANSI escape sequences</a> for color. For this code
to work, your terminal has to support 24-bit True Color sequences.</p>

<p><em>example.c</em>:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#include</span> &lt;stdio.h&gt;

<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;common.h&#34;</span>
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;tga.c&#34;</span>
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;cli.c&#34;</span>


<span class='golog-keyword'>const</span> <span class='golog-keyword'>u8</span> image[] =
{
    <span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;image.tga.h&#34;</span>
};

<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>void</span>)
{
    texture tex = {<span class='golog-number'>0</span>};
    tga2tex_from_mem(&amp;tex, image, <span class='golog-keyword'>sizeof</span>(image));
    cli_draw_tex(&amp;tex, <span class='golog-keyword'>true</span>); <span class='golog-comment'>/* <span class='golog-keyword'>true</span> - Black&amp;White, <span class='golog-keyword'>false</span> - True Color */</span>

    <span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;
}
</code></pre></div>
</div>

<p>Here&rsquo;s an example in black and white.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ls
bin2c.c cli.c  example.c  tga.c  common.h  image.tga
$ cc -std=c89 -Wall -Wextra -pedantic bin2c.c -o bin2c
$ ./bin2c image.tga
$ cc -std=c89 -Wall -Wextra -pedantic example.c -o example
$ ./example

                        ####
        ####################
      ##########        ####
    ##########            ##
  ##########      ##        ##
  ##########        ##      ##
  ########                    ##
    ############################
    ####  ##                ##
    ##  ####  ####          ##
    ####  ##  ####    ####  ##
    ##  ####          ####  ##
          ##          ####  ##

$
</code></pre></div>
</div>

<p>And here&rsquo;s a screenshot in True Color:
<img src="/images/c89_embed.png" alt="Same as above code block, but the image represented with &lsquo;#&rsquo; symbols is in color" /></p>

<p>Success! You can easily add <code>bin2c</code> to your <em>Makefile</em> or any other build script and have it
generate embeddable files that you can embed in the source. Ez pz, no need for C23! ;)</p>

<p>P.S. Interested in the rest of the owl? You can check it out at my
<a href="https://cabin.digital/git/bin2c.git">git repository</a>. It&rsquo;s pretty barebones, and doesn&rsquo;t handle all TGA
files properly, only the non-RLE with ARGB channels in that order.
But, what did you expect for just an example?</p>

<p>¯\_(ツ)_/¯</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/celf.html</id>
<link rel="alternate" href="https://cabin.digital/log/celf.html"/>
<title>Executable C source files</title>
<updated>2024-07-18T03:16:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programming"/>
<category term="dev"/>
<category term="c"/>
<category term="sh"/>
<category term="unix"/>
<content type="html"><![CDATA[<h3 id="on-unix-like-systems">(on UNIX-like systems)</h3>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
</span></pre></div>
    <div class="golog-lines"><pre><code>$ tail -6 magic.c
extern int write(int, const void*, unsigned long long int);
int main(void)
{
    write(0, &#34;Hell world!\n&#34;, 12ULL);
    return 0;
}
$ chmod +x magic.c
$ ./magic.c
Hell world!
$
</code></pre></div>
</div>

<p>So some time ago I came up with a way to build and run C source files as if they were like shell or
python scripts with shebang(<code>#!</code>). I called it <a href="/share/code/celf">&ldquo;celf&rdquo;</a> as in self, as in
self-building C files. It also produces <em>ELF</em> executables on modern UNIX-like platforms.</p>

<p>Questionable play on words aside, it might not be immediately obvious how they work to someone not
familiar with how POSIX <code>sh</code> works, C pre-processor and peculiarities of Linux/UNIX executable
loading. As the README says:</p>

<blockquote>
<p>== How does it work?</p>

<p>It abuses C preprocessor, shell and old UNIX heritage to run shell scripts.</p>
</blockquote>

<h2 id="but-wait-why-though">But wait. Why, though?</h2>

<blockquote>
<p>I came up with this scheme while just messing around, so it&rsquo;s probably not for you.</p>
</blockquote>

<p>Don&rsquo;t you want to just <code>chmod +x main.c</code> and <code>./main.c</code>? No? Just me? Okay. Well that&rsquo;s why.</p>

<p>I primarily use this idea for my &ldquo;playground&rdquo;/scratch space projects. Sometimes you just want to slap
something together to test an idea. Sure I can keep manually writing <code>cc ... main.c -o ...</code>, or hit
<code>↑</code> a bunch. Heck, maybe I&rsquo;ll even do <code>history | grep cc</code> or something and <code>!xxx</code>. But you know what,
that doesn&rsquo;t scale at all. Some of the playground projects need a lot of flags, not to mention I
like to turn on a lot of the warnings, and all those flags add up. &ldquo;Just use a Makefile&rdquo; some would
say, but most of the playground projects are just one file (although this scheme does work with
multi-file projects, see <a href="#how-i-build-my-c-projects">below</a>.)
Besides, I hate adding dependencies, even build dependencies, they add up.
Say I want to bootstrap my own minimal POSIX system, or just a Linux distro. Now I have to compile
GNU Make and all it depends on, or BSD Make, better but still. These days I usually just write a
<code>./build.sh</code> shell script for most of my projects, but in the usual case, now I have 2 files per
project.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cd playground
$ ls
proj1.c proj1.sh proj2.c proj2.sh proj3.c proj3.sh proj4_1.c proj4.c proj4.sh proj5/
</code></pre></div>
</div>

<p>&ldquo;Put them in separate dirs&rdquo;. Uhm, sure I do so for multi-file projects, but that&rsquo;s just hiding the
problem.</p>

<h2 id="the-real-problem">The real problem</h2>

<p>The real problem is that C does not have a builtin build system. Now I don&rsquo;t mean something like
Rust&rsquo;s &ldquo;cargo&rdquo; or go&rsquo;s &ldquo;go build&rdquo;. God forbid! Those are, in my <em>humble</em> opinion, horrible things.
They often obfuscate from the programmer what is going on with the toolchain, and how is it
configured, often producing surprising results. And I don&rsquo;t want to be surprised with a dynamically
linked executable when a static was promised (looking at you Go.)</p>

<p>What I envision is something much simpler, a way to tell the compiler how and what to compile in
the programming language itself. For example of this see Jonathan&rsquo;s Blow unnamed language known as
&ldquo;Jai&rdquo;. Now, I am not a member of closed-beta, and my memory may fault me, so correct me if I am
wrong via an <a href="/whoami.html#contact">email</a>, but if I recall correctly, you write some Jai code in
your first file that sets the compiler&rsquo;s options, add files to the project and even run arbitrary
Jai code at build time.</p>

<p>I am not sure about the last step, although I have nothing against it in principle. But the core
idea of not leaving the programming language to describe what to build in it is <em>very</em> appealing to
me.</p>

<p>What&rsquo;s funny is that Microsoft&rsquo;s MSVC has some
<a href="https://learn.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp">non-standard <code>#pragma</code>&rsquo;s</a>
(do note, that MS loves to kill its links, so it might be dead in the future)
that let you include a library or set some linker flags straight in the source code. I think that&rsquo;s
step in the right direction.</p>

<p>This will make more sense after I describe how I build C programs in general, but first&hellip;</p>

<h2 id="how-does-it-work">How does it work?</h2>

<p>Let&rsquo;s examine the whole <em>magic.c</em> file from the example at the beginning of this post:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#if</span> <span class='golog-number'>0</span>
    cc $<span class='golog-number'>0</span> &amp;&amp; exec ./a.out
<span class='golog-keyword'>#endif</span>

<span class='golog-keyword'>extern</span> <span class='golog-keyword'>int</span> write(<span class='golog-keyword'>int</span>, <span class='golog-keyword'>const</span> <span class='golog-keyword'>void</span>*, <span class='golog-keyword'>unsigned</span> <span class='golog-keyword'>long</span> <span class='golog-keyword'>long</span> <span class='golog-keyword'>int</span>);
<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>void</span>)
{
    write(<span class='golog-number'>0</span>, <span class='golog-string'>&#34;Hell world!\n&#34;</span>, 12ULL);
    <span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;
}
</code></pre></div>
</div>

<p>The whole magic is in the first 3 lines. Let&rsquo;s go step by step:</p>

<ol>
<li>Abuse the fact that # is used in all C pre-processor directives, <em>and</em> is a comment in POSIX <code>sh</code>.
So <code>sh</code> will ignore <code>#if 0</code> and <code>#endif</code> and C will ignore everything between those;</li>
<li>Since we cannot use shebang, as it is not a valid pre-processor directive, we simply assume we
will run in the shell and write a shell script in between the <code>#if 0</code> and <code>#endif</code>;
At this point we could just run <code>$ sh magic.c</code> But we don&rsquo;t even need to do that;</li>
<li>In ye olden days, before UNIX kernel supported <code>#!</code> magic sequence, <em>execve(2)</em> syscall would fail
when loading a file format that it didn&rsquo;t &ldquo;know&rdquo; how to execute (like a.out, COFF, and now ELF).
People back then wanted to just execute scripts like we do now, as in run <code>$ ./script</code> instead of
<code>$ sh script</code>, and so <code>sh</code> developers added a hack. If exec syscall failed, they tried to interpret
the file as a shell script, simply assuming that it was. I am not sure if later they used some sort
of heuristic to determine if file is a valid shell script or not, but what matters to us, is that
this hack become standardized in POSIX. Hence any POSIX compliant <code>sh</code> implementation like <code>ash</code>, or
even those that extend it, like <code>bash</code> or <code>zsh</code> retain this behaviour. If a file has <code>x</code> bit set,
and you try to run it from a POSIX-compliant shell, it will try to exec it, and upon failure try to
interpret it as a script;</li>
<li>Summing up, we write a valid C source file, that is at the same time a valid POSIX shell file.
We gate shell script inside C&rsquo;s <code>#if 0</code> pre-processor directive, and we don&rsquo;t let shell script to
start interpreting C code (well trying to) by manually terminating the script, either with <code>exit</code> or
as in the example above by <code>exec</code>ing into the built executable.</li>
</ol>

<p>And that&rsquo;s it! For those who aren&rsquo;t much into shell scripting, <code>$0</code> is the 0th argument, to the
shell script(or rather, any program), it&rsquo;s always the filepath of the executable, in our case the C
source file. So we run the C compiler <code>cc</code> on our source file <code>$0</code>, and (<code>&amp;&amp;</code>) if it returns with
exit code 0 (success) we <code>exec</code> the resulting binary, which by default is &ldquo;a.out&rdquo; since we didn&rsquo;t
specify it with <code>cc source -o out</code>. <code>exec</code> will not <em>fork(2)</em> rom the current executing script&rsquo;s
shell, but directly replace the executable image of the running shell script with the
file passed as its argument. So it runs our program, and exits, hence no need for separate <code>exit</code> at
the end of our shell script part.</p>

<p>And so the mystery is revealed. Not much of a mystery, just a bunch of hacks.</p>

<p>The core &ldquo;insight&rdquo; here is that we can run arbitrary shell scripts that are stored in a C file that
was set as executable. And so I can use it to put any build script I want there. This does not put
me in my desired &ldquo;one language&rdquo; system, but it does put us in the next best thing. Everything is in
the same file.</p>

<p><a href="/share/code/celf">&ldquo;celf&rdquo;</a> is one such build script. It&rsquo;s a small script, and as I say in the
README, I do encourage you to read the script itself.
It has some basic things built in. Like timing the build process, not
rebuilding if files didn&rsquo;t change, pass and set debug/release flags, and running the resulting
executable. Really it&rsquo;s just an example of the technique, not a &ldquo;product&rdquo;. You can call <code>make</code>
from there if you really want to, although it somewhat defeats the purpose.</p>

<p>So our <em>magic.c</em> using celf would look like:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#if</span> <span class='golog-number'>0</span>
    CFLAGS=<span class='golog-string'>&#34;-Wall -Wextra -pedantic&#34;</span>
    . build.sh
<span class='golog-keyword'>#endif</span>

<span class='golog-keyword'>extern</span> <span class='golog-keyword'>int</span> write(<span class='golog-keyword'>int</span>, <span class='golog-keyword'>const</span> <span class='golog-keyword'>void</span>*, <span class='golog-keyword'>unsigned</span> <span class='golog-keyword'>long</span> <span class='golog-keyword'>long</span> <span class='golog-keyword'>int</span>);
<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>void</span>)
{
    write(<span class='golog-number'>0</span>, <span class='golog-string'>&#34;Hell world!\n&#34;</span>, 12ULL);
    <span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;
}
</code></pre></div>
</div>

<p>And produce:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
</span></pre></div>
    <div class="golog-lines"><pre><code>$ ./magic.c
 --- cc time: .028220280 sec
 --- debug=yes; static=yes
 --- Program output:

Hell world!
$ ./magic.c
 --- rebuild not necessary
 --- Program output:

Hell world!
</code></pre></div>
</div>

<p>For me, the only negative is a non-portable nature of this, as I cannot do something like this on
Microsoft Windows. (I would also have to use different <code>cl.exe</code> (MSVC compiler) flags anyways.)
So I still require different build scripts/systems for non-POSIX platforms.</p>

<h2 id="but-that-s-stupid">But that&rsquo;s stupid.</h2>

<p>Yes. But I like stupid.</p>

<h2 id="how-i-build-my-c-projects">How I build my C projects</h2>

<p>For most small to medium projects having an incremental build system, hell, any
build system is way overkill. Not only is it an unnecessary build dependency, but
also it encourages the complexity demon (see: <a href="https://grugbrain.dev/">https://grugbrain.dev/</a>). So for
tiny projects you can just call:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cc myprog.c -o myprog
</code></pre></div>
</div>

<p>For small to medium projects I prefer Single Compilation Unit build. You might
have heard it called &ldquo;Unity&rdquo; build (no relation to mediocre game engine). If you
are unaware of them, the gist is that you collect all your sources into one
compilation unit (think one .c file) and just compile that. How is it better?
Well it is usually faster and produces better code. It can and will be slower to build
for large projects, but see the next paragraph about that. The resulting code
is usually smaller and faster because the compiler has visibility of the whole source.
As it has the full context and that lets it use more &ldquo;aggressive&rdquo; optimizations.</p>

<p>I don&rsquo;t often do large projects. But they too, can probably be built using SCU as
long as they don&rsquo;t use excessive source dependencies. Or you can break a large project into
a logical units and SCU those, you will still yield a separate linking stage, but you don&rsquo;t need to
recompile the whole project.</p>

<p>This really is a tooling issue, as Jonathan&rsquo;s Blow &ldquo;Jai&rdquo; and Google&rsquo;s <a href="https://github.com/carbon-language/carbon-lang/blob/trunk/README.md">Carbon</a> (or so they claim,
I didn&rsquo;t look into the latter) compilers show incredible speed, sophisticated features and modern
optimizations.</p>

<p>So, typically, these days, I have a file called <em>build.c</em> that <code>#include</code>&rsquo;s all the other <em>*.c</em>
and <em>*.h</em> files, <code>#define</code>&rsquo;s global constants, and, if using it, calls celf. Otherwise I use it as
a single input file to compile in a separate build shell or batch script.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-keyword'>#if</span> <span class='golog-number'>0</span>
    OUT=my_program
    CFLAGS=&#34;-Wall -Wextra -Wpedantic -Wno-<span class='golog-keyword'>long</span>-<span class='golog-keyword'>long</span> -Wformat=<span class='golog-number'>2</span> -Wfloat-equal -Wshadow \
        -std=c89 -fwrapv -fwhole-program \
        -pipe \
    &#34;
    DBGFLAGS=<span class='golog-string'>&#34;-g3 -Og -DDEBUG=<span class='golog-number'>1</span> -fsanitize=undefined&#34;</span>
    RELFLAGS=<span class='golog-string'>&#34;-O2&#34;</span>
    . build.sh
<span class='golog-keyword'>#endif</span>

<span class='golog-keyword'>#define</span> _DEFAULT_SOURCE
<span class='golog-keyword'>#define</span> _POSIX_C_SOURCE 200809L

<span class='golog-keyword'>#define</span> PROGNAME   <span class='golog-string'>&#34;my_program&#34;</span>
<span class='golog-keyword'>#define</span> PROGVER_MAJ <span class='golog-number'>1</span>
<span class='golog-keyword'>#define</span> PROGVER_MIN <span class='golog-number'>0</span>
<span class='golog-keyword'>#define</span> PROGVER_FIX <span class='golog-number'>0</span>
<span class='golog-keyword'>#define</span> PROGVER_REL <span class='golog-string'>&#34;rel&#34;</span> <span class='golog-comment'>/* release */</span>

<span class='golog-keyword'>#define</span>  EXTERNAL_LIB_IMPL
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;lib/extneral.h&#34;</span>

<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;base/common.c&#34;</span>
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;base/special.c&#34;</span>

<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;unit/a.c&#34;</span>
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;unit/b.c&#34;</span>

<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;main.c&#34;</span>
</code></pre></div>
</div>

<p>This lets me have everything in one place, which I really like. I just go to <em>build.c</em> and change
the things I need. It&rsquo;s all there, in one place. I don&rsquo;t need to hunt and decipher Makefiles in each
folder (not that I do that when I <em>do</em> use Makefiles.) Nor do I have to, God forbid, deal with cmake,
ninjas, yarns, ants and whatever else people came up with to create more problems for the rest of us.</p>

<p>I also switched to (almost) exclusively static builds, because at some point someone has to notice
that building containers (especially things like AppImages, snaps and flatpacks) are just worse way
to do a statically linked executable. Like I get the idea to bundle the configuration files and
maybe resources. But if you claim dynamic libraries are good because you can update them, but then
you version them, and then you pack them into a static container&hellip; My friend, reexamine your life
choices. But this is a separate rant.</p>

<p><em>You</em> are not forced to do as I do, you can add linker flags and pass <code>-lname</code> to dynamically link
with your libraries. And of course, you don&rsquo;t need to use Unity/SCU build, just gather your sources
with <code>find</code> and call the compiler on them. Or just don&rsquo;t use this at all, a single BSD Makefile is
probably fine.</p>

<h2 id="exit">exit</h2>

<p>In closing, I hope this was at least interesting. It would be even easier if C pre-processor ignored
<code>#!</code> so I could just &lsquo;#!/usr/local/bin/celf&rsquo; or something. But really we just need a new language
that fits the niche that C has, but modernized. I don&rsquo;t think Rust is that. It&rsquo;s more of a C++
contender, and don&rsquo;t start me on Go. It has GC, that&rsquo;s all one needs to know that it&rsquo;s in a different
world. <a href="https://ziglang.org/">Zig</a>, <a href="http://odin-lang.org/">Odin</a>, maybe even
<a href="https://nim-lang.org/">Nim</a> are all trying, and I&rsquo;ve yet to try all of them. But I am not sure
if any of them have something like this in mind, except Zig and of course unreleased Jai.</p>

<p>Perhaps I should jump on the bandwagon and write my own language? No, that&rsquo;d be potentially useful!
(Probably not though.) And I&rsquo;m all about that useless stuff, like <a href="/log/cpu0.html">wiring a CPU in a Logisim!</a> ;)
Although another toylang would be fun to make one day. I&rsquo;ve been reading about FORTH you know :P</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/include-this.html</id>
<link rel="alternate" href="https://cabin.digital/log/include-this.html"/>
<title>#include "this.h"</title>
<updated>2019-10-08T00:00:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="dev"/>
<category term="programming"/>
<category term="c"/>
<category term="python"/>
<content type="html"><![CDATA[<p>So, I work with a couple of Python developers, and they really like their
Pythonic way. Which is fine, it has many good ideas, but they keep making
<code>import this</code> jokes, and I just had to do one for C. And, well, here we are.</p>

<p>First, let&rsquo;s just see it in action.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-comment'>// main.c</span>
<span class='golog-keyword'>#include</span> <span class='golog-string'>&#34;this.h&#34;</span>


<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    (<span class='golog-keyword'>void</span>)argc;
    (<span class='golog-keyword'>void</span>)argv;

    <span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;
}
</code></pre></div>
</div>

<p>The first ingredient is our regular C source file. It&rsquo;s just a main function
that does nothing. The 2nd one, is our secret sauce, we will discuss it later.
And of course, the final ingredient, the C compiler, I&rsquo;ll use <code>gcc</code>.</p>

<p>So, let&rsquo;s compile and run it:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</span></pre></div>
    <div class="golog-lines"><pre><code>$ gcc -Wall -Wextra -pedantic main.c -o main
$ ./main
The Zen of C, by dweller

Code is better than cliché guidelines.
Working code is better than cute, but broken one.
Simple is better than complex, complex is better than broken.
Simple is not necessarily easy.
Zero cost abstractions is lack of abstractions.
Crash often.
There are usually multiple ways to do it.
Especially if it&#39;s multi-platform.
Avoid undefined behavior.
Know your tools, understand your platform.
Code, profile, optimize. In that order.
If the implementation is hard to explain, it may be a bad idea.
Or it may be hard to explain.
Delete more code than you write.
</code></pre></div>
</div>

<p>Magic, ain&rsquo;t it? :) Well, needless to say the magic is in the <em>this.h</em> file. So
let&rsquo;s inspect it.</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-comment'>// this.h</span>
<span class='golog-keyword'>#ifndef</span> THIS_H
<span class='golog-keyword'>#define</span> THIS_H

<span class='golog-keyword'>#include</span> &lt;stdio.h&gt;


<span class='golog-keyword'>#define</span> main                                                                 \
    __dummy(<span class='golog-keyword'>void</span>) {<span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;}                                                \
    <span class='golog-keyword'>int</span> __user_main(<span class='golog-keyword'>int</span>, <span class='golog-keyword'>char</span>**);                                            \
    <span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)                                          \
    {                                                                        \
        printf(<span class='golog-string'>&#34;The Zen of C, by dweller\n&#34;</span>                                  \
          <span class='golog-string'>&#34;\n&#34;</span>                                                               \
          <span class='golog-string'>&#34;Code is better than cliché guidelines.\n&#34;</span>                         \
          <span class='golog-string'>&#34;Working code is better than cute, but broken one.\n&#34;</span>              \
          <span class='golog-string'>&#34;Simple is better than complex, complex is better than broken.\n&#34;</span>  \
          <span class='golog-string'>&#34;Simple is not necessarily easy.\n&#34;</span>                                \
          <span class='golog-string'>&#34;Zero cost abstractions is lack of abstractions.\n&#34;</span>                \
          <span class='golog-string'>&#34;Crash often.\n&#34;</span>                                                   \
          <span class='golog-string'>&#34;There are usually multiple ways to <span class='golog-keyword'>do</span> it.\n&#34;</span>                      \
          <span class='golog-string'>&#34;Especially <span class='golog-keyword'>if</span> it&#39;s multi-platform.\n&#34;</span>                             \
          <span class='golog-string'>&#34;Avoid undefined behavior.\n&#34;</span>                                      \
          <span class='golog-string'>&#34;Know your tools, understand your platform.\n&#34;</span>                     \
          <span class='golog-string'>&#34;Code, profile, optimize. In that order.\n&#34;</span>                        \
          <span class='golog-string'>&#34;If the implementation is hard to explain, it may be a bad idea.\n&#34;</span>\
          <span class='golog-string'>&#34;Or it may be hard to explain.\n&#34;</span>                                  \
          <span class='golog-string'>&#34;Delete more code than you write.\n&#34;</span>                               \
          );                                                                 \
        <span class='golog-keyword'>return</span> __user_main(argc, argv);                                      \
    }                                                                        \
    <span class='golog-keyword'>int</span> __user_main

<span class='golog-keyword'>#endif</span> <span class='golog-comment'>// THIS_H</span>
</code></pre></div>
</div>

<p>This is the secret sauce. What it does is replaces any symbol main (which is
<em>usually</em> your program entry) with this mess.
First part is <code>__dummy(void) {return 0;}</code>, it simply closes the <code>int</code> that we assume comes before <code>main</code>. After that, it declares a function <code>__user_main</code>
that has the same signature as regular <code>main</code>. <em>Then</em>, we actually declare and
define actual <code>main</code> function, we print our little joke-ish Zen, call
<code>__user_main</code> function with arguments from the shell and return whatever it
returns. And finally, at the end, we simply start the actual definition of
<code>__user_main</code>, as we expect the user&rsquo;s main implementation after this.</p>

<p>Let&rsquo;s see the output of C preprocessor, ignoring the <code>stdio.h</code> stuff:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
</span></pre></div>
    <div class="golog-lines"><pre><code>$ cpp main.c
</code></pre></div>
</div>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</span></pre></div>
    <div class="golog-lines"><pre><code># <span class='golog-number'>1</span> <span class='golog-string'>&#34;main.c&#34;</span>
# <span class='golog-number'>1</span> <span class='golog-string'>&#34;&lt;built-in&gt;&#34;</span>
# <span class='golog-number'>1</span> <span class='golog-string'>&#34;&lt;command-line&gt;&#34;</span>
# <span class='golog-number'>31</span> <span class='golog-string'>&#34;&lt;command-line&gt;&#34;</span>
# <span class='golog-number'>1</span> <span class='golog-string'>&#34;/usr/include/stdc-predef.h&#34;</span> <span class='golog-number'>1 3</span> 4
# <span class='golog-number'>32</span> <span class='golog-string'>&#34;&lt;command-line&gt;&#34;</span> <span class='golog-number'>2</span>
# <span class='golog-number'>1</span> <span class='golog-string'>&#34;main.c&#34;</span>
# <span class='golog-number'>1</span> <span class='golog-string'>&#34;this.h&#34;</span> <span class='golog-number'>1</span>

<span class='golog-comment'>// tons of stdio.h stuff...</span>

# <span class='golog-number'>5</span> <span class='golog-string'>&#34;this.h&#34;</span> <span class='golog-number'>2</span>
# <span class='golog-number'>2</span> <span class='golog-string'>&#34;main.c&#34;</span> <span class='golog-number'>2</span>

# <span class='golog-number'>4</span> <span class='golog-string'>&#34;main.c&#34;</span>
<span class='golog-keyword'>int</span> __dummy(<span class='golog-keyword'>void</span>) {<span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;} <span class='golog-keyword'>int</span> __user_main(<span class='golog-keyword'>int</span>, <span class='golog-keyword'>char</span>**); <span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv) { printf(<span class='golog-string'>&#34;The Zen of C, by dweller\n&#34;</span> <span class='golog-string'>&#34;\n&#34;</span> <span class='golog-string'>&#34;Code is better than cliché guidelines.\n&#34;</span> <span class='golog-string'>&#34;Working code is better than cute, but broken one.\n&#34;</span> <span class='golog-string'>&#34;Simple is better than complex, complex is better than broken.\n&#34;</span> <span class='golog-string'>&#34;Simple is not necessarily easy.\n&#34;</span> <span class='golog-string'>&#34;Zero cost abstractions is lack of abstractions.\n&#34;</span> <span class='golog-string'>&#34;Crash often.\n&#34;</span> <span class='golog-string'>&#34;There are usually multiple ways to <span class='golog-keyword'>do</span> it.\n&#34;</span> <span class='golog-string'>&#34;Especially <span class='golog-keyword'>if</span> it&#39;s multi-platform.\n&#34;</span> <span class='golog-string'>&#34;Avoid undefined behavior.\n&#34;</span> <span class='golog-string'>&#34;Know your tools, understand your platform.\n&#34;</span> <span class='golog-string'>&#34;Code, profile, optimize. In that order.\n&#34;</span> <span class='golog-string'>&#34;If the implementation is hard to explain, it may be a bad idea.\n&#34;</span> <span class='golog-string'>&#34;Or it may be hard to explain.\n&#34;</span> <span class='golog-string'>&#34;Delete more code than you write.\n&#34;</span> ); <span class='golog-keyword'>return</span> __user_main(argc, argv); } <span class='golog-keyword'>int</span> __user_main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    (<span class='golog-keyword'>void</span>)argc;
    (<span class='golog-keyword'>void</span>)argv;

    <span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;
}
</code></pre></div>
</div>

<p>If we just massage the code to make it bit more readable, since all of it is on
the same line, plus remove all the <code>cpp</code> stuff. We get:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</span></pre></div>
    <div class="golog-lines"><pre><code><span class='golog-comment'>// stdio.h stuff...</span>

<span class='golog-keyword'>int</span> __dummy(<span class='golog-keyword'>void</span>) {<span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;}
<span class='golog-keyword'>int</span> __user_main(<span class='golog-keyword'>int</span>, <span class='golog-keyword'>char</span>**);

<span class='golog-keyword'>int</span> main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    printf(<span class='golog-string'>&#34;The Zen of C, by dweller\n&#34;</span>
           <span class='golog-string'>&#34;\n&#34;</span>
           <span class='golog-string'>&#34;Code is better than cliché guidelines.\n&#34;</span>
           <span class='golog-string'>&#34;Working code is better than cute, but broken one.\n&#34;</span>
           <span class='golog-string'>&#34;Simple is better than complex, complex is better than broken.\n&#34;</span>
           <span class='golog-string'>&#34;Simple is not necessarily easy.\n&#34;</span>
           <span class='golog-string'>&#34;Zero cost abstractions is lack of abstractions.\n&#34;</span>
           <span class='golog-string'>&#34;Crash often.\n&#34;</span>
           <span class='golog-string'>&#34;There are usually multiple ways to <span class='golog-keyword'>do</span> it.\n&#34;</span>
           <span class='golog-string'>&#34;Especially <span class='golog-keyword'>if</span> it&#39;s multi-platform.\n&#34;</span>
           <span class='golog-string'>&#34;Avoid undefined behavior.\n&#34;</span>
           <span class='golog-string'>&#34;Know your tools, understand your platform.\n&#34;</span>
           <span class='golog-string'>&#34;Code, profile, optimize. In that order.\n&#34;</span>
           <span class='golog-string'>&#34;If the implementation is hard to explain, it may be a bad idea.\n&#34;</span>
           <span class='golog-string'>&#34;Or it may be hard to explain.\n&#34;</span>
           <span class='golog-string'>&#34;Delete more code than you write.\n&#34;</span>);

           <span class='golog-keyword'>return</span> __user_main(argc, argv);
}

<span class='golog-keyword'>int</span> __user_main(<span class='golog-keyword'>int</span> argc, <span class='golog-keyword'>char</span>** argv)
{
    (<span class='golog-keyword'>void</span>)argc;
    (<span class='golog-keyword'>void</span>)argv;

    <span class='golog-keyword'>return</span> <span class='golog-number'>0</span>;
}
</code></pre></div>
</div>

<p>And I think, that&rsquo;s pretty self-explanatory if you know C. Needless to say, this
has a lot of issues. It works only assuming you include it in the file that
contains <code>main</code> function, and that its signature is <code>int (*)(int, char**)</code>,
which is not always true. So this is just a toy to generate a bit of airflow
through some nostrils of a few programmers.</p>

<p>Have a nice one!</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/whitespace.html</id>
<link rel="alternate" href="https://cabin.digital/log/whitespace.html"/>
<title>Significant Whitespace</title>
<updated>2016-01-11T00:00:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="programming"/>
<category term="python"/>
<category term="haskell"/>
<category term="rant"/>
<content type="html"><![CDATA[<h1 id="or-more-like-significant-pain-in-the-rear">Or more like significant pain in the rear</h1>

<p>Today I want to talk about something that is bugging me lately. As you might&rsquo;ve guessed from the title, it&rsquo;s significant whitespace in programming languages.</p>

<p>While I was learning Python, I didn&rsquo;t give it much of a thought. Now admittedly I do not know Python as much I&rsquo;d wanted to, but I wrote enough scripts and small tools in it, that I feel <em>comfortable</em>. I haven&rsquo;t thought of significant whitespace much, because it was new, and it seemed cool. (it makes me format my code (not that it ever was a problem)) But now that I tried Haskell, my opinion changed, and in a bad way.</p>

<p>Okay, so you&rsquo;re a programmer? You solve problems, you write the solution to it as such that the dumbest thing can understand it - computer. So most of the time programmers <del>drink coffee and stare at the celling</del> are problem solving, and writing the solution down. Therefore, does it seem like a good idea to make a developer think <em>how</em> to write it down, rather than <em>what</em> to write down. In my humble option, you can always refactor the code and/or prettify it later, after the task have been solved.</p>

<p>When I just started to learn Haskell I got a bunch of errors that complained about indentations. That threw me off the track from solving problem on the hand a few times. I wrote possible solution down, tried it, and got errors that are not telling me what is <em>wrong</em> with my solution, but tell me that my <em>code is misaligned</em>. While I try to fix that, I lose my train of thought. (Also what is it about Haskell and tabs? <code>:unset -fwarn-tabs</code> anyone?)</p>

<p>In the end, all I want to say is that in my opinion tools should help us solve the problems, not get in the way. And significant whitespace gets in the way, especially in the beginning.</p>
]]></content>
</entry>
<entry>
<id>https://cabin.digital/tags/programming.xml:::https://cabin.digital/log/glocal.html</id>
<link rel="alternate" href="https://cabin.digital/log/glocal.html"/>
<title>Glocal - an interpreter I am making</title>
<updated>2015-04-07T00:00:00Z</updated>
<author>
<name>dweller</name>
</author>
<category term="glocal"/>
<category term="programming"/>
<category term="langdev"/>
<category term="dev"/>
<content type="html"><![CDATA[<p><img src="https://cabin.digital/images/Logo1.png" alt="Glocal logo" title="Glocal logo" /></p>

<p>Hello, I just decided to share that I am currently developing my own little interpreter. I am coding it in C#, because I am most familiar with it, and feel comfortable in it.</p>

<p>The language itself is pretty limited at the moment. it is called Glocal, an abbreviation of &ldquo;Glorified Calculator&rdquo;. Since that is what it basically is at the moment.</p>

<p>I am coding it from scratch, with no prior experience. So everything from Lexer to Evaluation is made with no external library. Reason? Just for fun! :D</p>

<p>Here are the list of features it has right now:</p>

<p>Math Expressions</p>

<ul>
<li>Variables</li>
<li>If-Then-Else Statement</li>
<li>Times Loop</li>
<li>Basic I/O with keywords (print and scan)</li>
<li>Basic Types (Int, Real, Bool, String)</li>
</ul>

<p>So I plan to also add:</p>

<ul>
<li>Lists/Tables</li>
<li>Scopes (Currently everything is global)</li>
<li>Structures</li>
<li>Functions/Procedures</li>
<li>Module loading (includes/using)</li>
<li>DLL support</li>
</ul>

<p>Here&rsquo;s a code snippet:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
</span></pre></div>
    <div class="golog-lines"><pre><code>{
    ; Has one line comments

    ; variables and basic I/O
    <span class='golog-keyword'>print</span> <span class='golog-string'>&#34;Echo: &#34;</span>
    a = <span class='golog-keyword'>scan</span>
    <span class='golog-keyword'>print</span> a

    ; Just added types, so no bool operations yet :(
    ; Has flow control
    <span class='golog-keyword'>if</span> <span class='golog-keyword'>true</span> <span class='golog-keyword'>then</span>
    {
        ; Has power operator
        b = <span class='golog-number'>2</span> ^ <span class='golog-number'>10</span>
        <span class='golog-keyword'>print</span> b
    }
    <span class='golog-keyword'>else</span>
    {
        ; And other basic math
        c = <span class='golog-number'>213.2</span> % <span class='golog-number'>3.1</span> + (<span class='golog-number'>16</span> - <span class='golog-number'>3</span>)
        <span class='golog-keyword'>print</span> c
    }

    <span class='golog-keyword'>print</span> <span class='golog-string'>&#34;Enter a number: &#34;</span>
    num = <span class='golog-keyword'>scan</span>

    i = <span class='golog-number'>0</span>

    ; Has a loop
    <span class='golog-keyword'>times</span> num <span class='golog-keyword'>do</span>
    {
        i = i + <span class='golog-number'>1</span>
        <span class='golog-keyword'>print</span> i
    }

    <span class='golog-keyword'>print</span> <span class='golog-string'>&#34;What is your name?&#34;</span>
    name = <span class='golog-keyword'>scan</span>
    <span class='golog-keyword'>print</span> <span class='golog-string'>&#34;Hello, &#34;</span> + name + <span class='golog-string'>&#34;!&#34;</span>
</code></pre></div>
</div>

<p>Output:</p>
<div class="golog-codeblock">
    <div class="golog-linenums"><pre><span>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</span></pre></div>
    <div class="golog-lines"><pre><code>D:\C#\Interpreter\Interpreter\Debug&gt;glocal test5.gc
Echo:
echo?
echo?
1024
Enter a number:
5
1
2
3
4
5
What is your name?
Dweller
Hello, Dweller!

D:\C#\Interpreter\Interpreter\Debug&gt;
</code></pre></div>
</div>

<p>Thanks for reading! :D</p>
]]></content>
</entry>
</feed>
