<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="/rss.xsl" type="text/xsl"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Himwant</title><description>Musings from the mountain-top</description><link>https://himwant.org</link><item><title>xv6 - Introduction</title><link>https://himwant.org/posts/xv6-intro</link><guid isPermaLink="true">https://himwant.org/posts/xv6-intro</guid><description>Introducing the xv6 kernel</description><pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Why??&lt;/h1&gt;
&lt;p&gt;I am currently going through the book &lt;em&gt;&lt;strong&gt;Operating Systems: Three Easy Pieces&lt;/strong&gt;&lt;/em&gt;, and the next step after completing the first piece, Virtualisation, is to go through the xv6 kernel. The xv6 kernel had two implementations, the x86 one and RISC-V one. As the x86 one is now unmaintained and on recommendation of a mentor, I am using &lt;a href=&quot;https://github.com/mit-pdos/xv6-riscv&quot;&gt;xv6-riscv&lt;/a&gt;! So without further ado, let&apos;s get started.&lt;/p&gt;
&lt;h1&gt;The xv6 Kernel&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Based on Unix kernel&lt;/li&gt;
&lt;li&gt;Created by MIT for educational purposes&lt;/li&gt;
&lt;li&gt;Implemented for RISC-V&lt;/li&gt;
&lt;li&gt;Multicore&lt;/li&gt;
&lt;li&gt;~6000 lines of code (C and Assembly)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Features&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Processes&lt;/li&gt;
&lt;li&gt;Virtual Address Spaces
&lt;ul&gt;
&lt;li&gt;Page Tables&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Files and Directories&lt;/li&gt;
&lt;li&gt;Pipes&lt;/li&gt;
&lt;li&gt;Multitasking
&lt;ul&gt;
&lt;li&gt;Time-Slicing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;21 system calls&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;User Programs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;sh&lt;/li&gt;
&lt;li&gt;cat&lt;/li&gt;
&lt;li&gt;echo&lt;/li&gt;
&lt;li&gt;grep&lt;/li&gt;
&lt;li&gt;kill&lt;/li&gt;
&lt;li&gt;ln&lt;/li&gt;
&lt;li&gt;ls&lt;/li&gt;
&lt;li&gt;mkdir&lt;/li&gt;
&lt;li&gt;rm&lt;/li&gt;
&lt;li&gt;wc&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What is missing?&lt;/h2&gt;
&lt;p&gt;All the complexity of a &quot;real&quot; operating system is missing. For example, UserID, file protections, IPC, &quot;mount&quot;able filesystem, etc.&lt;/p&gt;
&lt;h2&gt;SMP: Shared Memory Multiprocesser&lt;/h2&gt;
&lt;p&gt;CPU = core = HART&lt;/p&gt;
&lt;p&gt;Main memory (RAM) is shared and is of size 128 Mbytes&lt;/p&gt;
&lt;h2&gt;Supported Devices&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;UART    (Serial Comm.    Tx &amp;lt;=&amp;gt; Rx)&lt;/li&gt;
&lt;li&gt;Disk&lt;/li&gt;
&lt;li&gt;Timer Interrupts&lt;/li&gt;
&lt;li&gt;PLIC: Platform Level Interrupt Controller&lt;/li&gt;
&lt;li&gt;CLINT: Core Local Interrupt Controller&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Memory Management&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Page Size = 4096 bytes    (&lt;code&gt;#define PGSIZE&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Single Free List&lt;/li&gt;
&lt;li&gt;No variable sized allocation&lt;/li&gt;
&lt;li&gt;No &quot;malloc&quot;&lt;/li&gt;
&lt;li&gt;Page Tables-
&lt;ul&gt;
&lt;li&gt;Three Levels&lt;/li&gt;
&lt;li&gt;One table per process + One table for Kernel&lt;/li&gt;
&lt;li&gt;Pages marked - R/W/X/U/V&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Scheduler&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Round Robin&lt;/li&gt;
&lt;li&gt;Size of TimeSlice is fixed (1,000,000 cycles)&lt;/li&gt;
&lt;li&gt;All cores share one &quot;Ready Queue&quot;&lt;/li&gt;
&lt;li&gt;Next TimeSlice may be on a different core&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Boot Sequence&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;QEMU
&lt;ul&gt;
&lt;li&gt;Loads kernel code at a fixed address (0x8000-0000)&lt;/li&gt;
&lt;li&gt;Starts all cores running&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;No bootloader/boot-block/BIOS&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Locking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Spin Locks&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sleep()&lt;/code&gt; &amp;amp; &lt;code&gt;wakeup()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&quot;Param.h&quot;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Fixed Limits (like no. of processes, no. of open files, etc)&lt;/li&gt;
&lt;li&gt;Several Arrays (&quot;&lt;code&gt;kill(pid)&lt;/code&gt;&quot; =&amp;gt; Linear search of Processes array)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;User Address Space&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;./ua-space.jpeg&quot; alt=&quot;img&quot; title=&quot;Sorry for the bad handwriting!&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Arguments (argc, argv) will be placed on the stack before the program begins execution.&lt;/p&gt;
&lt;h2&gt;RISC-V Virtual Addresses&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Multiple Schemes are available&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sv32 -&amp;gt; Used for 2-level Page Tables&lt;/li&gt;
&lt;li&gt;Sv39 -&amp;gt; For 3-levels&lt;/li&gt;
&lt;li&gt;Sv48 -&amp;gt; For 4-levels!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;xv6 uses Sv39&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;VA Size-&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  39 bits
  2^39 = 512 GB
       = 0x80-0000-0000
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;But xv6 uses only 38 bits for its VAs&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  38 bits
  2^38 = 256 GB
           = 0x40-0000-0000
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Therefore the allowed range is -  0…0x3F-FFFF-FFFF&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Startup&lt;/h2&gt;
&lt;p&gt;We go from&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;     entry.S         -&amp;gt;          start.c             -&amp;gt;            main.c
   (Setup Stack)              (Machine Mode)                  (Supervisor Mode)
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><author>Akshit Gaur</author></item><item><title>xv6 - SpinLocks</title><link>https://himwant.org/posts/xv6-spinlocking</link><guid isPermaLink="true">https://himwant.org/posts/xv6-spinlocking</guid><description>Should I spin or should I lock?</description><pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Spin Locking&lt;/h1&gt;
&lt;p&gt;We know that xv6 uses SpinLocks, sleep and wakeup for its locking. In this post, let us discuss SpinLocks.&lt;/p&gt;
&lt;p&gt;SpinLocks are defined in &lt;a href=&quot;https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/spinlock.c&quot;&gt;spinlock.c&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Basics&lt;/h2&gt;
&lt;p&gt;There are primarily &lt;strong&gt;two&lt;/strong&gt; functions related to locks, acquire and release&lt;/p&gt;
&lt;p&gt;:::warning
Spin Locks should not be held for a long time as spinning wastes cycles!
:::&lt;/p&gt;
&lt;p&gt;:::tip
In such cases, consider using &lt;code&gt;sleep()&lt;/code&gt; and &lt;code&gt;wakeup()&lt;/code&gt;!
:::&lt;/p&gt;
&lt;h3&gt;acquire&lt;/h3&gt;
&lt;p&gt;Acquiring the lock seems simple enough on the surface.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;while (locked == 1);
locked = 1;
cpu = mycpu();
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But two processors can simultaniously execute this! Therefore we need an atomic operation to read and write to &lt;code&gt;locked&lt;/code&gt; in one instruction. Fortunately RISC-V has one such instrcution - &lt;code&gt;AMOSWAP&lt;/code&gt;, which we will use.&lt;/p&gt;
&lt;p&gt;Another thing to note is that we need to disable preemption or interrupts while we are in the critical section!&lt;/p&gt;
&lt;p&gt;Therefore, we have-&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   1 -&amp;gt; [  Locked  ] -&amp;gt; y
   if (y != 0) try again
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;release&lt;/h3&gt;
&lt;p&gt;release is quite simple, just put value 0 in the locked field!&lt;/p&gt;
&lt;h2&gt;Theory&lt;/h2&gt;
&lt;p&gt;Now, &lt;code&gt;acquire()&lt;/code&gt; disables the interrupts and &lt;code&gt;release()&lt;/code&gt; re-enables them. But now, situations with multiple locks becomes problematic.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;acquire(&amp;amp;lk1);
acquire(&amp;amp;lk2);

// Critical Section

release(&amp;amp;lk2);    // Interrupts re-enabled prematurely!!

// Some more operations

release(&amp;amp;lk1);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, we store a counter &lt;code&gt;noff&lt;/code&gt; and a status int &lt;code&gt;intena&lt;/code&gt; (interrupts enabled) for each hart.&lt;/p&gt;
&lt;p&gt;If this is the first acquire (&lt;code&gt;noff =​= 0&lt;/code&gt;), we set &lt;code&gt;intena&lt;/code&gt; to current interrupt status (0 if disables, 1 otherwise). We then increment the counter regardless.&lt;/p&gt;
&lt;p&gt;During release, we decrement the counter and if it becomes 0, we restore the interrupts to their saved state in &lt;code&gt;intena&lt;/code&gt;!&lt;/p&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;h3&gt;Structure&lt;/h3&gt;
&lt;p&gt;The struct spinlock itself, is defined in &lt;a href=&quot;https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/spinlock.h&quot;&gt;spinlock.h&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;struct spinlock {
  uint locked;       // Is the lock held?

  // For debugging:
  char *name;        // Name of lock.
  struct cpu *cpu;   // The cpu holding the lock.
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It contains an int &lt;code&gt;locked&lt;/code&gt;, which indicates that the lock is free/unacquired when &lt;code&gt;locked = 0&lt;/code&gt; and otherwise for &lt;code&gt;locked = 1&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;Imports&lt;/h3&gt;
&lt;p&gt;We first import the necessary files. Nothing too exciting here.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#include &quot;types.h&quot;
#include &quot;param.h&quot;
#include &quot;memlayout.h&quot;
#include &quot;spinlock.h&quot;
#include &quot;riscv.h&quot;
#include &quot;proc.h&quot;
#include &quot;defs.h&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Initalising the lock&lt;/h3&gt;
&lt;p&gt;Before we can even acquire the lock, we need to actually initialise it! Its steps are easy enough to be obvious.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void
initlock(struct spinlock *lk, char *name)
{
  lk-&amp;gt;name = name;
  lk-&amp;gt;locked = 0;
  lk-&amp;gt;cpu = 0;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Acquiring the Lock&lt;/h3&gt;
&lt;p&gt;Let&apos; start writing the acquire function!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void acquire(struct spinlock *lk) {
  &amp;lt;&amp;lt;aq_disable_interrupt&amp;gt;&amp;gt;
  &amp;lt;&amp;lt;aq_atomic_swap&amp;gt;&amp;gt;
  &amp;lt;&amp;lt;aq_misc&amp;gt;&amp;gt;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;As discussed earlier we first disable the interrupts. &lt;code&gt;push_off()&lt;/code&gt; takes care of that. We also ensure we already aren&apos;t holding the lock.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  push_off();
  if(holding(lk))
      panic(&quot;acquire&quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;   while(__sync_lock_test_and_set(&amp;amp;lk-&amp;gt;locked, 1) != 0);
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We need to ensure that compiler optimisations do not mess with the order, also store the cpu&apos;s information in the struct.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   __sync_synchronize();
   lk-&amp;gt;cpu = mycpu();
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our final acquire function-&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void acquire(struct spinlock *lk) {
  push_off();
  if(holding(lk))
      panic(&quot;acquire&quot;);
  while(__sync_lock_test_and_set(&amp;amp;lk-&amp;gt;locked, 1) != 0);
  __sync_synchronize();
  lk-&amp;gt;cpu = mycpu();
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Releasing the lock&lt;/h3&gt;
&lt;p&gt;Releasing is a lot simpler.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void release(struct spinlock *lk) {
  &amp;lt;&amp;lt;rl_misc&amp;gt;&amp;gt;
  &amp;lt;&amp;lt;rl_en_int&amp;gt;&amp;gt;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Ensure we are holding the lock, scrub cpu information, synchronise, and release the lock&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   if(!holding(lk))
     panic(&quot;release&quot;);

   lk-&amp;gt;cpu = 0;

   __sync_synchronize();

   __sync_lock_release(&amp;amp;lk-&amp;gt;locked);
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Re-enable Interrupts&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   pop_off();
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Release function-&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void release(struct spinlock *lk) {
  if(!holding(lk))
    panic(&quot;release&quot;);

  lk-&amp;gt;cpu = 0;

  __sync_synchronize();

  __sync_lock_release(&amp;amp;lk-&amp;gt;locked);
  pop_off();
}
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><author>Akshit Gaur</author></item><item><title>xv6 - Memory Management</title><link>https://himwant.org/posts/xv6-mem-management</link><guid isPermaLink="true">https://himwant.org/posts/xv6-mem-management</guid><description>How does xv6 manage memory?</description><pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;We know that xv6 uses a simple freelist, keeping that in mind, let&apos;s take a look at &lt;a href=&quot;https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/kalloc.c&quot;&gt;kalloc.c&lt;/a&gt;!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  extern char end[];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First memory address after kernel.
Defined by &lt;code&gt;kernel.ld&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  struct run {
    struct run *next;
  };
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the freelist.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  struct {
    struct spinlock lock;
    struct run *freelist;
  } kmem;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Protects the freelist with a lock.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  void
  kinit()
  {
    initlock(&amp;amp;kmem.lock, &quot;kmem&quot;);
    freerange(end, (void*)PHYSTOP);
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Initialises the lock and frees the entire memory space above the kernel.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  void
  freerange(void *pa_start, void *pa_end)
  {
    char *p;
    p = (char*)PGROUNDUP((uint64)pa_start);
    for(; p + PGSIZE &amp;lt;= (char*)pa_end; p += PGSIZE)
      kfree(p);
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We go from &lt;code&gt;pa_start&lt;/code&gt; to &lt;code&gt;pa_end&lt;/code&gt; and free each page!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The next two functions do require a closer look-&lt;/p&gt;
&lt;h1&gt;kfree&lt;/h1&gt;
&lt;pre&gt;&lt;code&gt;void
kfree(void *pa)
{
  struct run *r;

  if(((uint64)pa % PGSIZE) != 0 || (char*)pa &amp;lt; end || (uint64)pa &amp;gt;= PHYSTOP)
    panic(&quot;kfree&quot;);

  // Fill with junk to catch dangling refs.
  memset(pa, 1, PGSIZE);

  r = (struct run*)pa;

  acquire(&amp;amp;kmem.lock);
  r-&amp;gt;next = kmem.freelist;
  kmem.freelist = r;
  release(&amp;amp;kmem.lock);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s go line by line here.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;   void
   kfree(void *pa)
   {
     struct run *r;

     ...
   }
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;We take in physical address (&lt;code&gt;pa&lt;/code&gt;) as input, which is a pointer that points to the page we want to free.&lt;/li&gt;
&lt;li&gt;We also create &lt;code&gt;struct run&lt;/code&gt; to actually manipulate memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;   if(((uint64)pa % PGSIZE) != 0 || (char*)pa &amp;lt; end || (uint64)pa &amp;gt;= PHYSTOP)
     panic(&quot;kfree&quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here, we define the conditions in which &lt;code&gt;kfree&lt;/code&gt; is invalid to run and should panic-&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;(uint64)pa % PGSIZE != 0&lt;/code&gt; -&amp;gt; The pointer &lt;code&gt;pa&lt;/code&gt; is not page-aligned!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;(char*)pa &amp;lt; end&lt;/code&gt; -&amp;gt; &lt;code&gt;pa&lt;/code&gt; points to a kernel page!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;(uint64)pa &amp;gt;= PHYSTOP&lt;/code&gt; -&amp;gt; &lt;code&gt;pa&lt;/code&gt; points to a memory address larger than the physical memory!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;   memset(pa, 1, PGSIZE);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As is quite obvious here, we fill the entire page with junk value so a snooping program cannot see the residual data. It also helps in debugging and catching dangling refs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;   r = (struct run*)pa;

   acquire(&amp;amp;kmem.lock);
   r-&amp;gt;next = kmem.freelist;
   kmem.freelist = r;
   release(&amp;amp;kmem.lock);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Step-by-step execution of this block-&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We assign the page to &lt;code&gt;r&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We acquire the lock to memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We prepend this page to the start of the freelist.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We point the freelist pointer to &lt;code&gt;r&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We release the lock on memory we are currently holding.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that concludes the &lt;code&gt;kfree&lt;/code&gt; function!&lt;/p&gt;
&lt;h1&gt;kalloc&lt;/h1&gt;
&lt;pre&gt;&lt;code&gt;void *
kalloc(void)
{
  struct run *r;

  acquire(&amp;amp;kmem.lock);
  r = kmem.freelist;
  if(r)
    kmem.freelist = r-&amp;gt;next;
  release(&amp;amp;kmem.lock);

  if(r)
    memset((char*)r, 5, PGSIZE); // fill with junk
  return (void*)r;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s again analyse each line (only unique ones)-&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  struct run *r;

  acquire(&amp;amp;kmem.lock);
  r = kmem.freelist;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We acquire the memory lock and point a variable &lt;code&gt;r&lt;/code&gt; to it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  if(r)
    kmem.freelist = r-&amp;gt;next;
  release(&amp;amp;kmem.lock);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If r exists, we move freelist pointer to the 2nd node and release the lock.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;pre&gt;&lt;code&gt;  if(r)
    memset((char*)r, 5, PGSIZE); // fill with junk
  return (void*)r;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, if r exists, we fill it with junk to foil any would be snooper&apos;s plan!&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that concludes the &lt;code&gt;kalloc.c&lt;/code&gt; file in xv6 kernel! You are finally ready to conquer your memory!!&lt;/p&gt;
&lt;p&gt;Our final &lt;code&gt;kalloc.c&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Physical memory allocator, for user processes,
// kernel stacks, page-table pages,
// and pipe buffers. Allocates whole 4096-byte pages.

#include &quot;types.h&quot;
#include &quot;param.h&quot;
#include &quot;memlayout.h&quot;
#include &quot;spinlock.h&quot;
#include &quot;riscv.h&quot;
#include &quot;defs.h&quot;

void freerange(void *pa_start, void *pa_end);

extern char end[]; // first address after kernel.
                   // defined by kernel.ld.

struct run {
  struct run *next;
};

struct {
  struct spinlock lock;
  struct run *freelist;
} kmem;

void
kinit()
{
  initlock(&amp;amp;kmem.lock, &quot;kmem&quot;);
  freerange(end, (void*)PHYSTOP);
}

void
freerange(void *pa_start, void *pa_end)
{
  char *p;
  p = (char*)PGROUNDUP((uint64)pa_start);
  for(; p + PGSIZE &amp;lt;= (char*)pa_end; p += PGSIZE)
    kfree(p);
}

// Free the page of physical memory pointed at by pa,
// which normally should have been returned by a
// call to kalloc().  (The exception is when
// initializing the allocator; see kinit above.)
void
kfree(void *pa)
{
  struct run *r;

  if(((uint64)pa % PGSIZE) != 0 || (char*)pa &amp;lt; end || (uint64)pa &amp;gt;= PHYSTOP)
    panic(&quot;kfree&quot;);

  // Fill with junk to catch dangling refs.
  memset(pa, 1, PGSIZE);

  r = (struct run*)pa;

  acquire(&amp;amp;kmem.lock);
  r-&amp;gt;next = kmem.freelist;
  kmem.freelist = r;
  release(&amp;amp;kmem.lock);
}

// Allocate one 4096-byte page of physical memory.
// Returns a pointer that the kernel can use.
// Returns 0 if the memory cannot be allocated.
void *
kalloc(void)
{
  struct run *r;

  acquire(&amp;amp;kmem.lock);
  r = kmem.freelist;
  if(r)
    kmem.freelist = r-&amp;gt;next;
  release(&amp;amp;kmem.lock);

  if(r)
    memset((char*)r, 5, PGSIZE); // fill with junk
  return (void*)r;
}
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><author>Akshit Gaur</author></item><item><title>RSoC 2026: A new CPU scheduler for Redox</title><link>https://himwant.org/posts/redox-dwrr</link><guid isPermaLink="true">https://himwant.org/posts/redox-dwrr</guid><description>Say hi to the brand new scheduler</description><pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;TL;DR&lt;/h1&gt;
&lt;p&gt;I have replaced the legacy Round Robin scheduler with a Deficit Weighted Round Robin scheduler. Due to this, we finally have a way of assigning different priorities to our Process contexts. When running under light load, you may not notice any difference, but under heavy load the new scheduler outperforms the old one (eg. ~150 FPS gain in the &lt;code&gt;pixelcannon&lt;/code&gt; 3D Redox demo, and ~1.5x gain in operations/sec for CPU bound tasks and a similar improvement in responsiveness too (measured through &lt;a href=&quot;https://gitlab.redox-os.org/akshitgaur2005/schedrs&quot;&gt;schedrs&lt;/a&gt;))&lt;/p&gt;
&lt;h1&gt;What is Redox OS?&lt;/h1&gt;
&lt;p&gt;Let&apos;s keep this brief, Redox OS is Rust-based microkern-&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Redox OS is a complete Unix-like microkernel-based operating system written in Rust, with a focus on security, reliability and safety&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Being written in Rust, it gives an inherently higher floor for quality than C in my humble opinion.&lt;/p&gt;
&lt;h2&gt;Round Robin Scheduler&lt;/h2&gt;
&lt;p&gt;Redox OS currently uses a simple &lt;a href=&quot;https://osdev.wiki/wiki/Scheduling_Algorithms#Round_Robin&quot;&gt;Round Robin Scheduler&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Imagine you are sitting at a bar with a few of your friends, the bar has all drinks free for tonight, and as a result the bar is understaffed with the number of bartenders significantly less than the customers. The bartenders start from the left, serve the customer, and move to their right.&lt;/p&gt;
&lt;p&gt;Some patrons drink slowly and may still have a drink when the bartender returns. As a result, even though not everyone needs a new drink each time, bartenders must still check with them, which introduces inefficiency into the system.&lt;/p&gt;
&lt;p&gt;This system works well enough, customers wait for a while, but everybody waits for the same time, and everyone is happy, or atleast equally unhappy.&lt;/p&gt;
&lt;p&gt;Unfortunately for these bartenders, a local politician, with quite a short temper and a very large ego, happens to be one of the customers today. If these poor batenders follow their usual protocol and treat the VIP the same as the rest, he will sigterm their employment, but bound by the protocol they have no choice but move in a loop seeing the VIP boil in rage.&lt;/p&gt;
&lt;p&gt;In an Operating System, that VIP customer is a high priority I/O bound interactive process (like your audio stack, where even the slightest delay can cause audible artifacts). If it waits in a RR queue behind a CPU-hogging background task. The system feels unresponsive and user sigkills many children in frustration.&lt;/p&gt;
&lt;p&gt;Enter..&lt;/p&gt;
&lt;h2&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Deficit_round_robin&quot;&gt;Deficit Weighted Round Robin Scheduler&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Following the debacle at the bar, the offer for free drinks is now over, but the bartenders REALLY like to give free drinks, so they come up with a solution, a stone to kill two birds!&lt;/p&gt;
&lt;p&gt;They set up 3 token dispensers, each giving out tokens at different speeds, 1, 2, and 4 tokens per second. 3 queues (A, B, C) form along these dispensers, with a bouncer assigning each customer to a row. The price for a beer has been decided as two tokens, so customer standing in queue A waits two ticks before leaving, while the customer in queue C can afford two beers in just one tick. The customer then leaves the queue and &quot;purchases&quot; the beer!&lt;/p&gt;
&lt;p&gt;The problem for VIP&apos;s is solved! As soon as a VIP arrives, the bouncer directs him to queue C, and everybody in the other two queues is put on the backburner.&lt;/p&gt;
&lt;p&gt;Unfortunately though, we now have a problem with starvation, as queue C is closer to the exit than B, which is closer than A. If people from all queues have accumulated their balance and want to leave, the customer in C will always be the one to go first. This is a problem! If there are many customers in queue C, the customers from queue A and B will get no chance to purchase beer and will die of starvation!&lt;/p&gt;
&lt;p&gt;Deficit Weighted Round Robin Scheduler (DWRR) groups the processes in multiple queues, assigning priority to each queue. At each context switch, it starts with the queue with the highest priority, adds its weight to its balance, and keeps running tasks from it until its balance is below some base price, and then the scheduler moves on to the next queue in the list.&lt;/p&gt;
&lt;p&gt;It correctly prioritises the processes with high priority, but can lead to starvation and higher latency for lower priority processes.&lt;/p&gt;
&lt;h3&gt;Interleaving&lt;/h3&gt;
&lt;p&gt;To solve the starvation problem without sacrificing the VIPs&apos; needs, we move to an Interleaved approach. Instead of letting one queue exhaust its entire balance in one go, the scheduler &quot;interleaves&quot; the work.&lt;/p&gt;
&lt;p&gt;Think of it as the bartenders serving one round to the VIP queue, then immediately checking if anyone in the lower-priority queues has enough tokens for just one drink, before swinging back to the VIPs.&lt;/p&gt;
&lt;p&gt;This results in a slight increase in context-switch overhead, but the latency benefits are undeniable.&lt;/p&gt;
&lt;p&gt;Read more about this comparision &lt;a href=&quot;https://en.wikipedia.org/wiki/Weighted_round_robin#Interleaved_WR&quot;&gt;on wikipedia&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;How to set it up?&lt;/h2&gt;
&lt;h3&gt;Scheduler&lt;/h3&gt;
&lt;p&gt;After setting up Redox OS and ensuring it builds, check out &lt;a href=&quot;https://gitlab.redox-os.org/redox-os/kernel/-/merge_requests/539&quot;&gt;this kernel MR&lt;/a&gt;, and the all the related MR&apos;s mentioned in the first comment.&lt;/p&gt;
&lt;p&gt;For anyone trying to test it, you have to clone the repos of &lt;code&gt;redox_syscall&lt;/code&gt; and &lt;code&gt;libredox&lt;/code&gt; (given in the related MRs) into &lt;code&gt;recipes/core/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Your directory should finally look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;akshit@laptop-of-doom ~/w/r/r/r/core&amp;gt; ls
base/         binutils/    contain/    dash/        findutils/  ion/     libredox/  netutils/  pkgutils/  redoxfs/        relibc/  userutils/
base-initfs/  bootloader/  coreutils/  extrautils/  installer/  kernel/  netdb/     pkgar/     profiled/  redox_syscall/  strace/  uutils/

akshit@laptop-of-doom ~/w/r/r/r/core&amp;gt; pwd
/home/akshit/workspace/rust/redox/recipes/core
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I have added the patches for Cargo.toml for all the MRs temporarily which will be removed before merging.&lt;/p&gt;
&lt;h3&gt;Nice &amp;amp; Renice&lt;/h3&gt;
&lt;p&gt;Checkout this &lt;a href=&quot;https://gitlab.redox-os.org/redox-os/redox/-/merge_requests/2034/diffs&quot;&gt;MR&lt;/a&gt;, and add &lt;code&gt;renice = {}&lt;/code&gt; to your desktop.toml.&lt;/p&gt;
&lt;p&gt;You are now setup to try the goodness of the new scheduler!&lt;/p&gt;
&lt;p&gt;Usage of &lt;code&gt;nice&lt;/code&gt; and &lt;code&gt;renice&lt;/code&gt; is quite self-evident.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nice -n -10 pixelcannon
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;renice -n -5 -p 1234
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Numbers&lt;/h2&gt;
&lt;p&gt;For comparing the different schedulers, I built an &lt;a href=&quot;https://gitlab.redox-os.org/akshitgaur2005/sched-sim&quot;&gt;isolated testing harness&lt;/a&gt;. Let&apos;s see their results!&lt;/p&gt;
&lt;h3&gt;Idealised Workflow&lt;/h3&gt;
&lt;p&gt;40,000 tasks are initialised at &lt;code&gt;t=0&lt;/code&gt;, these tasks are CPU-hogging and never block, their runtime is long enough that they will not finish in our simulation timeframe of 100,000 ticks, and our simulated CPU has 16 cores&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Round Robin&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== Simulation finished (100000 ticks, 16 cores) ===
  Tasks spawned : 40000
  Tasks finished: 0
  Execute events: 1600000
  Block events  : 0
  Idle slots    : 0
  Tasks alive   : 40000

=== Diagnostics ===
  Absolute CPU Util : 100.00%
  Scheduler Effic.  : 100.00%
  Forced Idles      : 0
  Wasted Idles      : 0
  Avg Turnaround    : N/A (No tasks finished)
  Avg Wait Time     : 99960.00 ticks (All tasks)
  Max Wait Time     : 99960 ticks

=== Scheduler Verification ===
  Avg Response Time : 1249.50 ticks (All tasks)
  Context Switches  : 1599984
  Ctx Switch Rate   : 100.00% of executed ticks
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As this is a very idealised workflow, the only important metric to note here is Avg Response Time, as we will see further, the response time for RR is the lowest, but as discussed previously, we have no way of assigning different priorities to our tasks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deficit Weighted Round Robin&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== Simulation finished (100000 ticks, 16 cores) ===
  Tasks spawned : 40000
  Tasks finished: 0
  Execute events: 1600000
  Block events  : 0
  Idle slots    : 0
  Tasks alive   : 40000

=== Diagnostics ===
  Absolute CPU Util : 100.00%
  Scheduler Effic.  : 100.00%
  Forced Idles      : 0
  Wasted Idles      : 0
  Avg Turnaround    : N/A (No tasks finished)
  Avg Wait Time     : 99960.00 ticks (All tasks)
  Max Wait Time     : 100000 ticks

=== Scheduler Verification ===
  Avg Response Time : 34459.56 ticks (All tasks)
  Context Switches  : 1599984
  Ctx Switch Rate   : 100.00% of executed ticks

  Prio | Theor. Weight | Avg Execs/Task | Avg Wait/Task | Avg Resp/Task | Samples
  ----------------------------------------------------------------------------------
     0 |            11 |           0.05 |      99999.95 |      98296.72 |    1000
     1 |            14 |           0.05 |      99999.95 |      97630.59 |    1000
     2 |            18 |           0.08 |      99999.92 |      96717.01 |    1000
     3 |            23 |           0.10 |      99999.90 |      95577.52 |    1000
     4 |            29 |           0.13 |      99999.87 |      94231.06 |    1000
     5 |            36 |           0.16 |      99999.84 |      92651.36 |    1000
     6 |            46 |           0.19 |      99999.81 |      90405.76 |    1000
     7 |            57 |           0.26 |      99999.74 |      87896.05 |    1000
     8 |            72 |           0.32 |      99999.68 |      84514.77 |    1000
     9 |            90 |           0.40 |      99999.60 |      80455.06 |    1000
    10 |           112 |           0.50 |      99999.50 |      75471.92 |    1000
    11 |           140 |           0.62 |      99999.38 |      69147.95 |    1000
    12 |           175 |           0.78 |      99999.22 |      61239.20 |    1000
    13 |           219 |           0.98 |      99999.02 |      51306.77 |    1000
    14 |           274 |           1.23 |      99998.77 |      40981.82 |    1000
    15 |           343 |           1.54 |      99998.46 |      32685.75 |    1000
    16 |           428 |           1.92 |      99998.08 |      26163.02 |    1000
    17 |           535 |           2.40 |      99997.60 |      20889.75 |    1000
    18 |           669 |           3.01 |      99996.99 |      16676.70 |    1000
    19 |           836 |           3.76 |      99996.24 |      13305.07 |    1000
    20 |          1024 |           4.62 |      99995.38 |      10655.50 |    1000
    21 |          1280 |           5.78 |      99994.22 |       8591.62 |    1000
    22 |          1600 |           7.22 |      99992.78 |       6861.10 |    1000
    23 |          2000 |           9.02 |      99990.98 |       5469.71 |    1000
    24 |          2500 |          11.28 |      99988.72 |       4331.65 |    1000
    25 |          3125 |          14.10 |      99985.90 |       3432.66 |    1000
    26 |          3906 |          17.63 |      99982.37 |       2714.61 |    1000
    27 |          4882 |          22.03 |      99977.97 |       2144.54 |    1000
    28 |          6103 |          27.55 |      99972.45 |       1703.30 |    1000
    29 |          7629 |          34.45 |      99965.55 |       1327.54 |    1000
    30 |          9536 |          43.06 |      99956.94 |       1042.08 |    1000
    31 |         11920 |          53.82 |      99946.18 |        817.14 |    1000
    32 |         14901 |          67.28 |      99932.72 |        639.05 |    1000
    33 |         18626 |          84.10 |      99915.90 |        507.95 |    1000
    34 |         23283 |         104.80 |      99895.20 |        408.92 |    1000
    35 |         29103 |         130.96 |      99869.04 |        340.15 |    1000
    36 |         36379 |         163.70 |      99836.30 |        296.71 |    1000
    37 |         45474 |         204.62 |      99795.38 |        284.85 |    1000
    38 |         56843 |         255.79 |      99744.21 |        274.55 |    1000
    39 |         71053 |         319.73 |      99680.27 |        294.75 |    1000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s clarify the columns first-
a. Prio - The Priority of the task
b. Theor. Weight - Its Theoretical Weight.
c. Avg Execs/Task - How many times on average, a task in this priority queue was executed (Should be higher for higher priorities)
d. Avg Wait/Task - How much time a task spends sitting in the run&amp;lt;sub&amp;gt;queue&amp;lt;/sub&amp;gt; waiting for a CPU to pick it up. (It is nearly same for the idealised workflows because of the sheer number of tasks vs only 16 cores)
e. Avg Resp/Task - The time from when a task first arrives to the very first time it gets to execute on the CPU.
f. Samples - Number of tasks in this priority queue.&lt;/p&gt;
&lt;p&gt;As can be seen, the avg response time has shot up from 1249 to 34459, a 27x increase!
But the story changes when we look at Avg Execs/Task and Avg Resp/Task, a task with prio 39 has seen a significant increase in its execs, and decrease in its Resp. time!&lt;/p&gt;
&lt;p&gt;Though, we can notice the plight of the lower priority tasks and their starvation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Interleaved DWRR&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== Simulation finished (100000 ticks, 16 cores) ===
  Tasks spawned : 40000
  Tasks finished: 0
  Execute events: 1600000
  Block events  : 0
  Idle slots    : 0
  Tasks alive   : 40000

=== Diagnostics ===
  Absolute CPU Util : 100.00%
  Scheduler Effic.  : 100.00%
  Forced Idles      : 0
  Wasted Idles      : 0
  Avg Turnaround    : N/A (No tasks finished)
  Avg Wait Time     : 99960.00 ticks (All tasks)
  Max Wait Time     : 100000 ticks

=== Scheduler Verification ===
  Avg Response Time : 7442.54 ticks (All tasks)
  Context Switches  : 1599984
  Ctx Switch Rate   : 100.00% of executed ticks

  Prio | Theor. Weight | Avg Execs/Task | Avg Wait/Task | Avg Resp/Task | Samples
  ----------------------------------------------------------------------------------
     0 |            11 |           0.86 |      99999.14 |      57294.02 |    1000
     1 |            14 |           1.10 |      99998.90 |      45990.23 |    1000
     2 |            18 |           1.41 |      99998.59 |      35901.44 |    1000
     3 |            23 |           1.79 |      99998.21 |      28225.50 |    1000
     4 |            29 |           2.24 |      99997.76 |      22508.50 |    1000
     5 |            36 |           2.77 |      99997.23 |      18245.71 |    1000
     6 |            46 |           3.50 |      99996.50 |      14408.67 |    1000
     7 |            57 |           4.30 |      99995.70 |      11741.59 |    1000
     8 |            72 |           5.38 |      99994.62 |       9418.16 |    1000
     9 |            90 |           6.61 |      99993.39 |       7653.62 |    1000
    10 |           112 |           8.06 |      99991.94 |       6265.28 |    1000
    11 |           140 |           9.84 |      99990.16 |       5131.14 |    1000
    12 |           175 |          11.94 |      99988.06 |       4223.29 |    1000
    13 |           219 |          14.42 |      99985.58 |       3494.20 |    1000
    14 |           274 |          17.28 |      99982.72 |       2911.74 |    1000
    15 |           343 |          20.53 |      99979.47 |       2444.05 |    1000
    16 |           428 |          24.13 |      99975.87 |       2077.24 |    1000
    17 |           535 |          28.08 |      99971.92 |       1780.25 |    1000
    18 |           669 |          32.35 |      99967.65 |       1543.24 |    1000
    19 |           836 |          36.78 |      99963.22 |       1353.78 |    1000
    20 |          1024 |          40.93 |      99959.07 |       1205.02 |    1000
    21 |          1280 |          45.47 |      99954.53 |       1089.07 |    1000
    22 |          1600 |          49.92 |      99950.08 |        991.85 |    1000
    23 |          2000 |          54.14 |      99945.86 |        914.02 |    1000
    24 |          2500 |          58.08 |      99941.92 |        850.26 |    1000
    25 |          3125 |          61.66 |      99938.34 |        800.43 |    1000
    26 |          3906 |          64.86 |      99935.14 |        760.65 |    1000
    27 |          4882 |          67.65 |      99932.35 |        728.98 |    1000
    28 |          6103 |          70.08 |      99929.92 |        704.68 |    1000
    29 |          7629 |          72.16 |      99927.84 |        684.01 |    1000
    30 |          9536 |          73.90 |      99926.10 |        668.34 |    1000
    31 |         11920 |          75.38 |      99924.62 |        655.91 |    1000
    32 |         14901 |          76.59 |      99923.41 |        646.07 |    1000
    33 |         18626 |          77.58 |      99922.42 |        638.79 |    1000
    34 |         23283 |          78.40 |      99921.60 |        632.95 |    1000
    35 |         29103 |          79.07 |      99920.93 |        628.70 |    1000
    36 |         36379 |          79.60 |      99920.40 |        625.49 |    1000
    37 |         45474 |          80.05 |      99919.95 |        623.61 |    1000
    38 |         56843 |          80.40 |      99919.60 |        621.18 |    1000
    39 |         71053 |          80.69 |      99919.31 |        619.84 |    1000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The avg response time is significantly less than the non-interleaved one, but on the other hand Avg Execs/Task are no longer as extreme as DWRR. The difference in Execs in any two adjacent priorities is now quite small, especially at the extremes.&lt;/p&gt;
&lt;p&gt;As can be seen the interleaved scheduler is a lot more fair while still giving us a method to prioritise some tasks.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Randomised Tasks&lt;/h3&gt;
&lt;p&gt;This time, at each time step, we have a chance to generate upto 1 new task for each core. This new task has a random total runtime (range 2..100000) and they have a attribute called blocking chance (range 0..0.001) which dictates how likely the task is to block at each time step it is executed.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for _ in 0..self.cores.len() {
    if self.rng.random_bool(self.new_task_chance) { // new_task_chance = 0.001
        news.push(self.scheduler.new_task(
            self.last_pid,
            self.current_time,
            self.rng.random_range(2..100000),       // runtime
            self.rng.random_range(0.0..0.001f64),   // blocking chance
        ));
        self.last_pid += 1;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Round Robin&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== Simulation finished (100000 ticks, 16 cores) ===
  Tasks spawned : 1694
  Tasks finished: 18
  Execute events: 1593554
  Block events  : 795
  Idle slots    : 6446
  Tasks alive   : 1676

=== Diagnostics ===
  Absolute CPU Util : 99.60%
  Scheduler Effic.  : 99.64%
  Forced Idles      : 720
  Wasted Idles      : 5726
  Avg Turnaround    : 23976.50 ticks (Completed tasks only)
  Avg Wait Time     : 48120.16 ticks (All tasks)
  Max Wait Time     : 94245 ticks

=== Scheduler Verification ===
  Avg Response Time : 51.85 ticks (All tasks)
  Context Switches  : 1584937
  Ctx Switch Rate   : 99.46% of executed ticks
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, we will compare the avg response time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;DWRR&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== Simulation finished (100000 ticks, 16 cores) ===
  Tasks spawned : 1675
  Tasks finished: 19
  Execute events: 1593758
  Block events  : 785
  Idle slots    : 6242
  Tasks alive   : 1656

=== Diagnostics ===
  Absolute CPU Util : 99.61%
  Scheduler Effic.  : 99.65%
  Forced Idles      : 720
  Wasted Idles      : 5522
  Avg Turnaround    : 28837.89 ticks (Completed tasks only)
  Avg Wait Time     : 48501.35 ticks (All tasks)
  Max Wait Time     : 98986 ticks

=== Scheduler Verification ===
  Avg Response Time : 3572.60 ticks (All tasks)
  Context Switches  : 1513098
  Ctx Switch Rate   : 94.94% of executed ticks

  Prio | Theor. Weight | Avg Execs/Task | Avg Wait/Task | Avg Resp/Task | Samples
  ----------------------------------------------------------------------------------
     0 |            11 |          24.21 |      50892.05 |      23183.64 |      42
     1 |            14 |          24.19 |      50822.86 |      20124.14 |      42
     2 |            18 |          24.45 |      50754.79 |      16988.43 |      42
     3 |            23 |          26.07 |      50705.07 |      14319.05 |      42
     4 |            29 |          26.50 |      50643.31 |      12152.24 |      42
     5 |            36 |          28.05 |      50581.81 |      10211.95 |      42
     6 |            46 |          28.52 |      50528.93 |       8212.40 |      42
     7 |            57 |          30.79 |      50473.64 |       6936.67 |      42
     8 |            72 |          32.64 |      50403.57 |       5629.48 |      42
     9 |            90 |          35.24 |      50358.76 |       4683.40 |      42
    10 |           112 |          38.36 |      50298.83 |       3792.81 |      42
    11 |           140 |          39.88 |      50238.76 |       3216.88 |      42
    12 |           175 |          45.74 |      50184.26 |       2574.95 |      42
    13 |           219 |          50.10 |      50117.07 |       2055.76 |      42
    14 |           274 |          58.02 |      50045.31 |       1656.71 |      42
    15 |           343 |          66.71 |      49986.55 |       1335.38 |      42
    16 |           428 |          76.60 |      49901.86 |       1048.12 |      42
    17 |           535 |          89.17 |      49825.38 |        861.93 |      42
    18 |           669 |         105.76 |      49743.50 |        693.57 |      42
    19 |           836 |         126.05 |      49669.76 |        556.07 |      42
    20 |          1024 |         148.60 |      49592.67 |        449.21 |      42
    21 |          1280 |         177.10 |      49502.98 |        359.67 |      42
    22 |          1600 |         215.60 |      49407.50 |        285.60 |      42
    23 |          2000 |         260.86 |      49303.71 |        232.95 |      42
    24 |          2500 |         315.62 |      49187.90 |        187.12 |      42
    25 |          3125 |         384.62 |      48405.21 |        146.33 |      42
    26 |          3906 |         470.67 |      48924.67 |        116.45 |      42
    27 |          4882 |         577.90 |      47528.00 |         95.79 |      42
    28 |          6103 |         710.60 |      48586.55 |         79.24 |      42
    29 |          7629 |         873.36 |      48359.79 |         60.71 |      42
    30 |          9536 |        1074.52 |      47464.40 |         47.31 |      42
    31 |         11920 |        1325.69 |      47796.31 |         42.40 |      42
    32 |         14901 |        1628.88 |      47406.24 |         32.21 |      42
    33 |         18626 |        2015.14 |      46700.14 |         25.55 |      42
    34 |         23283 |        2485.21 |      46150.69 |         20.00 |      42
    35 |         29103 |        3149.07 |      44751.83 |         19.80 |      41
    36 |         36379 |        3885.54 |      44196.22 |         16.49 |      41
    37 |         45474 |        4800.20 |      44096.80 |         13.46 |      41
    38 |         56843 |        5937.27 |      39151.80 |         10.27 |      41
    39 |         71053 |        7125.95 |      36560.95 |          6.15 |      41
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we can see once again, the higher priority tasks see a very significant improvement in their stats, while the lowest priorities see a significant decrease.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Interleaved DWRR&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== Simulation finished (100000 ticks, 16 cores) ===
  Tasks spawned : 1694
  Tasks finished: 17
  Execute events: 1594185
  Block events  : 771
  Idle slots    : 5815
  Tasks alive   : 1677

=== Diagnostics ===
  Absolute CPU Util : 99.64%
  Scheduler Effic.  : 99.68%
  Forced Idles      : 720
  Wasted Idles      : 5095
  Avg Turnaround    : 19845.41 ticks (Completed tasks only)
  Avg Wait Time     : 47730.81 ticks (All tasks)
  Max Wait Time     : 99029 ticks

=== Scheduler Verification ===
  Avg Response Time : 309.42 ticks (All tasks)
  Context Switches  : 1593932
  Ctx Switch Rate   : 99.98% of executed ticks

  Prio | Theor. Weight | Avg Execs/Task | Avg Wait/Task | Avg Resp/Task | Samples
  ----------------------------------------------------------------------------------
     0 |            11 |          39.14 |      49599.84 |       2327.05 |      43
     1 |            14 |          44.56 |      49524.23 |       1883.81 |      43
     2 |            18 |          51.70 |      49457.37 |       1465.60 |      43
     3 |            23 |          61.65 |      49391.60 |       1174.26 |      43
     4 |            29 |          71.91 |      49323.74 |        907.98 |      43
     5 |            36 |          84.74 |      49256.72 |        753.70 |      43
     6 |            46 |         101.40 |      49188.16 |        596.65 |      43
     7 |            57 |         120.81 |      49112.67 |        473.23 |      43
     8 |            72 |         145.35 |      49035.26 |        394.09 |      43
     9 |            90 |         174.26 |      48950.81 |        320.21 |      43
    10 |           112 |         208.19 |      48839.07 |        264.07 |      43
    11 |           140 |         249.74 |      48728.21 |        213.07 |      43
    12 |           175 |         298.33 |      48601.60 |        172.63 |      43
    13 |           219 |         355.44 |      48484.21 |        143.51 |      43
    14 |           274 |         431.48 |      49508.79 |        122.57 |      42
    15 |           343 |         508.14 |      49374.55 |        102.98 |      42
    16 |           428 |         592.57 |      49240.57 |         87.62 |      42
    17 |           535 |         684.57 |      49090.19 |         74.33 |      42
    18 |           669 |         783.95 |      47656.31 |         62.60 |      42
    19 |           836 |         887.79 |      48761.90 |         56.00 |      42
    20 |          1024 |         980.93 |      48604.95 |         51.57 |      42
    21 |          1280 |        1087.26 |      48440.60 |         46.36 |      42
    22 |          1600 |        1190.40 |      46318.52 |         39.00 |      42
    23 |          2000 |        1285.60 |      47352.98 |         37.43 |      42
    24 |          2500 |        1374.31 |      47994.64 |         36.07 |      42
    25 |          3125 |        1454.71 |      47553.24 |         33.88 |      42
    26 |          3906 |        1527.64 |      47717.24 |         30.07 |      42
    27 |          4882 |        1590.74 |      47598.86 |         31.10 |      42
    28 |          6103 |        1643.95 |      47484.29 |         29.93 |      42
    29 |          7629 |        1690.29 |      45980.26 |         27.60 |      42
    30 |          9536 |        1728.45 |      47283.40 |         27.79 |      42
    31 |         11920 |        1761.95 |      45358.05 |         25.98 |      42
    32 |         14901 |        1789.07 |      44462.19 |         26.17 |      42
    33 |         18626 |        1803.95 |      47015.21 |         27.26 |      42
    34 |         23283 |        1821.83 |      44388.64 |         24.98 |      42
    35 |         29103 |        1837.10 |      43475.88 |         24.14 |      42
    36 |         36379 |        1848.98 |      44444.36 |         24.48 |      42
    37 |         45474 |        1858.00 |      44644.26 |         24.55 |      42
    38 |         56843 |        1866.29 |      44897.07 |         25.55 |      42
    39 |         71053 |        1871.83 |      46633.38 |         26.17 |      42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Avg Response Time is now 11x lower than standard DWRR while the highes priority queues have gone from 6 ticks to 26 in response time. Avg Execs show do show a large reduction for higher priorities from ~7000 ticks to ~1800 ticks.&lt;/p&gt;
&lt;p&gt;I think I have convinced you by this point that interleaved DWRR is a good middle ground between the fairness of the simple RR and the performance in DWRR (for high priority tasks)!&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Real World Workflow&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;pixelcannon&lt;/p&gt;
&lt;p&gt;The current simple Round Robin gives ~1000 FPS with two CPU hogging (`while true; do :; done` in bash, or `while (1) printf(&quot;Hello!\n&quot;);` in C) programs running in the background.&lt;/p&gt;
&lt;p&gt;On the other hand, the new scheduler with 0 prio for all, gives ~1000 FPS too with some margin of error.&lt;/p&gt;
&lt;p&gt;If we increase the priority (decrease niceness) of pixelcannon and decrease the priority of the two CPU hogging applications, pixelcannon now delivers ~1150 FPS!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Schedrs&lt;/p&gt;
&lt;p&gt;This is my rust &lt;a href=&quot;https://gitlab.redox-os.org/akshitgaur2005/schedrs&quot;&gt;rewrite&lt;/a&gt; of &lt;a href=&quot;https://openbenchmarking.org/test/pts/schbench&quot;&gt;schbench&lt;/a&gt;. As expected this is the area the new scheduler particularly shines in. To replicate my results, just run two CPU hogging programs, and then run schedrs&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;RR&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./rr.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;DWRR&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./n20.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Verdict&lt;/h1&gt;
&lt;p&gt;While the simulator showed us the theoretical limits, the real-world tests prove the architecture works as intended under contention.&lt;/p&gt;
&lt;p&gt;By running two aggressive background `while(true)` CPU hogs, we forced the system into a high-contention state, allowing us to compare behaviour under load:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;strong&gt;Interactive Workloads (pixelcannon):&lt;/strong&gt;&lt;/strong&gt; By `renice`-ing pixelcannon to a higher priority, boosting framerates from ~1000 FPS to ~1150 FPS (a 15% gain in interactive responsiveness).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;strong&gt;Context-Switch Latency (Schedrs):&lt;/strong&gt;&lt;/strong&gt; The most dramatic improvement was in pure scheduling overhead. Ops/sec jumped from 243 to 360 (a 48% increase), and median wakeup latencies dropped massively.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The DWRR scheduler successfully protects high-priority and latency-sensitive tasks from being starved by background noise.&lt;/p&gt;
&lt;p&gt;Next up: replacing the static queue logic with the dynamic lag-calculations of full EEVDF!&lt;/p&gt;
</content:encoded><author>Akshit Gaur</author></item></channel></rss>