Mission: DOOM on a Satellite (MVP)
Taking on this challenge means dealing more with pure engineering than with code: latency and bandwidth are our worst enemies.
Our assumptions:
- Pure physics (Orbit): Going further (e.g. GEO) pushes ping above 500 ms. LEO gives us a playable ping. time = distance / c = 500 km / 3×10⁸ m/s ≈ 1.67 ms
Limited Bandwidth: We assume (Downlink: 64 kbps | Uplink: 9.6 kbps): Pure realism. We forced ourselves into the harshest scenario for cheap satellite communications, compelling us to optimize at every level.
AArch64 Hardware (ARM Cortex-A53 Quad-Core): MVP mindset. The multiple ARM cores let us separate the game and the compression into distinct threads without losing performance.
The Brick Wall: Bandwidth by the Numbers
Before writing a single line of code, we had to do the disaster math to understand the size of our UDP data "pipe."
The numbers: A 64 kbps link equals 8,000 bytes per second. To run at 20 FPS, our absolute limit is 400 bytes per frame.
Raw frame weight: Reducing the screen to 160×100 pixels in grayscale (4 bits/pixel), a raw frame weighs 8,000 bytes.
The challenge: We need extreme real-time compression (at least 20:1).
Why the Initial Idea Failed (The XOR + RLE Collapse)
Our first theory was to use a differential algorithm (XOR) combined with run-length encoding (RLE). On paper, it worked. In practice, the DOOM engine destroyed it.
The Theory: When the player stands still, 96% of the screen does not change. RLE easily groups those static pixels (many consecutive zeros after XOR), yielding a compressed frame of around 300 bytes. So far so good.
The Reality (The Bloodbath): DOOM is a frantic FPS built on an old raycaster engine.
Lateral movement: Turning just one degree shifts every screen column horizontally. 90% of pixels change value instantly.
Dynamic textures: Walking forward scales the floor and ceiling, breaking any previously established pattern.
Data inflation: When 80% of the screen is moving noise, RLE finds no repetitions. Instead of compressing, it adds extra control bytes to each pixel.
Critical Conclusion
During combat, the frame size under RLE jumps from 300 bytes to over 5,000 bytes, blowing past our 400-byte budget. This saturates the UDP buffer, causes massive packet loss, and produces insurmountable lag. RLE is ruled out; the only viable path for our MVP is software H.264 video compression.
Note: other compression alternatives were evaluated, including neural compression specialized for DOOM, leveraging the deterministic nature of its maps and fixed color palette. It was discarded: real-time neural inference at 20 FPS requires dedicated hardware (NPU / GPU / TPU).
More on Bandwidth
Reducing bandwidth comes down to 3 questions: what is sent, when, and how. We have already addressed the last point, so let's focus on the first two. DOOM runs at 320×200 pixels — the first step is to reduce that. The minimum configurable resolution is 96×48, though it is not the most playable.
At this point, as good engineers, and thanks to our 64 kbps downlink assumption, we drew an analogy with another technology that lived under the same constraint: the video-call codecs used in 3G networks in the 2000s. They used the well-known 176×144 (QCIF) resolution. If they could do it, so could we.
Architecture
System Overview
The project implements a distributed client-server architecture split into two physically separate components that communicate via network protocols optimized for high-latency, low-bandwidth satellite links.
Main Components
1. Satellite (ARM Linux SoC)
The satellite component runs DOOM headless and transmits compressed video. Its architecture is based on multithreading with separation of concerns:
Thread 1 — DOOM Engine (35 Hz, Core 0)
- Runs the modified doomgeneric engine
- Ticks at 35 Hz (DOOM's native tick rate, non-negotiable)
- Writes RGB888 frames (176×144×3 bytes) to a shared lock-free ring buffer
- Receives player input from a thread-safe queue fed by the uplink module
Thread 2 — Encoder / Transmitter (20 FPS, Core 1)
- Reads frames from the ring buffer when available
- Encodes RGB888 → YUV420p → H.264 using libx264 (satellite/encoder.c)
- Packetizes NAL units into RTP and transmits over UDP (satellite/downlink.c)
- Runs at 20 FPS (~43% load reduction vs 35 FPS while maintaining playability)
Lock-Free Ring Buffer (satellite/ringbuffer.c)
- Producer-consumer communication with no mutex on the critical path
- 3–4 fixed-size slots (latency / frame-drop trade-off)
- Atomic states: EMPTY → WRITING → READY → READING → EMPTY
- Allows DOOM and the encoder to run at different rates without blocking each other
Uplink Module (satellite/uplink.h)
- Non-blocking UDP socket on port 5000 (configurable)
- Receives 8-byte input packets (serialized key bitfield)
- Deserializes and feeds the queue consumed by DOOM
2. Ground Station
The ground component captures player input and reconstructs / displays the received video:
Input Module (ground/input.c)
- Captures SDL2 keyboard events
- Maps keys to a 16-bit bitfield (movement, fire, weapon switch, menus)
- Serializes an
input_packet_t(8 bytes: bitfield + timestamp + seq_number) - Transmits over UDP only when state changes (bandwidth optimization)
Receiver Module (ground/receiver.c)
- Non-blocking UDP socket on port 5001 (configurable)
- FU-A fragment reassembly (RTP Fragmentation Unit type A):
- Small NALs (≤ MTU): processed directly
- Large NALs (> MTU): accumulated in buffer until marker bit = 1
- H.264 decoding with FFmpeg (libavcodec):
- Full NAL wrapped into an
AVPacket avcodec_send_packet()→avcodec_receive_frame()- YUV420p → RGB24 conversion via libswscale (
sws_scale)
- Full NAL wrapped into an
- Returns a pointer to the decoded RGB frame (valid until the next poll)
Display Module (ground/display.c)
- Scaled SDL2 window (704×576, 4× QCIF to maintain aspect ratio)
- RGB texture with nearest-neighbor filtering for retro pixel aesthetics
SDL_RENDERER_PRESENTVSYNCfor vertical sync
Communication Protocol
Downlink (Satellite → Ground): H.264 Video over RTP/UDP
RTP (Real-time Transport Protocol) was chosen over TCP for the following critical reasons in satellite environments:
Predictable latency: TCP retransmits lost packets, multiplying RTT (30–40 ms for LEO → 200+ ms with retransmissions). RTP over UDP accepts minor losses in exchange for constant latency.
No head-of-line blocking: In TCP, a lost packet stalls all subsequent ones until retransmission. In video streaming, a retransmitted old frame is useless — the next frame has already arrived.
Inadequate congestion control: TCP congestion control algorithms (CUBIC, BBR) assume terrestrial networks and unnecessarily degrade throughput under high satellite latency.
RTP packet structure (common/include/protocol.h):
[ RTP Header (12 bytes) ][ NAL Unit Payload (variable) ]
RTP Header fields:
- Sequence number (16 bits): loss detection and reordering
- Timestamp (32 bits): timing reconstruction at the receiver (90 kHz clock for H.264)
- Marker bit: signals end of frame (last NAL of the frame)
- SSRC: source identifier (stateless, no session required)
FU-A Fragmentation (for NALs > MTU ~1400 bytes):
- First fragment: FU indicator + FU header (S=1, Start) + payload
- Middle fragments: FU indicator + FU header + payload
- Last fragment: FU indicator + FU header (E=1, End) + payload
This allows transmitting large NALs (typically I-frames) without exceeding the Ethernet/IP MTU, avoiding IP-level fragmentation — which would cause total NAL loss if any single IP fragment is dropped.
Uplink (Ground → Satellite): Player Input
Input packet (common/include/protocol.h):
typedef struct {
uint16_t bitfield; // Active keys (16 flags)
uint32_t timestamp; // Monotonic ms since start
uint16_t seq_number; // Packet counter
} input_packet_t; // Total: 8 bytes, big-endian
16-bit bitfield encodes all DOOM actions:
- Bits 0–3: Movement (forward, backward, turn left/right)
- Bits 4–9: Strafe, run, strafe mode, fire (Ctrl), use/open (Space)
- Bits 10–12: Weapon select (0–6)
- Bits 13–15: Menu navigation (Enter, Escape, Y)
Optimizations:
- Transmitted only when the bitfield changes (no continuous flooding)
- Fixed 8-byte size enables ultra-fast parsing without heap allocation
- Big-endian serialization for cross-platform compatibility
Implemented Micro-optimizations
We experimented with x264 configuration, specifically: bitrate, preset options, and other tuning parameters. The settings that gave us the best performance are those in satellite/encoder.c.
Possible Micro-optimizations (Future Work)
The 4 SSRC bytes in the RTP header are currently hardcoded and unused. Repurposing them would gain 4 bytes per packet for useful payload or reduce header size by 4 bytes.
The bottom portion of the DOOM screen changes relatively little. It could be handled separately using delta XOR + RLE with keyframes.
Results and Final Conclusions
Like our marine, this challenge has been a grueling battle against bandwidth and the DOOM codebase itself — which we had to modify far more than we would have liked in order to work with threads. The threading decision was driven by one simple fact: threads cooperate, while processes compete. However, the overhead of working with threads prevented us from spending more time on the possible micro-optimizations. Although we met the uplink requirement, the downlink operates at around 240 kbps, which exceeds our initial 64 kbps assumption.
Log in or sign up for Devpost to join the conversation.