Inspiration
Flying in from Canada with limited luggage meant no VR gear, but we still wanted immersive, motion-driven play. We realized many kids can’t access headsets due to cost or availability, so we asked: can a webcam, a printed marker, and cardboard deliver VR-like immersion? That constraint became our spark.
What it does
A printed ArUco on a cardboard “gun” is tracked in real time to drive a fully interactive 3D gun in-game (x, y, z + yaw, pitch, roll), for a great FPS game. We map the marker center’s x-position to an aiming arc, detect a quick up-then-down “recoil” to shoot, and use a left-hand fist gesture to reload, complete with on-screen feedback and cooldowns.
How we built it
Using OpenCV’s ArUco detector (DICT_4X4_50) and solvePnP (IPPE_SQUARE), we estimate 6-DoF pose from known 3D corners and camera intrinsics. The method incorporates camera intrinsics, so lens distortion is handled during pose estimation. We map the marker center’s x-coordinate in the image, normalize it by the screen width, and convert that into an in-game aiming arc. MediaPipe Hands supplies 21-landmark gestures (curl ratios + fingertip compactness) for reloads. A recoil motion is used to shoot the gun, which we detect by checking for parabolic trends in the y-coordinates of the ArUco marker. The game template was built with Pygame + OpenGL, with both vector sprites and pygltflib to load 3d models. The weapon orientation was simulated with quaternions and spherical linear interpolation, as well as a two-segment inverse kinematics solver for the reloading animation.
Challenges we ran into
Tried multiple methods to estimate the position of the gun. We could not use the solvePnP estimated rotation vectors, since they were too sensitive, making the gun jitter and aim chaotically. We had to create our own method of normalizing the marker's center x by screen width to estimate a stable aiming position. We couldn't get MediaPipe to detect trigger pulls, so we switched to using a recoil motion instead. We had to use improvised materials for the gun prop. 3D models were particularly hard to get loaded, and took a lot of time, and a majority of the time was spent trying to import an enemy model; however, it never got to a point where we were satisfied with it, which ultimately led to us phasing it out.
Accomplishments that we’re proud of
We achieved convincing 6-DoF motion from a single printed square with no IMU and no headset. The recoil-to-shoot and left-fist-to-reload interactions feel natural and low-latency, and the visual overlays make the system approachable. Additionally, this was the first time we ever implemented complex math into a software project, and having it work was one of our proudest moments. But most importantly, Blitz delivers a VR-style experience using equipment many people already have. We made a very interesting game with the lowest budget hardware possible.
What we learned
Accurate intrinsics, distortion handling, and correct real-world marker size are crucial for reliable pose and scale, such as using the pinhole model. Gesture design benefits from relational features (curl ratios, compactness, relative position) rather than single landmark tests. It is possible to build together impressive projects without any hardware! We only used scraps of tape, cardboard, scavenged paper, and borrowed markers for our hardware.
What’s next for the project
Make game mechanics more interesting (spawning enemies, moving targets, better environment styling, etc). Running inference on a GPU, and having a higher frame rate, will reduce latency in the game.


Log in or sign up for Devpost to join the conversation.