Spatial Engine AI

Spatial Engine Architecture: Fusing Gemini 3 reasoning with deterministic physics on a scalable Cloud Run stack.
Advanced optical engineering console with real-time illuminance calculations, photon distribution mapping, and physics-based light.
Energy ROI analysis dashboard featuring cost-benefit optimization, annual savings calculations, payback period metrics, and EIR Tracking.
Spatial audit visualization with multi-lamp analysis, light distribution heatmaps, showing material identification and shadow mapping.

Spatial Engine is a next-generation "Action Era" agent designed to bridge the gap between digital intelligence and physical environments. Built on the Gemini 3 Pro model family, it addresses a critical global challenge: Energy Efficiency in our built environments.

Inefficient lighting wastes terawatt-hours of energy annually. Standard AI models can chat about physics, but they cannot do physics reliably. Spatial Engine changes this. It is a multimodal agent that uses live video and spatial-temporal reasoning to analyze room geometry, identifying window positions, furniture layouts, and shadow zones with precision.

By executing deterministic autonomous Python code, Spatial Engine calculates the optimal placement for light sources, providing users with a "Lux Optimization Map" and actionable procurement plans. It is not just an analyzer; it is an active engineer that helps reduce residential energy consumption by up to 40%, proving that in the Action Era, every photon counts.

Inspiration

I realized that while current LLMs are incredible at conversation, they are often terrible at engineering precision. "Hallucinating" a lux level in a safety-critical environment is dangerous. I wanted to build an agent that sees like a designer (using Gemini Vision) but calculates like a physicist (using deterministic code). I was inspired by the "Action Era" manifesto: AI shouldn't just describe the world; it should have the agency to optimize it.

What it does

Spatial Engine AI is a multimodal autonomous agent that upgrades your physical environment for maximum energy efficiency.

Vision Audit: Uses Gemini 3 Vision to decompose room geometry from a single photo, identifying materials (to estimate albedo) and detecting shadow zones.
Physics Core: A deterministic engine (not LLM guesses) that applies the Inverse Square Law ($E=I/d^2$) to calculate exact illuminance and ensure ISO/SanPiN health compliance.
Market Intelligence: An active agent that searches the live market for energy-efficient products, verifying "dimmable" specs and calculating real-time ROI and Payback Periods based on local electricity rates.
Action & Interactivity: With the Gemini Live API, users can interact with the agent in real-time, asking it to "look closer at that corner" or "simulate a warmer light," making the optimization process collaborative and fluid.

How I built it

The core brain is Gemini 3 Pro, accessed via the Google GenAI SDK. I utilized:

Gemini 3 Vision for spatial analysis and material detection.
Gemini Multimodal Live API for real-time, low-latency voice and video interaction, giving the agent a "persona" that feels present in the room.
Google Search Tooling for grounding recommendations in real-time market data.
Python/FastAPI for the deterministic Physics Engine and backend orchestration.
React 19 for the Generative UI that visualizes the agent's "thought process" through heatmaps and dynamic charts.

Gemini Integration Description

Spatial Engine AI heavily relies on the Gemini 3 model family to function as a "Senior Optical Physicist."

I use Gemini 3 Pro as the central reasoning orchestrator. It doesn't just chat; it uses Function Calling to wield tools—triggering my custom Python Physics Engine to perform deterministic math ($E=I/d^2$) and checking ISO safety standards. This hybrid approach solves the "hallucination in math" problem common in LLMs.

Gemini 3 Vision is critical for my "Vision Audit." It decomposes complex 3D room geometry from 2D images, estimating scale via reference objects and identifying surface materials to determine reflection coefficients (Albedo).

Finally, I integrated the Gemini Multimodal Live API to push the "Action Era" boundary. This allows users to have a real-time, bi-directional voice and video session with the agent, discussing lighting changes as they happen, making the agent feel like a true collaborator in the physical space. The application reduces latency by streaming audio directly, creating a fluid, human-level interaction experience.

Built With

fastapi
gemini
gemini-live-api
google-cloud-run
google-gemini-3.0
google-genai-sdk
google-web-audio-api
python
react
tailwindcss
typescript
vite

Updates

Veronika Kashtanova started this project — Jan 14, 2026 04:12 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.