\documentclass[12pt, a4paper]{article} \usepackage[utf8]{inputenc} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{hyperref} \usepackage{graphicx} \usepackage{geometry} \geometry{margin=1in}

\title{\textbf{BASALT SENTINEL: A Neural-Engineered Observability Framework}} \author{InsightGraph Engineering Team} \date{January 2026}

\begin{document}

\maketitle

\section{Inspiration} In modern microservice architectures, complexity is the precursor to catastrophe. When a core service fails, it triggers a "thundering herd" of errors that flood dashboards, creating "Alert Fatigue." We were inspired by the concept of \textbf{Topological Forensics}—the idea that an SRE should not have to hunt for the cause of a crash during an active incident. We built \textit{Basalt Sentinel} to act as a "flight recorder" for service meshes, providing the exact moment of impact and an intelligent AI explanation of the disaster.

\section{What it does} \textbf{Basalt Sentinel} is a real-time observability platform that transforms raw system logs into a living, breathing dependency graph. \begin{itemize} \item \textbf{Live Topology Mapping:} Visualizes service-to-service communication. \item \textbf{Automated Blast Radius Calculation:} Identifies all downstream services in the path of a failure. \item \textbf{Neural RCA:} Feeds failure metadata to \textbf{Google Gemini 1.5 Flash}, acting as an on-call AI engineer to provide instant Root Cause Analysis and remediation steps. \end{itemize}

\section{How we built it} The project architecture leverages a high-performance, decoupled stack designed for sub-second latency: \begin{itemize} \item \textbf{The Engine (Go):} A custom backend utilizing a depth-first search (DFS) algorithm to identify failure propagation paths in $O(V + E)$ time complexity. \item \textbf{The Intelligence (Gemini AI):} Integration with the Gemini 1.5 Flash API for high-speed forensic reasoning over graph metadata. \item \textbf{The Pipeline (SSE):} Server-Sent Events for a unidirectional, low-overhead data stream from server to client. \item \textbf{The Command Center (React + D3):} A cyberpunk-inspired UI using \texttt{react-force-graph} for physics-based network visualization. \end{itemize}

\section{Technical Deep-Dive & Logic} To determine the impact of a failure, the engine calculates the \textbf{Impact Set} $S$ for a failed node $v$ within a Directed Acyclic Graph (DAG) $G = (V, E)$. The blast radius is mathematically defined as:

\begin{equation} S(v) = {u \in V \mid \exists \text{ path}(v \to u)} \end{equation}

The system evaluates the system entropy and the probability of a total cascading failure $P(C)$ using the following heuristic:

\begin{equation} P(C) \approx \sum_{i=1}^{n} \Phi(i) \cdot \Omega(i) \end{equation}

Where: \begin{itemize} \item $\Phi(i)$ represents the \textbf{Blast Radius} (cardinality of affected child nodes). \item $\Omega(i)$ represents the \textbf{Criticality Score} of the failed node. \end{itemize}

\section{Challenges we ran into} \begin{itemize} \item \textbf{JSON Tag Discordance:} Aligning Go’s strict PascalCase tags (\texttt{json:"Root"}) with JavaScript’s camelCase expectations required a robust middleware marshaling layer. \item \textbf{Force-Graph Stability:} Managing real-time node updates without the graph "exploding" required fine-tuning the \textbf{D3 Alpha Decay} rate and velocity coefficients. \item \textbf{Asynchronous Forensic Latency:} Optimizing prompt engineering to ensure the AI's RCA arrived before the operator lost visual context of the failure. \end{itemize}

\section{Accomplishments that we're proud of} \begin{itemize} \item \textbf{Sub-400ms Forensics:} Achieving near-instantaneous AI reports following a node failure. \item \textbf{Backpressure-Safe Streaming:} Building a Go-channel-based buffer that handles bursts of failure events without crashing the browser main thread. \item \textbf{Physics-Driven UX:} Creating a UI where the "weight" and "gravity" of nodes change dynamically based on their health status. \end{itemize}

\section{What we learned} \begin{itemize} \item \textbf{Topological Context is King:} An LLM performs significantly better at debugging when provided with service relationships rather than raw stack traces. \item \textbf{Concurrency Patterns:} We mastered Go's \texttt{select} statements and channel synchronization for high-frequency telemetry. \item \textbf{Physics as a Signal:} We discovered that using a physics engine for service maps makes system health more intuitive; when the graph "shakes," the operator feels the instability. \end{itemize}

\section{What's next for Basalt Sentinel} \begin{itemize} \item \textbf{Predictive Entropy:} Implementing a model to calculate the probability of failure $P(f)$ before it occurs based on latency variance. \item \textbf{Auto-Remediation Hooks:} Allowing the AI to trigger Kubernetes \texttt{kubectl rollout undo} commands directly from the dashboard. \item \textbf{Cross-Cluster Mesh:} Bridging multiple K8s clusters into a single unified global topology. \end{itemize}

\end{document}

Built With

Share this project:

Updates