Skip to content

Techdude01/NBAnomaly

Repository files navigation

ScoutIQ

An NBA player performance anomaly detector and front-office scouting platform.

Identifying statistical deviations in NBA game logs to flag sudden performance shifts (breakouts, slumps, usage rate changes) that indicate true momentum versus statistical noise. Built to simulate an analytics-driven general manager evaluating trade prospects and free agency signings.

Architecture / Technical Overview

A decoupled architecture using a Python/FastAPI backend and a React/Next.js frontend. The backend fetches historical game logs from the external nba_api, caches the results in a local PostgreSQL database, and engineers core features (usage rate, true shooting, game score). An IsolationForest machine learning model runs on the recent temporal window to identify statistically significant outliers. Features are translated into Z-scores, calculating regression-to-the-mean probabilities. This mathematical context is injected as a prompt to the Google Gemini LLM, which behaves as a front-office analyst generating a natural language scouting report.

Core Tech Stack

  • Backend: Python 3.11, FastAPI, SQLAlchemy
  • Data / ML: Pandas, scikit-learn (IsolationForest), numpy
  • LLM: Google gemini-2.5-flash API via google-genai SDK
  • Database: PostgreSQL (Dockerized)
  • Frontend: Next.js (App Router), TypeScript, TailwindCSS, Framer Motion, Recharts

Key Features

  • End-to-end Machine Learning pipeline utilizing IsolationForest to flag true statistical game anomalies and filtering out typical variance.
  • Automated contextual LLM generation, grounding gemini-2.5-flash in purely mathematical deviations (Z-scores) to limit hallucination and output concrete trade analysis.
  • Z-Score caching and DB intelligence mapping to prevent duplicate API generation and limit cloud costs.
  • Aggressive data ingestion bypass utilizing custom browser headers and exponential backoff to circumvent stats.nba.com rate limits.

Prerequisites & Installation

# Clone the repository
git clone git@github.com:Techdude01/NBAnomaly.git
cd NBAnomaly

# Setup Database
make up

# Set up Python virtual environment and dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r backend/requirements.txt

# Configure Environment Variables
# Add a .env file linking POSTGRES_URL and your GEMINI_API_KEY
cp backend/.env.example backend/.env

# Setup Frontend dependencies
cd frontend
npm install

⚡ One-Tap Execution (Docker)

If you have Docker installed, skip the manual environment setup and just run:

make docker-up

This builds and pulls all services (Postgres, FastAPI, Next.js) and maps them to their respective ports. Use make down to kill the stack safely.

🛠️ Manual Development

Backend (FastAPI)

# From root
source venv/bin/activate
make dev

Frontend (Next.js)

# New terminal
cd frontend
npm run dev

Navigate to http://localhost:3000 to interact with the dashboard.

Technical Trade-offs / Challenges

The native nba_api package relies heavily on endpoints tied to stats.nba.com, which enforces sudden, strict rate-limiting and bot-protection timeouts during bulk data fetching. Initially, fetching full career game logs for a player would sequentially crash the container with TimeoutError. To circumvent this without paying for a commercial sports data provider, I abandoned the core nba_api/PlayerGameLog abstraction entirely. Instead, I reverse-engineered the raw API requests and built a custom requests wrapper applying specific Sec-Ch-Ua mimicking headers and an exponential backoff algorithm. This safely processes the ingestion sequence into Postgres while masking the bot signature.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors