Overfished

Surface illegal or high-risk fishing using vessel activity, regional regulations, and evidence-style artifacts suitable for enforcement demos (LA Hacks / hackathon scope).

User journey (high level)

Draw or select a region on the map.
Inspect activity and heatmap-style signals (fused from external fishing-data APIs).
Optionally highlight SAR / computer-vision detections as a showcase narrative.
Produce narrative and artifacts (reports, simulated hail, etc.) via the API, optionally backed by an agent plugin.

Honesty and scope

External fishing APIs are near-real-time, not live; product copy should say so.
Operational vessel detections may come from provider-classified data; custom CV (e.g. SAR / YOLO) is a separate showcase path documented in ml/ and optionally databricks/.

Stack and layers

Layer	Location	Role
Presentation	frontend/	Vite + React; calls the BFF over HTTP only.
API (BFF)	api/	FastAPI routes, OpenAPI, DI; no embedded business logic in this scaffold.
ML library	ml/	Local-first medallion ETL + sequence/CV modules; can run without Databricks.
Agent plugins (optional)	plugins/langchain_plugin/, plugins/fetch_plugin/	LangChain and/or Fetch-style orchestration behind `AgentBackend`.
Databricks (optional)	databricks/	Jobs, notebooks, MLflow / Unity Catalog when cloud orchestration is needed.

Architecture detail: docs/ARCHITECTURE.md.

startup (full stack, local)

startup = bring up Python/ML/proxy prep (if needed) plus the map/demo API, the auth + agent + sequence BFF, and the Vite app with linked data (as on npm run dev). In Cursor, when you say startup, the agent should use this flow.

./scripts/startup.sh

startup.sh runs, in order (unless you skip; see below):

Python stack (only if overfished-ml, overfished_api, or overfished_fetch_plugin is missing) — backend/requirements.txt (Jinja2, LangChain for tools in the map API, playwright library, etc.), ml/, pip install -e "api[ml,dev]", plugins/fetch_plugin/ (agentic /agent when AGENT_BACKEND=hybrid — default in the script).
Playwright Chromium — playwright install chromium so on-demand and pre-rendered PDFs work (see backend/documents/render.py).
ML smoke — one synthetic RNN + BiLSTM + ensemble run so torch + overfished-ml is exercised the same way as the BFF sequence routes (no extra HTTP process).
PDF prerender — python -m documents.render in backend/ writes the demo IUU case family to backend/output/… (served as /static from the map API).
Servers — uvicorn on 8000 + 8001, then npm run dev in frontend/.

Faster re-runs (skip heavy steps): STARTUP_QUICK=1 ./scripts/startup.sh only starts the three long-running services. You can also set individually: STARTUP_SKIP_PIP=1, STARTUP_SKIP_PLAYWRIGHT=1, STARTUP_SKIP_PDF_PRERENDER=1, STARTUP_SKIP_ML_SMOKE=1. Optional: STARTUP_INSTALL_LANGCHAIN_PLUGIN=1 adds the langchain plugin for AGENT_BACKEND=langchain.

Port	App
8000	backend/ `api.main` — map, cases, Twilio, fixtures, static output / PDFs (same routes the SPA calls via `/api/…` in dev).
8001	api/ `overfished_api` — `/agent`, `/ml/sequence/…`, `/audio`, Auth0. Proxied as `/agentapi/…` in frontend/vite.config.ts.
5173	frontend/ Vite dev with codegen + link-data via `predev`.

One-time environment (same venv is typical):

python3 -m venv .venv && . .venv/bin/activate (or your preferred venv)
Run ./scripts/startup.sh once, or install manually: pip install -r backend/requirements.txt, pip install -e "ml", pip install -e "api[ml,dev]", pip install -e "./plugins/fetch_plugin"
Root .env with Auth0, USE_FIXTURES=1 for the demo, and any API keys you need; avoid committing secrets

If the BFF exits with No module named 'fastapi_plugin', the Auth0 API shim is missing — pip install -e "api" (or re-run the installs above) pulls auth0-fastapi-api into the same venv. Address already in use on 8000/8001 means another uvicorn (or a prior startup); stop it or free those ports.

Optional: Databricks, GX10, or overfished-pipeline on large CSVs (see data/local_pipeline/) — not part of the default startup script.

Repository layout

frontend/                 Vite + React app
api/                      FastAPI BFF (routers + ports + deps)
ml/                       Local-first ML package (pipeline + sequence/CV modules)
plugins/langchain_plugin/ Optional LangChain implementation of AgentBackend
plugins/fetch_plugin/     Optional Fetch.ai-style implementation of AgentBackend
databricks/               Asset bundle placeholder + platform README
docs/                     Architecture notes

Environment variables (names only)

Set these in deployment or .env for the API (see api/ when running locally). Do not commit secrets.

Name	Used by	Purpose
`AGENT_BACKEND`	API	`noop` (default), `langchain`, `fetch`, or `hybrid` (`fetch_hybrid`).
`FETCH_AGENT_ENABLED`	API hybrid/fetch	Toggle Fetch path in hybrid mode (`true`/`false`).
`FETCH_AGENT_TIMEOUT_SECONDS`	API hybrid/fetch	Timeout budget for Fetch attempt before fallback.
`FETCH_AGENT_MAX_RETRIES`	API hybrid/fetch	Bounded retry attempts before fallback.
`VITE_API_URL`	Frontend	BFF base URL; empty = same-origin `/api` (Vite proxy in dev).
`VITE_AUTH0_DOMAIN`	Frontend	Auth0 tenant domain (no `https://`, e.g. `dev-6n2vvavwg11rnrx2.us.auth0.com`).
`VITE_AUTH0_CLIENT_ID`	Frontend	Auth0 SPA application Client ID.
`VITE_AUTH0_AUDIENCE`	Frontend	Custom API identifier, e.g. `https://overfished-bff` (not Management API).
`AUTH0_DOMAIN` / `AUTH0_AUDIENCE`	API (BFF)	Same as Vite; used for JWT validation. Or rely on Vite-prefixed vars if using one root `.env`.
`AUTH0_BYPASS`	API	`1` = skip JWT (e.g. tests); `0` = require Bearer token.
`CORS_ALLOW_ORIGINS`	API	Comma-separated if the SPA and API are on different origins.
`DATABRICKS_HOST`	Databricks CLI / jobs	Workspace host.
`DATABRICKS_TOKEN`	Databricks CLI / jobs	PAT (never commit).
`MLFLOW_TRACKING_URI`	ML jobs (optional)	Experiment tracking.
`GX10_HOST`	ML pipeline runtime	Remote host/IP for optional GX10 SSH execution.
`GX10_USER`	ML pipeline runtime	SSH user for GX10 execution.
`GX10_REMOTE_WORKDIR`	ML pipeline runtime	Remote directory where synced code/data are executed.
`GX10_SSH_KEY_PATH`	ML pipeline runtime (optional)	SSH private key path for GX10 auth.
`GX10_SSH_PORT`	ML pipeline runtime (optional)	SSH port (default `22`).
`VESSEL_IMAGE_OFFLINE`	ML image enrichment (optional)	Set `1` to use only cache/CSV/overrides (no MarineTraffic or Commons).

Auth0: "Client is not authorized to access resource server"

The Custom API is registered (e.g. audience https://overfished-bff), but the SPA application is not allowed to request tokens for it. This is fixed only in the Auth0 Dashboard:

APIs → open the API whose Identifier exactly matches VITE_AUTH0_AUDIENCE / AUTH0_AUDIENCE (e.g. https://overfished-bff). Create it if missing.
Applications → your Single Page App (the Client ID in .env).
Open the APIs tab (or API Authorization in Settings, depending on UI) and authorize this application for that API, then save.

Use Application type = Single Page Application, Token Endpoint Authentication = None, and add http://localhost:5173 to Allowed Callback URLs, Allowed Logout URLs, and Allowed Web Origins.

./scripts/startup.sh: Vite sends /api to the demo map API on port 8000 and /agentapi to the Auth0 BFF on port 8001. Map/health/vessel data does not go through the BFF; only agent/ML (and any route under /agentapi) do. A 401 on an /agentapi/* call usually means a missing/invalid access token, not a bad /api setup.

Still stuck? Copy the full line from the browser (Auth0 error page, Network tab, or the app’s red sign-in error) when asking for help; “Callback URL mismatch”, invalid_scope, and unauthorized point to different dashboard fixes.

Install and run (scaffold)

Frontend

cd frontend && npm install && npm run dev

API

cd api && pip install -e ".[dev]"
uvicorn overfished_api.main:app --reload --host 0.0.0.0 --port 8000

Optional LangChain plugin

From the repository root (so both editables share the same environment):

pip install -e ./api
pip install -e "./plugins/langchain_plugin[langchain]"
export AGENT_BACKEND=langchain

The plugin does not declare a path dependency on overfished-api; install the API package first.

With AGENT_BACKEND=noop (default), LangChain is not required.

Optional Fetch plugin (standalone or hybrid)

pip install -e ./api
pip install -e ./plugins/fetch_plugin
export AGENT_BACKEND=fetch      # or hybrid

For hackathon stability, AGENT_BACKEND=hybrid routes /agent/run through Fetch first and falls back deterministically if Fetch times out/errors.

Medallion Spark on ASUS GX10 and bulk fishing events

Put large GFW/warehouse-style CSVs under data/local_pipeline/raw/ (see data/local_pipeline/raw/README.md) or set FISHING_EVENTS_CSV to a repo-relative path.
On a GX10 with GX10_HOST / GX10_USER / GX10_REMOTE_WORKDIR set: overfished-pipeline --runtime gx10 --sync-gx10 (tune with SPARK_SHUFFLE_PARTITIONS, SPARK_DRIVER_MEMORY, etc.).
For headless image URL enrichment on GX10: overfished-gx10 sync && overfished-gx10 enrich && overfished-gx10 pull (see ml/README.md and data/local_pipeline/README.md).

Sequence model (RNN + BiLSTM) and “sus” ships (optional)

Install the ML package into the same venv the API uses, then restart uvicorn:

# from repo root, with .venv active
pip install -e "./ml[dev]"   # torch + overfished_ml; see [ml/README.md](ml/)

HTTP demo (browser or curl, ~10–20s first time):
http://127.0.0.1:8000/ml/sequence/demo
Returns JSON with soft per-class scores, a short narration, and a suspect_readout of MMSI windows where the ensemble is uncertain or disagrees with the label (triage / demo only).
Agent (fetch / hybrid): ask for suspicious / sus vessels or the sequence model; the Fetch plugin runs the same report. Example body for POST /agent/run:

{
  "query": "Which ships are sus based on the sequence model?",
  "context": { "use_synthetic": true, "epochs": 1 }
}

or another I like to use :

curl -sS http://127.0.0.1:8000/agent/run \
  -H "Content-Type: application/json" \
  -d '{"query":"Which ships are sus based on the sequence model?","context":{"use_synthetic":true,"epochs":1}}'

(Queries that mention e.g. sus, suspicious, BILSTM, LSTM, or sequence: true in context route to this path. Use repo CSV via context.csv_path like data/local_pipeline/sample_fishing_events.csv when you are not using synthetic data.)

Who does what (RACI)

Replace TBD with teammate names. R = Responsible, A = Accountable, C = Consulted, I = Informed.

Area	R	A	C	I
Map UI, heatmap, demo UX	TBD	TBD	API owner	Whole team
BFF API, routing, caching strategy, OpenAPI	TBD	TBD	Databricks (data contracts)	Frontend
External API clients (e.g. GFW), rate limits	TBD	TBD	Legal / ToS	Agent owner
Databricks jobs, tables, MLflow	TBD	TBD	CV owner	API
CV pipeline (e.g. SAR / YOLO showcase)	TBD	TBD	Databricks admin	Demo lead
LangChain plugin (tools, prompts, wiring)	TBD	TBD	API (`ports/`)	Frontend (copy/UX)
Demo script, pre-cached JSON, stage narrative	TBD	TBD	All signal owners	Judges

Definition of done (per layer, for this scaffold)

Frontend: App runs; NEXT_PUBLIC_API_BASE_URL documented; domain calls go through the BFF only.
API: /health returns OK; OpenAPI served at /docs; AgentBackend bound via deps (noop default).
ML: Importable package; CV modules are documented stubs with no training implementation.
Plugin: Optional install; build_agent_backend() returns an object satisfying AgentBackend (stub responses only).
Databricks: Bundle validates or documents next steps; README names job/notebook ownership.

Operations (external APIs)

Long-running or rate-limited upstream APIs (e.g. concurrent limits, gateway timeouts on heavy reports) make pre-cached JSON and a live vs demo toggle the default posture for demos. See internal planning docs; implement caching in the API layer when you add real clients.

License

TBD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overfished

User journey (high level)

Honesty and scope

Stack and layers

startup (full stack, local)

Repository layout

Environment variables (names only)

Auth0: "Client is not authorized to access resource server"

Install and run (scaffold)

Who does what (RACI)

Definition of done (per layer, for this scaffold)

Operations (external APIs)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.cursor		.cursor
Agents.MD		Agents.MD
api		api
backend		backend
data		data
databricks		databricks
docs		docs
finetune		finetune
frontend		frontend
ml		ml
plugins		plugins
scripts		scripts
.gitignore		.gitignore
.mcp.json		.mcp.json
LA Hacks Planning.pdf		LA Hacks Planning.pdf
README.md		README.md
content.webp		content.webp
overfishing_pipeline_local.zip		overfishing_pipeline_local.zip

Folders and files

Latest commit

History

Repository files navigation

Overfished

User journey (high level)

Honesty and scope

Stack and layers

startup (full stack, local)

Repository layout

Environment variables (names only)

Auth0: "Client is not authorized to access resource server"

Install and run (scaffold)

Who does what (RACI)

Definition of done (per layer, for this scaffold)

Operations (external APIs)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages