Surface illegal or high-risk fishing using vessel activity, regional regulations, and evidence-style artifacts suitable for enforcement demos (LA Hacks / hackathon scope).
- Draw or select a region on the map.
- Inspect activity and heatmap-style signals (fused from external fishing-data APIs).
- Optionally highlight SAR / computer-vision detections as a showcase narrative.
- Produce narrative and artifacts (reports, simulated hail, etc.) via the API, optionally backed by an agent plugin.
- External fishing APIs are near-real-time, not live; product copy should say so.
- Operational vessel detections may come from provider-classified data; custom CV (e.g. SAR / YOLO) is a separate showcase path documented in ml/ and optionally databricks/.
| Layer | Location | Role |
|---|---|---|
| Presentation | frontend/ | Vite + React; calls the BFF over HTTP only. |
| API (BFF) | api/ | FastAPI routes, OpenAPI, DI; no embedded business logic in this scaffold. |
| ML library | ml/ | Local-first medallion ETL + sequence/CV modules; can run without Databricks. |
| Agent plugins (optional) | plugins/langchain_plugin/, plugins/fetch_plugin/ | LangChain and/or Fetch-style orchestration behind AgentBackend. |
| Databricks (optional) | databricks/ | Jobs, notebooks, MLflow / Unity Catalog when cloud orchestration is needed. |
Architecture detail: docs/ARCHITECTURE.md.
startup = bring up Python/ML/proxy prep (if needed) plus the map/demo API, the auth + agent + sequence BFF, and the Vite app with linked data (as on npm run dev). In Cursor, when you say startup, the agent should use this flow.
./scripts/startup.shstartup.sh runs, in order (unless you skip; see below):
- Python stack (only if
overfished-ml,overfished_api, oroverfished_fetch_pluginis missing) —backend/requirements.txt(Jinja2, LangChain for tools in the map API,playwrightlibrary, etc.), ml/,pip install -e "api[ml,dev]", plugins/fetch_plugin/ (agentic/agentwhenAGENT_BACKEND=hybrid— default in the script). - Playwright Chromium —
playwright install chromiumso on-demand and pre-rendered PDFs work (see backend/documents/render.py). - ML smoke — one synthetic RNN + BiLSTM + ensemble run so
torch+ overfished-ml is exercised the same way as the BFF sequence routes (no extra HTTP process). - PDF prerender —
python -m documents.renderin backend/ writes the demo IUU case family tobackend/output/…(served as/staticfrom the map API). - Servers — uvicorn on 8000 + 8001, then
npm run devin frontend/.
Faster re-runs (skip heavy steps): STARTUP_QUICK=1 ./scripts/startup.sh only starts the three long-running services. You can also set individually: STARTUP_SKIP_PIP=1, STARTUP_SKIP_PLAYWRIGHT=1, STARTUP_SKIP_PDF_PRERENDER=1, STARTUP_SKIP_ML_SMOKE=1. Optional: STARTUP_INSTALL_LANGCHAIN_PLUGIN=1 adds the langchain plugin for AGENT_BACKEND=langchain.
| Port | App |
|---|---|
| 8000 | backend/ api.main — map, cases, Twilio, fixtures, static output / PDFs (same routes the SPA calls via /api/… in dev). |
| 8001 | api/ overfished_api — /agent, /ml/sequence/…, /audio, Auth0. Proxied as /agentapi/… in frontend/vite.config.ts. |
| 5173 | frontend/ Vite dev with codegen + link-data via predev. |
One-time environment (same venv is typical):
python3 -m venv .venv && . .venv/bin/activate(or your preferred venv)- Run
./scripts/startup.shonce, or install manually:pip install -r backend/requirements.txt,pip install -e "ml",pip install -e "api[ml,dev]",pip install -e "./plugins/fetch_plugin" - Root
.envwith Auth0,USE_FIXTURES=1for the demo, and any API keys you need; avoid committing secrets
If the BFF exits with No module named 'fastapi_plugin', the Auth0 API shim is missing — pip install -e "api" (or re-run the installs above) pulls auth0-fastapi-api into the same venv. Address already in use on 8000/8001 means another uvicorn (or a prior startup); stop it or free those ports.
Optional: Databricks, GX10, or overfished-pipeline on large CSVs (see data/local_pipeline/) — not part of the default startup script.
frontend/ Vite + React app
api/ FastAPI BFF (routers + ports + deps)
ml/ Local-first ML package (pipeline + sequence/CV modules)
plugins/langchain_plugin/ Optional LangChain implementation of AgentBackend
plugins/fetch_plugin/ Optional Fetch.ai-style implementation of AgentBackend
databricks/ Asset bundle placeholder + platform README
docs/ Architecture notes
Set these in deployment or .env for the API (see api/ when running locally). Do not commit secrets.
| Name | Used by | Purpose |
|---|---|---|
AGENT_BACKEND |
API | noop (default), langchain, fetch, or hybrid (fetch_hybrid). |
FETCH_AGENT_ENABLED |
API hybrid/fetch | Toggle Fetch path in hybrid mode (true/false). |
FETCH_AGENT_TIMEOUT_SECONDS |
API hybrid/fetch | Timeout budget for Fetch attempt before fallback. |
FETCH_AGENT_MAX_RETRIES |
API hybrid/fetch | Bounded retry attempts before fallback. |
VITE_API_URL |
Frontend | BFF base URL; empty = same-origin /api (Vite proxy in dev). |
VITE_AUTH0_DOMAIN |
Frontend | Auth0 tenant domain (no https://, e.g. dev-6n2vvavwg11rnrx2.us.auth0.com). |
VITE_AUTH0_CLIENT_ID |
Frontend | Auth0 SPA application Client ID. |
VITE_AUTH0_AUDIENCE |
Frontend | Custom API identifier, e.g. https://overfished-bff (not Management API). |
AUTH0_DOMAIN / AUTH0_AUDIENCE |
API (BFF) | Same as Vite; used for JWT validation. Or rely on Vite-prefixed vars if using one root .env. |
AUTH0_BYPASS |
API | 1 = skip JWT (e.g. tests); 0 = require Bearer token. |
CORS_ALLOW_ORIGINS |
API | Comma-separated if the SPA and API are on different origins. |
DATABRICKS_HOST |
Databricks CLI / jobs | Workspace host. |
DATABRICKS_TOKEN |
Databricks CLI / jobs | PAT (never commit). |
MLFLOW_TRACKING_URI |
ML jobs (optional) | Experiment tracking. |
GX10_HOST |
ML pipeline runtime | Remote host/IP for optional GX10 SSH execution. |
GX10_USER |
ML pipeline runtime | SSH user for GX10 execution. |
GX10_REMOTE_WORKDIR |
ML pipeline runtime | Remote directory where synced code/data are executed. |
GX10_SSH_KEY_PATH |
ML pipeline runtime (optional) | SSH private key path for GX10 auth. |
GX10_SSH_PORT |
ML pipeline runtime (optional) | SSH port (default 22). |
VESSEL_IMAGE_OFFLINE |
ML image enrichment (optional) | Set 1 to use only cache/CSV/overrides (no MarineTraffic or Commons). |
The Custom API is registered (e.g. audience https://overfished-bff), but the SPA application is not allowed to request tokens for it. This is fixed only in the Auth0 Dashboard:
- APIs → open the API whose Identifier exactly matches
VITE_AUTH0_AUDIENCE/AUTH0_AUDIENCE(e.g.https://overfished-bff). Create it if missing. - Applications → your Single Page App (the Client ID in
.env). - Open the APIs tab (or API Authorization in Settings, depending on UI) and authorize this application for that API, then save.
Use Application type = Single Page Application, Token Endpoint Authentication = None, and add http://localhost:5173 to Allowed Callback URLs, Allowed Logout URLs, and Allowed Web Origins.
./scripts/startup.sh: Vite sends /api to the demo map API on port 8000 and /agentapi to the Auth0 BFF on port 8001. Map/health/vessel data does not go through the BFF; only agent/ML (and any route under /agentapi) do. A 401 on an /agentapi/* call usually means a missing/invalid access token, not a bad /api setup.
Still stuck? Copy the full line from the browser (Auth0 error page, Network tab, or the app’s red sign-in error) when asking for help; “Callback URL mismatch”, invalid_scope, and unauthorized point to different dashboard fixes.
Frontend
cd frontend && npm install && npm run devAPI
cd api && pip install -e ".[dev]"
uvicorn overfished_api.main:app --reload --host 0.0.0.0 --port 8000Optional LangChain plugin
From the repository root (so both editables share the same environment):
pip install -e ./api
pip install -e "./plugins/langchain_plugin[langchain]"
export AGENT_BACKEND=langchainThe plugin does not declare a path dependency on overfished-api; install the API package first.
With AGENT_BACKEND=noop (default), LangChain is not required.
Optional Fetch plugin (standalone or hybrid)
pip install -e ./api
pip install -e ./plugins/fetch_plugin
export AGENT_BACKEND=fetch # or hybridFor hackathon stability, AGENT_BACKEND=hybrid routes /agent/run through Fetch first and
falls back deterministically if Fetch times out/errors.
Medallion Spark on ASUS GX10 and bulk fishing events
- Put large GFW/warehouse-style CSVs under
data/local_pipeline/raw/(see data/local_pipeline/raw/README.md) or setFISHING_EVENTS_CSVto a repo-relative path. - On a GX10 with
GX10_HOST/GX10_USER/GX10_REMOTE_WORKDIRset:overfished-pipeline --runtime gx10 --sync-gx10(tune withSPARK_SHUFFLE_PARTITIONS,SPARK_DRIVER_MEMORY, etc.). - For headless image URL enrichment on GX10:
overfished-gx10 sync && overfished-gx10 enrich && overfished-gx10 pull(see ml/README.md and data/local_pipeline/README.md).
Sequence model (RNN + BiLSTM) and “sus” ships (optional)
Install the ML package into the same venv the API uses, then restart uvicorn:
# from repo root, with .venv active
pip install -e "./ml[dev]" # torch + overfished_ml; see [ml/README.md](ml/)-
HTTP demo (browser or curl, ~10–20s first time):
http://127.0.0.1:8000/ml/sequence/demo
Returns JSON with soft per-class scores, a shortnarration, and asuspect_readoutof MMSI windows where the ensemble is uncertain or disagrees with the label (triage / demo only). -
Agent (
fetch/hybrid): ask for suspicious / sus vessels or the sequence model; the Fetch plugin runs the same report. Example body forPOST /agent/run:
{
"query": "Which ships are sus based on the sequence model?",
"context": { "use_synthetic": true, "epochs": 1 }
}or another I like to use :
curl -sS http://127.0.0.1:8000/agent/run \
-H "Content-Type: application/json" \
-d '{"query":"Which ships are sus based on the sequence model?","context":{"use_synthetic":true,"epochs":1}}'
(Queries that mention e.g. sus, suspicious, BILSTM, LSTM, or sequence: true in context route to this path. Use repo CSV via context.csv_path like data/local_pipeline/sample_fishing_events.csv when you are not using synthetic data.)
Replace TBD with teammate names. R = Responsible, A = Accountable, C = Consulted, I = Informed.
| Area | R | A | C | I |
|---|---|---|---|---|
| Map UI, heatmap, demo UX | TBD | TBD | API owner | Whole team |
| BFF API, routing, caching strategy, OpenAPI | TBD | TBD | Databricks (data contracts) | Frontend |
| External API clients (e.g. GFW), rate limits | TBD | TBD | Legal / ToS | Agent owner |
| Databricks jobs, tables, MLflow | TBD | TBD | CV owner | API |
| CV pipeline (e.g. SAR / YOLO showcase) | TBD | TBD | Databricks admin | Demo lead |
| LangChain plugin (tools, prompts, wiring) | TBD | TBD | API (ports/) |
Frontend (copy/UX) |
| Demo script, pre-cached JSON, stage narrative | TBD | TBD | All signal owners | Judges |
- Frontend: App runs;
NEXT_PUBLIC_API_BASE_URLdocumented; domain calls go through the BFF only. - API:
/healthreturns OK; OpenAPI served at/docs;AgentBackendbound viadeps(noopdefault). - ML: Importable package; CV modules are documented stubs with no training implementation.
- Plugin: Optional install;
build_agent_backend()returns an object satisfyingAgentBackend(stub responses only). - Databricks: Bundle validates or documents next steps; README names job/notebook ownership.
Long-running or rate-limited upstream APIs (e.g. concurrent limits, gateway timeouts on heavy reports) make pre-cached JSON and a live vs demo toggle the default posture for demos. See internal planning docs; implement caching in the API layer when you add real clients.
TBD.