gitcrawl

A local-first GitHub triage tool and a drop-in caching gh shim. Sync issues and PRs into SQLite for search and clustering — then let agents call gh against that same cache so you stop burning the API rate limit.

Quickstart View on GitHub


#Two jobs, one binary

gitcrawl mirrors a GitHub repository's issues and pull requests into local SQLite, then layers semantic clustering, full-text search, and a gh-compatible shim on top — so a maintainer (or an agent acting on their behalf) can triage threads and serve everyday gh reads without burning live API quota.

  • Local SQLite first. All issues, PRs, comments, reviews, files, commits, checks, and workflow runs land in ~/.config/gitcrawl/gitcrawl.db. Queries hit the disk, not GitHub.
  • Drop-in gh cache. Symlink gitcrawl-gh as gh and most read-only calls (gh search, gh issue/pr view, gh pr checks, gh run, REST GETs, GraphQL queries) answer from local SQLite. Agents stop hitting rate limits; mutating commands pass through unchanged. Run gh xcache stats to see hit rate, per-command misses, and evictions.
  • Semantic clustering. OpenAI embeddings group related reports, with deterministic GitHub reference evidence (#123, pull/123) preventing weak similarity bridges from forming mega-clusters.
  • Terminal UI. gitcrawl tui is a keyboard- and mouse-driven cluster browser with bidirectional sort, jump-to-number, neighbors, and member-level governance actions.
  • Agent-friendly JSON. Every command supports --json for clean automation surfaces.

#Pick your path

#I want to try it

Quickstart walks you from git clone to a populated cluster view in five minutes.

#I want to wire up an agent

The gh shim is the fastest way to cut GitHub API load — point your agent at gitcrawl-gh, keep the agent's gh calls intact.

#I want to triage a busy repo

Read Sync to bring data local, then Clustering and the TUI for the maintainer workflow.

#I want the full reference

Commands lists every flag and JSON field. Configuration covers env vars and paths.


#Project status

Early bootstrap. The implementation is being built in small commits — see the changelog for what shipped recently.

The product contract in SPEC.md is the source of truth for what is in and out of scope.

#Out of scope

  • Local HTTP API
  • Hosted service runtime
  • Browser web UI
  • GitHub write-back actions (use gh for those)

#License

Released under the MIT license.