Cache & Public-Repo Guard
Octopool owns a shared edge + D1 read-through cache for gh reads, and guards every repo route with a public-visibility check. Both keep private data out of the shared cache and reduce load on pooled identities.
Source: src/cache.ts, src/cache-policy.ts, src/cache-coalesce.ts, src/edge-cache.ts, src/public-repos.ts, src/pr-state.ts, src/maintenance.ts, migrations 0002/0003/0006/0011.
#Read-through edge + D1 cache
On a cacheable route the relay computes a stable cache key, checks Cloudflare's data-center-local Cache API, falls back to github_cache_entries in D1, and serves a fresh hit without touching GitHub. D1 hits warm the edge cache. On a miss it first tries a token-free public web/raw endpoint when one can produce the same shape. A successful direct repository-resource response also proves that the repository is public, avoiding a separate repository metadata request; routes that need a pooled identity still run the explicit public-repository guard first. Successful results write through to both layers.
#Cache key
SHA-256 (base64url) over a stable, sorted JSON of: pool, method, path, normalized query, the vary headers, the normalized route key, and any validated state discriminator. Default pagination (page=1, per_page=30) and default JSON accept variants are folded together; custom media types and non-default query values still produce distinct entries. The key is pool-scoped, so pools never share cache entries.
PR file-list routes may include a validated route_hint.pr_head_sha or closed/merged route_hint.pr_state discriminator. Clients that already know the current PR state can use that to avoid mixing entries across head SHAs while letting Octopool keep files warm longer. Hints are first checked against GitHub and then cached briefly in github_pr_state_proofs, so repeated cache hits do not need to re-contact GitHub just to validate the hint.
#What is cached
Only 200 responses on cacheable routes are stored. The cache is bypassed when:
- the route is a log route, large-payload route, or
rate_limit, or - the request carries a conditional header (
if-none-match/if-modified-since).
#Token-free GitHub reads
Before spending a pooled identity, Octopool can use anonymous GitHub API, public page/raw, and Git smart HTTP endpoints. Successful direct repository-resource responses are themselves a public visibility proof; ambiguous search responses still require the explicit repository guard. The canonical route-by-route inventory is Token-Free GitHub Endpoints.
The main transport classes are:
- PR diff/patch media requests (
gh pr diff, orGET /pulls/{number}with a diff or patchAcceptheader) viagithub.com/{owner}/{repo}/pull/{number}.diff|patch - commit diff/patch media requests via
github.com/{owner}/{repo}/commit/{sha}.diff|patch - compare diff/patch media requests via
github.com/{owner}/{repo}/compare/{base...head}.diff|patch - supported top-level
gh run list/viewsummaries (up to 25 results, with branch/status or workflow filters) and boundedgh run view --json jobsjob/step metadata prefer public GitHub pages once anonymous API quota falls below 50%; raw API requests retain exact REST semantics, and log bodies remain authenticated - exact public GitHub API reads without caller credentials for repo metadata, commits, compare JSON, contents, README, PRs, issues, checks/statuses, Actions run/workflow metadata, branches, tags, labels, milestones, topics, community profiles, forks, stargazers, subscribers, deployments, Git object reads, languages, contributors, licenses, release assets, GitHub metadata/license/gitignore APIs, org repository lists, org public events and members, user/gist reads, user follower/following/event/key lists, reactions, assignees, repo-wide issue/PR comments and events, commit pull/check-suite/ branch/status metadata, network events, repository stats, repository search, and repo-scoped issue/commit search
- explicit-ref contents reads can prefer
raw.githubusercontent.com, returned as an API-shaped JSON file payload, once anonymous API quota falls below 50% - branch refs, matching branch prefixes, and annotated-tag refs can use Git smart HTTP advertisements with exact REST-compatible IDs and object metadata below 50% anonymous API quota; ambiguous lightweight tags remain API-only
- supported top-level
gh pr viewsummaries andgh workflow viewmetadata can use bounded public GitHub page data below 50% anonymous API quota - release list/latest/tag/id/asset reads via unauthenticated
api.github.comrequests so pooled credentials never expose draft releases; supported top-levelgh release viewsummaries prefer public release HTML once anonymous API quota falls below 50%, while raw API requests retain exact REST semantics
Anonymous API rate snapshots are stored by GitHub resource until their advertised reset time. When a public-page/raw parser cannot satisfy a request, Octopool retains the successful anonymous API response as the fallback.
Successful web reads are cached in the same D1 table with no source identity. A cached web hit still re-checks that public proof covers the entry before returning it.
#TTLs
Per route kind and response state (cacheTTLSeconds):
- workflow runs, jobs, checks, check suites, and commit statuses → 30s while active; terminal payloads get 1h fresh plus up to 24h bounded stale fallback
- run/workflow lists → 30s while active, 2m when every returned run is completed; lists remain mutable because new runs can appear
- PR files with a validated state discriminator → 5m; PR commits, reviews, comments, issue comments/events/timeline, and undiscriminated PR files → 1m..5m
- repository-scoped
gh search issues|prsshim calls use cacheable GitHub Search requests, with misses accounted against the search bucket - closed PRs/issues → 1h; open PRs → 2m; open issues → 5m
- release lists/latest → 5m; release by tag/id → 1h
- immutable commit objects → 24h; commit lists → 5m; contents → 1h
- repo metadata → 10m; workflow metadata → 1h
- large logs, explicit log routes,
rate_limit, and conditional requests still bypass
#Cache-hit integrity
A fresh or bounded-stale hit is only served if:
- the source identity recorded on the entry is still an active candidate for the route (web-origin entries have no identity), and
- the repo's public-visibility proof still covers the entry (re-checked, with a small historical-proof allowance during GitHub outages / secondary-rate-limit — see below).
If the eligible token-free and pooled backends are unavailable, depleted, cooling down, or rate-limited, Octopool may serve an expired public cache entry for a short route-specific grace window. Mutable CI payloads get only minutes; terminal CI payloads get up to a day; PR/issue detail routes get up to an hour; immutable-ish commit views can get up to a day. Stale serves still run the public-repo guard and active-identity check before returning.
Cache publication is awaited before returning a miss response, closing the response/write race for immediate repeat reads. Concurrent identical misses also claim a short pool-scoped fill lease in the Durable Object; followers wait for the leader's publication and serve the resulting hit instead of duplicating the GitHub request. Public-repository proof refreshes use the same coordinator pattern, so simultaneous expired-proof checks share one GitHub request. Audit writes remain deferred. An hourly scheduled task deletes cache entries after each entry's route-specific stale_expires_at deadline in bounded batches, preserving every configured stale-serving window while keeping D1 growth bounded.
Hits are still audited, with the cached identity attributed. Each audit row records cache status as hit, stale, miss, bypass, or unknown, which powers octopool stats and the dashboard hit-rate/top-route views. Coalesced followers are marked separately. Stats count both fresh and stale hits as saved GitHub requests and expose an eligible hit rate that excludes failed misses and deliberate local fallback responses.
#Public-repo guard
The shared cache and pooled identities are public-repository only. Before any repo route uses a pooled identity or a cache entry, ensurePublicGitHubRepo confirms the repo is public.
- An unauthenticated
GET /repos/{owner}/{repo}is made against GitHub. - If
OCTOPOOL_GITHUB_ORG_TOKENis configured, that server-side token is used for the check to avoid shared unauthenticated GitHub quota; Octopool still requires the response body to sayprivate: false. 404orprivate !== false→403 repo_not_public.- If both authenticated and anonymous API checks are rate-limited or unavailable, Octopool can prove visibility from GitHub's public repository page marker without an API token.
- A successful anonymous request for a direct repository resource is also accepted as the live public proof, so a cache miss does not need a second GitHub metadata request. Search responses still run the explicit visibility check because an empty result does not prove that a
repo:qualifier names a public repository. - A successful public check is recorded in
github_public_reposwith a TTL (PUBLIC_REPO_TTL_SECONDS, default 30s) and the edge cache; subsequent cache hits reuse the fresh proof instead of re-hitting GitHub.
#Historical proof during outages
If the live public check fails with a 5xx, or a 403 with x-ratelimit-remaining: 0 (secondary rate limit), the guard may fall back to a previously recorded proof that was captured close to the cache entry's creation time (within 5s). This lets cached public data keep serving through transient GitHub failures without ever relaxing the private-repo block — a hard 404/private response always denies.
#Schema
github_cache_entries— cache key, pool, method, path, query/headers JSON, route key/kind, status, response headers JSON, body JSON, body encoding, source identity, created/fresh/stale expiration timestamps (migrations0002and0011).github_public_repos—owner,repo,checked_at,expires_at(migration0003).github_pr_state_proofs— short-lived validated PR head/state discriminators for state-scoped PR subresource cache keys (migration0006).audit_events.cache_status/audit_events.cacheable— per-request cache metrics (migration0005).audit_events.fallback_reason/audit_events.coalesced— local fallback classification and duplicate-fill telemetry (migration0009).
Secret values are never written to the cache. R2 is deferred; current routes are bounded enough to live in D1, and large Actions logs skip the cache entirely.