Skip to main content
Robots.txt, sitemaps, and crawl directives

Rules

4XX Pages in Sitemap

Checks for sitemap URLs returning 4XX status codes

All Non-Indexed Pages

Lists all pages blocked from indexing for user audit

Canonical Chain

Checks for redirect chains on canonical URLs

HTML Size

Checks HTML document size against Googlebot crawl limits

Indexability Check

Identifies pages blocked from search engine indexing

Indexability Conflicts

Detects conflicting signals between robots.txt and meta/headers

Noindex in Sitemap

Checks for noindexed pages listed in sitemap

Pagination

Checks that paginated pages have proper canonicals

PDF Size

Checks linked PDF sizes against Googlebot 64MB truncation limit

Redirect Chains

Detects multi-hop redirect chains that waste crawl budget

Robots Meta Conflict

Detects conflicts between robots meta tags and robots.txt

Robots.txt

Checks if robots.txt exists and is properly configured

Schema + Noindex Conflict

Detects pages with rich result schema that are blocked from indexing

Sitemap Coverage

Checks for indexable pages that are not in the sitemap

Sitemap Domain

Checks that all sitemap URLs belong to the expected domain

Sitemap Exists

Checks if XML sitemap exists and is referenced in robots.txt

Sitemap Valid

Validates sitemap structure and URL limits

Disable All Crawlability Rules

squirrel.toml
[rules]
disable = ["crawl/*"]