API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication. The API supports bulk updates. You can work on multiple objects per request for the core endpoints.

Authentication

Include your API key in the authorization header.

Authorization: Bearer sk-xxxx...

Response formats

Set the content-type header to shape the response.

application/jsonapplication/xmltext/csvapplication/jsonl

Prefix any path with v1 to lock the version. Requests on this page consume live credits.

Just getting started? Quickstart guide →

Not a developer? Use Spider's no-code options to get started without writing code.

Base Url
https://api.spider.cloud

Client libraries

OpenAPI Specllms.txt

Common Parameters

These parameters are shared across Crawl, Scrape, Unblocker, Search, Links, Screenshot, and Fetch. Click any parameter to jump to its full description in the Crawl section.

Advanced(35)
NameTypeDefaultDescription
blacklistarray

Blacklist a set of paths that you do not want to crawl. You can use regex patterns to help with the list.

block_adsbooleantrue

Block advertisements when running request as chrome or smart. This can greatly increase performance.

block_analyticsbooleantrue

Block analytics when running request as chrome or smart. This can greatly increase performance.

block_stylesheetsbooleantrue

Block stylesheets when running request as chrome or smart. This can greatly increase performance.

budgetobject

Object that has paths with a counter for limiting the amount of pages. Use {"*":1} for only crawling the root page. The wildcard matches all routes and you can set child paths to limit depth, e.g. { "/docs/colors": 10, "/docs/": 100 }.

chunking_algobject

Use a chunking algorithm to segment your content output. Pass an object like { "type": "bysentence", "value": 2 } to split the text into an array by every 2 sentences. Works well with markdown or text formats.

concurrency_limitnumber

Set the concurrency limit to help balance request for slower websites. The default is unlimited.

crawl_timeoutobject

The crawl_timeout parameter allows you to put a max duration on the entire crawl. The default setting is 2 mins.

The values for the timeout duration are in the object shape { secs: 300, nanos: 0 }.

data_connectorsobject

Stream crawl results directly to cloud storage and data services. Configure one or more connectors to automatically receive page data as it is crawled. Supports S3, Google Cloud Storage, Google Sheets, Azure Blob Storage, and Supabase. { s3: { bucket, access_key_id, secret_access_key, region?, prefix?, content_type? }, gcs: { bucket, service_account_base64, prefix? }, google_sheets: { spreadsheet_id, service_account_base64, sheet_name? }, azure_blob: { connection_string, container, prefix? }, supabase: { url, anon_key, table }, on_find: bool, on_find_metadata: bool }

depthnumber25

The crawl limit for maximum depth. If 0, no limit will be applied.

disable_interceptbooleanfalse

Disable request interception when running request as chrome or smart. This may help bypass pages that use third-party scripts or external domains.

event_trackerobject

Track the event request, responses, and automation output when using browser rendering. Pass in the object with the following requests and responses for the network output of the page. automation will send detailed information including a screenshot of each automation step used under automation_scripts.

exclude_selectorstring

A CSS query selector to use for ignoring content from the markup of the response.

execution_scriptsobject

Run custom JavaScript on certain paths. Requires chrome or smart request mode. The values should be in the shape "/path_or_url": "custom js".

external_domainsarray

A list of external domains to treat as one domain. You can use regex paths to include the domains. Set one of the array values to * to include all domains.

full_resourcesboolean

Crawl and download all the resources for a website.

max_credits_allowednumber
Set the maximum number of credits to use per run. This will return a blocked by client if the initial response is empty. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny).
max_credits_per_pagenumber
Set the maximum number of credits to use per page. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny).
metadatabooleanfalse

Collect metadata about the content found like page title, description, keywords and etc. This could help improve AI interoperability.

preserve_hostbooleanfalse

Preserve the default HOST header for the client. This may help bypass pages that require a HOST, and when the TLS cannot be determined.

redirect_policystringLoose

The network redirect policy to use when performing HTTP request.

requeststringsmart

The request type to perform. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

request_timeoutnumber60

The timeout to use for request. Timeouts can be from 5-255 seconds.

root_selectorstring

The root CSS query selector to use extracting content from the markup for the response.

run_in_backgroundbooleanfalse

Run the request in the background. Useful if storing data and wanting to trigger crawls to the dashboard.

sessionbooleantrue

Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session.

sitemapbooleanfalse

Include the sitemap results to crawl.

sitemap_onlybooleanfalse

Only include the sitemap results to crawl.

sitemap_pathstringsitemap.xml

The sitemap URL to use when using sitemap.

subdomainsbooleanfalse

Allow subdomains to be included.

tldbooleanfalse

Allow TLD's to be included.

user_agentstring

Add a custom HTTP user agent to the request. By default this is set to a random agent.

wait_forobject

The wait_for parameter allows you to specify various waiting conditions for a website operation. If provided, it contains the following sub-parameters:

The key idle_network specifies the conditions to wait for the network request to be idle within a period. It can include an optional timeout value.

The key idle_network0 specifies the conditions to wait for the network request to be idle with a max timeout. It can include an optional timeout value.

The key almost_idle_network0 specifies the conditions to wait for the network request to be almost idle with a max timeout. It can include an optional timeout value.

The key selector specifies the conditions to wait for a particular CSS selector to be found on the page. It includes an optional timeout value, and the CSS selector to wait for.

The key dom specifies the conditions to wait for a particular element to stop updating for a duration on the page. It includes an optional timeout value, and the CSS selector to wait for.

The key delay specifies a delay to wait for, with an optional timeout value.

The key page_navigationsset to true then waiting for all page navigations will be handled.

If wait_for is not provided, the default behavior is to wait for the network to be idle for 500 milliseconds. All of the durations are capped at capped at 60 seconds.

The values for the timeout duration are in the object shape { secs: 10, nanos: 0 }.

webhooksobject

Use webhooks to get notified on events like credit depleted, new pages, metadata, and website status. { destination: string, on_credits_depleted: bool, on_credits_half_depleted: bool, on_website_status: bool, on_find: bool, on_find_metadata: bool }

whitelistarray

Whitelist a set of paths that you want to crawl, ignoring all other routes that do not match the patterns. You can use regex patterns to help with the list.

Core(6)
NameTypeDefaultDescription
disable_hintsboolean

Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

limitnumber0

The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

lite_modeboolean

Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

max_sizenumber

The max content size in bytes per page response. Content exceeding this limit will be truncated with a smart head/tail strategy that preserves the beginning and end of the content.

network_blackliststring[]

Blocks matching network requests from being fetched/loaded. Use this to reduce bandwidth and noise by preventing known-unneeded third-party resources from ever being requested.

Each entry is a string match pattern (commonly a hostname, domain, or URL substring). If both whitelist and blacklist are set, whitelist takes precedence.

  • Good targets: googletagmanager.com, doubleclick.net, maps.googleapis.com
  • Prefer specific domains over broad substrings to avoid breaking essential assets.
network_whiteliststring[]

Allows only matching network requests to be fetched/loaded. Use this for a strict "allowlist-first" approach: keep the crawl lightweight while still permitting the essential scripts/styles needed for rendering and JS execution.

Each entry is a string match pattern (commonly a hostname, domain, or URL substring). When set, requests not matching any whitelist entry are blocked by default.

  • Start with first-party: example.com, cdn.example.com
  • Add only what you observe you truly need (fonts/CDNs), then iterate.
Output(16)
NameTypeDefaultDescription
clean_htmlboolean

Clean the HTML of unwanted attributes.

css_extraction_mapobject

Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.

encodingstring

The type of encoding to use like UTF-8, SHIFT_JIS, or etc.

filter_imagesboolean

Filter image elements from the markup.

filter_output_imagesboolean

Filter the images from the output.

filter_output_main_onlybooleantrue

Filter the nav, aside, and footer from the output.

filter_output_svgboolean

Filter the svg tags from the output.

filter_svgboolean

Filter SVG elements from the markup.

link_rewritejson

Optional URL rewrite rule applied to every discovered link before it's crawled. This lets you normalize or redirect URLs (for example, rewriting paths or mapping one host pattern to another).

The value must be a JSON object with a type field. Supported types:

  • "replace" – simple substring replacement.
    Fields:
    • host?: string (optional) – only apply when the link's host matches this value (e.g. "blog.example.com").
    • find: string – substring to search for in the URL.
    • replace_with: string – replacement substring.
  • "regex" – regex-based rewrite with capture groups.
    Fields:
    • host?: string (optional) – only apply for this host.
    • pattern: string – regex applied to the full URL.
    • replace_with: string – replacement string supporting $1, $2, etc.

Invalid or unsafe regex patterns (overly long, unbalanced parentheses, advanced lookbehind constructs, etc.) are rejected by the server and ignored.

readabilitybooleanfalse

Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage.

return_cookiesbooleanfalse

Return the HTTP response cookies with the results.

return_embeddingsbooleanfalse

Include OpenAI embeddings for title and description. Requires metadata to be enabled.

return_formatstring | arrayraw

The format to return the data in. Possible values are markdown, commonmark, raw, text, xml, bytes, and empty. Use raw to return the default format of the page like HTML etc.

return_headersbooleanfalse

Return the HTTP response headers with the results.

return_json_databooleanfalse

Return the JSON data found in scripts used for SSR.

return_page_linksbooleanfalse

Return the links found on each page.

Config(7)
NameTypeDefaultDescription
cookiesstring

Add HTTP cookies to use for request.

fingerprintbooleantrue

Use advanced fingerprint detection for chrome.

headersobject

Forward HTTP headers to use for all requests. The object is expected to be a map of key value pairs.

proxy'residential' | 'mobile' | 'isp'

Select the proxy pool for this request. Leave blank to disable proxy routing. Using this param overrides all other proxy_* shorthand configurations. See the pricing table for full details. Alternatively, use Proxy-Mode to route standard HTTP traffic through Spider's proxy endpoint.

proxy_enabledbooleanfalse

Enable premium high performance proxies to prevent detection and increase speed. You can also use Proxy-Mode to route requests through Spider's proxy front-end instead.

remote_proxystring

Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.

stealthbooleantrue

Use stealth mode for headless chrome request to help prevent being blocked.

Performance(5)
NameTypeDefaultDescription
cacheboolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }true

Use HTTP caching for the crawl to speed up repeated runs. Defaults to true.

Accepts either:

  • true / false
  • A cache control object:
    • maxAge (ms) — freshness window (default: 172800000 = 2 days). Set 0 for always fetch fresh.
    • allowStale — serve cached results even if stale.
    • period — RFC3339 timestamp cutoff (overrides maxAge), e.g. "2025-11-29T12:00:00Z"
    • skipBrowser — skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.

Default behavior by route type:

  • Standard routes (/crawl, /scrape, /unblocker) — cache is true with skipBrowser enabled by default. Cached pages return instantly without re-launching Chrome. To force a fresh browser fetch, set cache: false or { "skipBrowser": false }.
  • AI routes (/ai/crawl, /ai/scrape, etc.) — cache is true but skipBrowser is not enabled. AI routes always use the browser to ensure live page content for extraction.
delaynumber0

Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format.

respect_robotsbooleantrue

Respect the robots.txt file for crawling.

service_worker_enabledbooleantrue

Allow the website to use Service Workers as needed.

skip_config_checksbooleantrue

Skip checking the database for website configuration. This will increase performance for requests that use limit=1.

Automation(4)
NameTypeDefaultDescription
automation_scriptsobject

Run custom web automated tasks on certain paths. Requires chrome or smart request mode.

Below are the available actions for web automation:
  • Evaluate: Runs custom JavaScript code.
    { "Evaluate": "console.log('Hello, World!');" }
  • Click: Clicks on an element identified by a CSS selector.
    { "Click": "button#submit" }
  • ClickAll: Clicks on all elements matching a CSS selector.
    { "ClickAll": "button.loadMore" }
  • ClickPoint: Clicks at the position x and y coordinates.
    { "ClickPoint": { "x": 120.5, "y": 340.25 } }
  • ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
    { "ClickAllClickable": true }
  • ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
    { "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } }
  • ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
    { "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } }
  • ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
    { "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } }
  • ClickDragPoint: Click-and-drag from one point to another with optional modifier.
    { "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } }
  • Wait: Waits for a specified duration in milliseconds.
    { "Wait": 2000 }
  • WaitForNavigation: Waits for the next navigation event.
    { "WaitForNavigation": true }
  • WaitFor: Waits for an element to appear identified by a CSS selector.
    { "WaitFor": "div#content" }
  • WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
    { "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } }
  • WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
    { "WaitForAndClick": "button#loadMore" }
  • WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
    { "WaitForDom": { "selector": "main", "timeout": 12000 } }
  • ScrollX: Scrolls the screen horizontally by a specified number of pixels.
    { "ScrollX": 100 }
  • ScrollY: Scrolls the screen vertically by a specified number of pixels.
    { "ScrollY": 200 }
  • Fill: Fills an input element with a specified value.
    { "Fill": { "selector": "input#name", "value": "John Doe" } }
  • Type: Type a key into the browser with an optional modifier.
    { "Type": { "value": "John Doe", "modifier": 0 } }
  • InfiniteScroll: Scrolls the page until the end for certain duration.
    { "InfiniteScroll": 3000 }
  • Screenshot: Perform a screenshot on the page.
    { "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } }
  • ValidateChain: Set this before a step to validate the prior action to break out of the chain.
    { "ValidateChain": true }
evaluate_on_new_documentstring
Set a custom script to evaluate on new document creation.
scrollnumber

Infinite scroll the page as new content loads, up to a duration in milliseconds. You may still need to use the wait_for parameters. Requires chrome request mode.

viewportobject

Configure the viewport for chrome.

Geolocation(2)
NameTypeDefaultDescription
country_codestring

Set a ISO country code for proxy connections. View the locations list for available countries.

localestring

The locale to use for request, example en-US.

Per-endpoint notes

Scrape & Unblocker exclude limit, depth, and delay. Single-page endpoints.

Screenshot exclude request, return_format, and readability. Returns image data.

Every endpoint below includes these parameters in its own parameter tabs with full descriptions. This section is a quick-reference index.