API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication. The API supports bulk updates. You can work on multiple objects per request for the core endpoints.

Authentication

Include your API key in the authorization header.

Authorization: Bearer sk-xxxx...

Response formats

Set the content-type header to shape the response.

application/jsonapplication/xmltext/csvapplication/jsonl

Prefix any path with v1 to lock the version. Requests on this page consume live credits.

Just getting started? Quickstart guide →

Not a developer? Use Spider's no-code options to get started without writing code.

Base Url

https://api.spider.cloud

Client libraries

CLI JavaScript Python Rust

OpenAPI Spec llms.txt

Common Parameters

These parameters are shared across Crawl, Scrape, Unblocker, Search, Links, Screenshot, and Fetch. Click any parameter to jump to its full description in the Crawl section.

Advanced(35)blacklist, block_ads, block_analytics, block_stylesheets, budget, chunking_alg, concurrency_limit, crawl_timeout, data_connectors, depth, disable_intercept, event_tracker, exclude_selector, execution_scripts, external_domains, full_resources, max_credits_allowed, max_credits_per_page, metadata, preserve_host, redirect_policy, request, request_timeout, root_selector, run_in_background, session, sitemap, sitemap_only, sitemap_path, subdomains, tld, user_agent, wait_for, webhooks, whitelist

Name	Type	Default	Description
blacklist	array	—	Blacklist a set of paths that you do not want to crawl. You can use regex patterns to help with the list.
block_ads	boolean	true	Block advertisements when running request as `chrome` or `smart`. This can greatly increase performance.
block_analytics	boolean	true	Block analytics when running request as `chrome` or `smart`. This can greatly increase performance.
block_stylesheets	boolean	true	Block stylesheets when running request as `chrome` or `smart`. This can greatly increase performance.
budget	object	—	Object that has paths with a counter for limiting the amount of pages. Use `{"*":1}` for only crawling the root page. The wildcard matches all routes and you can set child paths to limit depth, e.g. `{ "/docs/colors": 10, "/docs/": 100 }`.
chunking_alg	object	—	Use a chunking algorithm to segment your content output. Pass an object like `{ "type": "bysentence", "value": 2 }` to split the text into an array by every 2 sentences. Works well with markdown or text formats.
concurrency_limit	number	—	Set the concurrency limit to help balance request for slower websites. The default is unlimited.
crawl_timeout	object	—	The `crawl_timeout` parameter allows you to put a max duration on the entire crawl. The default setting is 2 mins. The values for the timeout duration are in the object shape `{ secs: 300, nanos: 0 }`.
data_connectors	object	—	Stream crawl results directly to cloud storage and data services. Configure one or more connectors to automatically receive page data as it is crawled. Supports S3, Google Cloud Storage, Google Sheets, Azure Blob Storage, and Supabase. `{ s3: { bucket, access_key_id, secret_access_key, region?, prefix?, content_type? }, gcs: { bucket, service_account_base64, prefix? }, google_sheets: { spreadsheet_id, service_account_base64, sheet_name? }, azure_blob: { connection_string, container, prefix? }, supabase: { url, anon_key, table }, on_find: bool, on_find_metadata: bool }`
depth	number	25	The crawl limit for maximum depth. If `0`, no limit will be applied.
disable_intercept	boolean	false	Disable request interception when running request as `chrome` or `smart`. This may help bypass pages that use third-party scripts or external domains.
event_tracker	object	—	Track the event request, responses, and automation output when using browser rendering. Pass in the object with the following `requests` and `responses` for the network output of the page. `automation` will send detailed information including a screenshot of each automation step used under `automation_scripts`.
exclude_selector	string	—	A CSS query selector to use for ignoring content from the markup of the response.
execution_scripts	object	—	Run custom JavaScript on certain paths. Requires `chrome` or `smart` request mode. The values should be in the shape `"/path_or_url": "custom js"`.
external_domains	array	—	A list of external domains to treat as one domain. You can use regex paths to include the domains. Set one of the array values to `*` to include all domains.
full_resources	boolean	—	Crawl and download all the resources for a website.
max_credits_allowed	number	—	Set the maximum number of credits to use per run. This will return a blocked by client if the initial response is empty. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny).
max_credits_per_page	number	—	Set the maximum number of credits to use per page. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny).
metadata	boolean	false	Collect metadata about the content found like page title, description, keywords and etc. This could help improve AI interoperability.
preserve_host	boolean	false	Preserve the default HOST header for the client. This may help bypass pages that require a HOST, and when the TLS cannot be determined.
redirect_policy	string	Loose	The network redirect policy to use when performing HTTP request.
request	string	smart	The request type to perform. Use `smart` to perform HTTP request by default until JavaScript rendering is needed for the HTML.
request_timeout	number	60	The timeout to use for request. Timeouts can be from `5-255` seconds.
root_selector	string	—	The root CSS query selector to use extracting content from the markup for the response.
run_in_background	boolean	false	Run the request in the background. Useful if storing data and wanting to trigger crawls to the dashboard.
session	boolean	true	Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session.
sitemap	boolean	false	Include the sitemap results to crawl.
sitemap_only	boolean	false	Only include the sitemap results to crawl.
sitemap_path	string	sitemap.xml	The sitemap URL to use when using `sitemap`.
subdomains	boolean	false	Allow subdomains to be included.
tld	boolean	false	Allow TLD's to be included.
user_agent	string	—	Add a custom HTTP user agent to the request. By default this is set to a random agent.
wait_for	object	—	The `wait_for` parameter allows you to specify various waiting conditions for a website operation. If provided, it contains the following sub-parameters: The key `idle_network` specifies the conditions to wait for the network request to be idle within a period. It can include an optional timeout value. The key `idle_network0` specifies the conditions to wait for the network request to be idle with a max timeout. It can include an optional timeout value. The key `almost_idle_network0` specifies the conditions to wait for the network request to be almost idle with a max timeout. It can include an optional timeout value. The key `selector` specifies the conditions to wait for a particular CSS selector to be found on the page. It includes an optional timeout value, and the CSS selector to wait for. The key `dom` specifies the conditions to wait for a particular element to stop updating for a duration on the page. It includes an optional timeout value, and the CSS selector to wait for. The key `delay` specifies a delay to wait for, with an optional timeout value. The key `page_navigations`set to `true` then waiting for all page navigations will be handled. If `wait_for` is not provided, the default behavior is to wait for the network to be idle for 500 milliseconds. All of the durations are capped at capped at 60 seconds. The values for the timeout duration are in the object shape `{ secs: 10, nanos: 0 }`.
webhooks	object	—	Use webhooks to get notified on events like credit depleted, new pages, metadata, and website status. `{ destination: string, on_credits_depleted: bool, on_credits_half_depleted: bool, on_website_status: bool, on_find: bool, on_find_metadata: bool }`
whitelist	array	—	Whitelist a set of paths that you want to crawl, ignoring all other routes that do not match the patterns. You can use regex patterns to help with the list.

Core(6)disable_hints, limit, lite_mode, max_size, network_blacklist, network_whitelist

Name	Type	Default	Description
disable_hints	boolean	—	Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating `network_blacklist`/`network_whitelist` recommendations based on observed request-pattern outcomes). Hints are enabled by default for all `smart` request modes. Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
limit	number	0	The maximum amount of pages allowed to crawl per website. Remove the value or set it to `0` to crawl all pages.
lite_mode	boolean	—	Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.
max_size	number	—	The max content size in bytes per page response. Content exceeding this limit will be truncated with a smart head/tail strategy that preserves the beginning and end of the content.
network_blacklist	string[]	—	Blocks matching network requests from being fetched/loaded. Use this to reduce bandwidth and noise by preventing known-unneeded third-party resources from ever being requested. Each entry is a string match pattern (commonly a hostname, domain, or URL substring). If both whitelist and blacklist are set, whitelist takes precedence. Good targets: `googletagmanager.com`, `doubleclick.net`, `maps.googleapis.com` Prefer specific domains over broad substrings to avoid breaking essential assets.
network_whitelist	string[]	—	Allows only matching network requests to be fetched/loaded. Use this for a strict "allowlist-first" approach: keep the crawl lightweight while still permitting the essential scripts/styles needed for rendering and JS execution. Each entry is a string match pattern (commonly a hostname, domain, or URL substring). When set, requests not matching any whitelist entry are blocked by default. Start with first-party: `example.com`, `cdn.example.com` Add only what you observe you truly need (fonts/CDNs), then iterate.

Output(16)clean_html, css_extraction_map, encoding, filter_images, filter_output_images, filter_output_main_only, filter_output_svg, filter_svg, link_rewrite, readability, return_cookies, return_embeddings, return_format, return_headers, return_json_data, return_page_links

Name	Type	Default	Description
clean_html	boolean	—	Clean the HTML of unwanted attributes.
css_extraction_map	object	—	Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
encoding	string	—	The type of encoding to use like `UTF-8`, `SHIFT_JIS`, or etc.
filter_images	boolean	—	Filter image elements from the markup.
filter_output_images	boolean	—	Filter the images from the output.
filter_output_main_only	boolean	true	Filter the nav, aside, and footer from the output.
filter_output_svg	boolean	—	Filter the svg tags from the output.
filter_svg	boolean	—	Filter SVG elements from the markup.
link_rewrite	json	—	Optional URL rewrite rule applied to every discovered link before it's crawled. This lets you normalize or redirect URLs (for example, rewriting paths or mapping one host pattern to another). The value must be a JSON object with a `type` field. Supported types: `"replace"` – simple substring replacement. Fields: `host?: string` (optional) – only apply when the link's host matches this value (e.g. `"blog.example.com"`). `find: string` – substring to search for in the URL. `replace_with: string` – replacement substring. `"regex"` – regex-based rewrite with capture groups. Fields: `host?: string` (optional) – only apply for this host. `pattern: string` – regex applied to the full URL. `replace_with: string` – replacement string supporting `$1`, `$2`, etc. Invalid or unsafe regex patterns (overly long, unbalanced parentheses, advanced lookbehind constructs, etc.) are rejected by the server and ignored.
readability	boolean	false	Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage.
return_cookies	boolean	false	Return the HTTP response cookies with the results.
return_embeddings	boolean	false	Include OpenAI embeddings for `title` and `description`. Requires `metadata` to be enabled.
return_format	string \| array	raw	The format to return the data in. Possible values are `markdown`, `commonmark`, `raw`, `text`, `xml`, `bytes`, and `empty`. Use `raw` to return the default format of the page like `HTML` etc.
return_headers	boolean	false	Return the HTTP response headers with the results.
return_json_data	boolean	false	Return the JSON data found in scripts used for SSR.
return_page_links	boolean	false	Return the links found on each page.

Config(7)cookies, fingerprint, headers, proxy, proxy_enabled, remote_proxy, stealth

Name	Type	Default	Description
cookies	string	—	Add HTTP cookies to use for request.
fingerprint	boolean	true	Use advanced fingerprint detection for chrome.
headers	object	—	Forward HTTP headers to use for all requests. The object is expected to be a map of key value pairs.
proxy	'residential' \| 'mobile' \| 'isp'	—	Select the proxy pool for this request. Leave blank to disable proxy routing. Using this param overrides all other `proxy_*` shorthand configurations. See the pricing table for full details. Alternatively, use Proxy-Mode to route standard HTTP traffic through Spider's proxy endpoint.
proxy_enabled	boolean	false	Enable premium high performance proxies to prevent detection and increase speed. You can also use Proxy-Mode to route requests through Spider's proxy front-end instead.
remote_proxy	string	—	Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
stealth	boolean	true	Use stealth mode for headless chrome request to help prevent being blocked.

Performance(5)cache, delay, respect_robots, service_worker_enabled, skip_config_checks

Name	Type	Default	Description
cache	boolean \| { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }	true	Use HTTP caching for the crawl to speed up repeated runs. Defaults to `true`. Accepts either: `true` / `false` A cache control object: `maxAge` (ms) — freshness window (default: `172800000` = 2 days). Set `0` for always fetch fresh. `allowStale` — serve cached results even if stale. `period` — RFC3339 timestamp cutoff (overrides `maxAge`), e.g. `"2025-11-29T12:00:00Z"` `skipBrowser` — skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses. Default behavior by route type: Standard routes (`/crawl`, `/scrape`, `/unblocker`) — cache is `true` with `skipBrowser` enabled by default. Cached pages return instantly without re-launching Chrome. To force a fresh browser fetch, set `cache: false` or `{ "skipBrowser": false }`. AI routes (`/ai/crawl`, `/ai/scrape`, etc.) — cache is `true` but `skipBrowser` is not enabled. AI routes always use the browser to ensure live page content for extraction.
delay	number	0	Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format.
respect_robots	boolean	true	Respect the robots.txt file for crawling.
service_worker_enabled	boolean	true	Allow the website to use Service Workers as needed.
skip_config_checks	boolean	true	Skip checking the database for website configuration. This will increase performance for requests that use limit=1.

Automation(4)automation_scripts, evaluate_on_new_document, scroll, viewport

Name	Type	Default	Description
automation_scripts	object	—	Run custom web automated tasks on certain paths. Requires `chrome` or `smart` request mode. Below are the available actions for web automation: Evaluate: Runs custom JavaScript code. `{ "Evaluate": "console.log('Hello, World!');" }` Click: Clicks on an element identified by a CSS selector. `{ "Click": "button#submit" }` ClickAll: Clicks on all elements matching a CSS selector. `{ "ClickAll": "button.loadMore" }` ClickPoint: Clicks at the position x and y coordinates. `{ "ClickPoint": { "x": 120.5, "y": 340.25 } }` ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.). `{ "ClickAllClickable": true }` ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds. `{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } }` ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds. `{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } }` ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier. `{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } }` ClickDragPoint: Click-and-drag from one point to another with optional modifier. `{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } }` Wait: Waits for a specified duration in milliseconds. `{ "Wait": 2000 }` WaitForNavigation: Waits for the next navigation event. `{ "WaitForNavigation": true }` WaitFor: Waits for an element to appear identified by a CSS selector. `{ "WaitFor": "div#content" }` WaitForWithTimeout: Waits for an element to appear with a timeout (ms). `{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } }` WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector. `{ "WaitForAndClick": "button#loadMore" }` WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms). `{ "WaitForDom": { "selector": "main", "timeout": 12000 } }` ScrollX: Scrolls the screen horizontally by a specified number of pixels. `{ "ScrollX": 100 }` ScrollY: Scrolls the screen vertically by a specified number of pixels. `{ "ScrollY": 200 }` Fill: Fills an input element with a specified value. `{ "Fill": { "selector": "input#name", "value": "John Doe" } }` Type: Type a key into the browser with an optional modifier. `{ "Type": { "value": "John Doe", "modifier": 0 } }` InfiniteScroll: Scrolls the page until the end for certain duration. `{ "InfiniteScroll": 3000 }` Screenshot: Perform a screenshot on the page. `{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } }` ValidateChain: Set this before a step to validate the prior action to break out of the chain. `{ "ValidateChain": true }`
evaluate_on_new_document	string	—	Set a custom script to evaluate on new document creation.
scroll	number	—	Infinite scroll the page as new content loads, up to a duration in milliseconds. You may still need to use the `wait_for` parameters. Requires `chrome` request mode.
viewport	object	—	Configure the viewport for chrome.

Geolocation(2)country_code, locale

Name	Type	Default	Description
country_code	string	—	Set a ISO country code for proxy connections. View the locations list for available countries.
locale	string	—	The locale to use for request, example `en-US`.

Per-endpoint notes

•

Scrape & Unblocker exclude limit, depth, and delay. Single-page endpoints.

•

Screenshot exclude request, return_format, and readability. Returns image data.

Every endpoint below includes these parameters in its own parameter tabs with full descriptions. This section is a quick-reference index.