API Reference
The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication.
Set your API secret key in the authorization header to commence with the format Bearer $TOKEN. You can use the content-type header with application/json, application/xml, text/csv, and application/jsonl for shaping the response.
The Spider API supports bulk updates. You can work on multiple objects per request for the core API endpoints.
You can add v1 before any path to lock in that version. Executing a request on the page by pressing the Run button will consume live credits and treat the response as a genuine result.
Just getting started?
Check out our development quickstart guide.
Not a developer?
Use Spiders no-code options or applications to get started with Spider and to do more with your Spider account no code required.
https://api.spider.cloudClient libraries
Crawl
View detailsStart crawling website(s) to collect resources. You can pass an array of objects for the request body.
Body
application/jsonCrawl API - url string required
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
Crawl API - limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages. Defaults to0.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
Crawl API - disable_hints boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
Tip
If you’re tuning filters, keep hints enabled and pair with
event_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
Crawl API - request string
The request type to perform. Possible values are
http,chrome, andsmart. Usesmartto perform HTTP request by default until JavaScript rendering is needed for the HTML. Defaults tosmart.The request greatly influences how the output is going to look. If the page is server-side rendered, you can stick to the defaults for the most part.
Crawl API - depth number
The crawl limit for maximum depth. If
0, no limit will be applied. The default is set to25.Depth allows you to place a distance between the base URL path and sub paths.
Crawl API - metadata boolean
Collect metadata about the content found like page title, description, keywards and etc. This could help improve AI interoperability. Defaults to
false.Using metadata can help extract critical information to use for AI.
Crawl API - return_format string | array
The format to return the data in. Possible values are
markdown,commonmark,raw,text,xml,bytes, andempty. Userawto return the default format of the page likeHTMLetc. Defaults toraw.Usually you want to use
markdownfor LLM processing ortext. If you need to store the files without losing any encoding, use thebytesorrawformat. PDF transformations may take up to 1 cent per page for high accuracy.Crawl API - readability boolean
Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. Defaults to
false.This uses the Safari Reader Mode algorithm to extract only important information from the content.
Crawl API - css_extraction_map object
Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
Scrape content using CSS selectors to get data. You can scrape using selectors at no extra cost.
Crawl API - proxy 'residential' | 'mobile' | 'isp'
Select the proxy pool you’d like to use for this request. Leave it blank to disable proxy routing. Supported values are:
residential– entry-level residential pool.mobile– 4G / 5G mobile proxies for maximum stealth.isp(datacenteralias) – ISP-grade / datacenter-style routing with residential ASN.
Each pool carries a different price multiplier (from ×1.2 for
residentialup to ×2 formobile). See the pricing table for full details. Using this param overrides all otherproxy_*shorthand configurations.Crawl API - remote_proxy string
Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
Use your own proxy to bypass any firewall as needed or connect to private web servers.
Crawl API - cache boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }
Use HTTP caching for the crawl to speed up repeated runs. Defaults to
false.Accepts either:
true/false- A cache control object:
maxAge(ms) — freshness window (default:172800000= 2 days). Set0for always fetch fresh.allowStale— serve cached results even if stale.period— RFC3339 timestamp cutoff (overridesmaxAge), e.g."2025-11-29T12:00:00Z"skipBrowser— skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.
Enabling caching can save costs on repeated runs and when using chrome to get assets on pages.
Crawl API - delay number
Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format. Defaults to
0, which indicates it is disabled.Using a delay can help with websites that are set on a cron and do not require immediate data retrieval.
Crawl API - respect_robots boolean
Respect the robots.txt file for crawling. Defaults to
true.If you have trouble crawling a website it may be an issue with the robots.txt file. Setting the value to
falsecould help. Make sure to use this config sparingly.
Crawl API - scroll number
Infinite scroll the page as new content loads, up to a duration in milliseconds. The duration represents the maximum time you would wait to scroll. You may still need to use the
wait_forparameters. You also need to ensure that the request is made usingchrome.Use the
wait_forconfiguration to scroll until anddisable_interceptto make sure you get data from the network regardless of hostname.Crawl API - viewport object
Configure the viewport for chrome. Defaults to a random desktop viewport.
If you need to get data from a website as a mobile, set the viewport to a phone device's size ex:
375x414.Crawl API - automation_scripts object
Run custom web automated tasks on certain paths. The
requestmode to be made throughchromeorsmartto run.Custom web automation allows you to take control of the browser with events for up to 60 seconds at a time per page.
Below are the available actions for web automation:- Evaluate: Runs custom JavaScript code.
{ "Evaluate": "console.log('Hello, World!');" } - Click: Clicks on an element identified by a CSS selector.
{ "Click": "button#submit" } - ClickAll: Clicks on all elements matching a CSS selector.
{ "ClickAll": "button.loadMore" } - ClickPoint: Clicks at the position x and y coordinates.
{ "ClickPoint": { "x": 120.5, "y": 340.25 } } - ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
{ "ClickAllClickable": true } - ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } } - ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } } - ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } } - ClickDragPoint: Click-and-drag from one point to another with optional modifier.
{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } } - Wait: Waits for a specified duration in milliseconds.
{ "Wait": 2000 } - WaitForNavigation: Waits for the next navigation event.
{ "WaitForNavigation": true } - WaitFor: Waits for an element to appear identified by a CSS selector.
{ "WaitFor": "div#content" } - WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } } - WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
{ "WaitForAndClick": "button#loadMore" } - WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
{ "WaitForDom": { "selector": "main", "timeout": 12000 } } - ScrollX: Scrolls the screen horizontally by a specified number of pixels.
{ "ScrollX": 100 } - ScrollY: Scrolls the screen vertically by a specified number of pixels.
{ "ScrollY": 200 } - Fill: Fills an input element with a specified value.
{ "Fill": { "selector": "input#name", "value": "John Doe" } } - Type: Type a key into the browser with an optional modifier.
{ "Type": { "value": "John Doe", "modifier": 0 } } - InfiniteScroll: Scrolls the page until the end for certain duration.
{ "InfiniteScroll": 3000 } - Screenshot: Perform a screenshot on the page.
{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } } - ValidateChain: Set this before a step to validate the prior action to break out of the chain.
{ "ValidateChain": true }
- Evaluate: Runs custom JavaScript code.
Crawl API - country_code string
Set a ISO country code for proxy connections.
The country code allows you to run requests in regions where access to the website is restricted to within that specific region. View the locations list for available countries.
Crawl API - locale string
The locale to use for request, example
en-US.
[ { "content": "<resource>...", "error": null, "status": 200, "duration_elapsed_ms": 122, "costs": { "ai_cost": 0, "compute_cost": 0.00001, "file_cost": 0.00002, "bytes_transferred_cost": 0.00002, "total_cost": 0.00004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Scrape
View detailsStart scraping a single page on website(s) to collect resources. You can pass an array of objects for the request body.
Body
application/jsonScrape API - url string required
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
Scrape API - disable_hints boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
Tip
If you’re tuning filters, keep hints enabled and pair with
event_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.Scrape API - lite_mode boolean
Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.
Scrape API - request string
The request type to perform. Possible values are
http,chrome, andsmart. Usesmartto perform HTTP request by default until JavaScript rendering is needed for the HTML. Defaults tosmart.The request greatly influences how the output is going to look. If the page is server-side rendered, you can stick to the defaults for the most part.
Scrape API - metadata boolean
Collect metadata about the content found like page title, description, keywards and etc. This could help improve AI interoperability. Defaults to
false.Using metadata can help extract critical information to use for AI.
Scrape API - session boolean
Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session. Defaults to
true.
Scrape API - return_format string | array
The format to return the data in. Possible values are
markdown,commonmark,raw,text,xml,bytes, andempty. Userawto return the default format of the page likeHTMLetc. Defaults toraw.Usually you want to use
markdownfor LLM processing ortext. If you need to store the files without losing any encoding, use thebytesorrawformat. PDF transformations may take up to 1 cent per page for high accuracy.Scrape API - readability boolean
Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. Defaults to
false.This uses the Safari Reader Mode algorithm to extract only important information from the content.
Scrape API - css_extraction_map object
Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
Scrape content using CSS selectors to get data. You can scrape using selectors at no extra cost.
Scrape API - proxy 'residential' | 'mobile' | 'isp'
Select the proxy pool you’d like to use for this request. Leave it blank to disable proxy routing. Supported values are:
residential– entry-level residential pool.mobile– 4G / 5G mobile proxies for maximum stealth.isp(datacenteralias) – ISP-grade / datacenter-style routing with residential ASN.
Each pool carries a different price multiplier (from ×1.2 for
residentialup to ×2 formobile). See the pricing table for full details. Using this param overrides all otherproxy_*shorthand configurations.Scrape API - remote_proxy string
Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
Use your own proxy to bypass any firewall as needed or connect to private web servers.
Scrape API - cache boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }
Use HTTP caching for the crawl to speed up repeated runs. Defaults to
false.Accepts either:
true/false- A cache control object:
maxAge(ms) — freshness window (default:172800000= 2 days). Set0for always fetch fresh.allowStale— serve cached results even if stale.period— RFC3339 timestamp cutoff (overridesmaxAge), e.g."2025-11-29T12:00:00Z"skipBrowser— skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.
Enabling caching can save costs on repeated runs and when using chrome to get assets on pages.
Scrape API - respect_robots boolean
Respect the robots.txt file for crawling. Defaults to
true.If you have trouble crawling a website it may be an issue with the robots.txt file. Setting the value to
falsecould help. Make sure to use this config sparingly.Scrape API - skip_config_checks boolean
Skip checking the database for website configuration. This will increase performance for request that are using limit=1. Set the value to false in order to get the configs. Defaults to
true.
Scrape API - scroll number
Infinite scroll the page as new content loads, up to a duration in milliseconds. The duration represents the maximum time you would wait to scroll. You may still need to use the
wait_forparameters. You also need to ensure that the request is made usingchrome.Use the
wait_forconfiguration to scroll until anddisable_interceptto make sure you get data from the network regardless of hostname.Scrape API - viewport object
Configure the viewport for chrome. Defaults to a random desktop viewport.
If you need to get data from a website as a mobile, set the viewport to a phone device's size ex:
375x414.Scrape API - automation_scripts object
Run custom web automated tasks on certain paths. The
requestmode to be made throughchromeorsmartto run.Custom web automation allows you to take control of the browser with events for up to 60 seconds at a time per page.
Below are the available actions for web automation:- Evaluate: Runs custom JavaScript code.
{ "Evaluate": "console.log('Hello, World!');" } - Click: Clicks on an element identified by a CSS selector.
{ "Click": "button#submit" } - ClickAll: Clicks on all elements matching a CSS selector.
{ "ClickAll": "button.loadMore" } - ClickPoint: Clicks at the position x and y coordinates.
{ "ClickPoint": { "x": 120.5, "y": 340.25 } } - ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
{ "ClickAllClickable": true } - ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } } - ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } } - ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } } - ClickDragPoint: Click-and-drag from one point to another with optional modifier.
{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } } - Wait: Waits for a specified duration in milliseconds.
{ "Wait": 2000 } - WaitForNavigation: Waits for the next navigation event.
{ "WaitForNavigation": true } - WaitFor: Waits for an element to appear identified by a CSS selector.
{ "WaitFor": "div#content" } - WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } } - WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
{ "WaitForAndClick": "button#loadMore" } - WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
{ "WaitForDom": { "selector": "main", "timeout": 12000 } } - ScrollX: Scrolls the screen horizontally by a specified number of pixels.
{ "ScrollX": 100 } - ScrollY: Scrolls the screen vertically by a specified number of pixels.
{ "ScrollY": 200 } - Fill: Fills an input element with a specified value.
{ "Fill": { "selector": "input#name", "value": "John Doe" } } - Type: Type a key into the browser with an optional modifier.
{ "Type": { "value": "John Doe", "modifier": 0 } } - InfiniteScroll: Scrolls the page until the end for certain duration.
{ "InfiniteScroll": 3000 } - Screenshot: Perform a screenshot on the page.
{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } } - ValidateChain: Set this before a step to validate the prior action to break out of the chain.
{ "ValidateChain": true }
- Evaluate: Runs custom JavaScript code.
Scrape API - country_code string
Set a ISO country code for proxy connections.
The country code allows you to run requests in regions where access to the website is restricted to within that specific region. View the locations list for available countries.
Scrape API - locale string
The locale to use for request, example
en-US.
[ { "content": "<resource>...", "error": null, "status": 200, "duration_elapsed_ms": 122, "costs": { "ai_cost": 0, "compute_cost": 0.00001, "file_cost": 0.00002, "bytes_transferred_cost": 0.00002, "total_cost": 0.00004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Unblocker
View detailsStart unblocking challenging website(s) to collect data. You can pass an array of objects for the request body. Cost 10-40 credits additional per success.
Body
application/jsonUnblocker API - url string required
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
Unblocker API - disable_hints boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
Tip
If you’re tuning filters, keep hints enabled and pair with
event_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.Unblocker API - lite_mode boolean
Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.
Unblocker API - request string
The request type to perform. Possible values are
http,chrome, andsmart. Usesmartto perform HTTP request by default until JavaScript rendering is needed for the HTML. Defaults tosmart.The request greatly influences how the output is going to look. If the page is server-side rendered, you can stick to the defaults for the most part.
Unblocker API - metadata boolean
Collect metadata about the content found like page title, description, keywards and etc. This could help improve AI interoperability. Defaults to
false.Using metadata can help extract critical information to use for AI.
Unblocker API - session boolean
Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session. Defaults to
true.
Unblocker API - return_format string | array
The format to return the data in. Possible values are
markdown,commonmark,raw,text,xml,bytes, andempty. Userawto return the default format of the page likeHTMLetc. Defaults toraw.Usually you want to use
markdownfor LLM processing ortext. If you need to store the files without losing any encoding, use thebytesorrawformat. PDF transformations may take up to 1 cent per page for high accuracy.Unblocker API - readability boolean
Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. Defaults to
false.This uses the Safari Reader Mode algorithm to extract only important information from the content.
Unblocker API - css_extraction_map object
Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
Scrape content using CSS selectors to get data. You can scrape using selectors at no extra cost.
Unblocker API - proxy 'residential' | 'mobile' | 'isp'
Select the proxy pool you’d like to use for this request. Leave it blank to disable proxy routing. Supported values are:
residential– entry-level residential pool.mobile– 4G / 5G mobile proxies for maximum stealth.isp(datacenteralias) – ISP-grade / datacenter-style routing with residential ASN.
Each pool carries a different price multiplier (from ×1.2 for
residentialup to ×2 formobile). See the pricing table for full details. Using this param overrides all otherproxy_*shorthand configurations.Unblocker API - remote_proxy string
Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
Use your own proxy to bypass any firewall as needed or connect to private web servers.
Unblocker API - cache boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }
Use HTTP caching for the crawl to speed up repeated runs. Defaults to
false.Accepts either:
true/false- A cache control object:
maxAge(ms) — freshness window (default:172800000= 2 days). Set0for always fetch fresh.allowStale— serve cached results even if stale.period— RFC3339 timestamp cutoff (overridesmaxAge), e.g."2025-11-29T12:00:00Z"skipBrowser— skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.
Enabling caching can save costs on repeated runs and when using chrome to get assets on pages.
Unblocker API - respect_robots boolean
Respect the robots.txt file for crawling. Defaults to
true.If you have trouble crawling a website it may be an issue with the robots.txt file. Setting the value to
falsecould help. Make sure to use this config sparingly.Unblocker API - skip_config_checks boolean
Skip checking the database for website configuration. This will increase performance for request that are using limit=1. Set the value to false in order to get the configs. Defaults to
true.
Unblocker API - scroll number
Infinite scroll the page as new content loads, up to a duration in milliseconds. The duration represents the maximum time you would wait to scroll. You may still need to use the
wait_forparameters. You also need to ensure that the request is made usingchrome.Use the
wait_forconfiguration to scroll until anddisable_interceptto make sure you get data from the network regardless of hostname.Unblocker API - viewport object
Configure the viewport for chrome. Defaults to a random desktop viewport.
If you need to get data from a website as a mobile, set the viewport to a phone device's size ex:
375x414.Unblocker API - automation_scripts object
Run custom web automated tasks on certain paths. The
requestmode to be made throughchromeorsmartto run.Custom web automation allows you to take control of the browser with events for up to 60 seconds at a time per page.
Below are the available actions for web automation:- Evaluate: Runs custom JavaScript code.
{ "Evaluate": "console.log('Hello, World!');" } - Click: Clicks on an element identified by a CSS selector.
{ "Click": "button#submit" } - ClickAll: Clicks on all elements matching a CSS selector.
{ "ClickAll": "button.loadMore" } - ClickPoint: Clicks at the position x and y coordinates.
{ "ClickPoint": { "x": 120.5, "y": 340.25 } } - ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
{ "ClickAllClickable": true } - ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } } - ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } } - ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } } - ClickDragPoint: Click-and-drag from one point to another with optional modifier.
{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } } - Wait: Waits for a specified duration in milliseconds.
{ "Wait": 2000 } - WaitForNavigation: Waits for the next navigation event.
{ "WaitForNavigation": true } - WaitFor: Waits for an element to appear identified by a CSS selector.
{ "WaitFor": "div#content" } - WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } } - WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
{ "WaitForAndClick": "button#loadMore" } - WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
{ "WaitForDom": { "selector": "main", "timeout": 12000 } } - ScrollX: Scrolls the screen horizontally by a specified number of pixels.
{ "ScrollX": 100 } - ScrollY: Scrolls the screen vertically by a specified number of pixels.
{ "ScrollY": 200 } - Fill: Fills an input element with a specified value.
{ "Fill": { "selector": "input#name", "value": "John Doe" } } - Type: Type a key into the browser with an optional modifier.
{ "Type": { "value": "John Doe", "modifier": 0 } } - InfiniteScroll: Scrolls the page until the end for certain duration.
{ "InfiniteScroll": 3000 } - Screenshot: Perform a screenshot on the page.
{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } } - ValidateChain: Set this before a step to validate the prior action to break out of the chain.
{ "ValidateChain": true }
- Evaluate: Runs custom JavaScript code.
Unblocker API - country_code string
Set a ISO country code for proxy connections.
The country code allows you to run requests in regions where access to the website is restricted to within that specific region. View the locations list for available countries.
Unblocker API - locale string
The locale to use for request, example
en-US.
[ { "url": "https://spider.cloud", "status": 200, "cookies": { "a": "something", "b": "something2" }, "headers": { "x-id": 123, "x-cookie": 123 }, "status": 200, "costs": { "ai_cost": 0.001, "ai_cost_formatted": "0.0010", "bytes_transferred_cost": 3.1649999999999997e-9, "bytes_transferred_cost_formatted": "0.0000000031649999999999997240", "compute_cost": 0.0, "compute_cost_formatted": "0", "file_cost": 0.000029291250000000002, "file_cost_formatted": "0.0000292912499999999997868372", "total_cost": 0.0010292944150000001, "total_cost_formatted": "0.0010292944149999999997865612", "transform_cost": 0.0, "transform_cost_formatted": "0" }, "content": "<html>...</html>", "error": null }, // more content... ]
Search
View detailsPerform a Google search to gather a list of websites for crawling and resource collection, including fallback options if the query yields no results. You can pass an array of objects for the request body.
Body
application/jsonSearch API - search string required
The search query you want to search for.
Search API - limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages. Defaults to0.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
Search API - quick_search boolean
Prioritize speed over output quantity.
Search API - search_limit number
The limit amount of URLs to fetch or crawl from the search results. Remove the value or set it to
0to crawl all URLs from the realtime search results. This is a shorthand if you do not want to usenum.Search API - fetch_page_content boolean
Fetch all the content of the websites by performing crawls. If this is disabled, only the search results are returned instead with the meta
titleanddescription. Defaults tofalse.Search API - request string
The request type to perform. Possible values are
http,chrome, andsmart. Usesmartto perform HTTP request by default until JavaScript rendering is needed for the HTML. Defaults tosmart.The request greatly influences how the output is going to look. If the page is server-side rendered, you can stick to the defaults for the most part.
Search API - country string
The country code to use for the search. It's a two-letter country code. (e.g.
usfor the United States).Search API - location string
The location from where you want the search to originate.
Search API - language string
The language to use for the search. It's a two-letter language code (e.g.,
enfor English).
Search API - return_format string | array
The format to return the data in. Possible values are
markdown,commonmark,raw,text,xml,bytes, andempty. Userawto return the default format of the page likeHTMLetc. Defaults toraw.Usually you want to use
markdownfor LLM processing ortext. If you need to store the files without losing any encoding, use thebytesorrawformat. PDF transformations may take up to 1 cent per page for high accuracy.Search API - readability boolean
Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. Defaults to
false.This uses the Safari Reader Mode algorithm to extract only important information from the content.
Search API - css_extraction_map object
Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
Scrape content using CSS selectors to get data. You can scrape using selectors at no extra cost.
Search API - proxy 'residential' | 'mobile' | 'isp'
Select the proxy pool you’d like to use for this request. Leave it blank to disable proxy routing. Supported values are:
residential– entry-level residential pool.mobile– 4G / 5G mobile proxies for maximum stealth.isp(datacenteralias) – ISP-grade / datacenter-style routing with residential ASN.
Each pool carries a different price multiplier (from ×1.2 for
residentialup to ×2 formobile). See the pricing table for full details. Using this param overrides all otherproxy_*shorthand configurations.Search API - remote_proxy string
Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
Use your own proxy to bypass any firewall as needed or connect to private web servers.
Search API - cache boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }
Use HTTP caching for the crawl to speed up repeated runs. Defaults to
false.Accepts either:
true/false- A cache control object:
maxAge(ms) — freshness window (default:172800000= 2 days). Set0for always fetch fresh.allowStale— serve cached results even if stale.period— RFC3339 timestamp cutoff (overridesmaxAge), e.g."2025-11-29T12:00:00Z"skipBrowser— skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.
Enabling caching can save costs on repeated runs and when using chrome to get assets on pages.
Search API - delay number
Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format. Defaults to
0, which indicates it is disabled.Using a delay can help with websites that are set on a cron and do not require immediate data retrieval.
Search API - respect_robots boolean
Respect the robots.txt file for crawling. Defaults to
true.If you have trouble crawling a website it may be an issue with the robots.txt file. Setting the value to
falsecould help. Make sure to use this config sparingly.
Search API - scroll number
Infinite scroll the page as new content loads, up to a duration in milliseconds. The duration represents the maximum time you would wait to scroll. You may still need to use the
wait_forparameters. You also need to ensure that the request is made usingchrome.Use the
wait_forconfiguration to scroll until anddisable_interceptto make sure you get data from the network regardless of hostname.Search API - viewport object
Configure the viewport for chrome. Defaults to a random desktop viewport.
If you need to get data from a website as a mobile, set the viewport to a phone device's size ex:
375x414.Search API - automation_scripts object
Run custom web automated tasks on certain paths. The
requestmode to be made throughchromeorsmartto run.Custom web automation allows you to take control of the browser with events for up to 60 seconds at a time per page.
Below are the available actions for web automation:- Evaluate: Runs custom JavaScript code.
{ "Evaluate": "console.log('Hello, World!');" } - Click: Clicks on an element identified by a CSS selector.
{ "Click": "button#submit" } - ClickAll: Clicks on all elements matching a CSS selector.
{ "ClickAll": "button.loadMore" } - ClickPoint: Clicks at the position x and y coordinates.
{ "ClickPoint": { "x": 120.5, "y": 340.25 } } - ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
{ "ClickAllClickable": true } - ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } } - ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } } - ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } } - ClickDragPoint: Click-and-drag from one point to another with optional modifier.
{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } } - Wait: Waits for a specified duration in milliseconds.
{ "Wait": 2000 } - WaitForNavigation: Waits for the next navigation event.
{ "WaitForNavigation": true } - WaitFor: Waits for an element to appear identified by a CSS selector.
{ "WaitFor": "div#content" } - WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } } - WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
{ "WaitForAndClick": "button#loadMore" } - WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
{ "WaitForDom": { "selector": "main", "timeout": 12000 } } - ScrollX: Scrolls the screen horizontally by a specified number of pixels.
{ "ScrollX": 100 } - ScrollY: Scrolls the screen vertically by a specified number of pixels.
{ "ScrollY": 200 } - Fill: Fills an input element with a specified value.
{ "Fill": { "selector": "input#name", "value": "John Doe" } } - Type: Type a key into the browser with an optional modifier.
{ "Type": { "value": "John Doe", "modifier": 0 } } - InfiniteScroll: Scrolls the page until the end for certain duration.
{ "InfiniteScroll": 3000 } - Screenshot: Perform a screenshot on the page.
{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } } - ValidateChain: Set this before a step to validate the prior action to break out of the chain.
{ "ValidateChain": true }
- Evaluate: Runs custom JavaScript code.
{ "content": [ { "description": "Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.", "title": "ESPN - Serving Sports Fans. Anytime. Anywhere.", "url": "https://www.espn.com/" }, { "description": "Sports Illustrated, SI.com provides sports news, expert analysis, highlights, stats and scores for the NFL, NBA, MLB, NHL, college football, soccer, ...", "title": "Sports Illustrated", "url": "https://www.si.com/" }, { "description": "CBS Sports features live scoring, news, stats, and player info for NFL football, MLB baseball, NBA basketball, NHL hockey, college basketball and football.", "title": "CBS Sports - News, Live Scores, Schedules, Fantasy ...", "url": "https://www.cbssports.com/" }, { "description": "Sport is a form of physical activity or game. Often competitive and organized, sports use, maintain, or improve physical ability and skills.", "title": "Sport", "url": "https://en.wikipedia.org/wiki/Sport" }, { "description": "Watch FOX Sports and view live scores, odds, team news, player news, streams, videos, stats, standings & schedules covering NFL, MLB, NASCAR, WWE, NBA, NHL, ...", "title": "FOX Sports News, Scores, Schedules, Odds, Shows, Streams ...", "url": "https://www.foxsports.com/" }, { "description": "Founded in 1974 by tennis legend, Billie Jean King, the Women's Sports Foundation is dedicated to creating leaders by providing girls access to sports.", "title": "Women's Sports Foundation: Home", "url": "https://www.womenssportsfoundation.org/" }, { "description": "List of sports · Running. Marathon · Sprint · Mascot race · Airsoft · Laser tag · Paintball · Bobsleigh · Jack jumping · Luge · Shovel racing · Card stacking ...", "title": "List of sports", "url": "https://en.wikipedia.org/wiki/List_of_sports" }, { "description": "Stay up-to-date with the latest sports news and scores from NBC Sports.", "title": "NBC Sports - news, scores, stats, rumors, videos, and more", "url": "https://www.nbcsports.com/" }, { "description": "r/sports: Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.", "title": "r/sports", "url": "https://www.reddit.com/r/sports/" }, { "description": "The A-Z of sports covered by the BBC Sport team. Find all the latest live sports coverage, breaking news, results, scores, fixtures, tables, ...", "title": "AZ Sport", "url": "https://www.bbc.com/sport/all-sports" } ] }
Links
View detailsStart crawling a website(s) to collect links found. You can pass an array of objects for the request body. This endpoint can save on latency if you only need to index the content URLs.
Body
application/jsonGet API - url string required
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
Get API - limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages. Defaults to0.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
Get API - disable_hints boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
Tip
If you’re tuning filters, keep hints enabled and pair with
event_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
Get API - request string
The request type to perform. Possible values are
http,chrome, andsmart. Usesmartto perform HTTP request by default until JavaScript rendering is needed for the HTML. Defaults tosmart.The request greatly influences how the output is going to look. If the page is server-side rendered, you can stick to the defaults for the most part.
Get API - depth number
The crawl limit for maximum depth. If
0, no limit will be applied. The default is set to25.Depth allows you to place a distance between the base URL path and sub paths.
Get API - metadata boolean
Collect metadata about the content found like page title, description, keywards and etc. This could help improve AI interoperability. Defaults to
false.Using metadata can help extract critical information to use for AI.
Get API - return_format string | array
The format to return the data in. Possible values are
markdown,commonmark,raw,text,xml,bytes, andempty. Userawto return the default format of the page likeHTMLetc. Defaults toraw.Usually you want to use
markdownfor LLM processing ortext. If you need to store the files without losing any encoding, use thebytesorrawformat. PDF transformations may take up to 1 cent per page for high accuracy.Get API - readability boolean
Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. Defaults to
false.This uses the Safari Reader Mode algorithm to extract only important information from the content.
Get API - css_extraction_map object
Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
Scrape content using CSS selectors to get data. You can scrape using selectors at no extra cost.
Get API - proxy 'residential' | 'mobile' | 'isp'
Select the proxy pool you’d like to use for this request. Leave it blank to disable proxy routing. Supported values are:
residential– entry-level residential pool.mobile– 4G / 5G mobile proxies for maximum stealth.isp(datacenteralias) – ISP-grade / datacenter-style routing with residential ASN.
Each pool carries a different price multiplier (from ×1.2 for
residentialup to ×2 formobile). See the pricing table for full details. Using this param overrides all otherproxy_*shorthand configurations.Get API - remote_proxy string
Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
Use your own proxy to bypass any firewall as needed or connect to private web servers.
Get API - cache boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }
Use HTTP caching for the crawl to speed up repeated runs. Defaults to
false.Accepts either:
true/false- A cache control object:
maxAge(ms) — freshness window (default:172800000= 2 days). Set0for always fetch fresh.allowStale— serve cached results even if stale.period— RFC3339 timestamp cutoff (overridesmaxAge), e.g."2025-11-29T12:00:00Z"skipBrowser— skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.
Enabling caching can save costs on repeated runs and when using chrome to get assets on pages.
Get API - delay number
Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format. Defaults to
0, which indicates it is disabled.Using a delay can help with websites that are set on a cron and do not require immediate data retrieval.
Get API - respect_robots boolean
Respect the robots.txt file for crawling. Defaults to
true.If you have trouble crawling a website it may be an issue with the robots.txt file. Setting the value to
falsecould help. Make sure to use this config sparingly.
Get API - scroll number
Infinite scroll the page as new content loads, up to a duration in milliseconds. The duration represents the maximum time you would wait to scroll. You may still need to use the
wait_forparameters. You also need to ensure that the request is made usingchrome.Use the
wait_forconfiguration to scroll until anddisable_interceptto make sure you get data from the network regardless of hostname.Get API - viewport object
Configure the viewport for chrome. Defaults to a random desktop viewport.
If you need to get data from a website as a mobile, set the viewport to a phone device's size ex:
375x414.Get API - automation_scripts object
Run custom web automated tasks on certain paths. The
requestmode to be made throughchromeorsmartto run.Custom web automation allows you to take control of the browser with events for up to 60 seconds at a time per page.
Below are the available actions for web automation:- Evaluate: Runs custom JavaScript code.
{ "Evaluate": "console.log('Hello, World!');" } - Click: Clicks on an element identified by a CSS selector.
{ "Click": "button#submit" } - ClickAll: Clicks on all elements matching a CSS selector.
{ "ClickAll": "button.loadMore" } - ClickPoint: Clicks at the position x and y coordinates.
{ "ClickPoint": { "x": 120.5, "y": 340.25 } } - ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
{ "ClickAllClickable": true } - ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } } - ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } } - ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } } - ClickDragPoint: Click-and-drag from one point to another with optional modifier.
{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } } - Wait: Waits for a specified duration in milliseconds.
{ "Wait": 2000 } - WaitForNavigation: Waits for the next navigation event.
{ "WaitForNavigation": true } - WaitFor: Waits for an element to appear identified by a CSS selector.
{ "WaitFor": "div#content" } - WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } } - WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
{ "WaitForAndClick": "button#loadMore" } - WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
{ "WaitForDom": { "selector": "main", "timeout": 12000 } } - ScrollX: Scrolls the screen horizontally by a specified number of pixels.
{ "ScrollX": 100 } - ScrollY: Scrolls the screen vertically by a specified number of pixels.
{ "ScrollY": 200 } - Fill: Fills an input element with a specified value.
{ "Fill": { "selector": "input#name", "value": "John Doe" } } - Type: Type a key into the browser with an optional modifier.
{ "Type": { "value": "John Doe", "modifier": 0 } } - InfiniteScroll: Scrolls the page until the end for certain duration.
{ "InfiniteScroll": 3000 } - Screenshot: Perform a screenshot on the page.
{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } } - ValidateChain: Set this before a step to validate the prior action to break out of the chain.
{ "ValidateChain": true }
- Evaluate: Runs custom JavaScript code.
Get API - country_code string
Set a ISO country code for proxy connections.
The country code allows you to run requests in regions where access to the website is restricted to within that specific region. View the locations list for available countries.
Get API - locale string
The locale to use for request, example
en-US.
[ { "url": "https://spider.cloud", "status": 200, "duration_elasped_ms": 112 "error": null }, // more content... ]
Screenshot
View detailsTake screenshots of a website to base64 or binary encoding. You can pass an array of objects for the request body.
Body
application/jsonScreenshot API - url string required
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
Screenshot API - limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages. Defaults to0.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
Screenshot API - disable_hints boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
Tip
If you’re tuning filters, keep hints enabled and pair with
event_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
Screenshot API - depth number
The crawl limit for maximum depth. If
0, no limit will be applied. The default is set to25.Depth allows you to place a distance between the base URL path and sub paths.
Screenshot API - metadata boolean
Collect metadata about the content found like page title, description, keywards and etc. This could help improve AI interoperability. Defaults to
false.Using metadata can help extract critical information to use for AI.
Screenshot API - session boolean
Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session. Defaults to
true.
Screenshot API - css_extraction_map object
Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.
Scrape content using CSS selectors to get data. You can scrape using selectors at no extra cost.
Screenshot API - link_rewrite json
Optional URL rewrite rule applied to every discovered link before it's crawled. This lets you normalize or redirect URLs (for example, rewriting paths or mapping one host pattern to another).
The value must be a JSON object with a
typefield. Supported types:"replace"– simple substring replacement.
Fields:host?: string(optional) – only apply when the link's host matches this value (e.g."blog.example.com").find: string– substring to search for in the URL.replace_with: string– replacement substring.
"regex"– regex-based rewrite with capture groups.
Fields:host?: string(optional) – only apply for this host.pattern: string– regex applied to the full URL.replace_with: string– replacement string supporting$1,$2, etc.
Invalid or unsafe regex patterns (overly long, unbalanced parentheses, advanced lookbehind constructs, etc.) are rejected by the server and ignored.
Screenshot API - clean_html boolean
Clean the HTML of unwanted attributes.
Screenshot API - proxy 'residential' | 'mobile' | 'isp'
Select the proxy pool you’d like to use for this request. Leave it blank to disable proxy routing. Supported values are:
residential– entry-level residential pool.mobile– 4G / 5G mobile proxies for maximum stealth.isp(datacenteralias) – ISP-grade / datacenter-style routing with residential ASN.
Each pool carries a different price multiplier (from ×1.2 for
residentialup to ×2 formobile). See the pricing table for full details. Using this param overrides all otherproxy_*shorthand configurations.Screenshot API - remote_proxy string
Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.
Use your own proxy to bypass any firewall as needed or connect to private web servers.
Screenshot API - cache boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }
Use HTTP caching for the crawl to speed up repeated runs. Defaults to
false.Accepts either:
true/false- A cache control object:
maxAge(ms) — freshness window (default:172800000= 2 days). Set0for always fetch fresh.allowStale— serve cached results even if stale.period— RFC3339 timestamp cutoff (overridesmaxAge), e.g."2025-11-29T12:00:00Z"skipBrowser— skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.
Enabling caching can save costs on repeated runs and when using chrome to get assets on pages.
Screenshot API - delay number
Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format. Defaults to
0, which indicates it is disabled.Using a delay can help with websites that are set on a cron and do not require immediate data retrieval.
Screenshot API - respect_robots boolean
Respect the robots.txt file for crawling. Defaults to
true.If you have trouble crawling a website it may be an issue with the robots.txt file. Setting the value to
falsecould help. Make sure to use this config sparingly.
Screenshot API - scroll number
Infinite scroll the page as new content loads, up to a duration in milliseconds. The duration represents the maximum time you would wait to scroll. You may still need to use the
wait_forparameters. You also need to ensure that the request is made usingchrome.Use the
wait_forconfiguration to scroll until anddisable_interceptto make sure you get data from the network regardless of hostname.Screenshot API - viewport object
Configure the viewport for chrome. Defaults to a random desktop viewport.
If you need to get data from a website as a mobile, set the viewport to a phone device's size ex:
375x414.Screenshot API - automation_scripts object
Run custom web automated tasks on certain paths. The
requestmode to be made throughchromeorsmartto run.Custom web automation allows you to take control of the browser with events for up to 60 seconds at a time per page.
Below are the available actions for web automation:- Evaluate: Runs custom JavaScript code.
{ "Evaluate": "console.log('Hello, World!');" } - Click: Clicks on an element identified by a CSS selector.
{ "Click": "button#submit" } - ClickAll: Clicks on all elements matching a CSS selector.
{ "ClickAll": "button.loadMore" } - ClickPoint: Clicks at the position x and y coordinates.
{ "ClickPoint": { "x": 120.5, "y": 340.25 } } - ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
{ "ClickAllClickable": true } - ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
{ "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } } - ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
{ "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } } - ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
{ "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } } - ClickDragPoint: Click-and-drag from one point to another with optional modifier.
{ "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } } - Wait: Waits for a specified duration in milliseconds.
{ "Wait": 2000 } - WaitForNavigation: Waits for the next navigation event.
{ "WaitForNavigation": true } - WaitFor: Waits for an element to appear identified by a CSS selector.
{ "WaitFor": "div#content" } - WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
{ "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } } - WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
{ "WaitForAndClick": "button#loadMore" } - WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
{ "WaitForDom": { "selector": "main", "timeout": 12000 } } - ScrollX: Scrolls the screen horizontally by a specified number of pixels.
{ "ScrollX": 100 } - ScrollY: Scrolls the screen vertically by a specified number of pixels.
{ "ScrollY": 200 } - Fill: Fills an input element with a specified value.
{ "Fill": { "selector": "input#name", "value": "John Doe" } } - Type: Type a key into the browser with an optional modifier.
{ "Type": { "value": "John Doe", "modifier": 0 } } - InfiniteScroll: Scrolls the page until the end for certain duration.
{ "InfiniteScroll": 3000 } - Screenshot: Perform a screenshot on the page.
{ "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } } - ValidateChain: Set this before a step to validate the prior action to break out of the chain.
{ "ValidateChain": true }
- Evaluate: Runs custom JavaScript code.
Screenshot API - country_code string
Set a ISO country code for proxy connections.
The country code allows you to run requests in regions where access to the website is restricted to within that specific region. View the locations list for available countries.
Screenshot API - locale string
The locale to use for request, example
en-US.
[ { "content": "<resource>...", "error": null, "status": 200, "duration_elapsed_ms": 122, "costs": { "ai_cost": 0, "compute_cost": 0.00001, "file_cost": 0.00002, "bytes_transferred_cost": 0.00002, "total_cost": 0.00004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Transform HTML
View detailsTransform HTML into Markdown or plain text quickly. Each HTML transformation starts at 0.1 credits, while PDF transformations can cost up to 10 credits per page. You can submit up to 10 MB of data per request. The Transform API is also integrated into the /crawl endpoint via the return_format parameter.
Body
application/jsonTransform API - data object required
A list of html data to transform. The object list takes the keys
htmlandurl. The url key is optional and only used when the readability is enabled.
Transform API - return_format string | array
The format to return the data in. Possible values are
markdown,commonmark,raw,text,xml,bytes, andempty. Userawto return the default format of the page likeHTMLetc. Defaults toraw.Usually you want to use
markdownfor LLM processing ortext. If you need to store the files without losing any encoding, use thebytesorrawformat. PDF transformations may take up to 1 cent per page for high accuracy.Transform API - readability boolean
Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. Defaults to
false.This uses the Safari Reader Mode algorithm to extract only important information from the content.
Transform API - clean_full bool
Clean the HTML fully of unwanted attributes.
{ "content": [ "# Example Website This is some example markup to use to test the transform function. [Guides](https://spider.cloud/guides)" ], "cost": { "ai_cost": 0, "compute_cost": 0, "file_cost": 0, "bytes_transferred_cost": 0, "total_cost": 0, "transform_cost": 0.0001 }, "error": null, "status": 200 }
Proxy-Mode
Spider also offers a proxy front-end to the service. The Spider proxy will then handle requests just like any standard request, with the option to use high-performance and residential proxies up to 10GB per/s. Take a look at all of our proxy locations to see if we support the country.
**HTTP address**: proxy.spider.cloud:80**HTTPS address**: proxy.spider.cloud:443**Username**: YOUR-API-KEY**Password**: PARAMETERSResidential
- Speed: Up to 1GB/s
- Purpose: Real-User IPs, Global Reach, High Anonymity
- Cost: $1/GB - $4/GB
ISP
- Speed: Up to 10GB/s
- Purpose: Stable Datacenter IPs, Highest Performance
- Cost: $1/GB
Mobile
- Speed: Up to 100MB/s
- Purpose: Real Mobile Devices, Avoid Detection
- Cost: $2/GB
Use the country_code parameter to determine the proxy geolocation and the proxy parameter to change the proxy.
| Proxy Type | Price | Multiplier | Description |
|---|---|---|---|
| residential | $2.00/GB | ×2-x4 | Entry-level residential pool |
| mobile | $2.00/GB | ×2 | 4G/5G mobile proxies for stealth |
| isp | $1.00/GB | ×1 | ISP-grade residential routing |
Queries
Query the data that you collect during crawling and scraping. Add dynamic filters for extracting exactly what is needed.
Logs
Get the last 24 hours of logs.
Params
Logs API - url string
Filter a single url record.
Logs API - limit string
The limit of records to get.
Logs API - domain string
Filter a single domain record.
Logs API - page number
The current page to get.
{ "data": { "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "domain": "spider.cloud", "url": "https://spider.cloud", "links": 1, "credits_used": 3, "mode": 2, "crawl_duration": 340, "message": null, "request_user_agent": "Spider", "level": "UI", "status_code": 0, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" }, "error": null }
Credits
Get the remaining credits available.
{ "data": { "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "credits": 53334, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" } }