API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication.

Set your API secret key in the authorization header to commence with the format Bearer $TOKEN. You can use the content-type header with application/json, application/xml, text/csv, and application/jsonl for shaping the response.

The Spider API supports bulk updates. You can work on multiple objects per request for the core API endpoints.

You can add v1 before any path to lock in that version. Executing a request on the page by pressing the Run button will consume live credits and treat the response as a genuine result.

Download OpenAPI Specification:Download

LLM-Ready API Docs:llms.txt

Just getting started?

Check out our development quickstart guide.

Not a developer?

Use Spiders no-code options or applications to get started with Spider and to do more with your Spider account no code required.

Base Url
https://api.spider.cloud

Crawl

Start crawling website(s) to collect resources. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/crawl

Body

application/json
  • Crawl API - url string required

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • Crawl API - limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

  • Crawl API - disable_hints boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    Tip

    If you’re tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "duration_elapsed_ms": 122,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Scrape

Start scraping a single page on website(s) to collect resources. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/scrape

Body

application/json
  • Scrape API - url string required

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • Scrape API - disable_hints boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    Tip

    If you’re tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.

  • Scrape API - lite_mode boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/scrape', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "duration_elapsed_ms": 122,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Unblocker

Start unblocking challenging website(s) to collect data. You can pass an array of objects for the request body. Cost 10-40 credits additional per success.

POSThttps://api.spider.cloud/unblocker

Body

application/json
  • Unblocker API - url string required

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • Unblocker API - disable_hints boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    Tip

    If you’re tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.

  • Unblocker API - lite_mode boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/unblocker', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "url": "https://spider.cloud",
    "status": 200,
    "cookies": {
        "a": "something",
        "b": "something2"
    },
    "headers": {
        "x-id": 123,
        "x-cookie": 123
    },
    "status": 200,
    "costs": {
        "ai_cost": 0.001,
        "ai_cost_formatted": "0.0010",
        "bytes_transferred_cost": 3.1649999999999997e-9,
        "bytes_transferred_cost_formatted": "0.0000000031649999999999997240",
        "compute_cost": 0.0,
        "compute_cost_formatted": "0",
        "file_cost": 0.000029291250000000002,
        "file_cost_formatted": "0.0000292912499999999997868372",
        "total_cost": 0.0010292944150000001,
        "total_cost_formatted": "0.0010292944149999999997865612",
        "transform_cost": 0.0,
        "transform_cost_formatted": "0"
    },
    "content": "<html>...</html>",
    "error": null
  },
  // more content...
]

Perform a Google search to gather a list of websites for crawling and resource collection, including fallback options if the query yields no results. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/search

Body

application/json
  • Search API - limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"search":"sports news today","search_limit":3,"limit":5,"return_format":"markdown"}

response = requests.post('https://api.spider.cloud/search', 
  headers=headers, json=json_data)

print(response.json())
Response
{
  "content": [
      {
          "description": "Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.",
          "title": "ESPN - Serving Sports Fans. Anytime. Anywhere.",
          "url": "https://www.espn.com/"
      },
      {
          "description": "Sports Illustrated, SI.com provides sports news, expert analysis, highlights, stats and scores for the NFL, NBA, MLB, NHL, college football, soccer,&nbsp;...",
          "title": "Sports Illustrated",
          "url": "https://www.si.com/"
      },
      {
          "description": "CBS Sports features live scoring, news, stats, and player info for NFL football, MLB baseball, NBA basketball, NHL hockey, college basketball and football.",
          "title": "CBS Sports - News, Live Scores, Schedules, Fantasy ...",
          "url": "https://www.cbssports.com/"
      },
      {
          "description": "Sport is a form of physical activity or game. Often competitive and organized, sports use, maintain, or improve physical ability and skills.",
          "title": "Sport",
          "url": "https://en.wikipedia.org/wiki/Sport"
      },
      {
          "description": "Watch FOX Sports and view live scores, odds, team news, player news, streams, videos, stats, standings &amp; schedules covering NFL, MLB, NASCAR, WWE, NBA, NHL,&nbsp;...",
          "title": "FOX Sports News, Scores, Schedules, Odds, Shows, Streams ...",
          "url": "https://www.foxsports.com/"
      },
      {
          "description": "Founded in 1974 by tennis legend, Billie Jean King, the Women's Sports Foundation is dedicated to creating leaders by providing girls access to sports.",
          "title": "Women's Sports Foundation: Home",
          "url": "https://www.womenssportsfoundation.org/"
      },
      {
          "description": "List of sports · Running. Marathon · Sprint · Mascot race · Airsoft · Laser tag · Paintball · Bobsleigh · Jack jumping · Luge · Shovel racing · Card stacking&nbsp;...",
          "title": "List of sports",
          "url": "https://en.wikipedia.org/wiki/List_of_sports"
      },
      {
          "description": "Stay up-to-date with the latest sports news and scores from NBC Sports.",
          "title": "NBC Sports - news, scores, stats, rumors, videos, and more",
          "url": "https://www.nbcsports.com/"
      },
      {
          "description": "r/sports: Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.",
          "title": "r/sports",
          "url": "https://www.reddit.com/r/sports/"
      },
      {
          "description": "The A-Z of sports covered by the BBC Sport team. Find all the latest live sports coverage, breaking news, results, scores, fixtures, tables,&nbsp;...",
          "title": "AZ Sport",
          "url": "https://www.bbc.com/sport/all-sports"
      }
  ]
}

Start crawling a website(s) to collect links found. You can pass an array of objects for the request body. This endpoint can save on latency if you only need to index the content URLs.

POSThttps://api.spider.cloud/links

Body

application/json
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/links', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "url": "https://spider.cloud",
    "status": 200,
    "duration_elasped_ms": 112
    "error": null
  },
  // more content...
]

Screenshot

Take screenshots of a website to base64 or binary encoding. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/screenshot

Body

application/json
  • Screenshot API - url string required

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    Internet icon

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.

  • Screenshot API - limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages. Defaults to 0.

    Limit icon

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.

  • Screenshot API - disable_hints boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    Tip

    If you’re tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/screenshot', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "duration_elapsed_ms": 122,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Transform HTML

Transform HTML into Markdown or plain text quickly. Each HTML transformation starts at 0.1 credits, while PDF transformations can cost up to 10 credits per page. You can submit up to 10 MB of data per request. The Transform API is also integrated into the /crawl endpoint via the return_format parameter.

POSThttps://api.spider.cloud/transform

Body

application/json
  • Transform API - data object required

    A list of html data to transform. The object list takes the keys html and url. The url key is optional and only used when the readability is enabled.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","data":[{"html":"<html><body>\n<h1>Example Website</h1>\n<p>This is some example markup to use to test the transform function.</p>\n<p><a href=\"https://spider.cloud/guides\">Guides</a></p>\n</body></html>","url":"https://example.com"}]}

response = requests.post('https://api.spider.cloud/transform', 
  headers=headers, json=json_data)

print(response.json())
Response
{
    "content": [
      "# Example Website
This is some example markup to use to test the transform function.
[Guides](https://spider.cloud/guides)"
    ],
    "cost": {
        "ai_cost": 0,
        "compute_cost": 0,
        "file_cost": 0,
        "bytes_transferred_cost": 0,
        "total_cost": 0,
        "transform_cost": 0.0001
    },
    "error": null,
    "status": 200
  }

Proxy-Mode

Spider also offers a proxy front-end to the service. The Spider proxy will then handle requests just like any standard request, with the option to use high-performance and residential proxies up to 10GB per/s. Take a look at all of our proxy locations to see if we support the country.

**HTTP address**: proxy.spider.cloud:80**HTTPS address**: proxy.spider.cloud:443**Username**: YOUR-API-KEY**Password**: PARAMETERS

Residential

  • Speed: Up to 1GB/s
  • Purpose: Real-User IPs, Global Reach, High Anonymity
  • Cost: $1/GB - $4/GB

ISP

  • Speed: Up to 10GB/s
  • Purpose: Stable Datacenter IPs, Highest Performance
  • Cost: $1/GB

Mobile

  • Speed: Up to 100MB/s
  • Purpose: Real Mobile Devices, Avoid Detection
  • Cost: $2/GB

Use the country_code parameter to determine the proxy geolocation and the proxy parameter to change the proxy.

Proxy TypePriceMultiplierDescription
residential$2.00/GB×2-x4Entry-level residential pool
mobile$2.00/GB×24G/5G mobile proxies for stealth
isp$1.00/GB×1ISP-grade residential routing
Example proxy request
import requests, os


# Proxy configuration
proxies = {
    'http': f"http://{os.getenv('SPIDER_API_KEY')}:proxy=residential@proxy.spider.cloud:8888",
    'https': f"https://{os.getenv('SPIDER_API_KEY')}:proxy=residential@proxy.spider.cloud:8889"
}

# Function to make a request through the proxy
def get_via_proxy(url):
    try:
        response = requests.get(url, proxies=proxies)
        response.raise_for_status()
        print('Response HTTP Status Code: ', response.status_code)
        print('Response HTTP Response Body: ', response.content)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None

# Example usage
if __name__ == "__main__":
     get_via_proxy("https://www.example.com")
     get_via_proxy("https://www.example.com/community")

Queries

Query the data that you collect during crawling and scraping. Add dynamic filters for extracting exactly what is needed.

Logs

Get the last 24 hours of logs.

GEThttps://api.spider.cloud/data/crawl_logs

Params

  • Logs API - url string

    Filter a single url record.

  • Logs API - limit string

    The limit of records to get.

  • Logs API - domain string

    Filter a single domain record.

  • Logs API - page number

    The current page to get.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/crawl_logs?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": {
    "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
    "domain": "spider.cloud",
    "url": "https://spider.cloud",
    "links": 1,
    "credits_used": 3,
    "mode": 2,
    "crawl_duration": 340,
    "message": null,
    "request_user_agent": "Spider",
    "level": "UI",
    "status_code": 0,
    "created_at": "2024-04-21T01:21:32.886863+00:00",
    "updated_at": "2024-04-21T01:21:32.886863+00:00"
  },
  "error": null
}

Credits

Get the remaining credits available.

GEThttps://api.spider.cloud/data/credits
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/credits?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": {
    "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
    "credits": 53334,
    "created_at": "2024-04-21T01:21:32.886863+00:00",
    "updated_at": "2024-04-21T01:21:32.886863+00:00"
  }
}