Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

ScrapeGraph API Banner

NPM Package

npm version

Source on GitHub

Issues, PRs, and the changelog

License

License
These docs cover scrapegraph-js ≥ 2.1.0. The v2 SDK is ESM-only and requires Node ≥ 22. Earlier 0.x/1.x releases expose a different, deprecated API.
Breaking in 2.1.0 (types only): all exported TypeScript types and Zod schemas dropped the Api prefix and now match scrapegraph-py 1:1 (ApiScrapeRequestScrapeRequest, ApiFetchConfigFetchConfig, apiScrapeRequestSchemascrapeRequestSchema, etc.). Monitor input types are also renamed: ApiMonitorCreateInputMonitorCreateRequest, ApiMonitorUpdateInputMonitorUpdateRequest, ApiMonitorActivityParamsMonitorActivityRequest. ApiResult<T> is the only type that keeps the prefix. Runtime JS code is unchanged — only TypeScript consumers need to rename imports.

Installation

# npm
npm i scrapegraph-js@latest     # pins a version >= 2.1.0

# pnpm
pnpm add scrapegraph-js@latest

# yarn
yarn add scrapegraph-js@latest

# bun
bun add scrapegraph-js@latest

What’s new in v2

  • New entry point: import { ScrapeGraphAI } from "scrapegraph-js" and instantiate once — no more passing the API key to every call.
  • Nested resources: sgai.crawl.*, sgai.monitor.*, sgai.history.*.
  • ApiResult<T> wrapper: no throws — every call returns { status, data, error, elapsedMs }.
  • Auto-picks the API key from SGAI_API_KEY (or pass { apiKey } to the factory).
  • Removed: markdownify, agenticScraper, sitemap, feedback — use sgai.scrape() with the right format entry instead.
v2 is a breaking change. See the Migration Guide if you’re upgrading from v1.

Quick Start

import { ScrapeGraphAI } from "scrapegraph-js";

// reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI({ apiKey: "..." })
const sgai = ScrapeGraphAI();

const res = await sgai.scrape({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
});

if (res.status === "success") {
  console.log(res.data?.results.markdown?.data?.[0]);
} else {
  console.error(res.error);
}
Store your API keys securely in environment variables. Use .env files and libraries like dotenv to load them into your app.

Return Type

Every method returns ApiResult<T>:
type ApiResult<T> = {
  status: "success" | "error";
  data: T | null;
  error?: string;
  elapsedMs: number;
};
Check res.status before accessing res.data.

Services

sgai.scrape()

Fetch a page in one or more formats (markdown, html, screenshot, json, links, images, summary, branding).
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "screenshot", fullPage: true, width: 1440, height: 900 },
    { type: "json", prompt: "Extract product info" },
  ],
  contentType: "text/html",      // optional, auto-detected
  fetchConfig: {                 // optional
    mode: "js",
    stealth: true,
    timeout: 30000,
    wait: 2000,
    scrolls: 3,
  },
});

Parameters

ParameterTypeRequiredDescription
urlstringYesURL to scrape
formatsFormatConfig[]NoDefaults to [{ type: "markdown" }]
contentTypestringNoOverride detected content type (e.g. "application/pdf")
fetchConfigFetchConfigNoFetch configuration
Formats:
  • markdown — Clean markdown (modes: normal, reader, prune)
  • html — Raw HTML (modes: normal, reader, prune)
  • links — All links on the page
  • images — All image URLs
  • summary — AI-generated summary
  • json — Structured extraction with prompt/schema
  • branding — Brand colors, typography, logos
  • screenshot — Page screenshot (fullPage, width, height, quality)
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "links" },
    { type: "images" },
    { type: "screenshot", fullPage: false, width: 1440, height: 900, quality: 90 },
  ],
});

if (res.status === "success") {
  const r = res.data?.results;
  console.log("Markdown:", r?.markdown?.data?.[0]?.slice(0, 200));
  console.log("Links:", r?.links?.metadata?.count);
  console.log("Screenshot URL:", r?.screenshot?.data.url);
}

sgai.extract()

Extract structured data from a URL, HTML, or markdown.
const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the main heading and description",
});

if (res.status === "success") {
  console.log(res.data?.json);
  console.log("Tokens:", res.data?.usage);
}

Parameters

ParameterTypeRequiredDescription
urlstringYes*URL of the page
htmlstringYes*Raw HTML (alternative to url)
markdownstringYes*Raw markdown (alternative to url)
promptstringYesWhat to extract
schemaobjectNoJSON schema for structured output
modestringNoHTML processing mode: "normal", "reader", "prune"
contentTypestringNoOverride the detected content type
fetchConfigFetchConfigNoFetch configuration
*One of url, html, or markdown is required.
const res = await sgai.extract({
  url: "https://example.com/article",
  prompt: "Extract the article information",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      author: { type: "string" },
      publishDate: { type: "string" },
      content: { type: "string" },
    },
    required: ["title"],
  },
});

if (res.status === "success") {
  console.log(res.data?.json);
}
Web search with optional AI extraction.
const res = await sgai.search({
  query: "best programming languages 2024",
  numResults: 5,
});

if (res.status === "success") {
  for (const r of res.data?.results ?? []) {
    console.log(`${r.title} - ${r.url}`);
  }
}

Parameters

ParameterTypeRequiredDescription
querystringYesSearch query (1–500 chars)
numResultsnumberNoNumber of results (1–20). Default: 3
promptstringNoPrompt for AI extraction from the fetched results
schemaobjectNoJSON schema (requires prompt)
formatstringNo"markdown" (default) or "html"
timeRangestringNo"past_hour", "past_24_hours", "past_week", "past_month", "past_year"
locationGeoCodestringNoTwo-letter country code (e.g. "us")
fetchConfigFetchConfigNoFetch configuration
const res = await sgai.search({
  query: "typescript best practices",
  numResults: 5,
  prompt: "Extract the main tips and recommendations",
  schema: {
    type: "object",
    properties: {
      tips: { type: "array", items: { type: "string" } },
    },
  },
});

if (res.status === "success") {
  console.log("Results:", res.data?.results.length);
  console.log("Extracted:", res.data?.json);
}

sgai.crawl.*

Crawl a site. Access the resource via sgai.crawl.
const start = await sgai.crawl.start({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
  maxPages: 50,
  maxDepth: 2,
  maxLinksPerPage: 10,
  includePatterns: ["/blog/*"],
  excludePatterns: ["/admin/*"],
});

const crawlId = start.data?.id;

// Status
await sgai.crawl.get(crawlId);

// Control
await sgai.crawl.stop(crawlId);
await sgai.crawl.resume(crawlId);
await sgai.crawl.delete(crawlId);

crawl.start() parameters

ParameterTypeRequiredDescription
urlstringYesStarting URL
formatsFormatConfig[]NoDefaults to [{ type: "markdown" }]
maxDepthnumberNoMaximum crawl depth. Default: 2
maxPagesnumberNoMaximum pages (1–1000). Default: 50
maxLinksPerPagenumberNoLinks followed per page. Default: 10
allowExternalbooleanNoAllow crossing domains. Default: false
includePatternsstring[]NoURL patterns to include
excludePatternsstring[]NoURL patterns to exclude
contentTypesstring[]NoAllowed content types
fetchConfigFetchConfigNoFetch configuration

sgai.monitor.*

Scheduled monitoring jobs.
// Create
const res = await sgai.monitor.create({
  url: "https://example.com",
  name: "Price Monitor",
  interval: "0 * * * *",       // cron expression
  formats: [{ type: "markdown" }],
  webhookUrl: "https://...",   // optional
});

const cronId = res.data?.cronId;

// Manage
await sgai.monitor.list();
await sgai.monitor.get(cronId);
await sgai.monitor.update(cronId, { interval: "0 */6 * * *" });
await sgai.monitor.pause(cronId);
await sgai.monitor.resume(cronId);
await sgai.monitor.delete(cronId);

monitor.activity() — poll tick history

Paginate through per-run ticks.
const activity = await sgai.monitor.activity(cronId, { limit: 20 });

if (activity.status === "success") {
  for (const tick of activity.data?.ticks ?? []) {
    const changed = tick.changed ? "CHANGED" : "no change";
    console.log(`[${tick.createdAt}] ${tick.status} - ${changed} (${tick.elapsedMs}ms)`);
  }

  if (activity.data?.nextCursor) {
    const next = await sgai.monitor.activity(cronId, {
      limit: 20,
      cursor: activity.data.nextCursor,
    });
  }
}
Params: limit (1–100, default 20) and cursor for pagination. Each tick exposes id, createdAt, status, changed, elapsedMs, and diffs.

sgai.history.*

const list = await sgai.history.list({
  service: "scrape",   // optional filter
  page: 1,
  limit: 20,
});

const entry = await sgai.history.get("request-id");

sgai.credits() / sgai.healthy()

const credits = await sgai.credits();
// { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }

const health = await sgai.healthy();
// { status: "ok", uptime: 12345 }

Configuration Objects

FetchConfig

Controls how pages are fetched. See the proxy configuration guide for details.
{
  mode: "js",          // "auto" (default) | "fast" | "js"
  stealth: true,        // Residential proxies / anti-bot headers
  timeout: 15000,       // ms (1000–60000)
  wait: 2000,           // ms after page load (0–30000)
  scrolls: 3,           // 0–100
  country: "us",        // ISO 3166-1 alpha-2
  headers: { "X-Custom": "header" },
  cookies: { key: "value" },
  mock: false,          // Enable mock mode for testing
}

Error Handling

const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the title",
});

if (res.status === "success") {
  console.log(res.data);
} else {
  console.error(`Request failed: ${res.error}`);
}

Environment Variables

VariableDescriptionDefault
SGAI_API_KEYYour ScrapeGraphAI API key
SGAI_API_URLOverride API base URLhttps://v2-api.scrapegraphai.com/api
SGAI_DEBUGEnable debug logging ("1")off
SGAI_TIMEOUTRequest timeout in seconds120

Support

GitHub

Report issues and contribute to the SDK

Email Support

Get help from our development team