JavaScript SDK

NPM Package

Source on GitHub

Issues, PRs, and the changelog

License

These docs cover scrapegraph-js ≥ 2.1.0. The v2 SDK is ESM-only and requires Node ≥ 22. Earlier 0.x/1.x releases expose a different, deprecated API.

Breaking in 2.1.0 (types only): all exported TypeScript types and Zod schemas dropped the Api prefix and now match scrapegraph-py 1:1 (ApiScrapeRequest → ScrapeRequest, ApiFetchConfig → FetchConfig, apiScrapeRequestSchema → scrapeRequestSchema, etc.). Monitor input types are also renamed: ApiMonitorCreateInput → MonitorCreateRequest, ApiMonitorUpdateInput → MonitorUpdateRequest, ApiMonitorActivityParams → MonitorActivityRequest. ApiResult<T> is the only type that keeps the prefix. Runtime JS code is unchanged — only TypeScript consumers need to rename imports.

Installation

# npm
npm i scrapegraph-js@latest     # pins a version >= 2.1.0

# pnpm
pnpm add scrapegraph-js@latest

# yarn
yarn add scrapegraph-js@latest

# bun
bun add scrapegraph-js@latest

What’s new in v2

New entry point: import { ScrapeGraphAI } from "scrapegraph-js" and instantiate once — no more passing the API key to every call.
Nested resources: sgai.crawl.*, sgai.monitor.*, sgai.history.*.
ApiResult<T> wrapper: no throws — every call returns { status, data, error, elapsedMs }.
Auto-picks the API key from SGAI_API_KEY (or pass { apiKey } to the factory).
Removed: markdownify, agenticScraper, sitemap, feedback — use sgai.scrape() with the right format entry instead.

v2 is a breaking change. See the Migration Guide if you’re upgrading from v1.

Quick Start

import { ScrapeGraphAI } from "scrapegraph-js";

// reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI({ apiKey: "..." })
const sgai = ScrapeGraphAI();

const res = await sgai.scrape({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
});

if (res.status === "success") {
  console.log(res.data?.results.markdown?.data?.[0]);
} else {
  console.error(res.error);
}

Store your API keys securely in environment variables. Use .env files and libraries like dotenv to load them into your app.

Return Type

Every method returns ApiResult<T>:

type ApiResult<T> = {
  status: "success" | "error";
  data: T | null;
  error?: string;
  elapsedMs: number;
};

Check res.status before accessing res.data.

Services

`sgai.scrape()`

Fetch a page in one or more formats (markdown, html, screenshot, json, links, images, summary, branding).

const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "screenshot", fullPage: true, width: 1440, height: 900 },
    { type: "json", prompt: "Extract product info" },
  ],
  contentType: "text/html",      // optional, auto-detected
  fetchConfig: {                 // optional
    mode: "js",
    stealth: true,
    timeout: 30000,
    wait: 2000,
    scrolls: 3,
  },
});

Parameters

Parameter	Type	Required	Description
`url`	`string`	Yes	URL to scrape
`formats`	`FormatConfig[]`	No	Defaults to `[{ type: "markdown" }]`
`contentType`	`string`	No	Override detected content type (e.g. `"application/pdf"`)
`fetchConfig`	`FetchConfig`	No	Fetch configuration

Formats:

markdown — Clean markdown (modes: normal, reader, prune)
html — Raw HTML (modes: normal, reader, prune)
links — All links on the page
images — All image URLs
summary — AI-generated summary
json — Structured extraction with prompt/schema
branding — Brand colors, typography, logos
screenshot — Page screenshot (fullPage, width, height, quality)

Multi-format example

const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "links" },
    { type: "images" },
    { type: "screenshot", fullPage: false, width: 1440, height: 900, quality: 90 },
  ],
});

if (res.status === "success") {
  const r = res.data?.results;
  console.log("Markdown:", r?.markdown?.data?.[0]?.slice(0, 200));
  console.log("Links:", r?.links?.metadata?.count);
  console.log("Screenshot URL:", r?.screenshot?.data.url);
}

`sgai.extract()`

Extract structured data from a URL, HTML, or markdown.

const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the main heading and description",
});

if (res.status === "success") {
  console.log(res.data?.json);
  console.log("Tokens:", res.data?.usage);
}

Parameters

Parameter	Type	Required	Description
`url`	`string`	Yes*	URL of the page
`html`	`string`	Yes*	Raw HTML (alternative to `url`)
`markdown`	`string`	Yes*	Raw markdown (alternative to `url`)
`prompt`	`string`	Yes	What to extract
`schema`	`object`	No	JSON schema for structured output
`mode`	`string`	No	HTML processing mode: `"normal"`, `"reader"`, `"prune"`
`contentType`	`string`	No	Override the detected content type
`fetchConfig`	`FetchConfig`	No	Fetch configuration

*One of url, html, or markdown is required.

With a JSON schema

const res = await sgai.extract({
  url: "https://example.com/article",
  prompt: "Extract the article information",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      author: { type: "string" },
      publishDate: { type: "string" },
      content: { type: "string" },
    },
    required: ["title"],
  },
});

if (res.status === "success") {
  console.log(res.data?.json);
}

`sgai.search()`

Web search with optional AI extraction.

const res = await sgai.search({
  query: "best programming languages 2024",
  numResults: 5,
});

if (res.status === "success") {
  for (const r of res.data?.results ?? []) {
    console.log(`${r.title} - ${r.url}`);
  }
}

Parameters

Parameter	Type	Required	Description
`query`	`string`	Yes	Search query (1–500 chars)
`numResults`	`number`	No	Number of results (1–20). Default: `3`
`prompt`	`string`	No	Prompt for AI extraction from the fetched results
`schema`	`object`	No	JSON schema (requires `prompt`)
`format`	`string`	No	`"markdown"` (default) or `"html"`
`timeRange`	`string`	No	`"past_hour"`, `"past_24_hours"`, `"past_week"`, `"past_month"`, `"past_year"`
`locationGeoCode`	`string`	No	Two-letter country code (e.g. `"us"`)
`fetchConfig`	`FetchConfig`	No	Fetch configuration

Search + extraction

const res = await sgai.search({
  query: "typescript best practices",
  numResults: 5,
  prompt: "Extract the main tips and recommendations",
  schema: {
    type: "object",
    properties: {
      tips: { type: "array", items: { type: "string" } },
    },
  },
});

if (res.status === "success") {
  console.log("Results:", res.data?.results.length);
  console.log("Extracted:", res.data?.json);
}

`sgai.crawl.*`

Crawl a site. Access the resource via sgai.crawl.

const start = await sgai.crawl.start({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
  maxPages: 50,
  maxDepth: 2,
  maxLinksPerPage: 10,
  includePatterns: ["/blog/*"],
  excludePatterns: ["/admin/*"],
});

const crawlId = start.data?.id;

// Status
await sgai.crawl.get(crawlId);

// Control
await sgai.crawl.stop(crawlId);
await sgai.crawl.resume(crawlId);
await sgai.crawl.delete(crawlId);

`crawl.start()` parameters

Parameter	Type	Required	Description
`url`	`string`	Yes	Starting URL
`formats`	`FormatConfig[]`	No	Defaults to `[{ type: "markdown" }]`
`maxDepth`	`number`	No	Maximum crawl depth. Default: `2`
`maxPages`	`number`	No	Maximum pages (1–1000). Default: `50`
`maxLinksPerPage`	`number`	No	Links followed per page. Default: `10`
`allowExternal`	`boolean`	No	Allow crossing domains. Default: `false`
`includePatterns`	`string[]`	No	URL patterns to include
`excludePatterns`	`string[]`	No	URL patterns to exclude
`contentTypes`	`string[]`	No	Allowed content types
`fetchConfig`	`FetchConfig`	No	Fetch configuration

`sgai.monitor.*`

Scheduled monitoring jobs.

// Create
const res = await sgai.monitor.create({
  url: "https://example.com",
  name: "Price Monitor",
  interval: "0 * * * *",       // cron expression
  formats: [{ type: "markdown" }],
  webhookUrl: "https://...",   // optional
});

const cronId = res.data?.cronId;

// Manage
await sgai.monitor.list();
await sgai.monitor.get(cronId);
await sgai.monitor.update(cronId, { interval: "0 */6 * * *" });
await sgai.monitor.pause(cronId);
await sgai.monitor.resume(cronId);
await sgai.monitor.delete(cronId);

`monitor.activity()` — poll tick history

Paginate through per-run ticks.

const activity = await sgai.monitor.activity(cronId, { limit: 20 });

if (activity.status === "success") {
  for (const tick of activity.data?.ticks ?? []) {
    const changed = tick.changed ? "CHANGED" : "no change";
    console.log(`[${tick.createdAt}] ${tick.status} - ${changed} (${tick.elapsedMs}ms)`);
  }

  if (activity.data?.nextCursor) {
    const next = await sgai.monitor.activity(cronId, {
      limit: 20,
      cursor: activity.data.nextCursor,
    });
  }
}

Params: limit (1–100, default 20) and cursor for pagination. Each tick exposes id, createdAt, status, changed, elapsedMs, and diffs.

`sgai.history.*`

const list = await sgai.history.list({
  service: "scrape",   // optional filter
  page: 1,
  limit: 20,
});

const entry = await sgai.history.get("request-id");

`sgai.credits()` / `sgai.healthy()`

const credits = await sgai.credits();
// { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }

const health = await sgai.healthy();
// { status: "ok", uptime: 12345 }

Configuration Objects

FetchConfig

Controls how pages are fetched. See the proxy configuration guide for details.

{
  mode: "js",          // "auto" (default) | "fast" | "js"
  stealth: true,        // Residential proxies / anti-bot headers
  timeout: 15000,       // ms (1000–60000)
  wait: 2000,           // ms after page load (0–30000)
  scrolls: 3,           // 0–100
  country: "us",        // ISO 3166-1 alpha-2
  headers: { "X-Custom": "header" },
  cookies: { key: "value" },
  mock: false,          // Enable mock mode for testing
}

Error Handling

const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the title",
});

if (res.status === "success") {
  console.log(res.data);
} else {
  console.error(`Request failed: ${res.error}`);
}

Environment Variables

Variable	Description	Default
`SGAI_API_KEY`	Your ScrapeGraphAI API key	—
`SGAI_API_URL`	Override API base URL	`https://v2-api.scrapegraphai.com/api`
`SGAI_DEBUG`	Enable debug logging (`"1"`)	off
`SGAI_TIMEOUT`	Request timeout in seconds	`120`

Get Started

Services

Official SDKs

LLM SDKs

Frameworks

No-code

Contribute

NPM Package

Source on GitHub

License

Installation

What’s new in v2

Quick Start

Return Type

Services

`sgai.scrape()`

Parameters

`sgai.extract()`

Parameters

`sgai.search()`

Parameters

`sgai.crawl.*`

`crawl.start()` parameters

`sgai.monitor.*`

`monitor.activity()` — poll tick history

`sgai.history.*`

`sgai.credits()` / `sgai.healthy()`

Configuration Objects

FetchConfig

Error Handling

Environment Variables

Support

GitHub

Email Support

Get Started

Services

Official SDKs

LLM SDKs

Frameworks

No-code

Contribute

Documentation Index

NPM Package

Source on GitHub

License

​Installation

​What’s new in v2

​Quick Start

​Return Type

​Services

​sgai.scrape()

​Parameters

​sgai.extract()

​Parameters

​sgai.search()

​Parameters

​sgai.crawl.*

​crawl.start() parameters

​sgai.monitor.*

​monitor.activity() — poll tick history

​sgai.history.*

​sgai.credits() / sgai.healthy()

​Configuration Objects

​FetchConfig

​Error Handling

​Environment Variables

​Support

GitHub

Email Support

Installation

What’s new in v2

Quick Start

Return Type

Services

`sgai.scrape()`

Parameters

`sgai.extract()`

Parameters

`sgai.search()`

Parameters

`sgai.crawl.*`

`crawl.start()` parameters

`sgai.monitor.*`

`monitor.activity()` — poll tick history

`sgai.history.*`

`sgai.credits()` / `sgai.healthy()`

Configuration Objects

FetchConfig

Error Handling

Environment Variables

Support