Python SDK

PyPI Package

Python Support

Installation

Install the package using pip:

pip install scrapegraph-py

Features

AI-Powered Extraction: Advanced web scraping using artificial intelligence
Flexible Clients: Both synchronous and asynchronous support
Type Safety: Structured output with Pydantic schemas
Production Ready: Detailed logging and automatic retries
Developer Friendly: Comprehensive error handling

Quick Start

Initialize the client with your API key:

from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

You can also set the SGAI_API_KEY environment variable and initialize the client without parameters: client = Client()

Services

SmartScraper

Extract specific information from any webpage using AI:

response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main heading and description"
)

Parameters

Parameter	Type	Required	Description
website_url	string	Yes	The URL of the webpage that needs to be scraped.
user_prompt	string	Yes	A textual description of what you want to achieve.
output_schema	object	No	The Pydantic object that describes the structure and format of the response.
render_heavy_js	boolean	No	Enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: False

Basic Schema Example

Define a simple schema for basic data extraction:

from pydantic import BaseModel, Field

class ArticleData(BaseModel):
    title: str = Field(description="The article title")
    author: str = Field(description="The author's name")
    publish_date: str = Field(description="Article publication date")
    content: str = Field(description="Main article content")
    category: str = Field(description="Article category")

response = client.smartscraper(
    website_url="https://example.com/blog/article",
    user_prompt="Extract the article information",
    output_schema=ArticleData
)

print(f"Title: {response.title}")
print(f"Author: {response.author}")
print(f"Published: {response.publish_date}")

Advanced Schema Example

Define a complex schema for nested data structures:

from typing import List
from pydantic import BaseModel, Field

class Employee(BaseModel):
    name: str = Field(description="Employee's full name")
    position: str = Field(description="Job title")
    department: str = Field(description="Department name")
    email: str = Field(description="Email address")

class Office(BaseModel):
    location: str = Field(description="Office location/city")
    address: str = Field(description="Full address")
    phone: str = Field(description="Contact number")

class CompanyData(BaseModel):
    name: str = Field(description="Company name")
    description: str = Field(description="Company description")
    industry: str = Field(description="Industry sector")
    founded_year: int = Field(description="Year company was founded")
    employees: List[Employee] = Field(description="List of key employees")
    offices: List[Office] = Field(description="Company office locations")
    website: str = Field(description="Company website URL")

# Extract comprehensive company information
response = client.smartscraper(
    website_url="https://example.com/about",
    user_prompt="Extract detailed company information including employees and offices",
    output_schema=CompanyData
)

# Access nested data
print(f"Company: {response.name}")
print("\nKey Employees:")
for employee in response.employees:
    print(f"- {employee.name} ({employee.position})")

print("\nOffice Locations:")
for office in response.offices:
    print(f"- {office.location}: {office.address}")

Enhanced JavaScript Rendering Example

For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:

from scrapegraph_py import Client
from pydantic import BaseModel, Field

class ProductInfo(BaseModel):
    name: str = Field(description="Product name")
    price: str = Field(description="Product price")
    description: str = Field(description="Product description")
    availability: str = Field(description="Product availability status")

client = Client(api_key="your-api-key")

# Enable enhanced JavaScript rendering for a React-based e-commerce site
response = client.smartscraper(
    website_url="https://example-react-store.com/products/123",
    user_prompt="Extract product details including name, price, description, and availability",
    output_schema=ProductInfo,
    render_heavy_js=True  # Enable for React/Vue/Angular sites
)

print(f"Product: {response['result']['name']}")
print(f"Price: {response['result']['price']}")
print(f"Available: {response['result']['availability']}")

When to use render_heavy_js:

React, Vue, or Angular applications
Single Page Applications (SPAs)
Sites with heavy client-side rendering
Dynamic content loaded via JavaScript
Interactive elements that depend on JavaScript execution

SearchScraper

Search and extract information from multiple web sources using AI:

from scrapegraph_py.models import TimeRange

response = client.searchscraper(
    user_prompt="What are the key features and pricing of ChatGPT Plus?",
    time_range=TimeRange.PAST_WEEK  # Optional: Filter results by time range
)

Parameters

Parameter	Type	Required	Description
user_prompt	string	Yes	A textual description of what you want to achieve.
num_results	number	No	Number of websites to search (3-20). Default: 3.
extraction_mode	boolean	No	True = AI extraction mode (10 credits/page), False = markdown mode (2 credits/page). Default: True
output_schema	object	No	The Pydantic object that describes the structure and format of the response (AI extraction mode only)
location_geo_code	string	No	Optional geo code for location-based search (e.g., “us”)
time_range	TimeRange	No	Optional time range filter for search results. Options: TimeRange.PAST_HOUR, TimeRange.PAST_24_HOURS, TimeRange.PAST_WEEK, TimeRange.PAST_MONTH, TimeRange.PAST_YEAR

Basic Schema Example

Define a simple schema for structured search results:

from pydantic import BaseModel, Field
from typing import List

class ProductInfo(BaseModel):
    name: str = Field(description="Product name")
    description: str = Field(description="Product description")
    price: str = Field(description="Product price")
    features: List[str] = Field(description="List of key features")
    availability: str = Field(description="Availability information")

from scrapegraph_py.models import TimeRange

response = client.searchscraper(
    user_prompt="Find information about iPhone 15 Pro",
    output_schema=ProductInfo,
    location_geo_code="us",  # Optional: Geo code for location-based search
    time_range=TimeRange.PAST_MONTH  # Optional: Filter results by time range
)

print(f"Product: {response.name}")
print(f"Price: {response.price}")
print("\nFeatures:")
for feature in response.features:
    print(f"- {feature}")

Advanced Schema Example

Define a complex schema for comprehensive market research:

from typing import List
from pydantic import BaseModel, Field

class MarketPlayer(BaseModel):
    name: str = Field(description="Company name")
    market_share: str = Field(description="Market share percentage")
    key_products: List[str] = Field(description="Main products in market")
    strengths: List[str] = Field(description="Company's market strengths")

class MarketTrend(BaseModel):
    name: str = Field(description="Trend name")
    description: str = Field(description="Trend description")
    impact: str = Field(description="Expected market impact")
    timeframe: str = Field(description="Trend timeframe")

class MarketAnalysis(BaseModel):
    market_size: str = Field(description="Total market size")
    growth_rate: str = Field(description="Annual growth rate")
    key_players: List[MarketPlayer] = Field(description="Major market players")
    trends: List[MarketTrend] = Field(description="Market trends")
    challenges: List[str] = Field(description="Industry challenges")
    opportunities: List[str] = Field(description="Market opportunities")

from scrapegraph_py.models import TimeRange

# Perform comprehensive market research
response = client.searchscraper(
    user_prompt="Analyze the current AI chip market landscape",
    output_schema=MarketAnalysis,
    location_geo_code="us",  # Optional: Geo code for location-based search
    time_range=TimeRange.PAST_MONTH  # Optional: Filter results by time range
)

# Access structured market data
print(f"Market Size: {response.market_size}")
print(f"Growth Rate: {response.growth_rate}")

print("\nKey Players:")
for player in response.key_players:
    print(f"\n{player.name}")
    print(f"Market Share: {player.market_share}")
    print("Key Products:")
    for product in player.key_products:
        print(f"- {product}")

print("\nMarket Trends:")
for trend in response.trends:
    print(f"\n{trend.name}")
    print(f"Impact: {trend.impact}")
    print(f"Timeframe: {trend.timeframe}")

Markdown Mode Example

Use markdown mode for cost-effective content gathering:

from scrapegraph_py import Client

client = Client(api_key="your-api-key")

from scrapegraph_py.models import TimeRange

# Enable markdown mode for cost-effective content gathering
response = client.searchscraper(
    user_prompt="Latest developments in artificial intelligence",
    num_results=3,
    extraction_mode=False,  # Enable markdown mode (2 credits per page vs 10 credits)
    location_geo_code="us",  # Optional: Geo code for location-based search
    time_range=TimeRange.PAST_WEEK  # Optional: Filter results by time range
)

# Access the raw markdown content
markdown_content = response['markdown_content']
reference_urls = response['reference_urls']

print(f"Markdown content length: {len(markdown_content)} characters")
print(f"Reference URLs: {len(reference_urls)}")

# Process the markdown content
print("Content preview:", markdown_content[:500] + "...")

# Save to file for analysis
with open('ai_research_content.md', 'w', encoding='utf-8') as f:
    f.write(markdown_content)

print("Content saved to ai_research_content.md")

Markdown Mode Benefits:

Cost-effective: Only 2 credits per page (vs 10 credits for AI extraction)
Full content: Get complete page content in markdown format
Faster: No AI processing overhead
Perfect for: Content analysis, bulk data collection, building datasets

Time Range Filter Example

Filter search results by date range to get only recent information:

from scrapegraph_py import Client
from scrapegraph_py.models import TimeRange

client = Client(api_key="your-api-key")

# Search for recent news from the past week
response = client.searchscraper(
    user_prompt="Latest news about AI developments",
    num_results=5,
    time_range=TimeRange.PAST_WEEK  # Options: PAST_HOUR, PAST_24_HOURS, PAST_WEEK, PAST_MONTH, PAST_YEAR
)

print("Recent AI news:", response['result'])
print("Reference URLs:", response['reference_urls'])

Time Range Options:

TimeRange.PAST_HOUR - Results from the past hour
TimeRange.PAST_24_HOURS - Results from the past 24 hours
TimeRange.PAST_WEEK - Results from the past week
TimeRange.PAST_MONTH - Results from the past month
TimeRange.PAST_YEAR - Results from the past year

Use Cases:

Finding recent news and updates
Tracking time-sensitive information
Getting latest product releases
Monitoring recent market changes

Markdownify

Convert any webpage into clean, formatted markdown:

response = client.markdownify(
    website_url="https://example.com"
)

Async Support

All endpoints support asynchronous operations:

import asyncio
from scrapegraph_py import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.smartscraper(
            website_url="https://example.com",
            user_prompt="Extract the main content"
        )
        print(response)

asyncio.run(main())

Feedback

Help us improve by submitting feedback programmatically:

client.submit_feedback(
    request_id="your-request-id",
    rating=5,
    feedback_text="Great results!"
)

Support

GitHub

Report issues and contribute to the SDK

Email Support

Get help from our development team

License

This project is licensed under the MIT License. See the LICENSE file for details.

Get Started

Services

Official SDKs

Integrations

Contribute

PyPI Package

Python Support

Installation

Features

Quick Start

Services

SmartScraper

Parameters

SearchScraper

Parameters

Markdownify

Async Support

Feedback

Support

GitHub

Email Support

Get Started

Services

Official SDKs

Integrations

Contribute

PyPI Package

Python Support

​Installation

​Features

​Quick Start

​Services

​SmartScraper

​Parameters

​SearchScraper

​Parameters

​Markdownify

​Async Support

​Feedback

​Support

GitHub

Email Support

Installation

Features

Quick Start

Services

SmartScraper

Parameters

SearchScraper

Parameters

Markdownify

Async Support

Feedback

Support