Inspiration# TinyScout — Autonomous Web Research & Insight Engine

Inspiration

Search engines give links.
LLMs give answers.

But real research needs both—and it needs process: planning what to look for, collecting evidence, cross-checking sources, rejecting irrelevant pages, and producing a structured report with citations.

We built TinyScout to behave like a real research assistant. You type a research goal, and it autonomously plans, browses the web, evaluates evidence, and synthesizes a report—optionally with visuals.


What It Does

TinyScout is an autonomous research agent that turns a single research goal into a multi-step investigation.

Workflow

Planner
Breaks the user’s research goal into actionable sub-tasks
(e.g., “key players”, “trends”, “unmet needs”, “case studies”).

Retriever + Browser
Gathers evidence from the web using the TinyFish Web Agent API, with an HTTP retriever fallback when needed.

Analyzer
Reads each extracted document, scores relevance, extracts key facts, and flags missing evidence.

Synthesizer
Compiles a clean final report:

  • Executive summary
  • Structured findings
  • Sources

Visuals (optional)
If enabled, the system generates relevant visuals and infographics using Freepik, based on prompts derived from the report.


User Interface

A simple dashboard where users can:

  • Choose model and retriever backend
  • Run a research job
  • View run ID and status
  • Read the final report with full source trace

Why It’s Unique

Most “research agents” fail on two real-world problems:

  1. Bad retrieval

    • Irrelevant pages
    • Repeated cached sources
    • Weak or low-quality content
  2. No evidence discipline

    • LLMs hallucinate instead of stopping when sources are weak

TinyScout explicitly addresses this with:

  • Topic-aware retrieval & fallback logic
    Prevents pulling irrelevant seed sources when the query topic shifts.

  • Evidence gating
    Returns “Insufficient Evidence” instead of hallucinating answers.

  • Source credibility rules
    Biases toward high-quality sources, with controlled fallback to weaker ones.

  • Full traceability
    Shows:

    • Fetched URLs
    • Retrieval method (TinyFish / HTTP / cache)
    • Relevance scores
    • Selection rationale

This makes TinyScout reliable beyond demo-friendly queries.


How We Built It (Architecture)

Frontend

  • Streamlit dashboard
    • Research input
    • Run controls
    • Settings
    • Output display

Core Agent Pipeline

  • Planner → task plan generation
  • RetrieverFactory → backend selection
    • TinyFish retriever (primary)
    • HTTP / DuckDuckGo retriever (fallback)
  • Web Agent → page fetching & text extraction
  • Analyzer → relevance scoring & fact extraction
  • Synthesizer → final report generation

Visual Generation

  • Freepik API
    Generates visuals from agent-derived prompts and attaches them to the report when useful.

Models

  • Anthropic Claude models
    Used for planner, analyzer, and synthesizer
    (configurable via environment variables / settings)

Observability

  • Run ID and status
  • Retrieval trace (URL, method, relevance score)
  • Debug logs
    • Useful for diagnosing 403/404 blocks and fetch failures

Challenges We Ran Into

Planner Output Parsing

Some models returned plans in inconsistent formats (JSON vs plain text).
We added robust parsing:

  • Try JSON format
  • Fallback to list format
  • Fallback to single-task plan if parsing fails

Web Fetching Failures (403 / 404 / SSL)

Many sites block bots or change URLs frequently.
We implemented:

  • Retry strategies
  • Alternative fetch methods (httpx → requests fallback)
  • “Too thin” content filtering
  • Graceful degradation:
    • Proceed with partial data
    • Or stop if evidence is insufficient

Relevance Drift

Cached sources from earlier runs sometimes appeared in unrelated queries.
We tightened:

  • Topic classification
  • Cache invalidation rules
  • “Seed fallback blocked if topic unknown” logic

What We Learned

  • Research agents need retrieval discipline more than better prompting.
  • A good research system must be comfortable saying: “Not enough evidence.”
  • Real-world browsing requires resilient error handling and multi-backend retrieval.

What’s Next

  • Stronger source quality enforcement

    • Tier A/B source preference
    • Dynamic domain expansion when topics change
  • Built-in citation formatting and quote extraction

  • Improved run history and UI visibility for every extracted document

  • Parallel research tasks (multi-agent mode) for speed and coverage

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for TinyScout

Built With

  • claude
  • fastapi
  • freepik
  • langgraph
  • python
  • sqlite
  • streamlit
  • tinyfish
  • yutori
Share this project:

Updates