Inspiration

Every developer has felt the "Automation Tax": the grueling hours spent inspecting DOM elements and writing brittle CSS selectors for a UI that will inevitably change next week. We were inspired by the gap between human intent ("I want to deploy this instance") and machine execution ("Click div.btn-primary-021"). With the release of Amazon Nova Act, we saw an opportunity to move past static scripting and create "self-healing" automation that understands the purpose of a UI, not just its coordinates.

What it does

ScriptZero is an AI-driven automation engine that converts manual web actions into robust, production-ready code. It eliminates the need for developers to manually inspect HTML elements or write fragile "find-by-ID" scripts that break whenever a website updates. Core Functions Intelligent Observation: While you perform a task in the browser, ScriptZero uses Amazon Nova Act to "watch" your actions. It doesn't just record clicks; it understands your intent (e.g., "The user is trying to export a monthly billing report"). Semantic Element Mapping: Instead of relying on brittle CSS selectors ($div > span > button$), it uses Nova Multimodal Embeddings to identify UI components by their function and visual context. Code Generation: It automatically generates high-quality Playwright (Node.js) or Python scripts. These scripts are "self-healing," meaning they use AI reasoning to find the correct buttons even if the website's layout changes. Complex Workflow Handling: It can navigate multi-step processes, such as logging into an AWS Console, navigating through nested menus, and handling dynamic pop-ups or MFA prompts.

How we built it

ScriptZero is built on a high-performance AWS stack designed for low-latency reasoning *The Brain: We utilized Amazon Nova 2 Lite via Amazon Bedrock to interpret user actions and translate them into logical steps.

*The Actor: We integrated Nova Act to handle the heavy lifting of browser navigation, allowing the agent to interact with complex dashboards like the AWS Console and Jira.

*The Framework: We used Lang Chain and Strands Agents to orchestrate the "Observe-Reason-Generate" loop.

*The Output: A custom post-processing layer transforms Nova’s reasoning into production-ready Playwright and Python code.

Challenges we ran into

*Dynamic Latency: Real-time UI automation is hardware-intensive. We had to optimize our prompt tokens to ensure Nova Act responded within sub-second intervals to avoid session timeouts.

*State Management: Handling "pop-ups" and asynchronous loading was tricky. We implemented a recursive "Retry-and-Reason" logic where the agent re-scans the DOM if a predicted element isn't immediately visible.

*Security: Masking sensitive data (like passwords) during the "Observation" phase required building a custom PII-redaction middleware before the data reached the model.

Accomplishments that we're proud of

For this section, you want to sound like a developer who pushed the boundaries of what Amazon Nova can do. You should highlight technical breakthroughs that move beyond basic AI wrappers.

we're proud of

  1. Achieving 90%+ Execution Accuracy While standard LLM-based agents often struggle with "hallucinating" selectors or missing dynamic pop-ups, we reached a 90% task-completion rate. By using Nova Act, which is natively trained for "computer use," our agent successfully navigated complex multi-step workflows—like date-pickers and nested dropdowns—that typically break traditional automation scripts.

  2. Zero-Brittle "Self-Healing" Code We are incredibly proud of our Semantic Rerouting engine. Instead of generating scripts tied to static IDs (e.g., id="button-452"), ScriptZero generates code that uses Nova Multimodal Embeddings to locate elements based on visual and functional context. We successfully tested this by running a script on a website, manually changing the CSS classes, and watching the script "heal" itself and continue the task without error.

  3. Real-Time "Intent Mapping" We successfully implemented a system that translates human messy actions into clean logical structures. If a user clicks around a page looking for a setting, ScriptZero’s Nova 2 Lite backbone filters out the "noise" and identifies the core intent. It doesn't just record clicks; it synthesizes a optimized path, often producing a script that is 30% faster than the human's original recording.

  4. Mastering the 1M Token Context By leveraging the massive 1-million token context window of Nova 2 Lite, we were able to feed the agent entire documentation sets for complex platforms (like the AWS Service Guide) alongside the live DOM. This allows the agent to not just "guess" what a button does, but to understand it based on official technical documentation.

  5. Secure "Human-in-the-Loop" Escalation UI automation can be unpredictable. We built a graceful Escalation Protocol using Amazon Bedrock AgentCore. If the agent encounters an unexpected MFA prompt or a total UI overhaul, it doesn't just crash; it pauses the session, takes a screenshot, and asks the developer for guidance, effectively learning from the human intervention for future runs.

    What we learned

    UI Automation is a Reasoning Problem, Not a Selector Problem The biggest takeaway was that traditional automation fails because it treats web pages as static code trees. We learned that by using Amazon Nova Act, we could treat the UI as a visual and functional environment. We shifted our focus from "find the ID" to "describe the intent," which made our scripts significantly more resilient to the "noise" of modern web development (like dynamic class names and A/B test variations).

The Power of "Thinking Budgets" Working with Nova 2 Lite taught us the importance of balancing speed and intelligence. We discovered that simple tasks (like clicking a known link) don't need high reasoning, but complex workflows (like resolving a data entry error) benefit immensely from the "Extended Thinking" mode. Mastering the toggle between Low and High Thinking Budgets allowed us to optimize ScriptZero for both cost and performance.

Human Intent is Non-Linear By analyzing recording sessions, we learned that humans often take inefficient paths (clicking back and forth, scrolling aimlessly). We realized that our agent shouldn't just mimic the user; it should distill the user's actions. We implemented a "Logical Cleanup" phase using Nova 2 Lite to ignore accidental clicks and output the most direct, optimized path to the goal.

Safety is a First-Class Citizen We learned that giving an AI control over a browser comes with massive responsibility. We had to implement strict Guardrails using Amazon Bedrock AgentCore to ensure the agent didn't navigate to unauthorized domains or interact with sensitive data fields (like passwords) unless explicitly governed by IAM roles. This taught us that in agentic AI, "what the agent shouldn't do" is as important as "what it can do."

Vertical Integration Matters Building on the AWS Ecosystem (Bedrock + Nova Act + S3) proved that vertical integration is a massive advantage. We learned that when the model, the browser actuator, and the security layer are all part of the same stack, the end-to-end latency drops and the task success rate climbs compared to "stitched-together" solutions using different providers.

What's next for ScriptZero

The current version of ScriptZero successfully bridges the gap between manual actions and automated code. Our roadmap for 2026 focuses on moving from "Reactive Automation" to "Proactive Orchestration."

Collaborative Multi-Agent "Crews" We plan to integrate CrewAI and Amazon Bedrock AgentCore to allow multiple specialized agents to work together. Imagine a "QA Crew" where: The Observer records the developer's workflow. The Security Agent scans the UI for leaked PII or exposed credentials during the recording. The Optimizer refines the generated Playwright code for maximum execution speed.

Zero-Touch Maintenance via Self-Healing Currently, ScriptZero helps a developer fix broken scripts. Our next goal is Autonomous Maintenance. By running a background "Health Check" using Nova 2 Lite, ScriptZero will automatically detect when a production site's UI has changed, re-run the "Observation" logic silently, and submit a Pull Request with the updated automation code before the developer even notices a failure.

Native Integration with Nova Forge We intend to use Nova Forge to fine-tune a specialized version of Nova 2 Lite specifically on "Developer Intent" datasets. This "ScriptZero-Model" would be hyper-aware of complex cloud consoles and enterprise ERPs, allowing it to predict the next 5 steps of a workflow based on just the first 2 clicks.

Cross-Platform "Universal Translator" While we currently output Playwright and Python, our vision is to create a "Universal Automation Translator." This would allow a developer to record a task once and instantly export it as an AWS Step Functions workflow, a Terraform module, or even a Nova Act natural language agent.

Predictive "Intent-First" Generation Leveraging the 1M token context window, we want ScriptZero to ingest a company's entire internal Wiki and API documentation. This would allow the agent to say: "I see you are manually onboarding a user in the internal portal. I have already generated the automation script for this based on the 'Onboarding SOP' I found in your documentation."

Built With

  • amazon-bedrock
  • amazon-ecr
  • amazon-nova-2l-ite
  • amazon-nova-act
  • amazon-nova-multimodel-embedding
  • amazon-web-services
  • aws-lambda
  • langchain
  • playwright
  • strands-agents-sdk
Share this project:

Updates