AI Browser - Human-in-the-loop Navigator
Inspiration
We've all been there โ trying to help a friend or family member navigate a complex website over the phone, or struggling to remember the exact steps to complete an online task we only do once a year. Traditional screen recorders capture what happens but don't teach how to do it. Automation tools complete tasks but leave users dependent and unable to learn.
We were inspired to create something different: an AI-powered browser that doesn't just do things for you, but teaches you how to do them. By combining the intelligence of AI with the irreplaceable human element of learning and decision-making, we built a tool that bridges the gap between automation and education.
What it does
AI Browser is an intelligent, human-in-the-loop navigation assistant that transforms complex web workflows into guided learning experiences. Here's what makes it special:
๐ฏ Intelligent Task Planning
- Users describe what they want to accomplish in natural language
- AI analyzes the task and breaks it down into clear, manageable steps
- Creates a visual roadmap of the entire workflow
๐งญ Multi-Mode Learning System
- Plan Mode: AI generates and refines step-by-step instructions through conversational interface
- Explore Mode: Users discover interface elements with AI assistance, building a shared understanding
- Tutorial Mode: AI guides users step-by-step with real-time element highlighting and contextual instructions
- Practice Mode: Users attempt tasks independently while the system records their actions
- Reflection Mode: Side-by-side comparison of correct workflow vs. user attempts, highlighting areas for improvement
๐ Smart Element Detection
- HTML Mode: Analyzes DOM structure to identify interactive elements
- Vision Mode: Uses computer vision to locate elements when HTML matching fails
- Fallback system ensures guidance continues even when exact elements can't be highlighted
๐ Adaptive Learning
- Records user progress and mistakes
- Provides contextual help when users get stuck
- Builds confidence through progressive independence
How we built it
Technology Stack
Frontend
- Electron: Desktop application framework with webview integration
- Vanilla JavaScript: Fast, lightweight rendering without framework overhead
- Custom CSS: Sophisticated grayscale theme with gradient accents and lotus branding
Backend
- Python + Flask: RESTful API server handling AI requests
- LLM Integration: OpenAI GPT for natural language understanding and task planning
- Computer Vision: Image analysis for element detection fallback
- DOM Analysis: Intelligent parsing and element matching algorithms
Key Architecture Decisions
- Preload Scripts: Secure communication between renderer and webview
- Session Management: Stateful agent tracking across multiple page interactions
- IPC Communication: Efficient message passing for real-time element highlighting
- Modular Design: Separate modules for planning, navigation, observation, and translation
Development Workflow
- Built core Electron shell with webview integration
- Developed DOM scanning and element highlighting system
- Integrated LLM for task planning and step generation
- Created multi-mode learning workflow (Plan โ Tutorial โ Practice โ Reflect)
- Added computer vision fallback for robust element detection
- Designed and implemented grayscale UI theme with lotus branding
- Implemented session management for continuous task tracking
Challenges we ran into
1. Dynamic Web Content
Modern websites use complex JavaScript frameworks that constantly modify the DOM. Our initial approach of static element selection failed when pages updated. We solved this by:
- Implementing real-time DOM observation
- Creating resilient CSS selectors that survive page mutations
- Adding retry logic with exponential backoff
2. Element Identification Ambiguity
Matching AI-generated instructions to actual HTML elements proved surprisingly difficult. A button described as "Submit" might have aria-label="Complete form" or no text at all. Our solution:
- Multi-criteria matching (text content, ARIA labels, placeholders, role attributes)
- Confidence scoring system to detect low-quality matches
- Computer vision fallback using screenshot analysis
3. Cross-Domain Communication
Electron's security model made it challenging to inject code into arbitrary websites. We overcame this with:
- Careful preload script architecture
- IPC message passing between isolated contexts
- Sandbox permissions for network and git operations
4. State Management Complexity
Coordinating state between frontend UI, backend AI, and webview content required careful design:
- Session-based tracking with unique IDs
- Event-driven architecture for asynchronous updates
- Proper cleanup and reset mechanisms
5. UI/UX for Learning
Balancing between guidance and overwhelming users was tricky:
- Created collapsible sidebar to maximize browser space
- Designed non-intrusive element highlighting
- Implemented progressive disclosure of information
- Added help button for easy access during practice mode
Accomplishments that we're proud of
โจ Hybrid AI-Human Approach: We didn't just build another automation tool โ we created a learning system that respects human agency while providing intelligent assistance.
๐จ Sophisticated UI Design: Our grayscale theme with subtle gradients and the custom lotus logo creates a professional, calming interface that doesn't distract from the learning experience.
๐ง Robust Fallback System: When exact element matching fails, the system gracefully degrades to card-based instructions, ensuring users never get stuck.
๐ Complete Learning Loop: From planning to practice to reflection, we built a comprehensive system that mirrors effective educational psychology principles.
๐ Real-World Applicability: Successfully demonstrated with complex workflows like restaurant reservations on Google Maps โ a genuinely useful application.
๐ Session Continuity: The agent maintains context across multiple page navigations, understanding the full workflow rather than treating each step in isolation.
What we learned
Technical Insights
- LLM Prompt Engineering: Crafting prompts that consistently produce structured, actionable outputs required extensive iteration
- DOM Manipulation at Scale: Working with arbitrary websites taught us resilience patterns for unpredictable environments
- Electron Architecture: Deep understanding of Electron's security model, preload scripts, and IPC communication
- Computer Vision Integration: Learned to combine traditional CV with modern AI for hybrid element detection
Design Insights
- Progressive Learning: Users need different levels of guidance at different stages โ one size doesn't fit all
- Confidence Matters: Showing confidence scores helps users know when to trust AI vs. verify manually
- Minimalism Wins: A clean, grayscale interface helps users focus on learning rather than fighting the tool
Product Insights
- Human-in-the-Loop is Powerful: The best AI tools augment rather than replace human decision-making
- Context is Everything: Task completion isn't just about clicking buttons โ it's about understanding why
- Learning Sticks: Users who complete the tutorial-practice-reflect cycle retain knowledge better than those who just watch automation
What's next for AI Browser
Near-Term Enhancements
๐ Voice Guidance: Audio narration for hands-free learning during tutorials
๐ฑ Mobile Companion: Sync learned workflows to mobile devices for on-the-go reference
๐ Workflow Sharing: Community marketplace for common tasks (tax filing, travel booking, etc.)
๐ฅ Video Generation: Automatic creation of polished tutorial videos from recorded sessions
Advanced Features
๐ค Adaptive Difficulty: AI adjusts guidance level based on user proficiency
๐งฉ Workflow Decomposition: Break complex multi-site workflows into manageable chunks
๐ Integration Hub: Connect with password managers, form fillers, and productivity tools
๐ Analytics Dashboard: Track learning progress and identify areas needing more practice
Enterprise Vision
๐ฅ Team Collaboration: Share and co-edit workflows within organizations
๐ Compliance Mode: Ensure workflows follow company policies and regulations
๐ Training Analytics: Track employee onboarding and process adoption
๐ข Custom Deployment: Self-hosted versions for security-sensitive environments
Research Directions
๐งช Reinforcement Learning: AI learns from user corrections to improve future guidance
๐ฎ Predictive Assistance: Anticipate user needs based on context and history
๐ Multi-Language Support: Automatic translation of workflows and instructions
โฟ Accessibility Features: Screen reader integration, keyboard-only navigation support
The future of web navigation isn't full automation โ it's intelligent assistance that empowers humans to learn, adapt, and succeed. AI Browser is just the beginning of this journey.
Try It Out
Visit our GitHub repository to:
- Download the latest release
- View the source code
- Contribute to development
- Report issues or request features
Let's make the web accessible through learning, not just automation. ๐ธ

Log in or sign up for Devpost to join the conversation.