Skip to content

bird-bench/GenUI-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenUI-Agent

A complete pipeline for training and evaluating AI agents on interactive UI/UX tasks. The system covers supervised fine-tuning (SFT), reinforcement learning (GRPO), end-to-end evaluation, and a dynamic UI rendering engine.

GenUI-Agent architecture and workflow

Figure 1. Top: Overview of the GenUI-Agent architecture. GenUI-Agent consists of a Tool Agent for tool reasoning and execution, and a GUI Agent for generating clarification UIs and collecting structured feedback. Bottom: The unfolded three-stage GenUI-Agent workflow. The Tool Agent first explores candidate records and missing slots, the GUI-Coder generates a clarification UI, and the Tool Agent uses the resulting structured feedback to execute final actions.

Project Structure

.
├── ux_train/          # Model training (SFT + GRPO)
├── ux_infer/          # Benchmark evaluation (9-step pipeline)
└── dynamic-ux/        # UI rendering engine (WebSocket server + frontend)

Components

ux_train — Training

Supervised fine-tuning and GRPO reinforcement learning for UX code generation models.

  • SFT: Fine-tune base models (e.g., Qwen2.5-Coder-7B-Instruct) on UX interaction data
  • GRPO: Reward-guided RL using API-based LLM judge for code quality optimization
# SFT training
bash ux_train/sft_coder/sft_qwen2.5_coder_7b.sh

# GRPO training (requires API_KEY)
export API_KEY="your-api-key"
bash ux_train/rl_coder/grpo_ux.sh

ux_infer — Evaluation

9-step end-to-end evaluation pipeline with API (Cloud API) and GPU (vLLM) inference modes.

The pipeline is driven by a GenUI Agent composed of two sub-agents:

  • Tool Agent — Multi-turn reasoning over domain databases and tool calls to gather information and execute actions.
  • GUI-Coder — Generates interactive React/TSX UIs for collecting user feedback based on the Tool Agent's findings.
Tool Agent: Pre-Agent TAO → ··· → Post-Agent TAO
                          ↘                ↗
GUI-Coder:          Reflection UI → TSX Gen → Rendering → User Simulation
# API mode
export API_KEY="your-api-key"
cd ux_infer/run/agent/api && ./run_eval_lite.sh

# GPU mode
cd ux_infer/run/agent/gpu && ./run_eval_lite.sh

dynamic-ux — UI Rendering Engine

WebSocket server and React frontend for rendering TSX components and executing user simulations. Required by ux_infer Steps 4 & 6.

# Start services (required before running ux_infer)
cd dynamic-ux
yarn install
pip install -r packages/live-app-server/requirements.txt

# Terminal 1: WebSocket server (ws://localhost:8765)
cd packages/live-app-server && python3 websocket_server.py

# Terminal 2: Frontend app (http://localhost:5173)
cd packages/live-app-server && yarn dev

End-to-End Workflow

Base Model (e.g., Qwen2.5-Coder-7B-Instruct)
    │
    ▼
[ SFT ]  ux_train/sft_coder/     Fine-tune on UX interaction data
    │
    ▼
[ GRPO ] ux_train/rl_coder/    RL with LLM-judge reward
    │
    ▼
[ Eval ] ux_infer/               9-step benchmark evaluation
    │        │
    │        └── dynamic-ux/     UI rendering & user simulation
    ▼
Results   ux_infer/output/       Metrics, screenshots, logs

Code Structure

.
├── ux_train/                                   # Model training
│   ├── sft_coder/                              # SFT for GUI-Coder (code generation)
│   │   └── sft_qwen2.5_coder_7b.sh
│   ├── sft_tool/                               # SFT for Tool Agent (tool-use reasoning)
│   │   └── sft_qwen2.5_7b.sh
│   └── rl_coder/                               # GRPO reinforcement learning
│       ├── grpo_ux.sh                          # Training launch script
│       ├── reward_fn_batch.py                  # 4-dimension LLM judge reward
│       ├── call_api.py                         # Cloud API wrapper for reward calls
│       ├── prompt.py                           # Prompt templates for LLM judge
│       └── code_exec/                          # TSX code execution & validation
│
├── ux_infer/                                   # Benchmark evaluation (9-step pipeline)
│   ├── bench/                                  # Datasets & domain databases
│   ├── prompt/                                 # System & user prompt templates
│   ├── src/
│   │   ├── agent/{api,gpu}/                    # Tool Agent: TAO reasoning (Steps 1, 8)
│   │   ├── show_ui/{api,gpu}/                  # GUI-Coder: TSX gen, rendering, user sim (Steps 2-6)
│   │   ├── evaluation/                         # Action tool-calling evaluation (Step 9)
│   │   └── pre_process/                        # Data preprocessing
│   ├── utils/                                  # Cloud API & vLLM wrappers
│   └── run/agent/{api,gpu}/                    # Run scripts & configuration
│
└── dynamic-ux/                                 # UI rendering engine (Nx monorepo)
    └── packages/
        ├── live-app-server/                    # WebSocket server + React frontend
        │   ├── websocket_server.py             # WS server (ws://localhost:8765)
        │   ├── playwright_runner.py            # Playwright execution runtime
        │   └── src/                            # React app (http://localhost:5173)
        ├── dynamic-guest/                      # TSX transpilation & sandboxed execution
        └── spark-ux/                           # UI component library (shadcn/ui)

About

UI Generation for HCI interaction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors