A complete pipeline for training and evaluating AI agents on interactive UI/UX tasks. The system covers supervised fine-tuning (SFT), reinforcement learning (GRPO), end-to-end evaluation, and a dynamic UI rendering engine.
Figure 1. Top: Overview of the GenUI-Agent architecture. GenUI-Agent consists of a Tool Agent for tool reasoning and execution, and a GUI Agent for generating clarification UIs and collecting structured feedback. Bottom: The unfolded three-stage GenUI-Agent workflow. The Tool Agent first explores candidate records and missing slots, the GUI-Coder generates a clarification UI, and the Tool Agent uses the resulting structured feedback to execute final actions.
.
├── ux_train/ # Model training (SFT + GRPO)
├── ux_infer/ # Benchmark evaluation (9-step pipeline)
└── dynamic-ux/ # UI rendering engine (WebSocket server + frontend)
ux_train — Training
Supervised fine-tuning and GRPO reinforcement learning for UX code generation models.
- SFT: Fine-tune base models (e.g., Qwen2.5-Coder-7B-Instruct) on UX interaction data
- GRPO: Reward-guided RL using API-based LLM judge for code quality optimization
# SFT training
bash ux_train/sft_coder/sft_qwen2.5_coder_7b.sh
# GRPO training (requires API_KEY)
export API_KEY="your-api-key"
bash ux_train/rl_coder/grpo_ux.shux_infer — Evaluation
9-step end-to-end evaluation pipeline with API (Cloud API) and GPU (vLLM) inference modes.
The pipeline is driven by a GenUI Agent composed of two sub-agents:
- Tool Agent — Multi-turn reasoning over domain databases and tool calls to gather information and execute actions.
- GUI-Coder — Generates interactive React/TSX UIs for collecting user feedback based on the Tool Agent's findings.
Tool Agent: Pre-Agent TAO → ··· → Post-Agent TAO
↘ ↗
GUI-Coder: Reflection UI → TSX Gen → Rendering → User Simulation
# API mode
export API_KEY="your-api-key"
cd ux_infer/run/agent/api && ./run_eval_lite.sh
# GPU mode
cd ux_infer/run/agent/gpu && ./run_eval_lite.shdynamic-ux — UI Rendering Engine
WebSocket server and React frontend for rendering TSX components and executing user simulations. Required by ux_infer Steps 4 & 6.
# Start services (required before running ux_infer)
cd dynamic-ux
yarn install
pip install -r packages/live-app-server/requirements.txt
# Terminal 1: WebSocket server (ws://localhost:8765)
cd packages/live-app-server && python3 websocket_server.py
# Terminal 2: Frontend app (http://localhost:5173)
cd packages/live-app-server && yarn devBase Model (e.g., Qwen2.5-Coder-7B-Instruct)
│
▼
[ SFT ] ux_train/sft_coder/ Fine-tune on UX interaction data
│
▼
[ GRPO ] ux_train/rl_coder/ RL with LLM-judge reward
│
▼
[ Eval ] ux_infer/ 9-step benchmark evaluation
│ │
│ └── dynamic-ux/ UI rendering & user simulation
▼
Results ux_infer/output/ Metrics, screenshots, logs
.
├── ux_train/ # Model training
│ ├── sft_coder/ # SFT for GUI-Coder (code generation)
│ │ └── sft_qwen2.5_coder_7b.sh
│ ├── sft_tool/ # SFT for Tool Agent (tool-use reasoning)
│ │ └── sft_qwen2.5_7b.sh
│ └── rl_coder/ # GRPO reinforcement learning
│ ├── grpo_ux.sh # Training launch script
│ ├── reward_fn_batch.py # 4-dimension LLM judge reward
│ ├── call_api.py # Cloud API wrapper for reward calls
│ ├── prompt.py # Prompt templates for LLM judge
│ └── code_exec/ # TSX code execution & validation
│
├── ux_infer/ # Benchmark evaluation (9-step pipeline)
│ ├── bench/ # Datasets & domain databases
│ ├── prompt/ # System & user prompt templates
│ ├── src/
│ │ ├── agent/{api,gpu}/ # Tool Agent: TAO reasoning (Steps 1, 8)
│ │ ├── show_ui/{api,gpu}/ # GUI-Coder: TSX gen, rendering, user sim (Steps 2-6)
│ │ ├── evaluation/ # Action tool-calling evaluation (Step 9)
│ │ └── pre_process/ # Data preprocessing
│ ├── utils/ # Cloud API & vLLM wrappers
│ └── run/agent/{api,gpu}/ # Run scripts & configuration
│
└── dynamic-ux/ # UI rendering engine (Nx monorepo)
└── packages/
├── live-app-server/ # WebSocket server + React frontend
│ ├── websocket_server.py # WS server (ws://localhost:8765)
│ ├── playwright_runner.py # Playwright execution runtime
│ └── src/ # React app (http://localhost:5173)
├── dynamic-guest/ # TSX transpilation & sandboxed execution
└── spark-ux/ # UI component library (shadcn/ui)
