Research Study

Robot Evaluation Study

Help us understand how different robot policies perform by comparing side-by-side video clips.

📋 Study Overview

Thank you for participating in our study! Your feedback will help us understand how different robot policies perform on simulated manipulation tasks.

In this study, you will watch short video clips showing two robots (side-by-side) attempting to complete the same task (e.g., stacking, moving objects). Each robot follows a different control policy.

For each pair of videos, first briefly describe what each robot did. Watch both videos fully before judging, rather than deciding from halfway behavior. Then give each video its own score from 0 to 100, where 100 means the robot fully succeeded at the task. After that, choose which robot performed better overall, or mark a tie if the two videos look equally good or equally bad.

Select your participant type and enter a unique ID to begin. Using the same ID on the same browser lets you resume your progress.

Paid Workers Use your Amazon Worker ID. You'll complete a short qualification quiz before the main study.
Volunteers Create any unique identifier for yourself to track your progress.
🎬 What to Expect

Here's an example trial. Watch both videos fully, describe each robot's attempt, assign each video a 0-100 score where 100 means full task success, then choose which policy performed better overall.

If a video is slow to load, please wait a moment or refresh the page.

Task: Put toy bear into black bowl
Policy A
Policy B

Describe both videos above to continue.

Your Evaluation

Policy A Score

Give this video a score from `0` to `100`, where `100` means full task success. Current: Not set

Policy B Score

Give this video a score from `0` to `100`, where `100` means full task success. Current: Not set
(equally good or equally bad)

After completing all clips you will receive a completion code to submit for credit.