Peter Sushko

I work on multimodal web agents with the PRIOR team at AI2. Previously, I conducted AI research with Professor Ranjay Krishna at the University of Washington, where I earned my Master's degree in Statistics in 2024.

Before that, I worked as a Machine Learning Engineer at Neustar, where I built models for marketing attribution.

I hold a Bachelor's degree in Mathematics and Economics from Santa Clara University, where I was advised by Professor Frank Farris.

My research interests are multimodal agents and generative AI.

Email  /  Resume  /  LinkedIn  /  Github

profile photo

Research and Publications

Image
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
Tanmay Gupta*, Piper Wolters*, Zixian Ma*, Peter Sushko*, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna
Preprint, 2026
Paper / Model / Code / Data / Demo

We introduce MolmoWeb, an open visual web agent built on Molmo 2 that navigates websites using only screenshots. Alongside the model, we release MolmoWebMix, a large dataset of browser task demonstrations and GUI perception data. MolmoWeb achieves state-of-the-art results among open-weight web agents, outperforming even agents built on much larger proprietary models.

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Peter Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna
In submission
arXiv / Project page / Code (coming soon)

We introduce OmniView, a unified diffusion framework for 3D and 4D view synthesis that generalizes across novel view synthesis, camera-controlled video generation, and keyframe interpolation. By separately representing space, time, and view conditions, OmniView is competitive with task-specific models across diverse benchmarks.

Image
DiScoFormer: Plug-In Density and Score Estimation with Transformers
Vasily Ilin, Peter Sushko
GRaM Workshop @ ICLR 2026
arXiv

We introduce DiScoFormer, a "train-once, infer-anywhere" equivariant Transformer that maps i.i.d. samples to both density values and score vectors, generalizing across distributions and sample sizes. We prove that self-attention can recover normalized KDE, and show the model outperforms classical methods for density estimation, Fisher information computation, and Fokker-Planck-type PDEs.

Image
MultiRef: Controllable Image Generation with Multiple Visual References
Ruoxi Chen, Dongping Chen, Siyuan Wu, Sinan Wang, Shiyun Lang, Peter Sushko, Gaoyang Jiang, Yao Wan, Ranjay Krishna
ACMMM25
arXiv / Project page / Code / Data

We introduce MultiRef, a benchmark and dataset for controllable image generation with multiple visual references. MultiRef-bench offers 1,990 evaluation samples, and the MultiRef dataset provides 38k high-quality images via our RefBlend engine. Experiments show that even state-of-the-art models struggle with multi-reference conditioning, underscoring challenges and opportunities for more flexible creative tools.

Image
Score-Based Deterministic Density Sampling
Vasily Ilin, Peter Sushko, Jingwei Hu
Communications on Pure and Applied Analysis (CPAA), 2026
arXiv

We propose a deterministic sampling method that learns time-varying scores on-the-fly to sample from unnormalized densities. Our approach produces smooth trajectories with monotone convergence, achieving the same optimal rates as exact gradient flow while being more sample efficient than stochastic methods.

Image
REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim, Vasily Ilin, Ben Caffee, Dongping Chen, Mohammadreza Salehi, Cheng-Yu Hsieh, Ranjay Krishna
CVPR 2025
arXiv / Project page / Code / Model / Data

We introduce REALEDIT, a large-scale image editing dataset with authentic user requests and human-made edits from Reddit, enabling models to better address real-world needs. Our model, finetuned on the REALEDIT dataset, shows state-of-the-art performance results and is able to generate extremely high quality edits.

Engineering Projects

Image Conformal Prediction Kaggle Competition Winner
Peter Sushko
In-class Kaggle Competition, 2024
Kaggle

Leveraged neural networks and statistical methods to optimize prediction intervals. Implemented Jackknife resampling to construct robust intervals based on empirical error distributions. Designed a dual-network architecture to predict upper and lower confidence bounds, employing custom asymmetric loss functions.

Image Ninigrams
Peter Sushko, Nina Koh
Reddit Games and Puzzles Hackathon Submission, 2024
Reddit / Github

Nonogram-inspired game deployed on Reddit using Devvit. Playable in a Reddit post. Rendering done with TypeScript, user data is collected and stored via Redis API, backend puzzle generation implemented in Python.

Image Sloop
Tim Shur, Akash Katir, Peter Sushko
Personal Project, 2023
Website

Sloop, designed by Tim and Akash, is a browser-based game. It is built with Node.js, initialized using Create Next App and is deployed on Vercel. I contributed additional features and hidden Easter eggs to enhance gameplay and user experience.


Design and source code from Jon Barron's website.