This repository accompanies our survey paper:
Reinforcement Learning for Multimodal Foundation Models: A Survey
- [2026-01-28] To better define the scope of the claim, we have renamed the title to "Reinforcement Learning for Multimodal Foundation Models: A Survey".
- [2025-08-13] We have released "Reinforcement Learning for Large Model: A Survey", the first comprehensive survey dedicated to the emerging paradigm of "RL for Large Model".
- [2025-08-13] We reorganized the repository and aligned the classifications in the survey.
- [2025-06-08] We created this repository to maintain a paper list on Awesome-Visual-Reinforcement-Learning. Everyone is welcome to push and update related work!
Reinforcement Learning for Multimodal Foundation Models enables agents to learn decision-making policies directly from visual observations (e.g., images or videos), rather than structured state inputs. It lies at the intersection of reinforcement learning and computer vision, with applications in robotics, embodied AI, games, and interactive environments.
Awesome-Visual-Reinforcement-Learning is a curated list of papers, libraries, and resources on learning control policies from visual input. It aims to help researchers and practitioners navigate the fast-evolving Visual RL landscape — from perception and representation learning to policy learning and real-world applications.
We structure this collection along a trajectory of visual RL. This chart ugroups existing work by high-level domain (MLLMs, visual generation, unified models, and vision-language action agents) and then by finer-grained tasks, illustrating representative papers for each branch.:
Libraries and tools
- Benchmarks environments and datasets with Visual RL
- Multi-Modal Large Language Models with RL
- Visual Generation with RL
- RL for Unified Model
- Vision Language Action Models with RL
- Others
-
MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning (Mar. 2025)
-
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 (Mar. 2025)
-
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? (May. 2025)
Definition: We refer to conventional RL-based MLLMs as approaches that apply reinforcement learning primarily to align a vision–language backbone with verifiable, task-level rewards, without explicitly modeling multi-step chain-of-thought reasoning.
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (Jan. 2026)
-
RL makes MLLMs see better than SFT (Oct. 2025)
-
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning (Sep. 2025)
-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning (Sep. 2025)
-
LENS: Learning to Segment Anything with Unified Reinforced Reasoning (Aug. 2025)
-
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding (Aug. 2025)
-
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning (Aug. 2025)
-
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models (Jun. 2025)
-
GoalLadder: Incremental Goal Discovery with Vision-Language Models (Jun. 2025)
-
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning (Jun. 2025)
-
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model (May. 2025)
-
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning (Jun. 2025)
-
Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment (Jun. 2025)
-
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning (May. 2025)
-
One RL to See Them All: Visual Triple Unified Reinforcement Learning (Mar. 2025)
-
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning (May. 2025)
-
ProxyThinker: Test-Time Guidance through Small Visual Reasoners (Mar. 2025)
-
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles (Mar. 2025)
-
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning (Jun. 2025)
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization (Mar. 2025)
-
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning (May. 2025)
-
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories (Mar. 2025)
-
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank (Mar. 2025)
-
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning (July. 2025)
-
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards (Mar. 2025)
-
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning (Apr. 2025)
-
MMSearch-R1: Incentivizing LMMs to Search (Jun. 2025)
-
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation (May. 2025)
Definition: Perception centric works applies RL to sharpen object detection, segmentation and grounding without engaging in lengthy chain–of–thought reasoning.
-
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration (May. 2025)
-
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes (Mar. 2025)
-
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse (Mar. 2025)
-
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning (Jun. 2025)
-
Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations (Jun. 2025)
-
Perception-R1: Pioneering Perception Policy with Reinforcement Learning (Apr. 2025)
-
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning (Mar. 2025)
-
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs (Jun. 2025)
-
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning (Mar. 2025)
-
Grounded Reinforcement Learning for Visual Reasoning (Mar. 2025)
-
Visual-RFT: Visual Reinforcement Fine-Tuning (Mar. 2025)
-
Video-R1: Reinforcing Video Reasoning in MLLMs (Mar. 2025)
- GLM‑4.5V and GLM‑4.1V‑Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning (Jul. 2025)
-
SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization (Jun. 2025)
-
VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training (Jun. 2025)
-
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning (Jun. 2025)
-
EasyARC: Evaluating Vision Language Models on True Visual Reasoning (Jun. 2025)
-
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs (Jun. 2025)
-
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning (Mar. 2025)
-
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning (Mar. 2025)
-
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning (Mar. 2025)
-
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning (Jun. 2025)
-
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning (Mar. 2025)
-
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking (Jun. 2025)
-
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning (Mar. 2025)
-
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning (Jun. 2025)
-
Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward (Jun. 2025)
-
MiMo-VL Technical Report (Jun. 2025)
-
Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning (Mar. 2025)
Definition: Thinking with Images elevates the picture to an active, external workspace: models iteratively generate, crop, highlight, sketch or insert explicit visual annotations as tokens in their chain-of-thought, thereby aligning linguistic logic with grounded visual evidence.
-
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback (Jul. 2025)
-
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning (Jul. 2025)
-
Visual Planning: Let's Think Only with Images (May. 2025)
-
GRIT: Teaching MLLMs to Think with Images (Mar. 2025)
-
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing (Jun. 2025)
-
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information (Mar. 2025)
-
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning (Mar. 2025)
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning (May. 2025)
-
DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning (Mar. 2025)
-
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs (Mar. 2025)
-
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning (May. 2025)
-
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection (Mar. 2025)
-
Thinking with Generated Images (May. 2025)
-
Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL (Mar. 2025)
-
Openthinkimg: Learning to think with images via visual tool reinforcement learning (May. 2025)
-
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO (May. 2025)
-
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning (May. 2025)
-
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use (May. 2025)
-
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning (Sep. 2025)
-
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs (Sep. 2025)
-
Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding (Aug. 2025)
-
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning (Aug. 2025)
-
VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning (Jun. 2025)
-
Reinforcing Video Reasoning with Focused Thinking (Mar. 2025)
-
EgoVLM: Policy Optimization for Egocentric Video Understanding (Jun. 2025)
-
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning (Jun. 2025)
-
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO (Jun. 2025)
-
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning (Jun. 2025)
-
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning (Apr. 2025)
-
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency (Jun. 2025)
Definition: Study RL agents that generate or manipulate visual content to achieve goals or enable creative visual tasks.
-
Unified Personalized Reward Model for Vision Generation (Mar. 2026)
-
Unified Personalized Reward Model for Vision Generation (Feb. 2026)
-
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment (Jan. 2026)
-
GARDO: Reinforcing Diffusion Models without Reward Hacking (Dec. 2025)
-
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation (Jan. 2026)
-
MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency (Oct. 2025)
-
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation (Oct. 2025)
-
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward (Sep. 2025)
-
OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning (Aug. 2025)
-
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning (Aug. 2025)
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning (Jul. 2025)
-
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models (Aug. 2025)
-
Qwen-Image Technical Report (Aug. 2025)
-
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation (Apr. 2023)
-
ReasonGen-R1: Cot for Autoregressive Image generation models through SFT and RL (May. 2025)
-
FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL (Jun. 2025)
-
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models (May. 2023)
-
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning (Mar. 2025)
-
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference (2024, Neurips)
-
Rendering-Aware Reinforcement Learning for Vector Graphics Generation (May. 2025)
-
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning (May. 2025)
-
D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples (May. 2025)
-
Training Diffusion Models with Reinforcement Learning (May. 2023)
-
Diffusion model alignment using direct preference optimization (Nov. 2023)
-
Aligning diffusion models by optimizing human utility (Apr. 2024)
-
Diffusion-rpo: Aligning diffusion models through relative preference optimization (Jun. 2024)
-
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation (Jan. 2024)
-
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning (2024, ECCV)
-
Subject-driven text-to-image generation via preference-based reinforcement learning (Jul. 2024)
-
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning (May. 2025)
-
RL for consistency models: Reward guided text-to-image generation with fast inference (Mar. 2024)
-
Towards better alignment: Training diffusion models with reinforcement learning against sparse rewards (CVPR 2025)
-
DiffPPO: Reinforcement Learning Fine-Tuning of Diffusion Models for Text-to-Image Generation (ICNC 2024)
-
Simplear: Pushing the frontier of autoregressive visual generation through pretraining, sft, and rl (Apr. 2025)
-
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation (Jul. 2025)
-
Flow-GRPO: Training Flow Matching Models via Online RL (May. 2025)
-
Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization (May. 2025)
-
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning (2024, ECCV)
-
Rendering-Aware Reinforcement Learning for Vector Graphics Generation (Mar. 2025)
-
Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes (Jan. 2026)
-
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling (Sep. 2025)
-
The Promise of RL for Autoregressive Image Editing (Aug. 2025)
-
Manifold-Aware Exploration for Reinforcement Learning in Video Generation (Mar. 2026)
-
WorldCompass: Reinforcement Learning for Long-Horizon World Models (Nov. 2025)
-
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models (Jan. 2026)
-
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation (Nov. 2025)
-
Video Generation Models Are Good Latent Reward Models (Nov. 2025)
-
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation (Nov. 2025)
-
RewardDance: Reward Scaling in Visual Generation (Sep. 2025)
-
DanceGRPO: Unleashing GRPO on Visual Generation (Jun. 2025)
-
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO (May. 2025)
-
Reasoning physical video generation with diffusion timestep tokens via reinforcement learning (Apr. 2025)
-
Improving Video Generation with Human Feedback (Jan. 2025)
-
TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning (May. 2025)
-
Aligning Anime Video Generation with Human Feedback (Apr. 2025)
-
Instructvideo: Instructing video diffusion models with human feedback (Dec. 2023)
-
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation (Jun. 2024)
-
Gradeo: Towards human-like evaluation for text-to-video generation via multi-step reasoning (Mar. 2025)
-
Boosting text-to-video generative model with MLLMs feedback (NIPS 2024)
-
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision (Jun. 2024)
-
Dreamreward: Text-to-3d generation with human preference (ECCV 2024)
-
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization (Feb. 2025)
-
Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards (Jun. 2025)
-
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning (Mar. 2025)
-
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation (May. 2025)
-
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning (Jun. 2025)
-
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation (Feb. 2025)
-
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning (Apr. 2025)
-
Emu3: Next-token prediction is all you need (Sep. 2024)
-
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again (Jul. 2025)
-
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning (Mar. 2025)
-
MMaDA: Multimodal Large Diffusion Language Models (Mar. 2025)
-
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning (Sep. 2025)
-
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding (Jul. 2025)
-
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning (Aug. 2025)
-
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents (Apr. 2025)
-
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning (Mar. 2025)
-
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (Mar. 2025)
-
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning (Mar. 2025)
-
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning (Jun. 2025)
-
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment (Jul. 2025)
-
ProgRM: Build Better GUI Agents with Progress Rewards (May. 2025)
-
Mobile-R1: Towards Interactive Reinforcement Learning for VLM-Based Mobile Agent via Task-Level Rewards (Jun. 2025)
-
GTA1: GUI Test-time Scaling Agent (Jul. 2025)
-
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization (Jun. 2025)
-
Ui-tars: Pioneering automated gui interaction with native agents (Jan. 2025)
-
Reinforced ui instruction grounding: Towards a generic ui task automation api (Oct. 2023)
-
Appvlm: A lightweight vision language model for online app control (Feb. 2025)
-
DigiRL: Training In‑The‑Wild Device‑Control Agents with Autonomous Reinforcement Learning (Jun. 2024, NeurIPS 2024)
-
ARPO: End‑to‑End Policy Optimization for GUI Agents with Experience Replay (May 2025)
-
OctoNav: Towards Generalist Embodied Navigation (Jun. 2025)
-
More: Unlocking scalability in reinforcement learning for quadruped vision-language-action models (ICRA 2025)
-
RAPID: Robust and Agile Planner Using Inverse Reinforcement Learning for Vision-Based Drone Navigation (Feb. 2025)
-
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning (Jun. 2025)
-
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning (Sep. 2024)
-
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model (Aug. 2025)
-
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning (Apr. 2025)
-
Selective Visual Representations Improve Convergence and Generalization for Embodied AI (2024, ICLR Spotlight)
-
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models (Oct. 2025)
-
VLA-RFT: Vision-Language-Action Reinforcement Fine-Tuning with Verified Rewards in World Simulators (Oct. 2025)
-
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training (Oct. 2025)
-
VLA-RFT: Vision-Language-Action Reinforcement Fine-Tuning with Verified Rewards in World Simulators (Oct. 2025)
-
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning (Sep. 2025)
-
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning (Aug. 2025)
-
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control (May 2025)
-
TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization (Jun. 2025)
-
RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback (May. 2025)
-
What Can RL Bring to VLA Generalization? An Empirical Study (May. 2025)
-
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning (May. 2025)
-
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy (Feb. 2025)
-
Improving Vision-Language-Action Model with Online Reinforcement Learning (2025, ICRA)
-
Interactive Post-Training for Vision-Language-Action Models (May. 2025)
-
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning (May. 2025)
-
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning (Jun. 2025)
-
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics (May. 2025)
-
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning (Dec. 2024)
-
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning (Oct. 2024)
-
MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations (Dec. 2022)
-
Visual IRL for Human-Like Robotic Manipulation (Dec. 2024)
-
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning (2018, IROS)
- Reinforcement Pre‑Training (Jun. 2025)
-
Visual Pre-Training on Unlabeled Images using Reinforcement Learning (Jun. 2025)
-
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models (Mar. 2025)
- Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering (Mar. 2025)
-
Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound (July. 2025)
-
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning (Feb. 2025)
Definition: Learn predictive models of environment dynamics from visual inputs to enable planning and long-horizon reasoning in RL.
-
Mastering Diverse Domains through World Models (2025, Nature)
-
CoWorld: Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning (2024, NeurIPS)
-
LS-Imagine: Open-World Reinforcement Learning over Long Short-Term Imagination (2025, ICLR oral)
-
Reinforcement Learning Guide (2025, Blog)
-
Can RL From Pixels be as Efficient as RL From State? (Jl. 2025, Blog) u
-
The 37 Implementation Details of Proximal Policy Optimization (Mar. 2022, ICLR Blog)
-
Group Relative Policy Optimization (GRPO) Illustrated Breakdown & Explanation (Jl. 2025, Blog)
- Deep Reinforcement Learning Course from Hugging Face (Hugging Face)
-
Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes (Aug. 2024)
-
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models (May. 2025)
-
Reinforcement Learning for Generative AI: A Survey (Aug. 2023)
-
Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives (Oct. 2024)
This template is provided by Awesome-Video-Diffusion. And our approach builds upon numerous contributions from prior resource, such as Awesome Visual RL.
🔥 This project is actively maintained, and we welcome your contributions. If you have any suggestions, such as missing papers or information, please feel free to open an issue or submit a pull request.
🤖 Try our Awesome-Paper-Agent. Just provide an arXiv URL link, and it will automatically return formatted information, like this:
User:
https://arxiv.org/abs/2312.13108
GPT:
+ [AssistGUI: Task-Oriented Desktop Graphical User Interface Automation](https://arxiv.org/abs/2312.13108) (Dec. 2023)
[](https://github.com/showlab/assistgui)
[](https://arxiv.org/abs/2312.13108)
[](https://showlab.github.io/assistgui/)
So then you can easily copy and use this information in your pull requests.
⭐ If you find this repository useful, please give it a star.
If you have any suggestions (missing papers, new papers, or typos), please feel free to edit and submit a pull request. Even just suggesting paper titles is a great contribution — you can also open an issue or contact us via email (weijiawu96@gmail.com).
If you find our survey and this repository useful for your research, please consider citing our work:
@article{wu2025reinforcement,
title={Reinforcement Learning in Vision: A Survey},
author={Wu, Weijia and Gao, Chen and Chen, Joya and Lin, Kevin Qinghong and Meng, Qingwei and Zhang, Yiming and Qiu, Yuke and Zhou, Hong and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2508.08189},
year={2025}
}
