Experiential Reinforcement Learning

🚀 Experiential Reinforcement Learning for Language Agents 🌟

This repository implements Experiential Reinforcement Learning (ERL), enabling post-training of language agents through an explicit experience–reflection–consolidation loop. Instead of relying solely on reward optimization, ERL enables agents to learn through structured reflection and refinement, turning environmental feedback into durable behavioral improvements. This repository provides the tools to build custom agents and environments, train them with ERL, and deploy adaptive agents capable of tackling complex, real-world tasks.

Algorithm Overview

Experiential Reinforcement Learning (ERL) embeds an explicit experience–reflection–consolidation loop into reinforcement learning so that language agents can learn from interaction and internalize improvements. For each task, the agent produces an initial attempt, receives feedback, generates a reflection to diagnose failures, and produces a refined second attempt whose improvements are reinforced into the policy. We optimize the first attempt, reflection, and second attempt with a policy gradient objective (GRPO by default, with support for other methods). To internalize lessons from reflection and experience, we distill successful second attempts via supervised finetuning with context distillation, training the model to produce the improved response directly from the original prompt.

Getting Started 🎯

Our implementation of Experiential RL is based on rLLM v0.2.1 and verl v0.6.1. Please refer to their documentation for installation instructions.

Step 1: Building rLLM

rLLM requires Python >= 3.11. You can install it either directly via pip or build from source.

Option A: Direct Installation

uv pip install "git+https://github.com/rllm-org/rllm.git"

Option B: Building from Source

# Clone the repository
git clone https://github.com/rllm-org/rllm.git
cd rllm

# Create a conda environment
conda create -n rllm python=3.11 -y
conda activate rllm

# Build rLLM from source
uv pip install -e .

Step 2: Installing Training Backend

rLLM supports two training backends: verl and Tinker. ERL currently only supports verl as the training backend.

# Install verl
bash scripts/install_verl.sh

Installation with Docker 🐳

For a containerized setup, you can use Docker:

# Build the Docker image
docker build -t rllm .

# Create and start the container
docker create --runtime=nvidia --gpus all --net=host --shm-size="10g" --cap-add=SYS_ADMIN -v .:/workspace/rllm -v /tmp:/tmp --name rllm-container rllm sleep infinity
docker start rllm-container

# Enter the container
docker exec -it rllm-container bash

Examples

ERL for Sokoban: grid-based planning workflow
ERL for FrozenLake: navigation/control workflow
ERL for HotpotQA: retrieval-augmented QA workflow

Adding New Tasks

You can extend ERL in two ways:

If you already have an agent built with an existing framework (for example LangGraph or AutoGen), use the rLLM SDK Engine.
- SDK docs: rLLM SDK Engine
- Reference example: ERL for HotpotQA
- In this example, ErlHotpotSearchAgent defines the LangGraph-based agent loop, and ErlHotpotWorkflow in the same file wraps it into the ERL first-attempt -> reflection -> second-attempt training flow.
If you want a native rLLM workflow implementation, build your task with AgentWorkflowEngine.
- Workflow engine docs: rLLM AgentWorkflowEngine
- Reference example: ERL for FrozenLake
- In this case, replace ErlFrozenLakeAgent, ErlFrozenLakeEnv, and ErlFrozenLakeWorkflow with your own task, then wire your classes in train_erl_frozenlake_flow.py.

Acknowledgements

Our work is done as part of USC LIME Lab and Microsoft Office of Applied Research. We pay special thanks to Berkeley Sky Computing Lab and rLLM for their support. The implementation of Experiential RL is based on rLLM.

Citation

@misc{shi2026experientialreinforcementlearning,
      title={Experiential Reinforcement Learning}, 
      author={Taiwei Shi and Sihao Chen and Bowen Jiang and Linxin Song and Longqi Yang and Jieyu Zhao},
      year={2026},
      eprint={2602.13949},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.13949}, 
}

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
examples		examples
rllm		rllm
rllm_data		rllm_data
scripts		scripts
tests		tests
train_scripts		train_scripts
verl		verl
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
build_docs.sh		build_docs.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiential Reinforcement Learning

Algorithm Overview

Getting Started 🎯

Step 1: Building rLLM

Step 2: Installing Training Backend

Installation with Docker 🐳

Examples

Adding New Tasks

Acknowledgements

Citation

Trademarks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Experiential Reinforcement Learning

Algorithm Overview

Getting Started 🎯

Step 1: Building rLLM

Step 2: Installing Training Backend

Installation with Docker 🐳

Examples

Adding New Tasks

Acknowledgements

Citation

Trademarks

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages