Skip to main content

DARIL: When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

  • Conference paper
  • First Online:
Image Collaborative Intelligence and Autonomy in Image-Guided Surgery (COLAS 2025)

Abstract

Surgical action planning requires predicting future instru-ment-verb-target triplets for real-time assistance. While teleoperated robotic surgery provides natural expert demonstrations for imitation learning (IL), reinforcement learning (RL) could potentially discover superior strategies through exploration. We present the first comprehensive comparison of IL versus RL for surgical action planning on CholecT50. Our Dual-task Autoregressive Imitation Learning (DARIL) baseline achieves 34.6% action triplet recognition mAP and 33.6% next frame prediction mAP with smooth planning degradation to 29.2% at 10-second horizons. We evaluated three RL variants: world model-based RL, direct video RL, and inverse RL enhancement. Surprisingly, all RL approaches underperformed DARIL—world model RL dropped to 3.1% mAP at 10s while direct video RL achieved only 15.9%. Our analysis reveals that distribution matching on expert-annotated test sets systematically favors IL over potentially valid RL policies that differ from training demonstrations. This challenges assumptions about RL superiority in sequential decision making and provides crucial insights for surgical AI development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Boels, M., Liu, Y., Dasgupta, P., Granados, A., Ourselin, S.: Swag: long-term surgical workflow prediction with generative-based anticipation. Inter. J. Comput. Assisted Radiol. Surgery, 1–11 (2025)

    Google Scholar 

  2. Boels, M., Robertshaw, H., Booth, T.C., Granados, A., Dasgupta, P., Ourselin, S.: Surgical robot learning: From demonstration and simulation to world models-a review. arXiv preprint (2025)

    Google Scholar 

  3. Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104 (2023)

  4. Hansen, N., Su, H., Wang, X.: Td-mpc2: scalable, robust world models for continuous control. arXiv preprint arXiv:2310.16828 (2023)

  5. Hu, A., et al.: Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080 (2023)

  6. Kiyasseh, D., et al.: A vision transformer for decoding surgeon activity from surgical videos. Nat. Biomed. Eng. 7(6), 780–796 (2023)

    Article  Google Scholar 

  7. Liu, Y., et al.: Skit: a fast key information video transformer for online surgical phase recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21074–21084 (2023)

    Google Scholar 

  8. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  9. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PmLR (2016)

    Google Scholar 

  10. Nwoye, C.I., et al.: Cholectriplet 2021: a benchmark challenge for surgical action triplet recognition. Med. Image Anal. 86, 102803 (2023)

    Article  Google Scholar 

  11. Nwoye, C.I., et al.: Recognition of instrument-tissue interactions in endoscopic videos via action triplets. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 364–374. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_35

    Chapter  Google Scholar 

  12. Nwoye, C.I., Padoy, N.: Data splits and metrics for method benchmarking on surgical action triplet datasets. arXiv preprint arXiv:2204.05235 (2022)

  13. Nwoye, C.I., et al.: Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Med. Image Anal. 78, 102433 (2022)

    Article  Google Scholar 

  14. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

    Google Scholar 

  15. Robertshaw, H., et al.: Reinforcement learning for safe autonomous two-device navigation of cerebral vessels in mechanical thrombectomy. Inter. J. Comput. Assisted Radiol. Surgery, 1–10 (2025)

    Google Scholar 

  16. Robertshaw, H., Karstensen, L., Jackson, B., Granados, A., Booth, T.C.: Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning. Int. J. Comput. Assist. Radiol. Surg. 19(8), 1569–1578 (2024)

    Article  Google Scholar 

  17. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  18. Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8017–8024. IEEE (2022)

    Google Scholar 

  19. Weerasinghe, K., Roodabeh, S.H.R., Hutchinson, K., Alemzadeh, H.: Multimodal transformers for real-time surgical activity prediction. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 13323–13330. IEEE (2024)

    Google Scholar 

  20. Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K., et al.: Maximum entropy inverse reinforcement learning. In: AAAI, Chicago, IL, USA, vol. 8, pp. 1433–1438 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxence Boels .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Boels, M., Robertshaw, H., Booth, T.C., Dasgupta, P., Granados, A., Ourselin, S. (2026). DARIL: When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning. In: Dou, Q., Ban, Y., Jin, Y., Bano, S., Unberath, M. (eds) Collaborative Intelligence and Autonomy in Image-Guided Surgery. COLAS 2025. Lecture Notes in Computer Science, vol 16298. Springer, Cham. https://doi.org/10.1007/978-3-032-09784-2_18

Download citation

Keywords

Publish with us

Policies and ethics