Yunlong Lin - Homepage

Welcome to my academic homepage. I am Yunlong Lin, a Master student at Xiamen University (XMU) @SmartDSP advised by Prof. Xinghao Ding. I previously interned at Tencent Hunyuan (Tencent HY) and ByteDance Seed. My key research areas and methodologies include:

📌 Research Highlights

My recent research focuses on VLM/LLM/Agent systems for visual creation and perception. A central theme is improving multimodal agents with stronger post-training signals, tighter plan-act-reflect loops, and scalable reward design.

Agentic VLM/LLM for end-to-end workflows: build multimodal agents that can search, reason, call tools, and execute complete creative tasks.
Reinforcement learning for multimodal decision making: optimize search and editing trajectories with evaluator feedback, self-reflection, and policy-level improvements.
Reliable multimodal perception and generation: connect restoration, 4D understanding, and synthesis so models perform better in real-world settings, not only on static metrics.

✉️ Welcome to contact me for discussions and collaborations on VLM/LLM/Agent and reinforcement learning research.

🔬 Selected Research

Open-Source Project

JarvisX-Cowork: A Personal AI Creative Assistant for End-to-End Creative Workflows

An open-source creative assistant that supports end-to-end workflows from web reference search to image generation, refinement, video creation, and polished demos, decks, or docs.

Insight: Treating creative production as a unified agent workflow is more scalable than isolated tools. A single planner with shared memory and tool interfaces significantly reduces context switching and improves final output consistency.

💻 GitHub | 🎬 Demo Video

Preprint 2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng

Insight: Image synthesis quality improves when generation is grounded in an explicit world model rather than prompt-only decoding. Unifying perception, planning, and generation inside one agent reduces mismatch between intent and rendered scenes.

📄 PDF | 🧾 arXiv | 💻 Code

Preprint 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue

Insight: Better images come from better search trajectories before generation. Reinforcement learning over agentic search decisions gives stronger global composition and semantic faithfulness than one-shot prompt tuning.

📄 PDF | 🧾 arXiv | 🌐 Project | 💻 Code

CVPR 2026 (Tencent HY)

JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization

Yunlong Lin*, Linqing Wang, Kunjie Lin, Zixu Lin*, Kaixiong Gong, Wenbo Li, Bin Lin, Zhenxi Li, Shiyi Zhang, Yuyang Peng, Wenxun Dai, Xinghao Ding3♣, Chunyu Wang†, Qinglin Lu†

Insight: A self-evolving loop between editor and evaluator provides stable long-horizon improvement for image editing agents. Reward shaping plus reflection-driven updates enables continual capability gains without handcrafting new pipelines for each task.

📄 PDF | 🌐 Project | 🤗 HF Paper | 💻 Code

NeurIPS 2025

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin*, Zixu Lin*, Kunjie Lin*, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding†, Wenbo Li^♣️, Shuicheng Yan†

Insight: High-quality artistic retouching is easier when the agent decomposes editing into interpretable subgoals and tool calls. This decomposition improves controllability for users while preserving strong aesthetic quality.

CVPR 2025

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Panwang Pan*, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding

Insight: Restoration should be optimized for downstream driving perception, not just pixel-level scores. Injecting agent-style decision signals into enhancement leads to more robust perception gains under adverse conditions.

📄 PDF | 🌐 Project | 🤗 Online Demo ｜ 💻 Code

NeurIPS 2025

DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

Insight: Physical priors are critical for temporally consistent 4D world modeling. Combining physically-aware constraints with multimodal context improves both realism and controllability in dynamic scene generation.

📄 PDF | 🌐 Project

Preprint 2025

PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Sixiang Chen, Jianyu Lai, Jialin Gao, Tian Ye, Haoyu Chen, Hengyu Shi, Shitong Shao, Yunlong Lin, Song Fei, Zhaohu Xing, Yeying Jin, Junfeng Luo, Xiaoming Wei, Lei Zhu

Insight: A unified framework can jointly handle layout planning and visual rendering instead of treating them as separate stages. This coupling yields better text-image harmony and more stable poster aesthetics.

📄 PDF | 🌐 Project | 💻 Code | 📹 Demo Video

📝 Publications

AAAI 2025

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

Yunlong Lin*, Tian Ye*, Sixiang Chen*, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding.

Insight: Training-free enhancement can still be practical when diffusion models are guided by carefully designed global and local luminance priors. This design avoids expensive retraining while maintaining real-world robustness.

📄 PDF | 🌐 Project | 💻 Code

AAAI 2025

Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

Yunlong Lin*, Zhenqi Fu*, Kairun Wen, Tian Ye, Sixiang Chen, Ge Meng, Yingying Wang, Yue Huang, Xiaotong Tu, Xinghao Ding.

Insight: Lightweight LUT-based enhancement and diffusion priors are complementary rather than conflicting. Their combination provides a strong efficiency-quality trade-off for unsupervised low-light restoration.

📄 PDF | 🌐 Project

CVPR 2025

SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization

Jianyu Lai*, Sixiang Chen*, Yunlong Lin, Tian Ye, Yun Liu, Song Fei, Zhaohu Xing, Hongtao Wu, Weiming Wang, Lei Zhu.

Insight: MLLM-driven feedback is effective for coordinating multiple restoration experts in difficult weather conditions. Multi-model feedback optimization improves desnowing quality and reduces brittle failure cases.

📄 PDF | 🌐 Project

📱 Contact

WeChat: lyl20136148
Email: linyl@stu.xmu.edu.cn

🔥 News

Our JarvisEvo has been accepted by CVPR 2026!
Our JarvisEvo: Towards Self-Evolution—the first intelligent photo retouching Agent integrating "Edit-Evaluate-Reflect" has been released! We're honored that our work coincides with (OpenAI x Adobe's new feature!)
Our JarvisArt: The first intelligent photo retouching agent seamlessly integrated with Adobe Lightroom has been accepted by NeurIPS 2025! The code and benchmark are now fully open-sourced!

💼 Experience

Jun'2025 - Dec'2025: Qingyun Intern, Tencent Hunyuan
Jan'2026 - Present: ByteDance Seed