FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Xie, Chengen; Sima, Chonghao; Li, Tianyu; Sun, Bin; Wu, Junjie; Hao, Zhihui; Li, Hongyang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2601.05611 (cs)

[Submitted on 9 Jan 2026 (v1), last revised 9 Mar 2026 (this version, v2)]

Title:FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Authors:Chengen Xie, Chonghao Sima, Tianyu Li, Bin Sun, Junjie Wu, Zhihui Hao, Hongyang Li

View PDF HTML (experimental)

Abstract:While Vision-Language Models (VLMs) offer rich world knowledge for end-to-end autonomous driving, current approaches heavily rely on labor-intensive language annotations (e.g., VQA) to bridge perception and control. This paradigm suffers from a fundamental mismatch between discrete linguistic tokens and continuous driving trajectories, often leading to suboptimal control policies and inefficient utilization of pre-trained knowledge. To address these challenges, we propose FLARE (Future-aware LAtent REpresentation), a novel framework that activates the visual-semantic capabilities of pre-trained VLMs without requiring language supervision. Instead of aligning with text, we introduce a self-supervised future feature prediction objective. This mechanism compels the model to anticipate scene dynamics and ego-motion directly in the latent space, enabling the learning of robust driving representations from large-scale unlabeled trajectory data. Furthermore, we integrate Group Relative Policy Optimization (GRPO) into the planning process to refine decision-making quality. Extensive experiments on the NAVSIM benchmark demonstrate that FLARE achieves state-of-the-art performance, validating the effectiveness of leveraging VLM knowledge via predictive self-supervision rather than explicit language generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2601.05611 [cs.CV]
	(or arXiv:2601.05611v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2601.05611

Submission history

From: Chengen Xie [view email]
[v1] Fri, 9 Jan 2026 08:06:44 UTC (2,176 KB)
[v2] Mon, 9 Mar 2026 09:35:48 UTC (2,982 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators