alphaXiv

Explore

Sign In

Blog

Labs

Feedback

Browser Extension

Ask or search anything...

What are the most popular benchmarks for math reasoning?

Alt+↵ To search

Embarrassingly Simple Self-Distillation Improves Code Generation

01 Apr 2026

Ruixiang Zhang

Richard He Bai

Huangjie Zheng

A method called Simple Self-Distillation (SSD) enables large language models to enhance their code generation performance by training exclusively on self-generated, unverified solutions. This approach improved the Qwen3-30B-Instruct model's pass@1 score on LiveCodeBench v6 from 42.4% to 55.3%, demonstrating its efficacy without external supervision or complex verification.

#computer-science #computation-and-language #fine-tuning

Paper thumbnail

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

02 Apr 2026

Xinlei Yu

Zhangquan Chen

Yongbo He

A comprehensive survey unifies the fragmented understanding of latent space in language-based models, detailing its conceptual foundation, historical evolution, technical mechanisms, emergent abilities, and future research directions. It positions latent space as a machine-native computational substrate enabling enhanced intelligence beyond explicit token-level operations.

#computer-science #artificial-intelligence #generative-models

Paper thumbnail

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

02 Apr 2026

Zhengxi Lu

Zhiyuan Yao

Jinyang Wu

The SKILL0 framework introduces In-Context Reinforcement Learning (ICRL) to enable Large Language Model (LLM) agents to internalize skills into their parameters, thereby achieving autonomous behavior without external skill descriptions at inference time. This approach yielded superior performance, such as an 87.9% success rate on ALFWorld (+9.7% over AgentOCR), while substantially reducing context token costs by over 5 times compared to skill-augmented methods.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Screening Is Enough

01 Apr 2026

Ken M. Nakanishi

Microsoft developed the AI Greenferencing framework and its Heron router, enabling the routing of AI inferencing workloads to modular data centers located at wind farms. This system leverages underutilized renewable energy and helps mitigate strain on the power grid, demonstrating a 1.8x goodput improvement and a 27% median end-to-end latency reduction for AI tasks through intelligent routing and multi-timescale resource optimization.

#attention-mechanisms #computer-science #artificial-intelligence

Paper thumbnail

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

02 Apr 2026

Ao Qu

Han Zheng

Zijian Zhou

The CORAL framework introduces an autonomous multi-agent evolutionary system, empowering Large Language Model agents to control their discovery process and collaborate indirectly through shared persistent memory. This approach achieved new state-of-the-art results on 8 out of 11 mathematical and systems optimization tasks, exhibiting a 3-10x higher improvement rate and an 18.3% cycle reduction in kernel engineering.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

02 Apr 2026

Gengsheng Li

Tianyu Yang

Junfeng Fang

Sample-Routed Policy Optimization (SRPO) unifies Group Relative Policy Optimization (GRPO) and Self-Distillation Policy Optimization (SDPO) for large language model post-training. This framework uses adaptive sample routing and entropy-aware dynamic weighting to improve early training efficiency, enhance long-horizon stability, and achieve higher peak performance on various benchmarks.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Meta-Harness: End-to-End Optimization of Model Harnesses

30 Mar 2026

Yoonho Lee

Roshen Nair

Qizheng Zhang

Meta-Harness provides an end-to-end optimization framework for LLM harnesses, the external code that dictates how models interact with their environment. The system utilizes an agentic proposer with filesystem access to uncompressed historical code and execution traces, leading to a 7.7-point accuracy improvement in text classification, a 4.7-point average gain in math reasoning, and competitive pass rates on agentic coding benchmarks.

#agents #computer-science #artificial-intelligence

Paper thumbnail

A Simple Baseline for Streaming Video Understanding

02 Apr 2026

Yujiao Shen

Shulin Tian

Jingkang Yang

SIMPLESTREAM, a minimalist baseline employing only the most recent N video frames with an off-the-shelf visual-language model, achieves state-of-the-art performance on streaming video understanding benchmarks like OVO-Bench and StreamingBench, demonstrating that complex memory mechanisms are often not necessary for competitive results.

#computer-science #computer-vision-and-pattern-recognition #model-interpretation

Paper thumbnail

Generative World Renderer

02 Apr 2026

Zheng-Hui Huang

Zhixiang Wang

Jiaming Tan

This research introduces a large-scale dataset, "Generative World Renderer," comprising over 4 million frames from AAA games with synchronized RGB and five G-buffer channels, alongside a novel VLM-based evaluation protocol. The work demonstrates that training on this dataset substantially improves inverse rendering performance and G-buffer-conditioned forward synthesis, enabling more robust material decomposition, precise geometry, and enhanced real-world game editing with complex atmospheric effects.

#computer-science #computer-vision-and-pattern-recognition #data-curation

Paper thumbnail

A Survey of On-Policy Distillation for Large Language Models

01 Apr 2026

Mingyang Song

Mao Zheng

A comprehensive survey outlines On-Policy Distillation (OPD) for Large Language Models (LLMs), introducing a unified theoretical framework based on f-divergences and a three-dimensional taxonomy for existing methods. The work details how OPD mitigates exposure bias, reducing error accumulation from quadratic to linear, and explores various methodologies and future research directions for efficient and capable LLM distillation.

#agents #computer-science #computation-and-language

Paper thumbnail

Steerable Visual Representations

02 Apr 2026

Jona Ruthardt

Manu Gaur

Deva Ramanan

SteerViT introduces a method to equip any pretrained Vision Transformer with language-steerable visual representations by integrating lightweight gated cross-attention layers for early text-visual fusion. This approach achieves a Pareto improvement by enabling fine-grained control over visual features with natural language, demonstrated by a 96.0% top-1 retrieval accuracy on the CORE benchmark, while preserving the original encoder's representation quality with only 21 million trainable parameters.

#attention-mechanisms #computer-science #artificial-intelligence

Paper thumbnail

LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model

02 Apr 2026

Jiachun Jin

Zetong Zhou

Xiao Yang

Researchers at Shanghai Jiao Tong University, Tsinghua University, and UCSD developed LatentUM, a unified model that integrates visual understanding and generation within a shared semantic latent space, eliminating pixel-space mediation. This approach enhances interleaved cross-modal reasoning for tasks like visual spatial planning and action-conditioned world modeling, achieving state-of-the-art performance on relevant benchmarks.

#computer-science #computer-vision-and-pattern-recognition #machine-learning

Paper thumbnail

Think Anywhere in Code Generation

02 Apr 2026

Xue Jiang

Tianyu Zhang

Ge Li

Peking University and Alibaba Group researchers developed THINK-ANYWHERE, a mechanism enabling Large Language Models (LLMs) to invoke on-demand, token-level reasoning during code generation, enhancing adaptability and resource efficiency. The approach achieves an average pass@1 score of 70.3% across four code generation benchmarks, representing a 9.3% absolute improvement, and also generalizes to mathematical reasoning tasks.

#agents #computer-science #machine-learning

Paper thumbnail

Understanding Transformers and Attention Mechanisms: An Introduction for Applied Mathematicians

01 Apr 2026

Michel Fabrice Serret

Michel Fabrice Serret's preprint introduces the fundamental concepts of Transformer models and attention mechanisms, tailored for applied mathematicians. It systematically formalizes text encoding, attention operations, architectural components, and discusses computational optimizations like KV caching and latent attention, preparing the audience to engage with quantitative challenges in the field.

Paper thumbnail

VOID: Video Object and Interaction Deletion

02 Apr 2026

Saman Motamed

William Harvey

Benjamin Klein

The VOID framework enables video object removal that generates physically plausible counterfactual scenarios by modeling how a scene would dynamically evolve in the absence of a target object and its interactions. It achieved 64.8% human preference and consistently high VLM-as-a-judge scores, particularly for interaction and physics realism, surpassing existing baselines.

#causal-inference #computer-science #artificial-intelligence

Paper thumbnail

ByteRover: Agent-Native Memory Through LLM-Curated Hierarchical Context

02 Apr 2026

Andy Nguyen

Danh Doan

Hoang Pham

ByteRover introduces an agent-native memory architecture where large language models directly manage and structure their own knowledge using a hierarchical, file-based Context Tree. This approach mitigates issues like semantic drift in memory-augmented generation systems and achieves 96.1% accuracy on LoCoMo, outperforming baselines by 6.2 percentage points, and 92.8% on LongMemEval-S with a median query latency of 1.6 seconds, all without relying on external databases.

#computer-science #artificial-intelligence

Paper thumbnail

ActionParty: Multi-Subject Action Binding in Generative Video Games

02 Apr 2026

Alexander Pondaven

Ziyi Wu

Igor Gilitschenski

ActionParty introduces an autoregressive video generator for interactive games that effectively binds actions to multiple individual subjects. The system achieves precise per-subject control by jointly modeling video frames with explicit subject states and employing specialized attention mechanisms, outperforming baselines in action accuracy and visual fidelity across 46 diverse game environments.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

Reinforcement Learning for Speculative Trading under Exploratory Framework

02 Apr 2026

yun-zhao495

Yun Zhao

Yun Zhao

Alex S.L. Tse

Harry Zheng

This research introduces a reinforcement learning framework for sequential optimal stopping in speculative trading, where agents learn optimal entry and exit times under general price dynamics and utility functions. The methodology employs randomized intensity controls with entropy regularization, yielding closed-form optimal policies as Gibbs distributions and providing theoretical convergence guarantees to the original problem.

#agent-based-systems #computer-science #machine-learning

Paper thumbnail

World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

02 Apr 2026

Yuejiang Liu

Fan Feng

Lingjing Kong

The World Action Verifier (WAV) framework enhances action-conditioned world models by employing a self-improving cycle that leverages forward-inverse asymmetry to identify and rectify prediction errors. This approach resulted in a 2x improvement in sample efficiency for world model learning and an 18% increase in average reward for downstream robotic manipulation policies compared to baseline methods.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

02 Apr 2026

Yanchen Wu

Tenghui Lin

Yingli Zhou

A systematic investigation into LLM agent memory architectures proposes a modular framework for analysis, conducting comprehensive evaluations of ten existing methods. This research introduces a new memory method that achieves competitive performance, outperforming strong baselines by 5.17% in overall F1 score on LONGMEMEVAL, while maintaining low computational overhead.

#agentic-frameworks #agents #computer-science

Paper thumbnail

There are no more papers matching your filters at the moment.

Advertisement