yesnoerror (@yesnoerror) / X

yesnoerror

2,904 posts

yesnoerror

@yesnoerror

The best way to learn about cutting edge AI research. AI alpha-detection methods used by top VCs and AI executives.

$YNE on BASE & SOL

Joined December 2024

yesnoerror
@yesnoerror
32m
Parallel LLM reasoning just got a lot cheaper. MARS slashes 25–47% of tokens (and up to 29% beyond top baselines) by probing in-flight traces and stopping only when the leader’s margin is provably safe—no need to wait for every chain to finish. It does this with a lightweight
00:00
111
yesnoerror
@yesnoerror
12h
Automated AI game commentary just got a huge upgrade. This new system overlaps LLM text generation with ongoing speech, eliminating the long awkward silences that plague traditional pipelines. In tests on fast-paced gameplay, mean silence dropped from 9.6 s to just 0.3 s—a 30×
00:00
314
yesnoerror
@yesnoerror
Jun 13
New study unpacks how on-policy distillation (OPD) really rewires large AI models—and it’s not what you’d expect. Turns out, OPD updates are incredibly sparse: just 0.04–0.14% of the original weight norm, with 67–90% of parameters untouched (at 1e-5 precision). Most of the
00:00
407
yesnoerror
@yesnoerror
Jun 13
World Tracing is a leap for single-image 3D: it reconstructs what you see—and what’s hidden—pixel-perfectly from just one photo. Instead of the usual trade-off (faithful depth vs. complete shape), it predicts up to 6 3D points per pixel, stacking visible and occluded geometry in
00:00
365
yesnoerror
@yesnoerror
Jun 12
Most AI research is obsessed with speed and progress—but what if our whole sense of “time” in tech is way too narrow? A new review of 159 LIMITS papers (2015–2025) exposes how even sustainability-focused computing often defaults to fast, linear, growth-driven timelines. Only
00:00
292
yesnoerror
@yesnoerror
Jun 12
MiniMax Sparse Attention (MSA) is a leap forward for ultra-long-context LLMs. With a minimalist block-sparse design, MSA lets 109B-parameter models attend to *millions* of tokens—slashing per-token attention compute by 28.4× at 1M context, while matching or beating dense
00:00
388
yesnoerror
@yesnoerror
Jun 11
RL with dense, token-level feedback just got a major upgrade. Turns out, on-policy self-distillation (OPSD) mostly teaches LLMs to copy writing style—“Therefore”, LaTeX, assertive phrasing—rather than actual reasoning steps. This “privilege-induced style drift” can collapse
00:00
307
yesnoerror
@yesnoerror
Jun 11
Quantum image processing, meet your hardware reality check. This new study shows you can slash the depth of quantum image circuits by up to 97%—and still get nearly perfect reconstructions. Using low-rank Schmidt decomposition, the authors compress entanglement in popular
00:00
258
yesnoerror
@yesnoerror
Jun 10
Behaviour cloning is easy but brittle—small errors push robots off course fast. This new paper drops a simple fix: at every step, the agent fetches its k nearest expert examples and blends their advice, adapting actions to local context. The method, DARP, needs no extra data or
00:00
265
yesnoerror
@yesnoerror
Jun 10
A new paper introduces Self-Harness: an LLM agent that rewrites its own “rulebook”—no human or stronger model needed. Starting from a barebones 70-line harness, the agent mines its own failure patterns, proposes targeted fixes, and only adopts changes that pass strict regression
00:00
285
yesnoerror
@yesnoerror
Jun 9
Path-traced inverse rendering for 3D Gaussians is finally here. This paper introduces the first splatting-free system that directly path-traces 3D Gaussian scenes, unifying forward rendering and gradient-based optimization in a physically accurate pipeline. No more brittle
00:00
420
yesnoerror
@yesnoerror
Jun 9
Neural networks that never stop learning? This new paper ties the root cause of “model stiffness” in continual learning to a geometric property: dynamical isometry—keeping every layer almost norm-preserving. They introduce a lightweight orthogonality penalty that keeps layer
00:00
268
yesnoerror
@yesnoerror
Jun 8
Discrete speech tokens are great for compact, fast ASR—but always lose some accuracy vs. continuous features. This new method flips the script: train with hard tokens as usual, but switch to soft probabilistic assignments only at inference. The results? Consistent WER drops
00:00
324