Sycophancy in LLMs isn’t just an annoyance ,it erodes honest reasoning and meaningful discourse.
We’re excited to share Beacon - a step that turns this hidden bias into a measurable, diagnosable, and ultimately fixable signal.
thrilled to be spending the summer at @SarvamAI.
their mission to build foundational ai tailored for india deeply resonates with me.
can’t wait to learn, build, and grow alongside this brilliant team!
excited to announce i’ll be spending some time at @lossfunk working on alignment via post-training, behavioral signals, and a little benchmarking magic.
let’s hit it out of the park @sxohom@itskavins!
we set out to rethink how models allocate reasoning.
today we’re releasing hecto,a modular mixture-of-experts model combining GRU and FFNN to specialize computation per input.
no supervision. no backing. just a team of undergrads building what didn’t exist.
so I… redid attention.
not for scale. not for benchmarks. just to see what happens when you give it a sense of time.
after releasing a benchmark for temporal reasoning,
i started wondering what if temporal understanding could be built in, not just emerge?
we’ve been working on this paper for almost a year and it’s finally out.
what started as a simple idea to model how long a fact stays true evolved into something much more foundational:
a benchmark to give AI a sense of time.
introducing: chronocept
over the weekend, I got tired of writing cleanup scripts every time I batch-generated outputs from LLMs.
they were always redundant - same idea phrased 10 ways.
so I built SieveAI: a tiny Flask tool powered by sbert, jaccard and frustration.
we're happy to see people jump on the chronocept bandwagon,the worlds first benchmark for temporal validity.
check the paper out :)
arxiv.org/pdf/2505.07637
ai can be smarter in a more human-like way..
we need to re-introduce those complex, temporal processing abilities that were originally left out (or simplified) in biological neurons.
Compute (used effectively), tools & understanding tricks. That last one is important.
I don’t mean tricks in the sense of being deceptive, but rather that tricky questions require a bag of tricks to solve. Anyone skilled at any given subject area knows exactly what I mean.
speech transcripts are messy.
they’re full of names, places, and timestamps hidden in noisy, code-switched text.
if left unchecked, that ends up in your models and outputs.
so i built zentrypii, a redaction layer for speech pipelines.
Introducing the Environments Hub
RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down
We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI
tough realisation that just adding temporal awareness in pre-training is sort of useless unless u can expose and fix failure modes scale can’t solve.
no longer a believer in this idea,saved myself alot of time