bycloud (@bycloudai) / X

bycloud

1,562 posts

bycloud

@bycloudai

I make youtube videos on cool AI research /// AI papers newsletter mail.bycloud.ai /// paper recap @TheAITimeline /// intuitiveai.academy

youtube.com/bycloudAI

Joined January 2020

bycloud
@bycloudai
Feb 27, 2025
the grok-3 benchmark is pretty useful in comparing base models, so I added GPT-4.5
8.6M
bycloud
@bycloudai
Jul 16, 2024
I got a great trailer for yall
00:00
Mistral AI
@MistralAI
Jul 16, 2024
mistral.ai/news/mathstral/ mistral.ai/news/codestral…
104K
bycloud
@bycloudai
Jun 22, 2025
didn’t know gwern’s comment has the ability to predict the future too
59K
bycloud
@bycloudai
Aug 31, 2025
imagine getting beaten by a chinese food delivering company at training an LLM (along with a banger tech report btw)
79K
bycloud
@bycloudai
Feb 27, 2025
Claude 3.7 is cool, but i still ended up using grok-3 somehow something's off about claude 3.7 and I just cant pinpoint why
159K
bycloud
@bycloudai
Jun 29, 2025
many such cases
33K
bycloud
@bycloudai
Jan 17, 2025
someone has finally done it test time compute + diffusion models a really interesting one for sure 🧵
57K
bycloud
@bycloudai
Nov 27, 2024
the 4 horsemen of OpenAI apocalypse has now been assembled
44K
bycloud
@bycloudai
Apr 14, 2025
no model is able to escape the 66% accuracy @ 120k tokens, except Gemini 2.5 Pro which sits at 90% even the new GPT-4.1 with 1 mil ctx is stuck at 60%... (please tells us your secret gemini🥺)
Fiction.live
@ficlive
Apr 14, 2025
Long Context benchmark updated with GPT-4.1. Looks like it's the "optimus" version instead of the better performing original quasar. The smaller versions are not usable in long context.
88K
bycloud
@bycloudai
Feb 27, 2025
how does DeepSeek V3 win against GPT-4.5? (NOT R1 btw) openAI claimed that GPT-4.5 is a VERY big model, yet GPT-4.5 falls short compared to DeepSeek-V3 What.
73K
bycloud
@bycloudai
Oct 21, 2024
super interesting read maybe we just need to find the rules that are class 4 equivalent when generating synthetic data to get better performance on reasoning making a video on this now😳
37K
bycloud
@bycloudai
Apr 6, 2025
what also intrigued me about this is that @ 120k context window, 2.5 pro did a 90% accuracy while no one else crossed 66% everyone else starts to fall off hard @ 4k what new attention technique did google invent??? (and why is there a sudden dip at 16k???????)
michelle
@michellechen
Apr 6, 2025
Replying to @michellechen
small tangent - people always ask about gemini context window, yeah it’s big, it probably uses some sliding window-like architecture too (don’t quote me). most notably though, google has it’s own proprietary accelerators called TPUs. much more GPU memory, so they can fit larger
55K
bycloud
@bycloudai
Jun 17, 2025
is this what anthropic did to make their non-reasoning models so good?
43K
bycloud
@bycloudai
Jun 24, 2025
after making diffusionLMs, you are telling me we are now adding U-Nets?
39K