Fiction.live (@ficlive) / X

Fiction.live

449 posts

Fiction.live

@ficlive

Read and control interactive stories Talk to writers. Suggest your own ideas and debate with other fans. Vote for what happens next.

Joined November 2012

Fiction.live
@ficlive
Apr 17, 2025
OpenAI Strikes Back
346K
Fiction.live
@ficlive
Jun 5, 2025
Wow Google does it again! Gemini 2.5 Pro is super impressive. Amazing 192k result.
53K
Fiction.live
@ficlive
Jul 10, 2025
Grok 4 is at the SOTA on long context up to 192k. Gemini 2.5 Pro still edges out on 192k but Grok 4 was more consistent overall. Very very impressed, it's a GREAT model.
51K
Fiction.live
@ficlive
Apr 6, 2025
Updated Long context benchmark with Llama 4
104K
Fiction.live
@ficlive
Jun 11, 2025
o3-Pro is really good at not making mistakes at lower contexts, solid improvement overall. 192k still belongs to Gemini though.
24K
Fiction.live
@ficlive
May 6, 2025
Gemini 2.5 Pro Preview gives good results, but can't quite match the original experimental version.
97K
Fiction.live
@ficlive
Apr 14, 2025
Long Context benchmark updated with GPT-4.1. Looks like it's the "optimus" version instead of the better performing original quasar. The smaller versions are not usable in long context.
93K
Fiction.live
@ficlive
Jun 21, 2025
minimax-m1 tested on Fiction.liveBench Long Context. @minimax_ai They did it! Competitive with Gemini 2.5 Pro-preview 05-06.
12K
Fiction.live
@ficlive
Jun 19, 2025
Claude 4 Sonnet thinking close to SOTA at the testable range (thinking tokens put it over the line for 192k) But strangely opus is significantly worse. Why? Anyone else see this?
5.4K
Fiction.live
@ficlive
May 23, 2025
Expanded context length to 192k for openai models and gemini. gemini is still consistently decent even at that length, o3 falls off dramatically at 192k.
12K
Fiction.live
@ficlive
Apr 17, 2025
Replying to @ficlive
fiction.live/stories/Fictio… Will be working on an even longer context and harder eval. DM me if you wanna sponsor.
7.7K
Fiction.live
@ficlive
Apr 4, 2025
New model on @openrouter tested on long context. Long context performance is legit!
11K
Fiction.live
@ficlive
Apr 7, 2025
Replying to @_arohan_
Re-ran the bench, there was no real improvement.
17K
Fiction.live
@ficlive
Jun 21, 2022
As mentioned previously, migration started about ~5 hours ago and was expected to take around 6 hours. Unfortunately, it's going to take longer than previously expected since the database import is only at around 50%. For some reason, the import today is slower than the test. 🤷‍♂️