Fiction.live
449 posts
Read and control interactive stories
Talk to writers. Suggest your own ideas and debate with other fans. Vote for what happens next.
Joined November 2012
- Wow Google does it again! Gemini 2.5 Pro is super impressive. Amazing 192k result.
- Grok 4 is at the SOTA on long context up to 192k. Gemini 2.5 Pro still edges out on 192k but Grok 4 was more consistent overall. Very very impressed, it's a GREAT model.
- o3-Pro is really good at not making mistakes at lower contexts, solid improvement overall. 192k still belongs to Gemini though.
- Gemini 2.5 Pro Preview gives good results, but can't quite match the original experimental version.
- Long Context benchmark updated with GPT-4.1. Looks like it's the "optimus" version instead of the better performing original quasar. The smaller versions are not usable in long context.
- minimax-m1 tested on Fiction.liveBench Long Context. @minimax_ai They did it! Competitive with Gemini 2.5 Pro-preview 05-06.
- Claude 4 Sonnet thinking close to SOTA at the testable range (thinking tokens put it over the line for 192k) But strangely opus is significantly worse. Why? Anyone else see this?
- Expanded context length to 192k for openai models and gemini. gemini is still consistently decent even at that length, o3 falls off dramatically at 192k.
- Replying to @ficlivefiction.live/stories/Fictio… Will be working on an even longer context and harder eval. DM me if you wanna sponsor.
- Replying to @_arohan_Re-ran the bench, there was no real improvement.
- As mentioned previously, migration started about ~5 hours ago and was expected to take around 6 hours. Unfortunately, it's going to take longer than previously expected since the database import is only at around 50%. For some reason, the import today is slower than the test. 🤷♂️













