“The doom lies in yourself, not in your name.”
Continuation of Wur doomed!.
For longer text chunks or stories, https://pastebin.com works great and helps prevent the thread from slowing down!
🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧
🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛⬛🟧
🟧🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧🟧
⬜🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛🟧⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧⬛🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧⬛⬛⬛⬛🟧⬛⬛⬛⬛🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧⬛⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧⬛⬛⬛⬛🟧🟧🟧⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛🟧🟧🟧⬛⬛🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛⬛⬛⬛🟧🟧⬜🟧🟧⬛⬛⬛⬛⬛⬛🟧🟧🟧⬛⬛⬛⬛⬛⬛🟧🟧⬜🟧⬛⬛🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛⬛⬛🟧🟧⬜⬜⬜🟧🟧⬛⬛⬛⬛🟧🟧⬜🟧🟧⬛⬛⬛⬛🟧🟧⬜⬜🟧🟧⬛🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛⬛🟧🟧⬜⬜⬜⬜⬜🟧🟧⬛⬛🟧🟧⬜⬜⬜🟧🟧⬛⬛🟧🟧⬜⬜⬜⬜🟧🟧🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜🟧🟧🟧🟧⬜⬜⬜⬜⬜🟧🟧🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧🟧⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧⬛⬛🟧⬜
⬜🟧⬛⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧🟧⬛🟧⬜
⬜🟧⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧⬛🟧⬜
⬜🟧🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧🟧🟧⬜
The doom is still buried within Command-A for sure.
A step 601 preview - all with temperature = 0:
- It's still messing up some end of lines, but I can live with that if it works... Likely can be fixed later using the new
class 0random data if a problem. - The Grimdark story was noticeably (much!) better compared to the inverse.
- The Battlestar Galactica story showed that even though
Q8_0,F16andBF16all diverge slightly fromF32; it's not clearly making them any worse (I actually liked theQ8_0story best!).
| Size | Name |
|---|---|
| 287M | command-a-03-2025-lora-Q8_0.ggu |
| 541M | command-a-03-2025-lora-F16.gguf |
| 541M | command-a-03-2025-lora-BF16.gguf |
| 1.1G | command-a-03-2025-lora-F32.gguf |
It still has a way to go before it starts to converge, but I would think by step 1000 it will be pretty close:
566 responses in previous thread! In the future we may be the reason for hf staff to implement multi-page view of discussions.
This was posted on Hacker News today:
Absolutely fascinating!
This was posted on Hacker News today:
Absolutely fascinating!
That was really cool. Thanks for sharing!
This was posted on Hacker News today:
Absolutely fascinating!
That was really cool. Thanks for sharing!
Yeah, and llama-3.1:405b doing so well was quite a surprise too (and makes you a bit sad everything seems to be moving away from large dense models ).
I've also been playing with Olmo 2 2503 32B at the pretraining checkpoint and have been getting some funny/surprising plot lines which go in random directions compared to most of the flat stories I'm used to seeing. Anyway while the nameslop results are going up here's my own comparison with end-of-stage-1 Olmo 3.
it associates them immediately with good guy protaganist who survives no matter what and when name is changed it treats it as an NPC with flaws.
I guess I'll change my name to Elara next time I'm applying for jobs :)
Largestral 3 base is a "true" base model:
I wish they'd release the 2407 base model.
they may have tried to make it a thinking model, but failed.
The rumors are that they distilled from Deepseek / couldn't get their RL pipeline working. That would explain it, along with the Nemotron-level name distribution.
I'm not going to bother with this one, I don't need another >700B markdown generator.
I've been using the GLM-4-32B base thanks to your benchmarks btw
Interesting paper linked on Reddit:
https://arxiv.org/abs/2512.05117
Pretty sure I've read something similar about a year ago too though.
Didn't do too well on my test, mistral peaked with Largestral 2407 here:
I wish they'd release the 2407 base model.
Yeah! I think they were probably worried at the time that some other company would train a better finetune like Microsoft did with wizard-lm-2.
Overall, I'm finding all the recent model releases to get worse and worse at creative writing (baring kimi-k2)😀
- The Western models all seem to not want to risk using copyright material for pre-training.
- The Chinese models are all-in on benchmaxxing for STEM tasks or agenic coding.
The dreaded "not x, but y" slop phrase seems way more painful than the old "shivers down spine" or nameslop to get rid of... 😟
The dreaded "not x, but y" slop phrase seems way more painful than the old "shivers down spine" or nameslop to get rid of... 😟
Yes, this is so much worse. I believe it was someone in this thread (or the previous one) who referred to this as "structural slop".
And it can't reliably be prompted-away. Largestral-2407/2411 simply don't do it.
I think they were probably worried at the time that some other company would train a better finetune like Microsoft did with wizard-lm-2.
Wouldn't their mrl license protect them from that?
A new slop phrase I've noticed is the constant use of "a beat." really irritating.
I'm pretty gutted about Mistral 3 large. I was excited when I saw it was released given how good Miqu and Mistral Large were.
A new slop phrase I've noticed is the constant use of "a beat." really irritating.
I first noticed that with the first "Kimi-K2" 1T model. Only seen it a few times, I have no idea what it's supposed to mean lol
A new slop phrase I've noticed is the constant use of "a beat." really irritating.
I first noticed that with the first "Kimi-K2" 1T model. Only seen it a few times, I have no idea what it's supposed to mean lol
Its on the new Deepseek, GLM and Mistral. The new Mistral is riddled with it though. I think its supposed to be a dramatic pause lmao



