Performance evaluation for v1.0.0 Model
3
#12 opened 14 days ago
by
woodytse
Bloody hell!! running perfectly on 3x 3090 at 160k context, speeds between 65tk/s to 30tk/s (depending on lenght) , my script:
π€
2
#11 opened about 2 months ago
by
groxaxo
Did anyone get speculative decode working?
π
1
4
#10 opened about 2 months ago
by
amit864
Successfully Running Qwen3-Next-80B-A3B-Instruct-AWQ-4bit on 3x RTX 3090s
β€οΈ
π€
4
7
#9 opened 2 months ago
by
8055izham
sorta works on vllm now
π
1
15
#8 opened 3 months ago
by
MrDragonFox
Recent update throws error: KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'
3
#7 opened 3 months ago
by
itsmebcc
gibberish still persists?
5
#6 opened 3 months ago
by
Geximus
MTP Accepted throughput always at 0.00 tokens/s
4
#5 opened 3 months ago
by
bpozdena
Experiencing excessive response latency.
π
4
#4 opened 3 months ago
by
JunHowie
Does this quantized version support running on machines like V100 and V100S?
β
1
#3 opened 3 months ago
by
ShaoShuoHe
Error on inputting lots of prompts
#2 opened 3 months ago
by
dwaynedu
Error when running in VLLM
π
2
21
#1 opened 3 months ago
by
d8rt8v