Qwen3 Next Thinking gguf

Make sure you have enough memory/GPU.

Use the model in ollama

First download and install ollama.

Note: the official ollama models do not have Qwen3-Next support yet. You need do the following.

Command

in windows command line (or mac os, linux), or in terminal in ubuntu, type:

ollama run hf.co/John1604/Qwen3-Next-80B-A3B-Thinking-gguf:q3_k_m

(q3_k_m is the model quant type, q3_k_s, q4_k_m, ..., can also be used)

C:\Users\developer>ollama run hf.co/John1604/Qwen3-Next-80B-A3B-Thinking-gguf-gguf:q3_k_m
pulling manifest
...
writing manifest
success

>>> Send a message (/? for help)

After you run command: ollama run hf.co/John1604/Qwen3-Next-80B-A3B-Thinking-gguf:q3_k_m, it will appear in ollama UI - you may select this model hf.co/John1604/Qwen3-Next-80B-A3B-Thinking-gguf:q3_k_m from the model list, and run it the same way as other ollama supported models.

Use the model in LM Studio

download and install LM Studio

https://lmstudio.ai/

Discover models

In the LM Studio, click "Discover" icon. "Mission Control" popup window will be displayed.

In the "Mission Control" search bar, type "John1604/Qwen3-Next-80B-A3B-Thinking-gguf" and check "GGUF", the model should be found.

Download a quantized model.

Load the quantized model.

Ask questions.

quantized models

Type	Bits	Quality	Description
Q2_K	2-bit	🟥 Low	Minimal footprint; only for tests
Q3_K_S	3-bit	🟧 Low	“Small” variant (less accurate)
Q3_K_M	3-bit	🟧 Low–Med	“Medium” variant
Q4_K_S	4-bit	🟨 Med	Small, faster, slightly less quality
Q4_K_M	4-bit	🟩 Med–High	“Medium” — best 4-bit balance
Q5_K_S	5-bit	🟩 High	Slightly smaller than Q5_K_M
Q5_K_M	5-bit	🟩🟩 High	Excellent general-purpose quant
Q6_K	6-bit	🟩🟩🟩 Very High	Almost FP16 quality, larger size
Q8_0	8-bit	🟩🟩🟩🟩	Near-lossless baseline

Downloads last month: 182

GGUF

Model size

80B params

Architecture

qwen3next

Hardware compatibility

2-bit

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for John1604/Qwen3-Next-80B-A3B-Thinking-gguf

Base model

Qwen/Qwen3-Next-80B-A3B-Thinking

Quantized

(45)

this model