--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen3-30B-A3B-Thinking-2507 - Qwen/Qwen3-30B-A3B-Instruct-2507 pipeline_tag: text-generation tags: - merge --- > *This is an auto-thinking-switching model built with model merging and expert substitution techniques: it answers simple questions directly, gives brief thoughts to moderate ones, and delves deeply into difficult ones.* # *Model Highlights:* - ***merge method**: `arcee_fusion`* - ***Highest precision**: `dtype: float32` + `out_dtype: bfloat16`* - ***Context length**: `262,144`&`1010000`* # *Parameter Settings*: ## *Auto-Thinking Mode* > [!NOTE] > *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.* ## *Step1: Hybrid Instruct Model and Thinking Model* *Conduct initial mixing of the instruction model and reasoning model.* ```yaml models: - model: Qwen/Qwen3-30B-A3B-Thinking-2507 merge_method: arcee_fusion base_model: Qwen/Qwen3-30B-A3B-Instruct-2507 dtype: float32 out_dtype: bfloat16 tokenizer_source: base name: Qwen3-30B-A3B-YOYO-AutoThink-preview ``` ## *Step2: Expert replacement* *Inspired by this [paper](https://arxiv.org/abs/2506.14794) , we use the following regular expression: `^model\.layers\.\d+\.mlp\.experts\.\d+\.(down_proj|gate_proj|up_proj)\.weight$` for expert replacement — all experts in Qwen3-30B-A3B-YOYO-AutoThink-preview that match the regex are replaced with those from Qwen3-30B-A3B-Thinking-2507.*