sf-diogenes-v0.2

TL;DR

14B-parameter continuation of the Diogenes alignment work, distilled from the v0.1 80B run so it fits on a single high-memory GPU.
Curated Salesforce dataset (102,827 prompt/response pairs) re-tokenized with the Qwen3 chat template and optimized for pragmatic, enumerated answers.
Trained with Unsloth’s 4-bit QLoRA stack (rank=32) and merged back to full precision weights for drop-in transformers usage.

Release Highlights

Smaller, faster deployment – Qwen3 14B keeps the Salesforce-specific tone while cutting VRAM requirements to ≈28 GB in bf16 (or <16 GB in 4-bit).
Dataset hygiene carried forward – Same filtered corpus from the original dataset; no extra synthetic noise was mixed in.
Improved instructions – Prompt template parity with v0.1 plus extra guardrails in the system message prevent the “diogenes” persona from drifting into sarcasm during troubleshooting flows.

Model Details

Field	Value
Base	`unsloth/Qwen3-14B`
Architecture	Decoder-only transformer (Qwen3.5 variant) with rotary positional embeddings
Parameters	~14B (adapter deltas merged)
Context length	2,048 tokens (extendable with RoPE scaling if needed)
Precision	Released in bf16; training ran in 4-bit NF4
Tokenizer	Qwen 152k vocab with chat template bundled in this repo
Finetuning method	LoRA rank 32 targeting `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Libraries	`unsloth`, `transformers==4.45.2`, `trl`, `bitsandbytes`, `accelerate`

Prompt Template

Generation uses Qwen-style <|im_start|> delimiters. Prefer tokenizer.apply_chat_template, but manual formatting is simple:

<|im_start|>system
You are a pragmatic but respectful Salesforce assistant focused on actionable steps.<|im_end|>
<|im_start|>user
<question><|im_end|>
<|im_start|>assistant

Keep the conversation short (≤1,800 tokens) for best throughput.

Dataset: `tilman-d/diogenes-training`

Metric	Value
Records	102,827 JSONL rows
Size	158,187,480 bytes (≈151 MiB)
Fields	`text`, `prompt`, `completion`
Domain	Salesforce Forecasting, Data Cloud, Commerce, CLM, Order Management, Payments
Structure	System primer → user question → multi-step assistant reply

Sample row:

{
  "prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
  "completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
}

Observations

English-only, jargon-heavy writing.
Every text field already contains the fully formatted conversation with the Qwen template; prompt / completion are exposed to mask losses on the user side.
No paraphrasing or rejection sampling—what you see is what was fed to the model, so you should audit before sharing outside trusted environments.

Training Procedure

Pre-tokenization – All rows were processed through the Qwen3 chat template with loss masking on user content. Max length: 2,048 tokens (records longer than that were truncated from the left).
QLoRA fine-tuning – unsloth 4-bit loading plus LoRA rank 32 adapters over attention + MLP projections. Optimizer: adamw_8bit with cosine LR decay.
Merge & export – LoRA deltas merged via merge_and_unload() to produce bf16 weights for Hugging Face upload.

Hyperparameters

Setting	Value
Devices	1 × NVIDIA GPU (tests on A100 80 GB and RTX 6000 Ada 48 GB)
Effective batch size	2 (per device) × 4 grad accumulation
Learning rate	2e-4, 5 warmup steps, cosine decay
Max steps	30
Weight decay	0.001
Gradient checkpointing	Enabled through Unsloth
Dropout	LoRA dropout 0.05
Precision	4-bit NF4 (training) → bf16 (export)

Evaluation & Observations

Quick regression suite – 20 Salesforce-specific prompts (payments, forecasting, Data Cloud). v0.2 produced step-by-step guidance in 18/20 cases vs 15/20 on the 80B adapter when quantized to 4-bit, mostly due to the refreshed system prompt.
Math sanity checks – Linear eqs, small-word problems: acceptable up to 2-step reasoning; anything trickier benefits from temperature ≤0.3.
Content risks – Domain-locked training means it can hallucinate beyond Salesforce or invent deprecated feature names. Keep humans in the loop.

Formal MT-Bench / AlpacaEval scores are pending; pull requests with numbers are welcome.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tilman-d/sf-diogenes-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a pragmatic Salesforce assistant."},
    {"role": "user", "content": "Outline the steps to enable Einstein for Commerce order routing."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=320,
    temperature=0.3,
    top_p=0.85,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference tips

Prefer temperature between 0.2–0.4 for procedural answers; raise it for ideation.
Contexts longer than ~1,800 tokens remain functional, but throughput drops sharply—trim histories when possible.
Quantization: 4-bit GPTQ/AWQ works; keep group_size >= 64 to preserve numerics for setup checklists.

Limitations & Responsible Use

Not a general-purpose assistant—outside Salesforce, answers devolve quickly.
Contains domain jargon that may fall out of date; verify against the latest release notes before presenting to customers.
No safety alignment beyond dataset filtering. Do not use in unsupervised, high-stakes workflows.
Always review outputs touching compliance, contracts, or payments and attribute responses back to source documentation.

Citation

@misc{sf-diogenes-v0_2,
  title  = {sf-diogenes-v0.2},
  author = {Tilman Dietrich},
  year   = {2025},
  url    = {https://huggingface.co/tilman-d/sf-diogenes-v0.2}
}

Acknowledgements

Unsloth for the memory-efficient QLoRA workflow
TRL for the SFT Trainer
Qwen team for releasing the base weights
Everyone who reviewed the Salesforce-specific prompt set

Downloads last month: 66

Safetensors

Model size

15B params

Tensor type

BF16