sf-diogenes-v0.2
TL;DR
- 14B-parameter continuation of the Diogenes alignment work, distilled from the v0.1 80B run so it fits on a single high-memory GPU.
- Curated Salesforce dataset (102,827 prompt/response pairs) re-tokenized with the Qwen3 chat template and optimized for pragmatic, enumerated answers.
- Trained with Unsloth’s 4-bit QLoRA stack (
rank=32) and merged back to full precision weights for drop-intransformersusage.
Release Highlights
- Smaller, faster deployment – Qwen3 14B keeps the Salesforce-specific tone while cutting VRAM requirements to ≈28 GB in bf16 (or <16 GB in 4-bit).
- Dataset hygiene carried forward – Same filtered corpus from the original dataset; no extra synthetic noise was mixed in.
- Improved instructions – Prompt template parity with v0.1 plus extra guardrails in the system message prevent the “diogenes” persona from drifting into sarcasm during troubleshooting flows.
Model Details
| Field | Value |
|---|---|
| Base | unsloth/Qwen3-14B |
| Architecture | Decoder-only transformer (Qwen3.5 variant) with rotary positional embeddings |
| Parameters | ~14B (adapter deltas merged) |
| Context length | 2,048 tokens (extendable with RoPE scaling if needed) |
| Precision | Released in bf16; training ran in 4-bit NF4 |
| Tokenizer | Qwen 152k vocab with chat template bundled in this repo |
| Finetuning method | LoRA rank 32 targeting q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Libraries | unsloth, transformers==4.45.2, trl, bitsandbytes, accelerate |
Prompt Template
Generation uses Qwen-style <|im_start|> delimiters. Prefer tokenizer.apply_chat_template, but manual formatting is simple:
<|im_start|>system
You are a pragmatic but respectful Salesforce assistant focused on actionable steps.<|im_end|>
<|im_start|>user
<question><|im_end|>
<|im_start|>assistant
Keep the conversation short (≤1,800 tokens) for best throughput.
Dataset: tilman-d/diogenes-training
| Metric | Value |
|---|---|
| Records | 102,827 JSONL rows |
| Size | 158,187,480 bytes (≈151 MiB) |
| Fields | text, prompt, completion |
| Domain | Salesforce Forecasting, Data Cloud, Commerce, CLM, Order Management, Payments |
| Structure | System primer → user question → multi-step assistant reply |
Sample row:
{
"prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
"completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
}
Observations
- English-only, jargon-heavy writing.
- Every
textfield already contains the fully formatted conversation with the Qwen template;prompt/completionare exposed to mask losses on the user side. - No paraphrasing or rejection sampling—what you see is what was fed to the model, so you should audit before sharing outside trusted environments.
Training Procedure
- Pre-tokenization – All rows were processed through the Qwen3 chat template with loss masking on user content. Max length: 2,048 tokens (records longer than that were truncated from the left).
- QLoRA fine-tuning –
unsloth4-bit loading plus LoRA rank 32 adapters over attention + MLP projections. Optimizer:adamw_8bitwith cosine LR decay. - Merge & export – LoRA deltas merged via
merge_and_unload()to produce bf16 weights for Hugging Face upload.
Hyperparameters
| Setting | Value |
|---|---|
| Devices | 1 × NVIDIA GPU (tests on A100 80 GB and RTX 6000 Ada 48 GB) |
| Effective batch size | 2 (per device) × 4 grad accumulation |
| Learning rate | 2e-4, 5 warmup steps, cosine decay |
| Max steps | 30 |
| Weight decay | 0.001 |
| Gradient checkpointing | Enabled through Unsloth |
| Dropout | LoRA dropout 0.05 |
| Precision | 4-bit NF4 (training) → bf16 (export) |
Evaluation & Observations
- Quick regression suite – 20 Salesforce-specific prompts (payments, forecasting, Data Cloud). v0.2 produced step-by-step guidance in 18/20 cases vs 15/20 on the 80B adapter when quantized to 4-bit, mostly due to the refreshed system prompt.
- Math sanity checks – Linear eqs, small-word problems: acceptable up to 2-step reasoning; anything trickier benefits from temperature ≤0.3.
- Content risks – Domain-locked training means it can hallucinate beyond Salesforce or invent deprecated feature names. Keep humans in the loop.
Formal MT-Bench / AlpacaEval scores are pending; pull requests with numbers are welcome.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tilman-d/sf-diogenes-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a pragmatic Salesforce assistant."},
{"role": "user", "content": "Outline the steps to enable Einstein for Commerce order routing."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=320,
temperature=0.3,
top_p=0.85,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference tips
- Prefer
temperaturebetween 0.2–0.4 for procedural answers; raise it for ideation. - Contexts longer than ~1,800 tokens remain functional, but throughput drops sharply—trim histories when possible.
- Quantization: 4-bit GPTQ/AWQ works; keep
group_size >= 64to preserve numerics for setup checklists.
Limitations & Responsible Use
- Not a general-purpose assistant—outside Salesforce, answers devolve quickly.
- Contains domain jargon that may fall out of date; verify against the latest release notes before presenting to customers.
- No safety alignment beyond dataset filtering. Do not use in unsupervised, high-stakes workflows.
- Always review outputs touching compliance, contracts, or payments and attribute responses back to source documentation.
Citation
@misc{sf-diogenes-v0_2,
title = {sf-diogenes-v0.2},
author = {Tilman Dietrich},
year = {2025},
url = {https://huggingface.co/tilman-d/sf-diogenes-v0.2}
}
Acknowledgements
- Downloads last month
- 66