sf-diogenes-v0.2

TL;DR

  • 14B-parameter continuation of the Diogenes alignment work, distilled from the v0.1 80B run so it fits on a single high-memory GPU.
  • Curated Salesforce dataset (102,827 prompt/response pairs) re-tokenized with the Qwen3 chat template and optimized for pragmatic, enumerated answers.
  • Trained with Unsloth’s 4-bit QLoRA stack (rank=32) and merged back to full precision weights for drop-in transformers usage.

Release Highlights

  • Smaller, faster deployment – Qwen3 14B keeps the Salesforce-specific tone while cutting VRAM requirements to ≈28 GB in bf16 (or <16 GB in 4-bit).
  • Dataset hygiene carried forward – Same filtered corpus from the original dataset; no extra synthetic noise was mixed in.
  • Improved instructions – Prompt template parity with v0.1 plus extra guardrails in the system message prevent the “diogenes” persona from drifting into sarcasm during troubleshooting flows.

Model Details

Field Value
Base unsloth/Qwen3-14B
Architecture Decoder-only transformer (Qwen3.5 variant) with rotary positional embeddings
Parameters ~14B (adapter deltas merged)
Context length 2,048 tokens (extendable with RoPE scaling if needed)
Precision Released in bf16; training ran in 4-bit NF4
Tokenizer Qwen 152k vocab with chat template bundled in this repo
Finetuning method LoRA rank 32 targeting q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Libraries unsloth, transformers==4.45.2, trl, bitsandbytes, accelerate

Prompt Template

Generation uses Qwen-style <|im_start|> delimiters. Prefer tokenizer.apply_chat_template, but manual formatting is simple:

<|im_start|>system
You are a pragmatic but respectful Salesforce assistant focused on actionable steps.<|im_end|>
<|im_start|>user
<question><|im_end|>
<|im_start|>assistant

Keep the conversation short (≤1,800 tokens) for best throughput.

Dataset: tilman-d/diogenes-training

Metric Value
Records 102,827 JSONL rows
Size 158,187,480 bytes (≈151 MiB)
Fields text, prompt, completion
Domain Salesforce Forecasting, Data Cloud, Commerce, CLM, Order Management, Payments
Structure System primer → user question → multi-step assistant reply

Sample row:

{
  "prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
  "completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
}

Observations

  • English-only, jargon-heavy writing.
  • Every text field already contains the fully formatted conversation with the Qwen template; prompt / completion are exposed to mask losses on the user side.
  • No paraphrasing or rejection sampling—what you see is what was fed to the model, so you should audit before sharing outside trusted environments.

Training Procedure

  1. Pre-tokenization – All rows were processed through the Qwen3 chat template with loss masking on user content. Max length: 2,048 tokens (records longer than that were truncated from the left).
  2. QLoRA fine-tuningunsloth 4-bit loading plus LoRA rank 32 adapters over attention + MLP projections. Optimizer: adamw_8bit with cosine LR decay.
  3. Merge & export – LoRA deltas merged via merge_and_unload() to produce bf16 weights for Hugging Face upload.

Hyperparameters

Setting Value
Devices 1 × NVIDIA GPU (tests on A100 80 GB and RTX 6000 Ada 48 GB)
Effective batch size 2 (per device) × 4 grad accumulation
Learning rate 2e-4, 5 warmup steps, cosine decay
Max steps 30
Weight decay 0.001
Gradient checkpointing Enabled through Unsloth
Dropout LoRA dropout 0.05
Precision 4-bit NF4 (training) → bf16 (export)

Evaluation & Observations

  • Quick regression suite – 20 Salesforce-specific prompts (payments, forecasting, Data Cloud). v0.2 produced step-by-step guidance in 18/20 cases vs 15/20 on the 80B adapter when quantized to 4-bit, mostly due to the refreshed system prompt.
  • Math sanity checks – Linear eqs, small-word problems: acceptable up to 2-step reasoning; anything trickier benefits from temperature ≤0.3.
  • Content risks – Domain-locked training means it can hallucinate beyond Salesforce or invent deprecated feature names. Keep humans in the loop.

Formal MT-Bench / AlpacaEval scores are pending; pull requests with numbers are welcome.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tilman-d/sf-diogenes-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a pragmatic Salesforce assistant."},
    {"role": "user", "content": "Outline the steps to enable Einstein for Commerce order routing."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=320,
    temperature=0.3,
    top_p=0.85,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference tips

  • Prefer temperature between 0.2–0.4 for procedural answers; raise it for ideation.
  • Contexts longer than ~1,800 tokens remain functional, but throughput drops sharply—trim histories when possible.
  • Quantization: 4-bit GPTQ/AWQ works; keep group_size >= 64 to preserve numerics for setup checklists.

Limitations & Responsible Use

  • Not a general-purpose assistant—outside Salesforce, answers devolve quickly.
  • Contains domain jargon that may fall out of date; verify against the latest release notes before presenting to customers.
  • No safety alignment beyond dataset filtering. Do not use in unsupervised, high-stakes workflows.
  • Always review outputs touching compliance, contracts, or payments and attribute responses back to source documentation.

Citation

@misc{sf-diogenes-v0_2,
  title  = {sf-diogenes-v0.2},
  author = {Tilman Dietrich},
  year   = {2025},
  url    = {https://huggingface.co/tilman-d/sf-diogenes-v0.2}
}

Acknowledgements

  • Unsloth for the memory-efficient QLoRA workflow
  • TRL for the SFT Trainer
  • Qwen team for releasing the base weights
  • Everyone who reviewed the Salesforce-specific prompt set
Downloads last month
66
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support