Update README.md
Browse files
README.md
CHANGED
|
@@ -162,6 +162,55 @@ Moo Moo the cow would certinaly win.
|
|
| 162 |
- reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
|
| 163 |
- Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
|
| 164 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
## Bias, Risks, and Limitations
|
| 167 |
Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
|
|
|
|
| 162 |
- reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
|
| 163 |
- Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
|
| 164 |
|
| 165 |
+
## Inference & Recommended Settings
|
| 166 |
+
We evaluated our models on the following settings. We also recommend using them for generation:
|
| 167 |
+
- **temperature:** `0.6`
|
| 168 |
+
- **top_p:** `0.95`
|
| 169 |
+
- **max_tokens:** `32768`
|
| 170 |
+
|
| 171 |
+
### transformers Example
|
| 172 |
+
```python
|
| 173 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 174 |
+
|
| 175 |
+
model_id = "allenai/Olmo-3-7B-Think"
|
| 176 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 177 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 178 |
+
model_id,
|
| 179 |
+
device_map="auto",
|
| 180 |
+
)
|
| 181 |
+
|
| 182 |
+
prompt = "Who would in in a fight - a dinosaur of a cow named MooMoo?"
|
| 183 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 184 |
+
|
| 185 |
+
outputs = model.generate(
|
| 186 |
+
**inputs,
|
| 187 |
+
temperature=0.6,
|
| 188 |
+
top_p=0.95,
|
| 189 |
+
max_new_tokens=32768,
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
### vllm Example
|
| 196 |
+
```python
|
| 197 |
+
from vllm import LLM, SamplingParams
|
| 198 |
+
|
| 199 |
+
model_id = "allenai/Olmo-3-7B-Think"
|
| 200 |
+
llm = LLM(model=model_id)
|
| 201 |
+
|
| 202 |
+
sampling_params = SamplingParams(
|
| 203 |
+
temperature=0.6,
|
| 204 |
+
top_p=0.95,
|
| 205 |
+
max_tokens=32768,
|
| 206 |
+
)
|
| 207 |
+
|
| 208 |
+
prompt = "Who would in in a fight - a dinosaur of a cow named MooMoo?"
|
| 209 |
+
|
| 210 |
+
outputs = llm.generate(prompt, sampling_params)
|
| 211 |
+
print(outputs[0].outputs[0].text)
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
|
| 215 |
## Bias, Risks, and Limitations
|
| 216 |
Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
|