evaluation-guidebook

Running

Clémentine commited on 12 days ago

Commit

f09f2a7

1 Parent(s): dca8525

removing a dumb note by claude

Files changed (1) hide show

app/src/content/chapters/troubleshooting/troubleshooting-inference.mdx CHANGED Viewed

@@ -48,21 +48,6 @@ And that's it!
 I would actually recommend using  `<memory (in GB)> = <number of parameters (in G)> * (<precision factor> * 110%)`, to be on the safer side, as inference will require a bit more memory than just loading the model (you'll also need to load the batches).
-<Note title="Estimating GPU memory requirements" emoji="💾" variant="info">
-**Quick formula:**
-`Memory (GB) = Params (billions) × Precision factor × 1.1`
-**Precision factors:**
-- float32: 4
-- float16/bfloat16: 2
-- 8-bit: 1
-- 4-bit: 0.5
-The 1.1 multiplier accounts for batch loading overhead. Example: A 7B model in float16 needs ~15.4GB (7 × 2 × 1.1).
-</Note>
 ### My model does not fit on a GPU
 ➡️ Quantization

 I would actually recommend using  `<memory (in GB)> = <number of parameters (in G)> * (<precision factor> * 110%)`, to be on the safer side, as inference will require a bit more memory than just loading the model (you'll also need to load the batches).
 ### My model does not fit on a GPU
 ➡️ Quantization