Clémentine commited on
Commit
f09f2a7
·
1 Parent(s): dca8525

removing a dumb note by claude

Browse files
app/src/content/chapters/troubleshooting/troubleshooting-inference.mdx CHANGED
@@ -48,21 +48,6 @@ And that's it!
48
 
49
  I would actually recommend using `<memory (in GB)> = <number of parameters (in G)> * (<precision factor> * 110%)`, to be on the safer side, as inference will require a bit more memory than just loading the model (you'll also need to load the batches).
50
 
51
- <Note title="Estimating GPU memory requirements" emoji="💾" variant="info">
52
-
53
- **Quick formula:**
54
- `Memory (GB) = Params (billions) × Precision factor × 1.1`
55
-
56
- **Precision factors:**
57
- - float32: 4
58
- - float16/bfloat16: 2
59
- - 8-bit: 1
60
- - 4-bit: 0.5
61
-
62
- The 1.1 multiplier accounts for batch loading overhead. Example: A 7B model in float16 needs ~15.4GB (7 × 2 × 1.1).
63
-
64
- </Note>
65
-
66
  ### My model does not fit on a GPU
67
  ➡️ Quantization
68
 
 
48
 
49
  I would actually recommend using `<memory (in GB)> = <number of parameters (in G)> * (<precision factor> * 110%)`, to be on the safer side, as inference will require a bit more memory than just loading the model (you'll also need to load the batches).
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ### My model does not fit on a GPU
52
  ➡️ Quantization
53