baby can you also make a int8 version for us?

#10
by PussyHut - opened

πŸ‘„

You can quantize the original model on model load. Quantization doesn't really add any extra overhead or memory usage.

I only upload UINT4 SVD versions because svd takes a little bit longer to quantize and uint4 svd has the minimum file size with good quality.

I also don't want to waste a terabyte of disk space just for different quantization types of the same model. Use the original model and quantize on load.

But i can make an exception for Z-Image only for INT8 as it is still small and we can use INT8 MatMul without any bit packing or svd overhead.

Disty0 changed discussion status to closed

You can quantize the original model on model load. Quantization doesn't really add any extra overhead or memory usage.
https://huggingface.co/Disty0/Z-Image-Turbo-SDNQ-int8

love you~ and offering prebuild files is even betterβ™₯️
πŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯οΈπŸ‘„β™₯️πŸ”₯πŸ€—πŸ‘Š

Sign up or log in to comment