baby can you also make a int8 version for us?
π
You can quantize the original model on model load. Quantization doesn't really add any extra overhead or memory usage.
I only upload UINT4 SVD versions because svd takes a little bit longer to quantize and uint4 svd has the minimum file size with good quality.
I also don't want to waste a terabyte of disk space just for different quantization types of the same model. Use the original model and quantize on load.
But i can make an exception for Z-Image only for INT8 as it is still small and we can use INT8 MatMul without any bit packing or svd overhead.
You can quantize the original model on model load. Quantization doesn't really add any extra overhead or memory usage.
https://huggingface.co/Disty0/Z-Image-Turbo-SDNQ-int8
love you~ and offering prebuild files is even betterβ₯οΈ
πβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπβ₯οΈπ₯π€π