baby can you also make a int8 version for us?

#10

by PussyHut - opened 9 days ago

Discussion

PussyHut

9 days ago

👄

Disty0

Owner 9 days ago

•

edited 9 days ago

You can quantize the original model on model load. Quantization doesn't really add any extra overhead or memory usage.

I only upload UINT4 SVD versions because svd takes a little bit longer to quantize and uint4 svd has the minimum file size with good quality.

I also don't want to waste a terabyte of disk space just for different quantization types of the same model. Use the original model and quantize on load.

But i can make an exception for Z-Image only for INT8 as it is still small and we can use INT8 MatMul without any bit packing or svd overhead.

Disty0

Owner 9 days ago

https://huggingface.co/Disty0/Z-Image-Turbo-SDNQ-int8

Disty0 changed discussion status to closed 9 days ago

PussyHut

9 days ago

•

edited 9 days ago

You can quantize the original model on model load. Quantization doesn't really add any extra overhead or memory usage.
https://huggingface.co/Disty0/Z-Image-Turbo-SDNQ-int8

love you~ and offering prebuild files is even better♥️
👄♥️👄♥️👄♥️👄♥️👄♥️👄♥️👄♥️👄♥️👄♥️👄♥️🔥🤗👊

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment