Generate images from text descriptions and optional input images
Wan2.1-T2V-14B + Fast 4-step with NAG + Automatic Audio