[this is audio-to-latent with img+loras, start img is AI generated - AITUBE