Well, if you talk about the newest AI-powered UTAU voicebanks, that’s because the developers finally thought about crossing the streams, and instead of having the singers merely pronounce syllables in several pitches, they used that data (expanded to also include several syllable clusters) to train an AI. Unlike most trained AI models, where the voice samples are recorded from live performances, so they vary in quality and on data points for each individual syllable, these have the full set of voice training data prerecorded by design, so the quality of every possible combination of phonemes is as clear as possible.
Yeah but vocaloids suck and I’ve heard ai singing recently that made me double check because they were so good.
Does this suck? To my ears, it doesn’t. Not unmistakably human by any stretch, but still pretty good. And that’s 9 years ago
And by “AI singing” do you mean “a famous voice overlaid on another singer’s performanse” or something closer to text-to-speech (text-to-song)?
I dont understand the language nor am i familiar with that style so i couldnt really judge.
Im not sure about your second point. I’ll keep this in mind and the next example i come across i will come back here to share.
Well, if you talk about the newest AI-powered UTAU voicebanks, that’s because the developers finally thought about crossing the streams, and instead of having the singers merely pronounce syllables in several pitches, they used that data (expanded to also include several syllable clusters) to train an AI. Unlike most trained AI models, where the voice samples are recorded from live performances, so they vary in quality and on data points for each individual syllable, these have the full set of voice training data prerecorded by design, so the quality of every possible combination of phonemes is as clear as possible.
That’s very interesting. Where can i read more about it?
https://dreamtonics.com/en/synthesizer-v-ai-announcement/