Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

Papers Read on AI

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models

2022-10-16

Download 61

We propose AudioStyleGAN (ASGAN), a new generative adversarial network (GAN) for unconditional speech synthesis. As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modiﬁcation to adaptive discriminator augmentation to probabilistically skip d...

We propose AudioStyleGAN (ASGAN), a new generative adversarial network (GAN) for unconditional speech synthesis. As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modiﬁcation to adaptive discriminator augmentation to probabilistically skip discriminator updates. ASGAN achieves state-of-the-art results in unconditional speech synthesis on the Google Speech Commands dataset. 2022: Matthew Baas, H. Kamper https://arxiv.org/pdf/2210.05271v1.pdf

View more

Comments (3)

More Episodes

You may also like

Cyber Security Headlines

Babbage from The Economist

Big Technology Podcast

Software Engineering Daily

Cybersecurity Today

Techmeme Ride Home

Bloomberg Technology

Get this podcast on your phone, Free

Creat Yourt Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com