Vocal Synthesizer Wiki
🛠 This subject is work in progress.
Please bear with us while improvements are being made and assume good faith until the edits are complete.
For information on how to help, see the guidelines. More subjects categorized here.
🛠


DiffSinger is an open-source, AI-powered singing voice synthesis (SVS) software developed by a research team consisting of Jinglin Liu (刘静林), Chengxi Li (李成蹊), Yi Ren (任意), Feiyang Chen, and Zhou Zhao.

Concept[]

DiffSinger was proposed initially to the arXiv free distribution service as an an acoustic model for singing voice synthesis based on the diffusion probabilistic model. DiffSinger is a parameterized Markov chain that iteratively converts the noise into mel-spectrogram conditioned on the music score. By implicitly optimizing variational bound, DiffSinger can be stably trained and generate realistic outputs. To further improve the voice quality and speed up inference, the team introduced a shallow diffusion mechanism to make better use of the prior knowledge learned by the simple loss. Specifically, DiffSinger starts generation at a shallow step smaller than the total number of diffusion steps, according to the intersection of the diffusion trajectories of the ground-truth mel-spectrogram and the one predicted by a simple mel-spectrogram decoder. Besides, the team proposed boundary prediction methods to locate the intersection and determine the shallow step adaptively. The evaluations conducted on a Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work. Extensional experiments also prove the generalization of our methods on text-to-speech task (DiffSpeech).[1]

History[]

WIP

Requirements[]

  • OS: Windows, macOS or Linux

External links[]

References[]

Navigation[]