Vocal Synthesizer Wiki
Vocal Synthesizer Wiki
Advertisement


NNSVS, short for Neural Network Singing Voice Synthesizer, is an open source AI, neural network-based voice synthesizer created by software engineer and researcher Ryuichi Yamamoto (山本 龍一, also known by his social media handle r9y9) and is designed for research purposes. As an open source synthesizer, users are able to contribute and create voice models freely using a variety of community provided datasets, resources and tools.[3][4]

NNSVS does not have a user interface of its own, as such, a majority of users utilize the accompanying UTAU plug-in ENUNU, created by UTAU cover artist CrazY,[5] which is compatible with NNSVS voice models,

Concept[]

NNSVS, originally released on May 10, 2020, was introduced as a PyTorch-based research singing voice synthesis software. This project aimed to create an open source library for developing a singing voice synthesis engine for research purposes. Creator Ryuichi Yamamoto envisioned to create a singing voice synthesis engine with NEUTRINO level quality in a field with few open-source tools, and aimed to make a user-friendly tool.

The first edition of the software used the freely available Tohoku Kiritan voice dataset and after reaching a point in development, could create a parametric singing voice synthesizer (SVS), similar to the contents of Sinsy. NNSVS uses MusicXML as the input method and audio waveform is the output. The singing voice synthesis system that was created consists of three trainable models: a time-lag model, a phoneme continuation length model, and an acoustic model. Music and linguistic features were extracted with Sinsy , and WORLD, a high-quality speech analysis, manipulation and synthesis system, is used for speech analysis and synthesis.[6]

Features[]

  • Open-source: NNSVS is fully open-source. Users can create their own voice models with their own dataset.
  • Multiple languages: NNSVS has been used for creating singing voice synthesis (SVS) systems for multiple languages by VocalSynth communities (eight or more of which are known).
  • Research friendly: NNSVS comes with reproducible Kaldi/ESPnet-style recipes. Users can use NNSVS to create baseline systems for research.

Demonstrations[]

Demonstrations for NNSVS were hosted on the official website for NNSVS.[7]

NNSVS Voice Models[]

Released[]

ENUNU Voice Models[]

Released[]

Demonstration Models[]

Note: This is not a comprehensive list, due to user creation capabilities, the total number of NNSVS libraries is indefinite. See the Publicly Available NNSVS Voice Models page for additional releases on the NNSVS Practical Guide created by xuu.


External links[]

Official[]

Articles[]

References[]

Navigation[]

Advertisement