This is the demo page for the NANSY++ unofficial implementation of the open-source repository published by MWM.

🦜 Zero-shot voice conversion

Following section showcases zero-shot voice conversion ability of backbone checkpoints trained on HifiTTS with the open-source repo (”OS repo”) and another trained on internal data (”our best”). The inferencer class was used to synthesize results. Source and target audio samples are unseen examples from VCTK corpus that can be either extracted from full dataset or found in the static sub-directory.

Source

Target

NANSY++ (OS repo)

NANSY++ (our best)

p225.wav

p226.wav

p225-to-p226-400k.wav

high-res-p225-to-p226.wav

p225.wav

p227.wav

p225-to-p227-400k.wav

high-res-p225-to-p227.wav

p225.wav

p228.wav

p225-to-p228-400k.wav

high-res-p225-to-p228.wav

p225.wav

p229.wav

p225-to-p229-400k.wav

high-res-p225-to-p229.wav

Zero-Shot Text-to-Speech

Speaker

p238

p248

p261

p326

p347

Reference Input

238_ref.wav

248_ref.wav

261_ref.wav

326_ref.wav

347_ref.wav

GT

238-gt.wav

248_gt.wav

261-gt.wav

326-gt.wav

347-gt.wav

NANSY-TTS w/ NANSY++(vctk)

238-nansy-small.wav

248-nansy-small.wav

261-nansy-small.wav

326-nansy-small.wav

347-nansy-small.wav

NANSY++ (OS repo)

238-ours.wav

248-ours.wav

261-ours.wav

326-ours.wav

347-ours.wav