============================= ComfyUI_RH_VoxCPM ============================= ComfyUI_RH_VoxCPM is a community-maintained ComfyUI integration for VoxCPM with native **VoxCPM 2** support, multi-speaker dialogue generation, optional LoRA loading, automatic ASR for cloning workflows, and reference-audio denoising. - Repo: `HM-RunningHub/ComfyUI_RH_VoxCPM `_ - Try online: `RunningHub `_ .. note:: This project is maintained by the community, not by OpenBMB. For the other ComfyUI integrations in the ecosystem, see :doc:`comfyui_voxcpm` and :doc:`comfyui_voxcpmtts`. .. list-table:: Supported VoxCPM Versions :widths: 30 70 :header-rows: 0 * - VoxCPM 2 - ✅ Recommended. Full feature set including Ultimate Cloning and multi-speaker workflows. * - VoxCPM 1.5 - ✅ Supported through the shared loader workflow. * - VoxCPM-0.5B - ✅ Supported as a lighter-weight option. Overview -------- The integration exposes three main nodes: * **RunningHub VoxCPM Load Model** — load a local VoxCPM model directory from ``ComfyUI/models/voxcpm/`` with optional LoRA weights * **RunningHub VoxCPM Generate Speech** — a single generation node covering Voice Design, Controllable Cloning, and Ultimate Cloning * **RunningHub VoxCPM Multi-Speaker** — generate tagged multi-speaker dialogue for up to five speakers with per-speaker voice control Compared with other ComfyUI ecosystem options, this project is currently the most feature-complete visual workflow for VoxCPM 2 in the docs set: it combines native V2 support, multi-speaker generation, optional LoRA loading, auto-ASR, and ZipEnhancer-based denoising in one package. Features -------- * **Voice Design** from a text-only control instruction * **Controllable Cloning** from reference audio with optional style guidance * **Ultimate Cloning** for prompt continuation / maximum fidelity (**VoxCPM 2 only**) * **LoRA loading** from ``ComfyUI/models/voxcpm/loras/`` * **Automatic ASR** via FunASR SenseVoiceSmall when ``reference_audio_text`` is empty in Ultimate Cloning * **Reference denoising** via ZipEnhancer before cloning * **Multi-speaker dialogue** from tagged scripts such as ``[spk1]... [spk2]...`` * **Example workflows** included in the repository for both single-speaker and multi-speaker generation Prerequisites ------------- * `ComfyUI `_ installed and running * VoxCPM model files placed under ``ComfyUI/models/voxcpm/`` * SenseVoiceSmall available under ``ComfyUI/models/SenseVoice/SenseVoiceSmall/`` if you want automatic transcription * ZipEnhancer available under ``ComfyUI/models/voxcpm/speech_zipenhancer_ans_multiloss_16k_base/`` if you want reference denoising Installation ------------ Via ComfyUI Manager: * Search for ``ComfyUI_RH_VoxCPM`` and install it Manual installation: .. code-block:: bash cd ComfyUI/custom_nodes git clone https://github.com/HM-RunningHub/ComfyUI_RH_VoxCPM.git cd ComfyUI_RH_VoxCPM pip install -r requirements.txt Model layout ------------ The README documents the following layout: .. code-block:: text ComfyUI/ └── models/ └── voxcpm/ ├── VoxCPM2/ ├── VoxCPM1.5/ ├── VoxCPM-0.5B/ ├── loras/ └── speech_zipenhancer_ans_multiloss_16k_base/ SenseVoiceSmall should be placed under: .. code-block:: text ComfyUI/models/SenseVoice/SenseVoiceSmall/ Basic usage ----------- Single-speaker generation ^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Add **RunningHub VoxCPM Load Model** and choose your model directory. 2. Optionally enable ``optimize`` in the loader node. The repository defaults it to ``False``, which is a reasonable starting point for compatibility. 3. Connect the output to **RunningHub VoxCPM Generate Speech**. 4. Use one of the following modes: * **Voice Design**: set ``control_instruction`` and leave ``reference_audio`` empty * **Controllable Cloning**: provide ``reference_audio`` and keep ``ultimate_clone`` off * **Ultimate Cloning**: provide ``reference_audio`` and turn ``ultimate_clone`` on 5. Connect the resulting ``AUDIO`` output to a preview or save node. Auto ASR and denoising ^^^^^^^^^^^^^^^^^^^^^^ * In Ultimate Cloning mode, if ``reference_audio_text`` is empty, the node can auto-transcribe the reference through SenseVoiceSmall. * If ``denoise_reference`` is enabled, the node uses ZipEnhancer before generation. Multi-speaker workflow ^^^^^^^^^^^^^^^^^^^^^^ The **RunningHub VoxCPM Multi-Speaker** node accepts tagged scripts such as: .. code-block:: text [spk1]Hello there.[spk2]Hi, welcome to VoxCPM.[spk1]Let's begin. You can provide up to five speakers with separate: * reference audio inputs (``audio_1`` to ``audio_5``) * per-speaker control instructions (``control_1`` to ``control_5``) This is useful for dialogue demos, podcast-style content, or character-based audio generation inside ComfyUI. Example workflows ----------------- The repository currently ships at least two workflow examples: * a basic VoxCPM 2 workflow for single-speaker generation * a VoxCPM 2 multi-speaker workflow for tagged dialogue generation These example JSON files can be imported directly into ComfyUI. When to choose this integration ------------------------------- Choose **ComfyUI_RH_VoxCPM** when you want: * visual VoxCPM 2 workflows inside ComfyUI * multi-speaker dialogue generation * built-in auto-ASR for cloning * optional LoRA loading without leaving the ComfyUI workflow If you specifically want in-node LoRA **training**, see :doc:`comfyui_voxcpm`. If you prefer a lighter VoxCPM 1.5-focused setup with auto-transcription, see :doc:`comfyui_voxcpmtts`.