======================== ComfyUI-VoxCPM ======================== ComfyUI-VoxCPM is a full-featured ComfyUI custom node for VoxCPM, with TTS, voice cloning, and **in-node LoRA training** support. - Repo: `wildminder/ComfyUI-VoxCPM `_ It wraps VoxCPM into ComfyUI's visual node-based workflow, allowing users to integrate speech synthesis into their generation pipelines. .. note:: For a lighter alternative with auto-transcription, see :doc:`comfyui_voxcpmtts`. .. list-table:: Supported VoxCPM Versions :widths: 30 70 :header-rows: 0 * - VoxCPM 1.0 (0.5B) - ✅ Supported (``openbmb/VoxCPM-0.5B``, 16 kHz) * - VoxCPM 1.5 - ✅ Supported (``openbmb/VoxCPM1.5``, 44.1 kHz) * - VoxCPM 2 - ❌ Not supported Features -------- * Supports both **VoxCPM 1.5** (44.1 kHz, ~800M params) and **VoxCPM-0.5B** (16 kHz, ~640M params) * Auto-download models to ``ComfyUI/models/tts/VoxCPM/`` * **LoRA inference and training** directly in ComfyUI: - ``VoxCPM Train Config`` — configure training hyperparameters - ``VoxCPM Dataset Maker`` — prepare fine-tuning data - ``VoxCPM LoRA Trainer`` — run LoRA fine-tuning within ComfyUI * All generation parameters exposed as node inputs * Built-in denoiser (ZipEnhancer) disabled by default to keep dependencies light * Multi-device support: **CUDA**, **CPU**, **MPS** (Apple Silicon), **DirectML**, **HIP** (AMD ROCm, if available) Prerequisites ------------- * `ComfyUI `_ installed and running * PyTorch with appropriate backend (CUDA, MPS, etc.) * Model weights: `openbmb/VoxCPM1.5 `_ or `openbmb/VoxCPM-0.5B `_ (auto-downloaded) Installation ------------ * Via ComfyUI Manager: search for ``ComfyUI-VoxCPM`` and install. Manual installation: .. code-block:: bash cd ComfyUI/custom_nodes/ git clone https://github.com/wildminder/ComfyUI-VoxCPM.git pip install -r ComfyUI-VoxCPM/requirements.txt # Restart ComfyUI Models are auto-downloaded on first use. Basic usage ----------- Text-to-Speech ^^^^^^^^^^^^^^ 1. Add a **VoxCPM TTS** node from the ``audio/tts`` category 2. Connect ``text`` input with the target text to synthesize 3. Adjust ``cfg_value``, ``inference_timesteps``, and ``device`` as needed 4. Connect output to a ``Save Audio`` or ``Preview Audio`` node Voice cloning ^^^^^^^^^^^^^ 1. Add a **Load Audio** node and load your reference audio 2. Connect it to the ``prompt_audio`` input of the VoxCPM TTS node 3. Provide the **verbatim transcript** in ``prompt_text`` (not a description — must match the reference audio exactly) 4. Enter the target text to synthesize in ``text`` .. tip:: For high-quality voice clones, use clear reference audio (5–15 seconds) and ensure ``prompt_text`` is an accurate transcript. LoRA fine-tuning ^^^^^^^^^^^^^^^^ 1. Add ``VoxCPM Train Config`` → ``VoxCPM Dataset Maker`` → ``VoxCPM LoRA Trainer`` nodes 2. Configure your training data and hyperparameters 3. Run the workflow to train a LoRA adapter 4. Load the trained LoRA from ``models/loras`` (use the refresh button and ``lora_name`` dropdown) See ``readme-lora-training.md`` in the repository for detailed training instructions. Parameters ---------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Parameter - Description * - ``model_name`` - Select VoxCPM 1.5 or 0.5B * - ``cfg_value`` - Classifier-free guidance scale (default: 2.0; higher = closer to prompt, may reduce quality) * - ``inference_timesteps`` - LocDiT diffusion steps (default: 10; higher = better quality, slower) * - ``device`` - ``cuda`` / ``cpu`` / ``mps`` / ``directml`` / ``hip`` (varies by hardware) * - ``normalize_text`` - Enable external text normalization (disable for phoneme input) * - ``seed`` - Random seed for reproducibility (default: -1 for random) * - ``lora_name`` - Select a LoRA adapter from ``models/loras`` * - ``min_tokens`` / ``max_tokens`` - Token length range for generation * - ``force_offload`` - Offload model from GPU after inference to save VRAM * - ``retry_max_attempts`` - Maximum retries for bad generation outputs * - ``retry_threshold`` - Threshold for bad-case detection Troubleshooting --------------- Out of memory ^^^^^^^^^^^^^ * Set ``force_offload`` to ``True`` to release GPU memory after each generation * Switch to ``cpu`` device (slower but uses system RAM) * Use VoxCPM-0.5B instead of 1.5 for lower memory usage