FAQ & Troubleshooting ===================== This page focuses on setup, runtime and deployment problems reported by the VoxCPM community. For prompt strategy, cloning tips, and quality tuning, see :doc:`./usage_guide`. ---- Installation & Environment **************************** Triton errors on Windows ^^^^^^^^^^^^^^^^^^^^^^^^^ **Symptom:** ``Python int too large to convert to C long`` or Triton-related import failures when loading the model on Windows. **Cause:** Triton has limited Windows support, and certain PyTorch + Triton version combinations trigger known bugs. **Solutions:** 1. Install Triton for Windows from the `triton-windows `_ community project (`#36 `_). **Triton and PyTorch versions must match**: .. list-table:: :widths: 50 50 :header-rows: 1 * - PyTorch - Triton * - 2.4 / 2.5 - 3.1 * - 2.6 - 3.2 * - 2.7 - 3.3 * - 2.8 - 3.4 2. If Triton still doesn't work, you can skip ``torch.compile`` entirely by loading the model with: .. code-block:: python model = VoxCPM.from_pretrained("openbmb/VoxCPM2", optimize=False) This disables ``torch.compile`` acceleration. Inference will be slower but functionally correct. 3. For the ``Python int too large to convert to C long`` error specifically, see the `PyTorch fix `_ which involves patching ``torch_python.dll`` (`#27 `_). torchcodec / libtorchcodec errors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Symptom:** ``RuntimeError: Could not load libtorchcodec`` or ``ImportError: TorchCodec is required for load_with_torchcodec`` when using voice cloning with a reference audio. **Cause:** ``torchaudio`` (>= 2.9) uses ``torchcodec`` as its default audio backend, which requires FFmpeg to be properly installed. **Solutions:** 1. Install FFmpeg system-wide (version 4–7 supported): - **Linux:** ``sudo apt install ffmpeg`` - **macOS:** ``brew install ffmpeg`` - **Windows:** Download from `ffmpeg.org `_ and add to ``PATH`` 2. Install a compatible torchcodec version: .. code-block:: sh pip install torchcodec 3. If issues persist, force torchaudio to use the ``soundfile`` backend instead: .. code-block:: python import torchaudio torchaudio.set_audio_backend("soundfile") See also: `HuggingFace discussion on torchcodec `_ (referenced in `#86 `_, `#119 `_, `#123 `_) torch.compile errors on first run ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Symptom:** ``torch._dynamo.exc.Unsupported`` errors (often mentioning ``einops`` or ``symmetric_difference``) during the warm-up phase. **Cause:** Certain combinations of PyTorch, Triton, and einops versions have incompatibilities with ``torch.compile`` (`#19 `_). **Solutions:** 1. **Quick fix** — disable torch.compile: .. code-block:: python model = VoxCPM.from_pretrained("openbmb/VoxCPM2", optimize=False) 2. **Recommended environment** (verified on 4090 per `#19 `_): .. list-table:: :widths: 30 70 :header-rows: 1 * - Package - Version * - PyTorch - 2.5.1+ * - Triton - 3.1.0+ * - einops - 0.8.1 * - Python - 3.10–3.11 3. If you're on an older GPU with limited SMs, you may also see ``Not enough SMs to use max_autotune_gemm mode`` — this is a warning and can be ignored if inference completes successfully. Mac / MPS support ^^^^^^^^^^^^^^^^^^ **Q: Can VoxCPM run on Mac (Apple Silicon)?** Yes. VoxCPM supports CPU and MPS (Metal Performance Shaders) on Apple Silicon Macs (`#14 `_, `#20 `_, `#41 `_). - **CPU:** Works out of the box, but inference is slow (`#67 `_). - **MPS:** ``device="auto"`` will try MPS automatically when CUDA is unavailable. .. note:: The denoiser (ZipEnhancer) runs on CPU even when MPS is active. If you don't need prompt speech enhancement, set ``load_denoiser=False`` to save memory. .. important:: ``torch.backends.mps.is_available()`` only means the MPS backend exists. It does not guarantee that every VoxCPM inference path will run successfully on that backend. If you hit a runtime error on MPS, explicitly switch to CPU: .. code-block:: python model = VoxCPM.from_pretrained("openbmb/VoxCPM2", device="cpu", optimize=False) .. code-block:: sh voxcpm design --text "Hello" --device cpu --no-optimize --output out.wav WSL2 / ROCm community workaround ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Q: Can VoxCPM run on AMD ROCm under WSL2?** There is a community-reported path for WSL2 + ROCm, but it is not one of the project's primary tested environments. In the report from `#203 `_, the user needed two workarounds: 1. Monkey-patch ``torchaudio.load_with_torchcodec`` / ``save_with_torchcodec`` back to the standard ``torchaudio`` I/O functions because ``torchcodec`` was not working in that ROCm setup. 2. Disable ``torch.compile`` by setting ``optimize=False``. If you are experimenting with ROCm or other non-CUDA platforms, start by disabling ``torch.compile`` first: .. code-block:: python model = VoxCPM.from_pretrained("openbmb/VoxCPM2", optimize=False) If your environment still fails on the default device path, explicitly fall back to CPU and validate the rest of the pipeline before trying any additional acceleration settings. Python version compatibility ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ VoxCPM is most tested on **Python 3.10–3.11**. For current releases, **Python 3.10–3.12** is the recommended runtime range. Known issues: - **Python 3.14+:** Installation may fail due to dependency incompatibilities (`#176 `_). Use Python 3.10–3.12 instead. - **``No module named 'pkg_resources'``:** This happens on newer Python/setuptools versions (`#189 `_). Fix with: .. code-block:: sh pip install setuptools ---- Performance & Deployment ************************** VRAM & RTF reference ^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: 25 15 20 20 :header-rows: 1 * - Model - VRAM - RTF (official) - RTF (NanoVLLM-VoxCPM) * - VoxCPM 1.0 (0.5B) - ~5 GB - ~0.17 - ~0.10 * - VoxCPM 1.5 (0.8B) - ~6 GB - ~0.15 - ~0.08 * - VoxCPM 2 (2B) - ~8 GB - ~0.3 - ~0.13 All RTF (Real-Time Factor) values are measured with ``inference_timesteps=10`` and ``torch.compile`` enabled on a single NVIDIA RTX 4090 GPU. - **RTF (official)** — the standard inference pipeline (``VoxCPM.generate``). - **RTF (NanoVLLM-VoxCPM)** — `NanoVLLM-VoxCPM `_ high-throughput serving, measured at concurrency = 1. Related issues: `#9 `_, `#67 `_, `#105 `_. CUDA Graphs and multi-threading ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: VoxCPM with ``torch.compile`` (default) uses CUDA Graphs, which are **not compatible with multi-threading** (`#97 `_, `#107 `_, `#125 `_). Running inference from a background thread will cause ``AssertionError`` in ``cudagraph_trees``. **Solutions:** 1. **Disable torch.compile** if you need multi-threading: .. code-block:: python model = VoxCPM.from_pretrained("openbmb/VoxCPM2", optimize=False) 2. **Use NanoVLLM-VoxCPM** for concurrent serving — it handles batching and threading correctly: See :doc:`./deployment/nanovllm_voxcpm` for setup instructions. 3. **For Gradio apps** (``app.py``), limit concurrency to avoid CUDA Graph conflicts (`#97 `_): .. code-block:: python interface.queue(max_size=10, default_concurrency_limit=1).launch() vLLM / lmdeploy compatibility ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ VoxCPM is **not compatible** with standard LLM inference frameworks (vLLM, lmdeploy, etc.) because it uses a diffusion-based architecture that generates continuous audio latents rather than discrete tokens (`#6 `_, `#91 `_). For high-throughput deployment, use `NanoVLLM-VoxCPM `_ instead. ---- Still have questions? *********************** If your question isn't covered here: 1. Search the `GitHub Issues `_ — someone may have already asked. 2. Open a `new issue `_ with details about your environment, error logs, and steps to reproduce. 3. Join the community WeChat group (see the `README `_ for the QR code).