======================== VoxCPMANE ======================== VoxCPMANE runs VoxCPM on the **Apple Neural Engine (ANE)** via CoreML, packaged as a Python server with a built-in web playground. - Repo: `0seba/VoxCPMANE `_ - CoreML Assets: `seba/VoxCPM-ANE `_ on Hugging Face .. note:: This project requires **macOS with Apple Silicon** (M1/M2/M3/M4). It does not support Intel Macs, Linux, or Windows. .. list-table:: Supported VoxCPM Versions :widths: 30 70 :header-rows: 0 * - VoxCPM 1.0 (0.5B) - ✅ Supported (source code default, 16 kHz) * - VoxCPM 1.5 - ⚠️ Announced in README as beta (``pip install --pre``); source defaults to 0.5B * - VoxCPM 2 - ❌ Not supported Features -------- * Native Apple Neural Engine acceleration via CoreML * Voice cloning with cached compiled voices * Streaming audio generation * Built-in web playground UI (Create Voice tab for uploads) * Server-side playback option * Preset sample voices included * VoxCPM 1.5 support announced in README (beta, via ``pip install --pre``; the checked-in source defaults to VoxCPM-0.5B at 16 kHz) Prerequisites ------------- * macOS with Apple Silicon (M1 or later) * Python 3.9 – 3.12 * ``uv`` package manager (recommended) * (Optional) ``pydub`` for mp3/opus/ogg/aac output formats on ``/v1/audio/speech`` (WAV and FLAC work without it) Installation ------------ .. code-block:: bash # Stable release uv pip install voxcpmane # or pip install voxcpmane # VoxCPM 1.5 beta uv pip install -U --pre voxcpmane CoreML model assets are downloaded automatically on first run via Hugging Face Hub (cached under ``~/.cache/huggingface/``). Custom voices are stored in ``~/.cache/ane_tts/``. Basic usage ----------- Start the server (default host ``0.0.0.0``, port ``8000``): .. code-block:: bash voxcpmane-server # Server starts at http://localhost:8000 # Custom options voxcpmane-server --host 127.0.0.1 --port 9000 --cache-dir /path/to/cache Open ``http://localhost:8000`` in your browser to access the web playground. Voice cloning ^^^^^^^^^^^^^ Register custom voices via the API or web UI. The API accepts a JSON body where ``prompt_wav_path`` refers to a file path **on the server filesystem**: .. code-block:: bash # Register a voice via API (JSON) curl -X POST http://localhost:8000/v1/voices \ -H "Content-Type: application/json" \ -d '{"voice_name":"my_voice","prompt_wav_path":"/path/to/reference.wav","prompt_text":"Reference transcript"}' For uploading audio files, use the **"Create Voice"** tab in the web playground at ``http://localhost:8000``. You can also drop audio files with matching ``.txt`` transcripts into the cache directory (``~/.cache/ane_tts/``); they will be compiled on startup. API reference ------------- Full API documentation is available in the repository at ``docs/API.md``. Key endpoints: * ``POST /v1/voices`` — Register a custom voice (JSON body) * ``POST /v1/audio/speech`` — Generate speech * ``POST /v1/audio/speech/stream`` — Streaming speech generation * ``GET /voices`` — List available voices * ``GET /health`` — Health check * ``POST /v1/audio/speech/playback`` — Server-side audio playback * ``POST /v1/audio/speech/cancel`` — Cancel ongoing generation Limitations ----------- * **macOS Apple Silicon only** — no cross-platform support * Long text inputs may need chunking (roadmap item) * Voice compilation can take several seconds on first use