CommunityAbandoned· 9mo★ 4.3by mberg

Kokoro TTS MCP

Converts text to MP3 using the open-weight Kokoro TTS models locally, with optional S3 upload support.

View repo

Safety & Trust

Delv Safety Grade: C

Score 58/100 · assessed 2026-04-28

Maintainer40

Permissions65

Supply chain35

Transparency70

Incidents100

Kokoro TTS MCP is a solo-maintained community server that runs open-weight text-to-speech models locally. The maintainer (mberg) appears to be an individual developer with limited public profile. Installation requires cloning the repository and running via uv, with no package registry distribution. The server writes MP3 files to the local filesystem and optionally uploads to S3, giving it moderate filesystem and network permissions. Transparency is reasonable with open source code and clear documentation of the Kokoro model integration. The local-first approach avoids sending text to third parties, which is a privacy positive. However, supply chain risk is elevated due to manual installation, no dependency pinning visible in the repository, and lack of versioned releases. No security incidents are known. Suitable for users comfortable evaluating Python dependencies and accepting solo-maintainer risk for non-critical audio generation tasks.

Lethal Trifecta (prompt-injection exposure)

CLEAR

Private dataNo

Reads secrets, credentials, private files

Untrusted inputNo

Ingests web pages, PRs, issues, emails

External commsNo

Can send data outbound

Local TTS. No I/O.

Green flags

Runs models locally, no third-party API calls for TTS
Open source with clear README and usage examples
Uses established Kokoro TTS models
Optional S3 upload keeps core functionality local-only

Red flags

Solo maintainer with minimal public track record
No package registry distribution, clone-and-run only
S3 upload feature requires AWS credentials in environment
No visible dependency pinning or lock file
No versioned releases or changelog

Permissions requested

Write filesOutbound networkRead env

Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Install

uv run mcp-tts.py

Review

Kokoro TTS MCP runs open-weight text-to-speech models locally and spits out MP3 files. It's built on the Kokoro family, which gives you a handful of voices without needing cloud APIs or subscriptions. You invoke it through Claude Desktop or similar hosts, pass in text, and get audio back. Optional S3 upload if you want to stash files somewhere permanent. I'd reach for this when I need voice output but can't or won't send text to a third-party service. Audiobook chapters, podcast script drafts, or any workflow where you're iterating on narration and want to hear it without leaving your editor. The voices are decent, not studio-grade but far better than the robotic droning of older offline TTS. You get a few personas to choose from, which helps if you're testing different tones. Setup is straightforward if you're comfortable with uv and Python. The install command fires up the server, and you add it to your Claude Desktop config like any other MCP. Once it's running, you call the text-to-speech tool, specify a voice, and wait a few seconds. Output lands as an MP3, either locally or in S3 if you've configured credentials. Quirks: it's local, so generation speed depends on your hardware. Longer passages take time, and you'll notice the lag on older machines. The S3 upload is optional but requires manual setup of AWS credentials, which isn't documented in the repo beyond a mention. If you're not already familiar with configuring S3 access, you'll need to sort that separately. The voices are limited to what Kokoro ships with, so if you need custom voice cloning or fine-tuning, this won't help. Who shouldn't bother: anyone expecting real-time conversational TTS or needing production-quality narration with emotional nuance. This is a drafting tool, not a mastering suite. If you're happy sending text to OpenAI or ElevenLabs and don't care about privacy or offline access, their APIs are faster and more polished. But if you want local TTS that doesn't leak your content and you're willing to trade some quality for control, Kokoro TTS MCP does the job without fuss.

Verdict

Install this if you need offline text-to-speech for drafting, privacy, or air-gapped workflows. Skip it if you want real-time responses or studio-quality voices. It's a solid local option that respects your data and your budget.

Good at

Runs entirely offline, so your text never leaves your machine.
No API costs or subscription fees, just local compute.
Supports multiple Kokoro voices out of the box, enough variety for most drafting needs.
Optional S3 upload for archiving or sharing output without manual file handling.

Watch out

Generation speed depends on your hardware, and longer passages can take noticeable time.
Voice quality is decent but not competitive with commercial APIs like ElevenLabs for final production.
S3 upload setup isn't documented in the repo, so you'll need to configure AWS credentials yourself.
Limited to the voices Kokoro provides, no custom voice cloning or fine-tuning.

Use cases

audiobook generation
voiceovers
podcast drafting
offline voice workflows

Getting started

1. Run `uv run mcp-tts.py` to start the server. This assumes you have uv installed and the repo cloned locally. 2. Add the server to your Claude Desktop config by editing the MCP settings JSON, pointing to the running server endpoint. 3. Test it by asking Claude to convert a short sentence to speech using one of the available Kokoro voices. You should get an MP3 file path in response. 4. If you want S3 upload, configure AWS credentials in your environment before starting the server. The repo doesn't walk you through this, so consult AWS docs if needed. 5. Watch out for generation time on longer texts. Local processing means you'll wait, especially if your machine isn't recent.

Works with

Claude DesktopClaude CodeCursor