Delv
CommunityActive· 5d4.3by Michael Bailey

VoiceMode

Voice interaction server with speech-to-text, text-to-speech, and real-time voice conversations via local mic and LiveKit.

C
Safety & Trust

Delv Safety Grade: C

Score 58/100 · assessed 2026-04-28

Maintainer45
Permissions40
Supply chain65
Transparency70
Incidents100

VoiceMode is a community-maintained MCP server enabling voice interactions through speech-to-text, text-to-speech, and real-time conversations via local microphone and LiveKit. The project is maintained by a solo developer (Michael Bailey) with limited visibility into maintenance patterns. It requires OpenAI API credentials and requests significant permissions including microphone access, network connectivity to external services (OpenAI, LiveKit), and environment variable access for API keys. The supply chain is reasonably standard via uvx/PyPI distribution, though the custom installer package (voice-mode-install) adds a layer of indirection. Documentation appears adequate based on repository structure. The permissions scope is broad, combining desktop audio capture with external API calls, which presents meaningful attack surface. No known security incidents exist, but the combination of microphone access and API key handling warrants careful consideration in sensitive environments.

Lethal Trifecta (prompt-injection exposure)

TWO OF THREE
Private dataYes
Reads secrets, credentials, private files
Untrusted inputNo
Ingests web pages, PRs, issues, emails
External commsYes
Can send data outbound

Local microphone audio is private; outbound to the speech-to-text API.

Green flags

  • Open source repository allows code inspection and community review
  • Standard PyPI distribution via uvx follows Python ecosystem best practices
  • No known security incidents or malicious activity reported
  • Clear documentation of required API credentials upfront

Red flags

  • Microphone access combined with external API calls increases data exfiltration risk
  • Solo maintainer with limited public track record reduces bus factor confidence
  • Custom installer package adds supply chain complexity vs direct installation
  • Requires API key storage in environment variables without key rotation guidance
  • LiveKit integration adds third-party service dependency with unclear data handling

Permissions requested

Desktop controlOutbound networkRead envAccess secretsExternal LLM call
Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Install

uvx voice-mode-install
Env vars needed: OPENAI_API_KEY

Review

VoiceMode turns Claude into a voice assistant you can actually talk to. It wires up OpenAI's speech-to-text and text-to-speech APIs, plus LiveKit for real-time audio streaming, so you can have proper spoken conversations with Claude through your microphone. The setup is straightforward if you're already running Claude Desktop, and it works with Claude Code too. I'd reach for this when I'm debugging something complex and need to think out loud, or when I'm away from the keyboard but still want to work through a problem. The hands-free coding angle is real: you can describe what you want, hear Claude's response, and iterate without typing. It's also genuinely useful for accessibility workflows, though the reliance on OpenAI's APIs means you're paying per minute of audio processed. The LiveKit integration is what makes this more than just a dictation tool. You get proper back-and-forth conversation, not just one-shot commands. That said, you'll need an OpenAI API key with credits, and you'll burn through them faster than you might expect if you're chatty. The server handles local mic input cleanly, but you're still dependent on network latency for the speech processing. Quirks: it's a community project, so expect some rough edges. The documentation assumes you know your way around MCP server configs. If you've never set up an MCP server before, this isn't the easiest first one to try. But if you're already comfortable with the ecosystem, it's a solid addition. Who shouldn't bother: anyone hoping for offline voice processing, or anyone who finds the idea of talking to their computer awkward. This is for people who already think out loud when they code, or who genuinely need hands-free access. If you're happy typing, stick with the keyboard.
Verdict

Install this if you code by talking through problems or need hands-free access to Claude. Skip it if you're not already comfortable with MCP servers, or if you'd rather not add another API bill to your stack. It's a niche tool that does its job well for the people who need it.

Good at

  • Real-time voice conversations, not just dictation, thanks to LiveKit integration.
  • Works with both Claude Desktop and Claude Code out of the box.
  • Handles local microphone input cleanly without extra hardware.
  • Genuinely useful for hands-free coding and accessibility workflows.

Watch out

  • Requires an OpenAI API key and burns through credits during extended conversations.
  • Community project with rougher documentation than official MCP servers.
  • Network-dependent for all speech processing, so latency can interrupt flow.
  • Not the easiest first MCP server if you're new to the ecosystem.

Use cases

  • voice conversations with Claude
  • hands-free coding
  • accessibility workflows
  • voice-driven dictation

Getting started

1. Run `uvx voice-mode-install` to install the server. 2. Add your OpenAI API key to your environment variables as `OPENAI_API_KEY`. 3. The installer should configure Claude Desktop or Claude Code automatically, but verify the server appears in your MCP settings. 4. Test it by asking Claude to listen for voice input, then speak into your mic. 5. Watch your OpenAI usage: speech processing isn't free, and costs add up during long conversations.

Works with

Claude DesktopClaude Code

Similar MCPs