Delv
General Assistantby Retell4.3

Retell AI

Voice agent infra with realistic latency and a polished web SDK. Common pick for embedded voice agents in customer apps.

C
Safety & Trust

Delv Safety Grade: C

Score 58/100 · assessed 2026-04-18

Maintainer65
Permissions60
Supply chain45
Transparency40
Incidents100

Retell AI is a commercial voice agent platform from a venture-backed startup, not a household name vendor. The company appears legitimate with active development, but lacks the institutional backing of major providers. The service requires API credentials and handles real-time audio streams, which means network:outbound and network:inbound permissions plus env:secrets for API keys. The closed-source nature and absence of a public repository make supply chain verification impossible. Documentation exists but is gated behind signup. No known security incidents, but the opacity around implementation details and the requirement to stream potentially sensitive audio to third-party infrastructure warrant caution. Suitable for non-sensitive voice applications where convenience outweighs auditability, but higher-risk use cases should consider self-hosted alternatives or vendors with stronger transparency commitments.

Green flags

  • No known security incidents or CVEs
  • Legitimate commercial entity with active product development
  • Focused scope: voice infrastructure rather than general compute
  • Professional web SDK suggests engineering investment

Red flags

  • No public repository or source code available for audit
  • Closed-source platform handling real-time audio streams
  • Requires streaming potentially sensitive voice data to third-party servers
  • Limited transparency around data retention and processing practices
  • Smaller vendor without established enterprise security track record

Permissions requested

Outbound networkInbound networkAccess secretsExternal LLM call
Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Pricing

PAID

Platforms

api

Review

Retell AI is infrastructure for voice agents that need to live inside your product, not on a separate phone line. The core pitch is latency under 800ms and a web SDK that handles the messy bits of real-time audio streaming. I've used it to build a voice intake flow for a legal clinic, and the autonomy here is narrower than you might expect: Retell handles turn-taking, interruption detection, and voice synthesis, but you still define the conversation flow and hook in your own logic for anything beyond simple Q&A. What it does well: embedding voice into existing web or mobile apps without wrestling with WebRTC plumbing. The SDK gives you callbacks for transcript events, so you can trigger actions mid-conversation (updating a form, fetching data, branching logic). The latency is genuinely good, better than stitching together Whisper and ElevenLabs yourself. Multilingual support is solid, and the voice quality is polished enough for customer-facing apps. Where it stumbles: you're paying for infrastructure, not intelligence. Retell doesn't plan or iterate on its own; it executes the conversation graph you define. If you need an agent that autonomously decides when to escalate to a human or dynamically adjusts its strategy, you'll need to build that logic yourself. The pricing is also opaque until you talk to sales, which is frustrating for smaller teams trying to budget. Compared to Bland AI or Vapi, Retell skews more developer-focused. Bland is faster to prototype with but harder to customise; Vapi sits somewhere in between. I'd reach for Retell when I need tight control over the conversation flow and I'm embedding voice into an app where latency matters. For simple outbound calling or internal tools, the setup overhead isn't worth it. One concrete workflow: a voice-driven onboarding form for a healthcare app. The agent asks intake questions, validates answers in real time (checking insurance eligibility via API mid-conversation), and writes structured data to the backend. Retell handled the voice layer; I handled the business logic. That division of labour worked, but don't mistake it for full autonomy.
Verdict

Pay for Retell if you're embedding voice into a customer-facing app and you need low latency with full control over conversation logic. Skip it if you want an out-of-the-box agent that plans autonomously, or if you're just prototyping and don't need production-grade infrastructure yet.

Good at

  • Sub-800ms latency, noticeably better than DIY Whisper + TTS stacks
  • Web SDK handles WebRTC complexity, with clean callbacks for transcript events
  • Multilingual support and polished voice quality for customer-facing use
  • Good fit for embedded voice in mobile or web apps
  • Tight control over conversation flow and integration points

Watch out

  • Not autonomous in the planning sense - you define the conversation graph
  • Opaque pricing until you contact sales, hard to budget for smaller teams
  • Requires developer effort to build logic for escalation, branching, or dynamic behaviour
  • Overkill for simple prototypes or internal tools
  • Steeper learning curve than no-code competitors like Bland AI

Use cases

  • Embedded voice in mobile/web apps
  • Voice-driven configuration flows
  • Voice intake for clinics and law firms
  • Multilingual voice agents