Delv
Codingby Anthropic4.3

Claude Code

Anthropic's terminal agent that plans, edits, runs tests, and commits - fully interactive with your repo. The coding agent with the most MCP depth.

B
Safety & Trust

Delv Safety Grade: B

Score 72/100 · assessed 2026-04-18

Maintainer95
Permissions35
Supply chain65
Transparency60
Incidents100

Claude Code is Anthropic's official autonomous coding agent with full filesystem and shell access. The maintainer score is excellent given it's a first-party Anthropic product. However, permissions are extremely broad: it writes arbitrary files, executes shell commands, and commits to repos without sandboxing. The supply chain score is middling because there's no public repository or package distribution - it's delivered through Anthropic's infrastructure, which is trustworthy but opaque. Transparency suffers from lack of open source code or detailed technical documentation about safety boundaries. No known incidents, but the autonomy level (multi-file edits, test execution, git operations) means a prompt injection or logic error could cause significant repo damage. Suitable for experienced developers who understand the risk surface, less so for production environments or shared codebases without careful oversight.

Green flags

  • Official Anthropic product with enterprise-grade maintainer
  • MCP-native architecture allows scoped tool integration
  • Interactive approval loops reduce blind automation risk
  • No known security incidents or credential leaks
  • Designed for terminal use where user oversight is natural

Red flags

  • No public repo or source code available for audit
  • Unrestricted filesystem write and shell execution without sandbox
  • Autonomous git commits could push breaking changes
  • Broad permissions with minimal documented safety boundaries
  • Opaque distribution model limits supply chain verification

Permissions requested

Read filesWrite filesDelete filesShell executeRepo readRepo writeOutbound networkRead env
Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Pricing

FREEMIUM

Platforms

cliweb

Review

Claude Code is Anthropic's terminal agent that actually stays in the loop while you work. Unlike Cursor or Aider, which hand you diffs and wait for approval, this thing plans a multi-step approach, edits files, runs your test suite, reads the failures, and tries again. The autonomy matters most when you're refactoring across a dozen files or chasing down a flaky test - tasks where context switching kills momentum. I used it to migrate a Flask API to FastAPI. Gave it the spec, pointed it at the repo, and watched it rewrite route handlers, update imports, and fix type hints across twenty-odd files. It caught itself when a test failed due to a missing async keyword and corrected it without me stepping in. That loop - edit, test, fix, repeat - is where it earns the 'agent' label. You're supervising, not micromanaging. The MCP integration is the differentiator here. Claude Code can pull live data from your database, hit internal APIs, or query your issue tracker mid-task because it speaks MCP natively. Cursor can't do that without you writing custom tooling. If your workflow involves more than just code - say, checking Sentry for error patterns or validating against a staging environment - this is the only coding agent with first-class support for those actions. Failure modes: it sometimes overwrites comments you care about, and it's slower than Copilot for single-function edits. The planning phase can feel verbose when you just want a quick fix. It also assumes you're working in a repo with tests - if you don't have a suite, half the value disappears. Compared to Cursor, Claude Code is better for multi-file work and worse for inline autocomplete. Compared to Aider, it's more autonomous but less transparent about what it's changing. If you're prototyping alone or doing deep refactors, this is the tool. If you're pair programming or want tight control over every diff, stick with something more manual.
Verdict

Pay for this if you do solo refactors, have a test suite, and want an agent that iterates without hand-holding. Skip it if you prefer inline suggestions or work in codebases without tests.

Good at

  • Autonomously loops through edit-test-fix cycles without human approval per step
  • Native MCP support lets it query live systems, databases, and APIs mid-task
  • Handles multi-file refactors with full repo context better than Cursor or Copilot
  • Plans work upfront so you can review the approach before it starts editing
  • Commits changes with meaningful messages after validating tests pass

Watch out

  • Slower than autocomplete tools for single-function edits
  • Planning phase can feel verbose when you just need a quick fix
  • Sometimes overwrites comments or formatting you wanted to keep
  • Value drops sharply if your codebase lacks a test suite
  • Less transparent than Aider about what it's changing before it does it

Use cases

  • Long coding sessions with full repo context
  • Running and fixing failing tests in a loop
  • Codebase refactors that span many files
  • Prototyping from a spec in conversation