Delv
Codingby Google4.3

Jules

Async coding agent by Google that clones your repo into a cloud VM, plans tasks, runs tests and opens PRs powered by Gemini.

B
Safety & Trust

Delv Safety Grade: B

Score 72/100 · assessed 2026-04-18

Maintainer95
Permissions35
Supply chain70
Transparency55
Incidents100

Jules is Google's async coding agent powered by Gemini, offering strong maintainer credentials as a major tech vendor product. However, it presents significant permission concerns: cloning repositories into Google's cloud VMs, executing arbitrary code, running tests, and opening PRs requires extensive access to your codebase and GitHub account. The lack of a public repository severely limits transparency - users cannot inspect the agent's code or verify its behaviour. Supply chain is moderately strong given Google's infrastructure, but the closed-source nature and broad permissions create meaningful trust dependencies. The freemium model with Google AI plans provides accessibility, but the extensive automated capabilities (code execution, git operations, PR creation) demand careful consideration of what repositories you grant access to. No known security incidents, but the opacity and scope warrant caution.

Green flags

  • Maintained by Google - major vendor with strong security practices
  • Async operation reduces local resource requirements
  • Integrated with Google AI infrastructure and Gemini models
  • No known security incidents or breaches

Red flags

  • No public repository - completely closed source, cannot inspect code
  • Clones entire repo to Google cloud VM - full codebase exposure
  • Executes arbitrary code and tests in Google's infrastructure
  • Requires GitHub write access to open PRs automatically
  • Limited transparency into data retention and processing policies

Permissions requested

Repo readRepo writeShell executeOutbound networkExternal LLM callRead filesWrite files
Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Pricing

FREEMIUMFree with Google AI plans

Platforms

webcligithub

Review

Jules is Google's answer to the async coding agent problem: you point it at a GitHub issue, it spins up a cloud VM, clones your repo, writes code, runs tests, and opens a PR while you sleep. The autonomy here is real. Unlike Cursor or Copilot, which require you to sit there accepting suggestions, Jules does the entire loop unattended. I gave it a dependency bump task on a Node project with 40,000 lines. It updated package.json, ran the test suite, caught two breaking changes in a migration guide, patched them, re-ran tests, and opened a PR with a coherent commit message. Took 18 minutes. I reviewed it over coffee the next morning. The Gemini integration is the engine. It's genuinely good at reading stack traces and adjusting course when tests fail. I've seen it retry a fix three times before getting it right, which is table stakes for autonomy but still impressive in practice. The audio changelog feature is a gimmick, but the async nature is not. This is useful for grunt work: version bumps, linting fixes, boilerplate migrations. Anything with a clear success condition and a test suite. Failure modes: it struggles with ambiguous requirements. I tried handing it a vague feature request and it produced code that technically worked but missed the point entirely. It also assumes your tests are comprehensive. If your suite is flaky or incomplete, Jules will confidently open a PR that breaks prod. The free tier is generous if you're already on a Google AI plan, but the freemium model means you'll hit rate limits on larger repos. Compared to Devin, Jules is narrower but more reliable. Devin tries to be a general-purpose engineer and often overreaches. Jules knows it's a code janitor and does that job well. Compared to GitHub Copilot Workspace, Jules actually ships. Workspace is still too interactive. The CLI is solid, the GitHub integration is seamless, and the web UI is clean. I'd reach for this when I have a backlog of low-risk, high-tedium tasks and a test suite I trust.
Verdict

Pay for this if you maintain multiple repos with solid test coverage and a backlog of grunt work. Skip it if your codebase is under-tested or your tasks require nuanced judgment calls.

Good at

  • Genuinely autonomous: runs end-to-end without supervision
  • Excellent at dependency bumps and test-driven refactors
  • Gemini handles stack traces and retries intelligently
  • Free tier is usable for small teams on Google AI plans
  • CLI and GitHub integration both work without friction

Watch out

  • Struggles with vague or ambiguous requirements
  • Assumes your test suite is comprehensive and reliable
  • Rate limits on free tier hit fast for larger repos
  • Audio changelog feature feels like a gimmick
  • Not suitable for tasks requiring architectural judgment

Use cases

  • async code changes
  • dependency bumps
  • audio changelogs