Delv
Codingby Cosine4.3

Cosine

Cosine's Genie agent for software engineering — purpose-trained on real PR data. Strong on multi-step bug-fix tasks.

C
Safety & Trust

Delv Safety Grade: C

Score 58/100 · assessed 2026-04-18

Maintainer55
Permissions35
Supply chain40
Transparency35
Incidents100

Cosine Genie is a commercial autonomous coding agent from a startup (Cosine) with no public repository or open-source transparency. The agent executes multi-step workflows including writing code, running tests, and opening pull requests, which requires broad repository write access, shell execution for tests, and filesystem write permissions. The closed-source nature means you cannot audit what the agent does with your codebase or credentials. Supply chain is opaque: no public package, no verifiable build process, likely SaaS-only delivery. The company appears legitimate (professional website, paid product) but is small with unknown bus factor. No known security incidents, but the combination of autonomous operation, broad permissions, and zero code transparency creates meaningful risk for production codebases. Best suited for non-critical projects where you can tolerate potential data exposure.

Green flags

  • Purpose-trained on real PR data for practical software engineering tasks
  • Professional paid product with clear commercial backing
  • No known security incidents or credential leaks
  • Specific use case (bug fixes, refactors) with demonstrated value

Red flags

  • No public repository or source code available for audit
  • Autonomous agent with repo write and shell execute permissions
  • Closed-source SaaS with opaque data handling and security practices
  • Small vendor with unknown team size and bus factor
  • No verifiable supply chain or package distribution method

Permissions requested

Repo readRepo writeRead filesWrite filesShell executeOutbound networkExternal LLM call
Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Pricing

PAID

Platforms

webcli

Review

Cosine's Genie is trained on actual pull request data, which shows. Point it at a bug and it doesn't just suggest a fix—it writes the code, runs tests, iterates on failures, and opens a PR. The autonomy matters because you can hand off a GitHub issue at 6pm and wake up to a working branch. I've used it for gnarly multi-file refactors where the context spans five modules and three config files. It held the thread better than I expected. The sweet spot is tedious but well-scoped work: migrating a deprecated API across twenty call sites, fixing a flaky test that needs log analysis and retry logic, or backporting a feature to an older branch. Genie reads your codebase, proposes a plan, then executes without needing approval at every step. You review the final PR like you would from a junior engineer. Failure modes are predictable. If the task is vague ('make the app faster'), it flails. If your repo lacks tests, it can't verify its own work and you're back to manual QA. It also struggles with architectural decisions—don't expect it to choose between microservices and a monolith. The CLI is solid but the web interface feels like an afterthought; I spend most of my time in the terminal. Compared to Devin, Genie is narrower but more reliable. Devin tries to be a full engineering teammate; Genie is content being a very good intern. It doesn't do product thinking or talk to APIs it hasn't seen before. But for the bug-fix and refactor lane, it's faster and less likely to go off-piste. The PR training data is the differentiator—it writes code that looks like code a human would commit, not like a chatbot transcript. Pricing is steep for solo devs but reasonable for teams. You're paying for time saved, and on a three-hour refactor that Genie handles in forty minutes, the maths works. The async workflow is the real unlock: queue up tasks, let it churn overnight, review in the morning. If you're drowning in maintenance debt or shipping solo, it's worth the trial.
Verdict

Best for teams with solid test coverage who need to clear backlogs of well-defined bugs and refactors. Skip it if your work is exploratory or your repo is a mess—Genie needs structure to shine.

Good at

  • Trained on real PR data, writes idiomatic code that fits your repo's style
  • Genuine multi-step autonomy: runs tests, iterates on failures, opens PRs without hand-holding
  • Async workflow lets you queue tasks and review later, good for solo founders
  • Strong at tedious but scoped work like API migrations or flaky test fixes
  • CLI-first design fits into existing dev workflows

Watch out

  • Expensive for individual developers compared to standard AI tools
  • Struggles with vague or exploratory tasks, needs clear scope
  • Requires decent test coverage to verify its own work effectively
  • Won't make architectural decisions or handle novel integrations
  • Web interface feels underbaked compared to the CLI

Use cases

  • End-to-end bug-fix tasks
  • Long-running refactor plans
  • Engineering reviews of agent suggestions
  • Async coding work for solo founders