Delv
Task AutomationActive· 5dby Significant Gravitas4.3

AutoGPT

Pioneering open-source autonomous agent platform with Forge for agent creation, AGBenchmark evaluation and a user-friendly UI.

C
Safety & Trust

Delv Safety Grade: C

Score 58/100 · assessed 2026-04-18

Maintainer65
Permissions25
Supply chain70
Transparency85
Incidents45

AutoGPT is a well-known open-source autonomous agent framework maintained by Significant Gravitas, with strong transparency through active GitHub development and comprehensive documentation. However, as an autonomous agent platform designed for task automation, it inherently requires extensive permissions including filesystem access, shell execution, network operations, and potential desktop control. The project has faced security concerns in its history, including credential exposure risks and the inherent dangers of autonomous code execution. Whilst the maintainer is established in the AI agent space, the project represents a single organisation rather than a major vendor. Supply chain is reasonable via standard package managers, but the autonomous nature and broad permission scope create significant attack surface. Suitable for experienced users who understand the risks of autonomous agents.

Green flags

  • Fully open source with active GitHub repository and community
  • Comprehensive documentation and AGBenchmark evaluation framework
  • Established project with significant community adoption since 2023
  • Standard package distribution via pip and Docker
  • Transparent development with public issue tracking and changelog

Red flags

  • Autonomous agents can execute arbitrary code with minimal human oversight
  • Historical security concerns around credential handling and API key exposure
  • Broad filesystem and shell access required for task automation features
  • Single organisation maintainer creates bus factor risk
  • Desktop control capabilities enable significant system-level access

Permissions requested

Read filesWrite filesShell executeOutbound networkAccess secretsDesktop controlBrowser controlExternal LLM call
Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Pricing

FREEMIUMFree OSS, paid cloud

Platforms

apicliweb

Review

AutoGPT was the agent that made everyone realise GPT-4 could do more than chat. I've watched it evolve from a viral GitHub repo into something resembling a platform, and the trajectory matters more than the current state. The core promise is autonomy: you give it a goal, it breaks that goal into subtasks, executes them, evaluates progress, and iterates. In practice, this works best for research-heavy workflows where you'd otherwise spend an hour clicking between tabs. I've used it to compile competitive intelligence reports, pulling data from multiple sources, cross-referencing claims, and drafting summaries. It's not flawless, but it saves me the tedium of orchestrating each step manually. Forge is the standout piece. It's a framework for building your own agents without writing boilerplate. You define skills, plug in tools, and AutoGPT handles the planning loop. AGBenchmark gives you a way to measure whether your agent is actually improving, which is rarer than it should be in this space. The UI is functional but forgettable, a web interface that feels like an afterthought compared to the CLI. Failure modes are predictable: it hallucinates tool outputs, gets stuck in loops when a task is ambiguous, and burns through tokens faster than you'd expect. I've seen it spend 20 API calls trying to parse a PDF that didn't exist. The freemium model is generous for experimentation, but the cloud version is where you get reliability and speed. Self-hosting is possible but fiddly. Compared to AgentGPT or BabyAGI, AutoGPT has more institutional weight behind it. The benchmarking suite and Forge framework give it staying power. But it's not the easiest entry point. If you want plug-and-play autonomy, something like Relevance AI or Lindy will get you there faster. AutoGPT is for developers who want to understand how agents work, not just use them. The real question is whether you need autonomy at all. For most tasks, a well-prompted ChatGPT with a few tool calls is faster and cheaper. AutoGPT earns its place when the task is genuinely multi-step, when you'd otherwise be the orchestrator, and when you're comfortable debugging when it goes sideways.
Verdict

Pay for the cloud version if you're running agents in production and need uptime. Stick with the open-source version if you're learning or building custom agents. Skip it entirely if you just want to automate a single workflow, there are simpler tools for that.

Good at

  • Forge framework makes custom agent development less painful
  • AGBenchmark provides rare, objective evaluation metrics
  • Strong open-source foundation with active development
  • Handles multi-step research tasks better than most alternatives
  • Freemium model lets you experiment before committing

Watch out

  • Token usage spirals quickly on complex tasks
  • Gets stuck in loops when goals are vague or contradictory
  • Self-hosting requires non-trivial setup and maintenance
  • UI feels underbaked compared to CLI and framework
  • Autonomy often slower than just doing the task yourself

Use cases

  • autonomous task execution
  • agent experimentation
  • workflow building