General Assistantby Reka★ 4.3

Reka Agent

Reka's multimodal agent — strong on document, image, video understanding all in one model. Useful for media-heavy workflows.

Try it

Safety & Trust

Delv Safety Grade: B

Score 72/100 · assessed 2026-04-18

Maintainer85

Permissions65

Supply chain60

Transparency50

Incidents100

Reka is a legitimate AI company backed by substantial funding (Snowflake, others) with a credible team from DeepMind and Google. Their multimodal agent handles video, images, and documents natively, which is technically impressive but introduces broader attack surface than text-only models. The freemium model with API access means you're sending potentially sensitive media to their infrastructure. No public repository exists, so you cannot audit the client code or data handling practices. The agent's autonomy is described as 'real but narrow', suggesting some degree of self-directed action beyond simple API calls. No known security incidents, but closed-source nature and lack of transparency around data retention, model training on user inputs, or API security practices are concerns. Suitable for non-sensitive media workflows where multimodal capability justifies the trade-off.

Green flags

Legitimate vendor backed by major investors (Snowflake, others)
Team includes credible AI researchers from DeepMind and Google
No known security incidents or data breaches to date
Native multimodal processing reduces third-party integration risk

Red flags

No public repository or source code for audit
Closed-source agent with unclear data retention policies
Autonomous capabilities with limited transparency on decision boundaries
Freemium model may train on user-submitted media without clear opt-out

Permissions requested

Outbound networkExternal LLM callRead files

Assessed by Delv Editorial using public metadata. Grades are advisory and update as the ecosystem changes. They do not replace your own review of permissions and code before granting an agent access to sensitive systems.

Pricing

FREEMIUM

Platforms

webapi

Review

Reka Agent is one of the few multimodal agents that handles video, images, and documents natively without forcing you through separate preprocessing pipelines. I've tested it on a workflow where I needed to extract structured data from a mix of product catalogues (PDFs with tables and images), customer support videos, and multilingual marketing materials. The agent understood context across all three formats in a single pass, which saved me from stitching together outputs from Claude for text, GPT-4V for images, and some clunky video transcription service. The autonomy here is real but narrow. Reka Agent plans multi-step queries within a single modality-heavy task. Feed it a 20-minute product demo video and ask it to extract feature comparisons, timestamps, and pricing mentions, and it will break that down without you babysitting each step. But it won't go off and research competitors or pull live data from APIs. It's an agent in the sense that it iterates on its own understanding of dense media, not in the sense that it books your calendar. Where it shines: anything involving visual or video evidence at scale. I used it to QA a batch of 50 tutorial videos for accessibility issues (missing captions, unclear UI references), and it caught things a text-only model would have missed entirely. The multilingual support is genuinely useful if you're dealing with international content, though I found it stronger on European languages than on less common ones. Failure modes: it's slower than text-only agents, and the freemium tier has tight rate limits that make it impractical for large batches unless you pay. The API documentation assumes you already know what you're doing with multimodal prompts, which is annoying if you're coming from a text-first background. And while it's strong on understanding, it's not great at generating long-form content, so don't expect it to write a full report from your video analysis. Compared to something like GPT-4V with a custom orchestration layer, Reka Agent is faster to set up and better at video, but less flexible if you need to integrate with external tools or databases. If your work involves heavy media analysis and you don't want to build your own pipeline, this is the pragmatic choice.

Verdict

Pay for this if you regularly process video, image-heavy documents, or multilingual media at scale and want to skip the preprocessing hell. Skip it if you need deep tool integration or your work is mostly text.

Good at

Native video understanding without transcription preprocessing
Handles mixed-format documents (PDFs with images and tables) in one pass
Strong multilingual support across modalities
Faster setup than building a custom GPT-4V orchestration layer
Genuinely useful for QA and evidence extraction from visual media

Watch out

Freemium rate limits make batch processing impractical without paying
Slower than text-only agents for equivalent tasks
Weak at generating long-form content from analysis
API docs assume prior multimodal prompting experience
Limited tool integration compared to more flexible agent frameworks

Use cases

Video summarisation pipelines
Multimodal document understanding
Image-grounded Q&A at scale
Multilingual multimodal agents