Reka Agent
Reka's multimodal agent — strong on document, image, video understanding all in one model. Useful for media-heavy workflows.
Delv Safety Grade: B
Score 72/100 · assessed 2026-04-18
Reka is a legitimate AI company backed by substantial funding (Snowflake, others) with a credible team from DeepMind and Google. Their multimodal agent handles video, images, and documents natively, which is technically impressive but introduces broader attack surface than text-only models. The freemium model with API access means you're sending potentially sensitive media to their infrastructure. No public repository exists, so you cannot audit the client code or data handling practices. The agent's autonomy is described as 'real but narrow', suggesting some degree of self-directed action beyond simple API calls. No known security incidents, but closed-source nature and lack of transparency around data retention, model training on user inputs, or API security practices are concerns. Suitable for non-sensitive media workflows where multimodal capability justifies the trade-off.
Green flags
- Legitimate vendor backed by major investors (Snowflake, others)
- Team includes credible AI researchers from DeepMind and Google
- No known security incidents or data breaches to date
- Native multimodal processing reduces third-party integration risk
Red flags
- No public repository or source code for audit
- Closed-source agent with unclear data retention policies
- Autonomous capabilities with limited transparency on decision boundaries
- Freemium model may train on user-submitted media without clear opt-out
Permissions requested
Pricing
Platforms
Review
Pay for this if you regularly process video, image-heavy documents, or multilingual media at scale and want to skip the preprocessing hell. Skip it if you need deep tool integration or your work is mostly text.
Good at
- Native video understanding without transcription preprocessing
- Handles mixed-format documents (PDFs with images and tables) in one pass
- Strong multilingual support across modalities
- Faster setup than building a custom GPT-4V orchestration layer
- Genuinely useful for QA and evidence extraction from visual media
Watch out
- Freemium rate limits make batch processing impractical without paying
- Slower than text-only agents for equivalent tasks
- Weak at generating long-form content from analysis
- API docs assume prior multimodal prompting experience
- Limited tool integration compared to more flexible agent frameworks
Use cases
- Video summarisation pipelines
- Multimodal document understanding
- Image-grounded Q&A at scale
- Multilingual multimodal agents