TwelveLabs' Marengo 3.0 is unlocking 80% of the world's unused data—and it's all sitting in your video archives.

TwelveLabs launched Marengo 3.0 at AWS re:Invent 2025, enabling natural language video search with composed queries (image + text) that processes hour-long videos 30x faster than competitors, now available on Amazon Bedrock.

Grant Harvey

July 29, 2024

This week at AWS re:Invent, Corey sat down with TwelveLabs to discuss their newly launched Marengo 3.0, a video foundation model that actually understands video the way humans do—across time, space, and all modalities (visual, audio, text) simultaneously.

Check out the video below, or keep reading to learn all about it.

Here's the problem:

Traditional computer vision treats video like a flipbook, analyzing frame-by-frame images. That means AI misses what's happening between frames—the temporal relationships, audio cues, and cross-modal connections that give video its meaning. Meanwhile, 80% of the world's data sits trapped in video format (often literally on tapes in storage), unusable because there's been no good way to search or understand it.

What makes Marengo 3.0 different:

Marengo 3.0 enables natural language search across your entire video library. Instead of manually scrubbing through hours of footage or relying on manual tagging, you can search with queries like "find the moment when the player in the red jersey scores a jump shot" or "show me all segments where the mechanic points to the engine component."

Even better: it supports composed queries—combining an image plus text (like a photo of a specific player + "scored a three-pointer") to pinpoint exact moments. It handles hour-long videos without breaking, works across 50+ languages natively, and processes everything 30x faster than competitors like Amazon Nova while using 6x less storage space.

Real-world use cases you can't do with other models:

Sports analysis: Search thousands of hours of game footage for specific plays, player actions, or strategic moments without watching everything
Media production: Find the exact B-roll footage you need from your archive by describing what you want in natural language
Insurance claims: Replace tedious PDF processing by analyzing video evidence of damage, incidents, or repairs
Security & compliance: Identify critical events across surveillance footage without human review
Automotive diagnostics: Build AI mechanics that can visually identify problems and provide repair guidance

The model combines two components: Marengo (for search/retrieval with natural language queries) and Pegasus (for generating text outputs like summaries, chapters, or classifications). Both are now available on Amazon Bedrock and through TwelveLabs' APIs; you can also try them out in the playground here.

Bottom line: if your company has video archives gathering digital dust, Marengo 3.0 makes that content searchable, analyzable, and actually useful for the first time.