TwelveLabs' Marengo 3.0 is unlocking 80% of the world's unused data—and it's all sitting in your video archives. | The Neuron

TwelveLabs' Marengo 3.0 is unlocking 80% of the world's unused data—and it's all sitting in your video archives.

TwelveLabs launched Marengo 3.0 at AWS re:Invent 2025, enabling natural language video search with composed queries (image + text) that processes hour-long videos 30x faster than competitors, now available on Amazon Bedrock.

Written By
Grant Harvey
Grant Harvey
Dec 3, 2025
2 minute read

This week at AWS re:Invent, Corey sat down with TwelveLabs to discuss their newly launched Marengo 3.0, a video foundation model that actually understands video the way humans do—across time, space, and all modalities (visual, audio, text) simultaneously.

Check out the video below, or keep reading to learn all about it.

Here's the problem:

Traditional computer vision treats video like a flipbook, analyzing frame-by-frame images. That means AI misses what's happening between frames—the temporal relationships, audio cues, and cross-modal connections that give video its meaning. Meanwhile, 80% of the world's data sits trapped in video format (often literally on tapes in storage), unusable because there's been no good way to search or understand it.

What makes Marengo 3.0 different:

Marengo 3.0 enables natural language search across your entire video library. Instead of manually scrubbing through hours of footage or relying on manual tagging, you can search with queries like "find the moment when the player in the red jersey scores a jump shot" or "show me all segments where the mechanic points to the engine component."

Even better: it supports composed queries—combining an image plus text (like a photo of a specific player + "scored a three-pointer") to pinpoint exact moments. It handles hour-long videos without breaking, works across 50+ languages natively, and processes everything 30x faster than competitors like Amazon Nova while using 6x less storage space.

Advertisement

Real-world use cases you can't do with other models:

  • Sports analysis: Search thousands of hours of game footage for specific plays, player actions, or strategic moments without watching everything
  • Media production: Find the exact B-roll footage you need from your archive by describing what you want in natural language
  • Insurance claims: Replace tedious PDF processing by analyzing video evidence of damage, incidents, or repairs
  • Security & compliance: Identify critical events across surveillance footage without human review
  • Automotive diagnostics: Build AI mechanics that can visually identify problems and provide repair guidance

The model combines two components: Marengo (for search/retrieval with natural language queries) and Pegasus (for generating text outputs like summaries, chapters, or classifications). Both are now available on Amazon Bedrock and through TwelveLabs' APIs; you can also try them out in the playground here.

Bottom line: if your company has video archives gathering digital dust, Marengo 3.0 makes that content searchable, analyzable, and actually useful for the first time.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

The Neuron Logo

Don't fall behind on AI. Get the AI trends & tools you need to know. Join 700,000+ professionals from top companies like Microsoft, Apple, Salesforce and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.