Everything OpenAI Released on DevDay 2025, Explained

OpenAI's DevDay 2025 unveiled six major releases: Apps SDK for building native ChatGPT apps, AgentKit with visual Agent Builder and ChatKit for deploying multi-agent workflows, Codex generally available with Slack integration and GPT-5-Codex model, plus GPT-5 Pro, gpt-realtime-mini, and Sora 2 API—transforming how developers build AI applications with tools they claim let Codex build 80% of Agent Builder in under 6 weeks.

Grant Harvey

July 29, 2024

On October 6, 2025, OpenAI dropped a tsunami of developer tools that fundamentally changes how people build AI applications. We're talking about an AI coding agent that wrote 80% of its own product, a visual workflow builder for creating multi-agent systems, and video models that finally feel production-ready.

Here's the wild part: Codex—OpenAI's new AI coding agent—built 80% of the Agent Builder tool in under 6 weeks. If you've noticed the cadence of AI releases accelerating lately, this is why. The tools are now building themselves.

Let's break down everything that shipped:

Codex: Your AI Pair Programmer (Now Generally Available)

The big idea: Codex is an AI agent that codes alongside you—in your terminal, your IDE, or in the cloud. It reads your codebase, writes code, runs tests, and even reviews pull requests.

Three ways to use it:

Codex CLI: Run it directly from your terminal on Mac, Linux, or Windows (via WSL)
IDE Extension: Install it in VS Code, Cursor, or Windsurf for inline coding assistance
Codex Cloud: Delegate entire features to Codex agents that work in background cloud containers

New features that matter:

Slack integration: Tag @Codex in a Slack thread, and it'll automatically pick up context, spin up an environment, and complete the task. Get a link to review changes in Codex Cloud.
Codex SDK: Embed the same agent that powers the CLI into your own workflows with just a few lines of code
Admin tools: ChatGPT workspace admins can now monitor usage, enforce environment controls, and track code review quality across teams

Real results:

Inside OpenAI, nearly all engineers now use Codex (up from half in July). They're merging 70% more pull requests per week, and Codex automatically reviews almost every PR to catch issues before production.

Companies like Cisco cut code review times by 50%. Instacart uses the Codex SDK with their internal agent platform to automatically clean up tech debt like dead code and expired experiments.

Pricing: Included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. Starting October 20th, Codex Cloud tasks count toward your usage limits.

How it actually works: Codex runs on GPT-5-Codex, a model specifically optimized for agentic coding. You can also use it with GPT-5 for complex reasoning tasks.

AgentKit: Build Multi-Agent Workflows Without Writing Code

The setup: Until now, building agent workflows meant juggling fragmented tools—complex orchestration with no versioning, custom connectors, manual eval pipelines, and weeks of frontend work.

The solution: AgentKit is OpenAI's complete toolkit for building, deploying, and optimizing agents. Three main pieces:

1. Agent Builder

A visual canvas where you drag-and-drop nodes to create multi-agent workflows. Think: Zapier meets LLM orchestration.

What you can do:

Connect multiple AI agents that work together.
Add logic branches (if/else, routing).
Integrate tools (web search, file search, code interpreter).
Set up guardrails to detect prompt injection, mask PII, and prevent jailbreaks.
Preview runs with live data.
Version control your workflows.
Export to code when ready.

Templates included: Customer service automation, research agents, support bots, and more. LY Corporation built a work assistant agent in under two hours using Agent Builder.

2. Connector Registry

A central admin panel where enterprises can manage data sources across multiple workspaces. Pre-built connectors for Dropbox, Google Drive, SharePoint, Microsoft Teams, plus third-party MCPs (Model Context Protocol).

3. ChatKit

Drop-in UI components for embedding chat-based agents into your product. No need to build streaming responses, thread management, or thinking indicators from scratch.

Who's using it: Canva saved two weeks of dev time building a support agent. HubSpot deployed a customer support agent. Ramp went from blank canvas to a working buyer agent in "just a couple of hours."

How to use ChatKit:

Create an agent workflow in Agent Builder (get a workflow ID)
Set up a backend endpoint to generate ChatKit session tokens
Install @openai/chatkit-react and embed the chat component
Pass in your workflow ID—done

Full ChatKit integration guide here.

Evals: Now With Datasets, Trace Grading, and Prompt Optimization

OpenAI massively upgraded their Evals platform to help developers measure and improve agent performance:

New capabilities:

Datasets: Build eval sets from scratch, expand them over time with automated graders and human annotations.
Trace grading: Run end-to-end assessments of agentic workflows, automate grading to pinpoint weaknesses.
Automated prompt optimization: Generate improved prompts based on human feedback and grader outputs.
Third-party model support: Evaluate models from other providers within OpenAI's platform.

Real-world impact: Carlyle (the investment firm) cut development time on their multi-agent due diligence framework by 50% and increased agent accuracy by 30%.

How it works: Follow the Evals quickstart to create eval configs, upload test data, run eval runs, and analyze results programmatically via API or in the dashboard.

New Models Across the Board

GPT-5 Pro

The smartest version of GPT-5 that uses more compute to "think harder" and deliver consistently better answers. Only available via the Responses API (not Chat Completions) to support multi-turn thinking before responding.

Pricing: $15/1M input tokens, $120/1M output tokens.
Context: 400K tokens.
Max output: 272K tokens.
Best for: Tough problems that justify the cost (complex reasoning, detailed analysis, code architecture).

gpt-realtime-mini

Cost-efficient version of GPT Realtime for audio and text interactions over WebRTC, WebSocket, or SIP connections.

Pricing: $0.6/1M input tokens, $2.4/1M output tokens.
Context: 32K tokens.
Best for: Voice agents, real-time customer support.

gpt-image-1-mini

Cost-efficient image generation model that accepts text and image inputs, produces image outputs.

Pricing: $2/1M input tokens, $8/1M output tokens.
Image generation: Starting at $0.005 per 1024x1024 low-quality image.

Sora 2: Video Generation Goes Production-Ready

OpenAI made video generation feel like a real tool, not a demo. Sora 2 via the API now includes:

Two model options:

sora-2: Fast iteration model for experimenting with tone and style.
sora-2-pro: Higher-quality production model (takes longer, costs more).

New features:

Image input support: Upload a reference image to lock in character design, wardrobe, or aesthetic. Sora uses it as the first frame.
Remix functionality: Take an existing video and make targeted adjustments without regenerating everything (change color palette, add a character, adjust lighting).
Better prompting: OpenAI released a comprehensive prompting guide with examples for dialogue, camera movements, lighting, and timing.
Prompt gallery: Browse working examples to learn effective patterns.

How it works via API:

Call POST /videos with a text prompt and parameters (model, size, duration)
Poll GET /videos/{video_id} for status updates (or use webhooks)
Download the final MP4 with GET /videos/{video_id}/content
Optional: Download thumbnail and spritesheet assets

Pricing: Based on resolution, duration, and model (sora-2 vs sora-2-pro). See video API pricing.

Guardrails: Open-Source Safety Layer

OpenAI released Guardrails, an open-source, modular safety framework for LLM apps. Available for Python and JavaScript.

Built-in checks:

Prompt injection detection.
PII masking and detection.
Jailbreak detection.
Hallucination detection.
Content moderation.
Off-topic prompt filtering.
URL filtering.

How to use it:

python

from guardrails import GuardrailsAsyncOpenAI client = GuardrailsAsyncOpenAI(config="guardrails_config.json") response = await client.responses.create( model="gpt-5", input="Hello" )# Guardrails run automaticallyprint(response.guardrail_results)

You can enable Guardrails directly in Agent Builder workflows or deploy them standalone.

Apps SDK: Build Native Experiences Inside ChatGPT

The idea: Apps SDK lets developers build experiences that live inside ChatGPT—not external links, but native cards, carousels, and fullscreen views that appear directly in the conversation flow.

Think: booking a ride, ordering food, or tracking a delivery without ever leaving ChatGPT. The interface stays conversational, but now you can see rich UI when it actually helps.

How it works: Apps use the Model Context Protocol (MCP) to connect. Your server exposes tools, ChatGPT calls them, and you return structured data plus optional HTML/UI components that render inline.

Four display modes:

Inline cards: Lightweight widgets embedded directly in conversation (confirmations, quick actions, status updates).
Inline carousels: Side-by-side cards for browsing options (restaurants, playlists, events).
Fullscreen: Immersive experiences for multi-step workflows (explorable maps, detailed browsing, rich editing).
Picture-in-picture (PiP): Persistent floating windows for ongoing sessions (games, videos, live collaboration).

Design principles from OpenAI:

Apps should be conversational (natural extension of chat), intelligent (context-aware), simple (one clear action), responsive (fast and lightweight), and accessible (works for everyone).

Good use cases: Booking rides, ordering food, checking availability, scheduling, quick lookups—tasks that are time-bound, action-oriented, and can be summarized visually.

Bad use cases: Long-form content dumps, complex multi-step workflows that break the chat flow, ads or upsells, or trying to recreate your entire native app inside ChatGPT.

How to Build Apps SDK Apps

Building an app for ChatGPT follows three phases: planning, building, and deploying.

Phase 1: Plan Your App

Research use cases - Start with a crisp understanding of what users are trying to accomplish. Discovery in ChatGPT is model-driven: the assistant chooses your app when your metadata and descriptions align with user prompts.

Gather inputs:

User interviews and support requests to capture jobs-to-be-done.
Prompt sampling (direct asks like "show my Jira board" and indirect intents like "what am I blocked on for the launch?").
System constraints (compliance requirements, offline data, rate limits).

Define evaluation prompts:

Write at least 5 direct prompts that explicitly reference your product.
Draft 5 indirect prompts where users state a goal but not the tool.
Add negative prompts that shouldn't trigger your app (for measuring precision).

Scope the minimum lovable feature:

What information must be visible inline?
Which actions require write access and confirmation?
What state needs to persist between turns?

Phase 2: Build Your App

Set up your MCP server - Your MCP server is the foundation. It exposes tools, handles auth, and packages structured data plus component HTML.

Choose an SDK:

Python SDK (official) - great for rapid prototyping with FastMCP.
TypeScript SDK (official) - ideal if your stack is already Node/React.

Example MCP server setup:

javascript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { z } from "zod"; import { readFileSync } from "node:fs"; const server = new McpServer({ name: "kanban-server", version: "1.0.0" });// Load built assetsconst KANBAN_JS = readFileSync("web/dist/kanban.js", "utf8"); const KANBAN_CSS = readFileSync("web/dist/kanban.css", "utf8");// Register UI resourceserver.registerResource( "kanban-widget", "ui://widget/kanban-board.html", {}, async () => ({ contents: [{ uri: "ui://widget/kanban-board.html", mimeType: "text/html+skybridge", text: ` <div id="kanban-root"></div> <style>${KANBAN_CSS}</style> <script type="module">${KANBAN_JS}</script> `.trim() }] }) );// Register toolserver.registerTool( "kanban-board", { title: "Show Kanban Board", _meta: { "openai/outputTemplate": "ui://widget/kanban-board.html", "openai/toolInvocation/invoking": "Displaying the board", "openai/toolInvocation/invoked": "Displayed the board" }, inputSchema: { tasks: z.string() } }, async () => { const board = await loadKanbanBoard(); return { structuredContent: { columns: board.columns.map(col => ({ id: col.id, title: col.title, tasks: col.tasks.slice(0, 5)// keep payload concise })) }, content: [{ type: "text", text: "Here's your board. Drag cards to update status." }], _meta: { tasksById: board.tasksById,// full data for component only lastSyncedAt: board.lastSyncedAt } }; } );

Three fields your tool returns:

structuredContent - Data that hydrates your component (ChatGPT injects this as window.openai.toolOutput)
content - Optional text the model receives verbatim
_meta - Data passed only to the component, never shown to the model

Phase 3: Deploy and Connect

Deploy your app - Host your MCP server behind a stable HTTPS endpoint.

Deployment options:

Managed containers: Fly.io, Render, Railway.
Cloud serverless: Google Cloud Run, Azure Container Apps.
Kubernetes for teams with existing clusters.

Local development: Use ngrok to expose your local server:

bash

ngrok http 2091# https://<subdomain>.ngrok.app/mcp → http://127.0.0.1:2091/mcp

Connect from ChatGPT:

Enable developer mode in Settings → Connectors → Advanced.
Navigate to Settings → Connectors → Create.
Provide connector name, description, and your HTTPS /mcp endpoint.
Click Create - ChatGPT will list your advertised tools.
Toggle your connector in conversations to start testing.

Best Practices for Apps SDK

Optimize metadata - ChatGPT decides when to call your app based on metadata. Well-crafted names and descriptions increase discovery.

For each tool:

Name: Pair domain with action (calendar.create_event)
Description: Start with "Use this when..." and call out disallowed cases.
Parameter docs: Describe each argument with examples.
Read-only hint: Annotate readOnlyHint: true on tools that never mutate state

Iterate methodically: change one field at a time, track precision/recall with your golden prompt set, and monitor tool-call analytics weekly in production.

Security & privacy - Treat every connector as production software:

Least privilege: Only request needed scopes and permissions.
Explicit consent: Users must understand when linking accounts or granting write access.
Defense in depth: Validate everything, assume prompt injection will happen.
Data handling: Include only required data in structured content, publish a retention policy.
Network access: Widgets run in sandboxed iframes with strict CSP - no privileged browser APIs.

Troubleshooting - Common issues and fixes:

No tools listed: Confirm server is running and /mcp endpoint is accessible.
Component doesn't render: Check _meta["openai/outputTemplate"] points to registered HTML resource
Tool never triggers: Rewrite descriptions with "Use this when..." phrasing, update golden prompts
401 errors: Include WWW-Authenticate header so ChatGPT restarts OAuth flow.
Streaming breaks: Ensure load balancer allows server-sent events without buffering.

Key Requirements from the Developer Guidelines

From the developer guidelines:

Clear purpose: Do something valuable, don't mislead or spam.
Privacy first: Collect only minimum required data, include a privacy policy.
Accurate metadata: Tool descriptions must clearly explain what they do and whether they modify data.
Safety: Comply with OpenAI's usage policies, suitable for ages 13+.
Verified developers: All submissions require identity verification.

Visual design rules from design guidelines:

Use system colors, fonts, and spacing (SF Pro on iOS, Roboto on Android).
Brand accents allowed on logos, icons, and primary buttons only.
No custom fonts, background colors, or patterns that break ChatGPT's minimal look.
Maintain WCAG AA contrast ratios and provide alt text for accessibility.

Example Apps and Code

OpenAI released a gallery of example apps on GitHub showing:

Pizza ordering app (Pizzaz) with multiple display modes in Node.js and Python.
3D solar system viewer in Python.
Working MCP servers you can run locally and test.

Current status: Apps SDK is available in preview today. OpenAI will open for app submissions later this year, with enhanced distribution opportunities (merchandising in the directory, proactive suggestions in conversations) for apps that meet higher design standards.

What this unlocks: Imagine asking ChatGPT "order me lunch" and seeing a carousel of nearby restaurants with photos, ratings, and one-click ordering—all without switching apps. Or "book a ride to the airport" and getting a live map with ETA that updates as you chat. That's the vision.

What Sam Said: The DevDay Keynote in Brief

‍

In addition to all the new releases on the web, OpenAI CEO Sam Altman took the stage in San Francisco during his Keynote with a simple message: "This is the best time in history to be a builder."

He wasn't exaggerating. The numbers tell the story:

4 million developers now build with OpenAI (up from 2 million in 2023).
800 million people use ChatGPT every week (up from 100 million).
6 billion tokens per minute processed on the API (up from 300 million).

Some developers in the room had processed over 1 trillion tokens on OpenAI's platform. Let that sink in.

Sam paused to recognize them: "On the screen behind me are the names of developers in the room today who've built apps on our platform that have crossed some incredible milestones—10 billion, 100 billion, even a trillion tokens processed. Let's give them a round of applause."

The four announcements:

"We have four things for you today," Sam announced. "We're going to show you how we're making it possible to build apps inside of ChatGPT. We're going to show you how building agents is going to be much faster and better. You'll see how we're making it easier to write software. And underneath all of this, we'll give you updates to models and APIs to support whatever you're building."

Sam structured the keynote around four major themes, each addressing a specific developer pain point:

‍1. Apps inside ChatGPT

"We're opening up ChatGPT for developers to build apps inside of ChatGPT. This will enable a new generation of apps that are interactive, adaptive, and personalized that you can chat with."

The demos showed Coursera courses playing picture-in-picture while you chat, Canva generating portfolios and pitch decks inline, and Zillow maps that update in real-time based on your conversation. The magic? Users discover apps by asking for them by name ("Figma, turn this sketch into a diagram") or ChatGPT suggests relevant apps mid-conversation.

The "talking to apps" feature is particularly clever: Apps SDK provides an API to expose context back to ChatGPT from your app, so the model always knows exactly what the user is interacting with. In the Zillow demo, when the user clicked on a specific home, ChatGPT could answer "How close is this to a dog park?" because Zillow exposed that context.

The Coursera demo highlighted this perfectly—when the presenter asked "Can you explain more about what they are saying right now?" while watching a video, ChatGPT immediately understood the video content without any manual explanation because the Coursera app exposed the current timestamp and context.

Sam emphasized the distribution opportunity: "When you build with the Apps SDK, your apps can reach hundreds of millions of ChatGPT users. This will be a big deal for how developers rapidly scale products."

Timeline and monetization: Apps SDK is available in preview today. Later in 2025, developers can submit apps for review and publication, and OpenAI will release a directory users can browse. Apps that meet basic standards in the developer guidelines will be listed; apps that meet higher standards for design and functionality will be featured more prominently and suggested in conversations.

Future monetization plans include an "agentic commerce protocol" for instant checkout inside ChatGPT—meaning users could complete purchases without ever leaving the conversation.

2. Building agents faster

"For all the excitement around agents, very few actually make it into production. It's hard to know where to start, what frameworks to trust. Each layer adds complexity before you even know if the idea is going to work."

Enter AgentKit. Sam introduced it as "a complete set of building blocks designed to help you take agents from prototype to production."

The demo was the most impressive part: Christina built and deployed a working agent in 8 minutes on stage. She used Agent Builder to visually wire up specialized agents, added guardrails for PII protection, attached a custom widget, previewed the workflow, and published it—all without writing backend code.

Real companies are already seeing results. Albertsons uses AgentKit to help store managers analyze sales drops (like a 32% ice cream decline) and get recommendations in seconds instead of "reports, spreadsheets, meetings." HubSpot's Breeze assistant uses it to search knowledge bases and piece together smart answers.

3. Writing software easier

"One of the most exciting things happening with AI is that we're entering a new era changing how software gets written. Anyone with an idea can build apps for themselves, their families, or their communities."

Sam shared stories of an 89-year-old Japanese retiree who taught himself to code with ChatGPT and built 11 iPhone apps for elderly users. Med students at ASU built a virtual patient app for practicing difficult conversations. Versailles lets visitors have conversations with art and sculptures using OpenAI's real-time API.

Codex is now generally available. Key stats:

Daily messages up 10x since early August
GPT-5-Codex served over 40 trillion tokens since release
Engineers using Codex complete 70% more pull requests per week
Nearly every OpenAI PR gets a Codex review

New features: Slack integration (tag @Codex in team conversations), Codex SDK (embed the agent in your workflows), and admin tools for enterprises.

Cisco rolled out Codex across their entire engineering org and cut code review times by 50%, reducing project timelines from weeks to days.

The demo was wild: Romain controlled a Sony FR7 camera, wired up an Xbox controller, integrated a lighting system with an MCP server, added voice control with the Realtime API, and even had the voice agent call Codex SDK to reprogram the app on the fly—all without writing code by hand.

Here's what made it mind-blowing:

The camera control: Romain asked Codex CLI to "lay out a plan to control a Sony FR7 camera." He admitted "I honestly didn't know how to get started." Codex figured out the VISCA protocol (over 30 years old), translated the C++ SDK to JavaScript, and built a complete control panel in 13 minutes.

The Xbox controller: Mid-demo, Romain asked the IDE extension to "wire up a wireless controller to control the camera." Codex ran in the background, figured out the gamepad API, and mapped the joystick controls—without Romain specifying which buttons should do what.

The voice + lighting integration: Romain integrated OpenAI's Realtime API for voice and asked Codex to create an MCP server for their lighting system. Then he tested it live:

When he asked the voice agent to "shine the lights towards the audience," it just worked. When he asked it to "do something fun with the lights and say hi to people tuning in on the livestream," colored lights started pulsing while the agent spoke: "We've got the fun lights rolling. Some dynamic colorful effects in motion. And to everyone watching the livestream, thanks for joining us."

The finale: he asked the voice agent to "show a credits overlay like at the end of a movie but the cast is the attendees" while simultaneously taking a countdown photo of the entire DevDay audience. The credits rolled in real-time as the photo snapped.

Sam's reaction: "This is the biggest change in how software gets created we've seen."

4. Model and API updates‍

Sam announced GPT-5 Pro is now available in the API for all developers—the most intelligent model OpenAI has ever shipped, optimized for hard tasks in finance, legal, and healthcare where accuracy matters most.

gpt-realtime-mini launched as a 70% cheaper version of Advanced Voice Mode with the same voice quality and expressiveness. "Personally, I think voice is going to become one of the primary ways people interact with AI," Sam said.

Sora 2 preview is now available in the API with synchronized sound and audio. The demos showed kayaking videos with realistic water sounds, dogs playing with ambient audio, and product concepts for e-commerce. Mattel is already using it to turn toy sketches into shareable video prototypes.

The closing message:

"Software used to take months or years to build. You saw today—it takes minutes now. To build with AI, you don't need a huge team. You don't need a bunch of infrastructure. You just need a good idea."

Sam wrapped up with gratitude: "Our goal is to make AI useful for everyone, and we couldn't do it without you. We're grateful you're building with us."

Then he added one more thought before leaving the stage: "Things are going to get pretty incredible pretty soon."

And remember how I mentioned at the beginning that Codex built 80% of Agent Builder in under 6 weeks? Sam didn't explicitly say this in the keynote, but it's the subtext of everything he announced.

The tools are building themselves. Codex writes the code for Agent Builder. Agent Builder creates agents that use Codex. Those agents deploy apps that reach 800 million people. The feedback loop is tightening, and the velocity is compounding.

That's why Sam kept saying "we're still so early on this journey." If this is early, where does it go next?

The Complete Guide: Building with OpenAI's DevDay Stack

For developers: Building with AI agents just became 10x easier. Instead of wiring up APIs manually, you can now design workflows visually, test them with live data, and deploy with drop-in UI components.

For enterprises: The Connector Registry and admin tools mean you can actually govern how AI accesses company data—not just hope for the best.

For everyone else: The products you use daily are about to get a lot smarter. Customer support agents that actually help. Code reviews that catch bugs before production. Video generation that doesn't look like a fever dream.

The meta point: Agent Builder lets you build agents without code. Those agents can use Codex to write more agents. We're officially in the "AI building AI" phase, and it's happening faster than anyone expected.

Get Started

Want to try Codex?

Install the CLI: npm install -g @openai/codex
Full Codex quickstart

Want to build agent workflows?

Want to embed chat agents in your product?

ChatKit quickstart

Want to generate videos programmatically?

Want to evaluate agent performance?

Evals platform guide

Want to build apps that live inside ChatGPT?

This is the infrastructure for the next generation of software—where agents are the default, not the exception.

Ready to build with the DevDay stack? Follow our 60-minute hands-on tutorial that walks you through creating a production AI app using Apps SDK, AgentKit, ChatKit, and Codex with complete working code and deployment instructions.