EPISODE #

“Is this GPT-5?!”, Inside Microsoft’s $10B Deal with OpenAI, California Bill 1047

May 2, 2024

Show Notes

A mysterious new chatbot hit the market, blew everyone away, then disappeared. I unpack some of the leading hypotheses behind it.

An email surfaced in court uncovering the motivation behind Microsoft’s deal with OpenAI. I unpack the scene at the time and what happened.

Tech folks are up in arms about a new AI bill in the California State Senate. I show you Bill 1047 and what people are saying about it.

Transcripts: https://www.theneuron.ai/podcast

Subscribe to the best newsletter on AI: https://theneurondaily.com

Watch The Neuron on YouTube: https://youtube.com/@theneuronai

Transcript

Welcome all you cool cats to The Neuron! I’m Pete Huang.

Today,

A mysterious new chatbot hit the market, blew everyone away, then disappeared. I unpack some of the leading hypotheses behind it.
Next, an email surfaced in court uncovering the motivation behind Microsoft’s deal with OpenAI. I unpack the scene at the time and what happened.
Finally, tech folks are up in arms about a new AI bill in the California State Senate. I show you Bill 1047 and what people are saying about it.

It’s Thursday, May 2nd. Let’s dive in! Our first story is the GPT 2 chatbot.

This week, a mysterious new chatbot captured the AI community’s attention, stirred the pot with AI speculators, then disappeared.

There’s this site called lmsys.org, standing for Large Model Systems Organization. Their most notable project is the chatbot arena, where you’re given two chatbots at random, you give them the same input, then you pick which output is better.

Then they take those votes and produce a live updating ranking.

This, by the way, is roughly how we know which of these chatbots is best. Other ways of measuring which try to put them against standardized tests are often unreliable and completely game-able.

Turns out the best way to figure out the best of something is to ask people which ones they like - who knew?!

On Sunday, users of this chatbot arena found a model mysteriously named GPT2 chatbot.

It was really good. Like, it’s probably number one, even better than OpenAI’s GPT-4.

No matter what test you threw at it, this GPT2 chatbot would either get it right when others couldn’t or at least show much stronger reasoning and get closer.

It would solve puzzles and riddles that other models couldn’t.

It could draw images using letters and characters on your keyboard, something that always tripped up other models.

The code that it wrote was noticeably better than GPT-4 and Claude 3.

Whatever you wanted, GPT 2 chatbot was better. And it knew a bunch of stuff, too. Obscure references to specific configurations of enterprise software that nobody talks about. Nearly dead internet forums that haven’t been talked about in 15 years.

So of course, this lights the rumor mill on fire.

What was GPT 2 chatbot? Who made it? And why was it shown like the way it was?

The leading hypothesis is that this is a secret new model under testing by OpenAI.

When you asked GPT2-chatbot to identify itself, it would say it was made by OpenAI. And it would do so in a way that very particularly matches models like GPT-4.

The way that it structured its text seemed to match OpenAI models and only OpenAI models.

The way that it dealt with breaking up words seemed to match OpenAI and only OpenAI.

And then there’s the simple fact that if someone made a model that was so clearly better than GPT-4, why aren’t they screaming from the rooftops about it? Why just whisper it out into the world?

I’d be bragging all day by now.

But what exactly is GPT2-chatbot? And why is it called that?

Here’s where people have landed.

The leading idea is that this is OpenAI silently dropping this for a couple days just to see how people react to it compared to the other models.

By Tuesday, GPT2-chatbot was already taken offline by the LMSys organization. It’s also been revealed that this organization makes some money by working with AI model developers to test special previews of new models.

As for GPT2-chatbot itself, it feels like an upgrade to GPT-4, but not a full step to GPT-5. In other words, it’s an improvement to GPT-4, but if this was GPT-5, with all of the hype that OpenAI CEO Sam Altman has dropped about it, that the difference between GPT-5 and GPT-4 would be the same as the difference between GPT-4 and GPT-3 (which is huge by the way). If this really was GPT-5…man, that would be a bit disappointing.

Here’s the dark horse hypothesis:

That this is some modification of their old GPT-2 model from way back in 2019. This model is 100 times smaller than GPT-4 and is way, way less capable.

The argument is that OpenAI may have found a way to make even this very old model from 2019 more capable than GPT-4, even though it’s 100x smaller. That size saves on cost. Big models are very expensive to run. If you can make a smaller model as good, which nobody has gotten even close to doing yet, then that’s a big win for anyone trying to use it.

That story sounds very good. It’s unlikely for a number of technical reasons. Though the naming of the chatbot, I mean it’s literally called GPT2-chatbot, has led a number of people to speculate that this is the reasoning.

GPT2-chatbot was just a flash in the pan, but it was enough for a shock to the system. It was nice knowing you.

Your big takeaway on GPT2-chatbot:

OpenAI is up to something. We just don’t know what.

Only OpenAI could have made something like GPT2-chatbot and released it like it was. If it is OpenAI, we’re not going to know what it is and what’s behind the scenes until it officially gets released.

But the timing feels right. In public appearances, OpenAI leadership have repeatedly said GPT-5 is coming. They’re talking very openly about it.

Reports have said it’s coming this summer and it’s already May! I mean they must be in the final stages of this thing, right?

Even if this GPT2-chatbot isn’t GPT-5, as many are hoping it’s not given it wasn’t a huge improvement, Sam Altman has said in interviews that there will be many good models released.

There’s something coming. This was too good to be a nothingburger.

Our second story is a behind the scenes look at Microsoft’s recent plays in AI.

Imagine this. It’s 2019. The magical bull run of the last decade is still running. Microsoft stock is the highest it’s ever been. The vibes are good.

But you’ve been watching a little startup called OpenAI. They’ve released this AI model called GPT-2, which you think is interesting. And GPT-2 is based off of a research paper released by Google, your arch nemesis.

Google, in fact, is also looking more primed than ever, especially in AI work. Recently, your team at Microsoft tried to replicate an AI model that Google made. And it took you 6 months to do it, whereas Google did it in a matter of days and weeks.

In fact, the more you look at it, the more you realize how much Google has. It’s rich in every way: talent, money, resources. It has the infrastructure. Part of the reason it took them days and weeks and you took months was that you didn’t have the infrastructure in place yet. And perhaps worst of all, it has the head start.

Imagine what they’ve been up to in the 6 months that it took you just to copy something they did already.

All of this is swirling in the head of Microsoft CTO Kevin Scott, who eventually architected the landmark deal between OpenAI and Microsoft that has led us to today.

That deal converted OpenAI from nonprofit to for profit and allowed OpenAI to take on outside funding, more than $10 billion of which eventually came from Microsoft. It let Microsoft leapfrog Google as the main player in Big Tech AI. And it armed OpenAI with access to customers and infrastructure that it’d need to make its models.

Here’s a snippet from the email:

“The thing that’s interesting about what OpenAI and DeepMind and Google Brain are doing is the scale of their ambition, and how that ambition is driving everything from datacenter design to compute silicon to networks and distributed systems architectures… As I dug into try to understand where all the capability gaps were between Google and us for model training, I got very, very worried.”

Kevin Scott goes on to describe that 6 month training effort. And the twist of the knife comes with this: that even copying Google’s model worked tremendously and improved a major metric by 10%. In other words, Google’s got the goods. And they’re ahead. So what next?

Kevin Scott eventually goes hunting for a partner. If they can’t do it themselves, they need to go find another org they can and somehow tie up with them. So he goes to chat with Sam Altman and the OpenAI team. And to them they see the most obvious deal. Microsoft can resource the heck out of anything they want. And OpenAI has the research direction to produce something amazing.

That all sounds obvious today, knowing what we know. But keep in mind that OpenAI was at that point a nonprofit with zero revenue and an early AI model with no forecast as to just how far it would improve. To make a concentrated bet in a partner like that is not very obvious at all.

Your big takeaway on Microsoft:

This deal may end up being the key to Microsoft’s entire future as a business. And it wasn’t a clear move at all.

Even Kevin Scott admits in the email that he was too dismissive of Google, that he wasn’t paying enough attention to the competitive landscape in AI. And he seemed genuinely shocked at the impact it would make on their products.

What he gets endless credit for is a bold, decisive move to fix that. OpenAI seemed like a total risk at the time. The researchers could doubt how far OpenAI could actually go. The business people all had questions about the financial payoff.

But at the end of the day, Microsoft was 6 months to a year behind, and the deal let them get 1-2 years ahead by getting advanced access to what was going on in OpenAI every step of the way.

Today, that’s turned into an exploding number of developers using GitHub Copilot, a fast start to integrating AI into the office suite, and a cloud product that was ready for enterprises out of the gate.

And heck, let’s not forget that OpenAI won big as well. Without the Microsoft partnership, OpenAI may have had to spend a lot more time figuring out how to resource their plans. Microsoft just swooped in and handed them all they needed.

It was an incredibly smart deal all around.

Our next story is California’s new AI safety bill.

It’s a classic case of tech and government not seeing eye to eye. This time, they’re taking that battle to new AI models.

State Senator Scott Weiner, who represents San Francisco, of all places, introduced California Bill 1047 in February this year.

And the major news this week is that it’s been fast-tracked for a vote in the State Senate Appropriations committee. Three groups have to vote: that committee, the entire State Senate, then the State Assembly before it becomes law.

Here’s what it says and why it’s generating so much controversy.

First, Bill 1047 only applies to models of a certain capability. They define this in two measures. The first is that the model has to be trained with a certain amount of computing resources. In this particular case, that’s 10 to the 26th power floating point operations.

For reference, that’s just a tick over how much OpenAI used to create GPT-4.

The second is that if the model performs with equal capability as a model trained with that much computing, then that model is also covered.

So in other words, if your model is as capable as GPT-4 but used less computing, you’re still covered the bill.

Bill 1047 requires the makers of those models, before you even start making the model, to apply for and get approval from the government in order to start making the model.

If you made a model that is covered by the bill, you have to file annual compliance documents with the government.

What the government is looking for is what they define as hazardous capabilities. Specifically, if the model can enable chemical, nuclear or biological weapons, it has a hazardous capability. If it can cause $500 million of damage, then it has a hazardous capability.

They require model makers to take steps to prevent these capabilities before they’re allowed to train.

All this sounds useful, right? The tech community says the bill has more going on than it seems.

For one, it’s suspect that the amount of computing used to define which models are subject to the bill coincides with OpenAI’s GPT-4. New upstarts have to face all this compliance and regulation, and that’s before they even start doing the work.

It feels like the law is pulling the ladder for anyone competing with OpenAI.

It also seems to destroy open source models. Part of the bill says they won’t allow training if there’s risk that someone could build off of that model and create those hazardous capabilities.

And since open source means anyone can download it, there is almost a guarantee that the government will see that as the presence of that risk. In order to keep operating, those model makers would virtually be guaranteed to have to close off their models.

I mean let’s just play it out, right? You make a model, you release it out in the open. Someone downloads it and modifies it to start doing pretty bad things. And now you’re on the hook.

That doesn’t seem entirely implausible. And knowing how many people are hacking away at these models, how many people would be trying to experiment with dangerous stuff and how much more powerful these models could get, it seems more likely than not that something like that would happen.

Meanwhile, the author of the bill, State Senator Scott Weiner, responded to some of these claims on Twitter, basically saying that critics are exaggerating.

For example, on the threat to open source, Senator Weiner clarifies that there would only be punishments if there are catastrophic harms that the model makers didn’t reasonably try to prevent.

Still, many in the tech community - founders, investors, pro tech think tanks, etc - all view this as stifling innovation, using much of the same language that any new regulation would incur.

Your big takeaway on California Bill 1047:

Governments are struggling with how to deal with AI safety while promoting innovation.

At the federal level, the Biden administration’s executive order on AI also targeted the same computing resource level that California Bill 1047 does. But it stops short of any requirement to register with the government. Instead, all it does is ask model makers to report the existence of these models to the government.

The European AI Act also requires high-risk AI systems to undergo assessment and certification before it makes it to market. But the California version specifically targets frontier models, and people are worried about competition. After all GPT-5 is on the way, which means we’ll eventually see Llama 4, Claude 4, Gemini 2, etc. And with AI moving as fast as it is, it might mean that whoever’s gotten their work out now has a structural advantage moving forward.

Nobody knows just how capable AI will get in the future, not even the researchers making them. Which makes it hard for anyone to define what the appropriate level of risk to regulate would be and what the right way to ensure safe approaches would look like.

Some quick hitters to leave you with today:

There’s a big rumor going around that OpenAI has an event planned for May 9th. That’s right before Google’s I/O conference on 14th and 15th. The rumor is that it has something to do with search, but either way, classic OpenAI to try and upstage Google.
Anthropic is launching Claude for Teams and their iOS app. The teams plan gets you the top models, up to 200K tokens of context, and more usage than the individual Pro plan. And we all love using AI on our phone.
Finally, Yelp and Atlassian are both launching AI chatbots. If you need help with home projects, you can use Yelp’s chatbot to hone in on your project spec and find the right provider. And if you use Atlassian products at work like Jira or Confluence, the new Rovo AI helps you find information across the company and chat with the information.

This is Pete wrapping up The Neuron for May 2nd. I’ll see you in a couple days.

‍