Welcome all you cool cats to The Neuron! I’m Pete Huang.
Today, we’re talking about Microsoft’s new competitor to OpenAI, Meta and Google, why they’re building it and why some people really dislike the guy leading the effort.
It’s Tuesday, May 7th. Let’s dive in!
When you look across Big Tech, each of the Big Tech companies have a lot of ways to make money with AI.
In fact, 2023, the year of AI, brought on tremendous growth in share price for the Magnificent Seven, which includes Nvidia, Meta, Amazon, Microsoft, Alphabet, Apple and Tesla.
For Nvidia, AI mostly means chips. Using AI models needs these big data centers with big computers to calculate a bunch of stuff. And each of the computers has chips that do the actual calculating. And Nvidia makes the best chips to do the calculating. And everyone knows that Nvidia wants the best chips, so when they open their wallets with billions of dollars, they just go to Nvidia and say, “Yes please we want a lot of your very best chips.”
So when they reported financial results in February, their data center division, the one that makes the chips to do the calculating that tech companies ramping up their AI efforts would eventually need, that division grew more than 400% over the last year. And a big portion of that growth was simply jacking up the price. You want the chips? So does everyone else, so pay up. Let’s see how much you actually want it. That’s how Nvidia makes money on AI.
Some of these companies offer cloud services that businesses use to build and run software on. Of course, this means Amazon Web Services, Microsoft Azure and Google Cloud. This category is the picks and shovels kind of play. Microsoft Azure doesn’t really care what you build, as long as you run it on Azure - that’s all they really care about. The more you use cloud, the more you pay, the more money they make.
For these companies, they’re maniacally focused on one thing. I don’t really care if you’re building an AI note-taking app, or you’re using AI to do customer support, or if you’re training an AI to accelerate your internal teams. I don’t care if you make money doing it or if selling the thing you build to other people. I just need you to build it and run it on my infrastructure. The more people I get building on my stuff, the more people I get to lock into my stuff and they’re stuck with me forever.
So as it relates to AI, I just need to make every AI model available to use on my infrastructure. You want GPT-4? You got it. You want Claude? You got it. It’s all available on Amazon Web Services or Google Cloud or wherever you wanna build stuff.
And note that this approach goes the other way too: it also doesn’t rely on any one model being better than another. If Anthropic suddenly makes a way better model, then if I just make sure I have it available to you, you don’t have to leave my service. I’m ready for you to switch. Then GPT-5 drops and you can switch to GPT-5.
And so on and so forth.
This is not necessarily true for a third group of companies who make money on the applications that use AI. For example, if Microsoft is building Microsoft Copilot into Office, their partnership with OpenAI means they’re probably relying on OpenAI's AI models to build Copilot.
And that can be a pretty scary thing when you think about it. I mean, let’s just remove Microsoft and OpenAI from this. If you’re building something and you’re completely relying on this partner to build a very critical component of it, you’re sorta at their whim. Like, what if they decide to jack up the prices? What if they go out of business? At virtually any time, you could be in for a very annoying change of plans.
Up until now, the relationship between Microsoft and OpenAI is sorta like that, though I’d imagine they wouldn’t actually try to mess with each other. Microsoft Copilot is indeed built on top of OpenAI’s AI models, and they really haven’t had an alternative in place.
But the Microsoft parent org is painfully aware of what the consequences are if you’re locked into a critical vendor. How? Well, their Microsoft Azure team is benefitting massively from doing that to other organizations! So when they’re on the other side, they know they gotta be careful.
This is the setup for MAI-1, Microsoft’s new large language model effort. While one part of Microsoft is continuing to build out applications like Copilot, another part of Microsoft is building out a new model to provide a backup option in case something in the Microsoft OpenAI relationship goes wrong.
After all, they’re not the same company, even if Microsoft’s massive $10 billion investment into OpenAI may make it seem like they’re an airtight alliance.
Remember that the risk of something bad happening to OpenAI was very painfully real at some point. In mid-November 2023, OpenAI’s board suddenly fired CEO Sam Altman with very little explanation. That caused 7 whole days of chaos. In the first few days, it wasn’t very clear at all that OpenAI was going to be OK after all that.
For one, OpenAI, despite having all this amazing tech built out, saw over 95% of its employees sign a letter demanding that Sam Altman be brought back as CEO and the board resign or else they’d all walk. Like, the entire company of over 700 employees would just leave.
That should not feel very good if you’re Microsoft CEO Satya Nadella. You’re getting calls in the middle of a Friday afternoon saying your closest partner in AI just got fired, you’re watching employees threaten to resign over the weekend, and you’re faced with the real possibility that they can’t figure this conflict out and all the advantage you thought you had with OpenAI is gone.
Of course, it all ended up fine, but not without Satya Nadella himself parachuting in to broker some talks. And this causes a huge pain for Satya Nadella, too. The drama is completely separate from Microsoft but once shareholders catch wind of the OpenAI news, the first question they ask is, “How does this affect Microsoft?”
Satya Nadella has to show strength here to placate those shareholders, so he says:
“If OpenAl disappeared tomorrow, we have all the IP rights and all the capability. We have the people, we have the compute, we have the data, we have everything. We are below them, above them, around them.”
In this effort to diversify away from OpenAI, Microsoft has at least two tracks of work going.
The first one is around small models. These models have fewer parameters, meaning they’re faster and cheaper to run, plus they can fit on smaller devices. Think about parameter count kinda like the number of calculations you have to do in order for the model to give you an answer.
Normally, the bigger models are better. The more calculating you do, the better responses you get.
But this track of work wants to see if you can get the same performance with smaller models. This includes models like Phi 3, which has 4 billion parameters but matches the performance of a model with hundreds of billions of parameters.
The second track of work is still to compete with these larger models in the hundreds of billions of parameters. This is where MAI-1 fits in. MAI-1 will have 500 billion parameters. It’s competing with Meta’s Llama3, whose largest version has 400 billion parameters, Google’s Gemini Ultra, which is something like 540 billion parameters, and GPT-4, the big kahuna with over a trillion parameters.
So now, Microsoft’s also in the race to build the best big model, and it’s competing with its own partner to do it.
Ok here’s another thing about MAI-1. Part of its training is using data generated by GPT-4. And the consequences of this are pretty significant, so let me walk you through it.
Previous efforts to train AI have nearly exclusively used human-generated text. It’s your books and papers and newspaper articles and Reddit comments.
All human-generated content, because we want the AI model to learn to behave like a human.
And this is why OpenAI has started signing all of these deals to partner with media orgs like the Associated Press and Axel Springer so they can feed more data to their AI models.
By the way, as of this week, that also includes Stack Overflow, which is a website where people ask and answer all types of technical questions. It’s a very common destination if you’re learning how to program, for example. And coincidentally, the release of ChatGPT did some serious damage to Stack Overflow’s traffic statistics after people realized you could just ask ChatGPT for the answer instead of wading through the Stack Overflow archives to get the answer.
Ok, so researchers used to use all this human-generated data. And before they started testing AI-generated data, they wondered about this problem called “model collapse”
Which boils down to this: if you assume that an AI model is basically a dumb version of a human, that means it has a more narrow range of thought than a human. So whatever it generates, even if you have it generate a million pages, a billion pages, isn’t going to fully represent human thinking and reasoning, just a portion of it.
So if you take those billions of pages, and you feed it back into the model. Then the range of thought gets even narrower, right? If the first time you asked a model to generate stuff, it gave you 80% of the range of real human thought, then that missing 20% will compound. And as you continue feeding more and more AI-generated data into a model, the model gets worse, the problems compound. Eventually, the model just starts to repeat itself, it gets super generic, it just gets super dumb.
All of this is called model collapse.
Here’s the thing: model collapse could’ve ended up happening to any AI company building a model even if researchers weren’t trying to use AI-generated data to train the model.
The way it happens is just the Internet. Once ChatGPT comes out, people use it to generate a bunch of AI-generated articles to put on the Internet. So now when you scrape the Internet, you’re scooping up AI-generated articles. And whoever decides to use a bank of Internet data to train a model will now have AI-generated data in it.
So now, without intending to, someone making an AI model could face this model collapse problem. That’s a big issue for AI development.
The opposite answer to this question is also equally impactful, but in the other direction.
If it turns out that AI-generated data does work and there are no notable consequences for using AI-generated data to train models, then you have unlimited training data. Right? Just use ChatGPT! If all of humanity produced a billion pages of text, I don’t know what the actual number is, then at a press of a button, ChatGPT can produce more.
And you just feed an endless firehose of AI-generated text of all kinds that looks and feels like human text but isn’t, right into the AI model.
AI training AI.
So this synthetic training data thing is in fact what researchers are doing with MAI-1. And so did Meta researchers with Llama 3. In both cases, part of the training data, not the whole thing, but part of it, will be AI-generated. And as we saw with Llama 3, the performance is very good, very competitive with the leading models.
The larger effect is kinda crazy: if we have all the ingredients for AI to basically endlessly improve on itself…well I’ll leave that for another episode.
The MAI-1 efforts are being led by Mustafa Suleyman, the CEO of Microsoft’s consumer AI division. Suleyman used to be CEO of Inflection AI and co-founder of DeepMind, the AI lab that eventually got acquired by Google and is now Google’s primary AI research center.
This is Suleyman’s first big project at Microsoft. He’s bringing in a resume that is pristine at first glance: DeepMind is the bedrock of all things Google AI now, and Inflection AI built some pretty impressive models before Microsoft took all of its staff in a pseudo-acquisition.
But I always found the reactions during that Inflection/Microsoft deal very interesting. Some people actually have some pretty negative reactions to him, so here’s one:
mustafa suleyman, the non technical of deepmind's founders, getting continually funded, promoted and appointed to positions where he has influence over ai is demoralizing. he's a corporatist statist anti-open source doomer.
Ok here’s another one:
Mustafa Suleyman is the ultimate AI grifter.
He has ZERO technical skills. ZERO hardcore STEM background.
This is Satya Nadella's worst AI move yet.
Very strong reactions. They have a point, Suleyman is nontechnical, he studied philosophy and theology at Oxford before dropping out. And it seems like Suleyman’s cofounder at DeepMind, Demis Hassabis who’s still at Google DeepMind as the CEO of the division, he basically brushes Mustafa Suleyman aside. His quote to the New York Times:
I don’t think there is much to say,” “Most of what he has learned about A.I. comes from working with me over all these years.
The good thing is that Suleyman is basically working with the same exact team that he had at Inflection, his previous startup. Microsoft didn’t formally acquire Inflection because they wanted to avoid any chance of an acquisition getting held up or blocked. So instead they just gave blanket offers to every person on the team, and all of them decided to join. They left behind just a couple of people at Inflection, and since then, they’ve all left, except for the newly installed CEO who I’m sure is just there to wind it down.
Now, it’s Mustafa Suleyman in yet another race to build a leading model. This time, with Microsoft’s MAI-1, which will compete against OpenAI, Meta, Google and all the others.
Some quick hitters to leave you with:
- Apple is working on its own AI chips. Much like Microsoft wants to diversify away from OpenAI, every Big Tech company wants to have a backup option for chips instead of overly relying on Nvidia. Apple’s just now the latest in their announcements to do this. The good news for them is that they’re already in the chip game, after moving to their own chips for Apple products in 2020.
- AI images continue to trick people, this time with Katy Perry at the Met Gala. An AI generated version of Katy Perry blew up on social media and even tricked her mom into believing that she was there before Katy Perry had to clarify she couldn’t make it due to work.
- The youngest generation is well known to be the most eager to adopt new technology. For AI, that would be Gen Z. But Gen Z is also turning out to use it in pretty clumsy ways, including submitting cover letters that sound exactly the same, word for word. Heads up, don’t just ask ChatGPT to “write a cover letter”, you gotta mix it up a bit.
This is Pete wrapping up The Neuron for May 7th. I’ll see you in a couple days.