We sat down with Andrew Hsu, CTO of Speak, to demo their new Winter Release and explore how Voice AI is revolutionizing language learning. From real-time pronunciation feedback to "Bad Bunny" roleplay scenarios, learn how Speak is building the world's most advanced AI tutor
Ever feel like you’ve been "learning" a language for a decade but still freeze up when a waiter actually asks you a question? You’re not alone. The gap between knowing vocabulary and having functional conversational ability is massive—and it’s where most apps fail.
We just wrapped a deep-dive livestream with Andrew Hsu, Co-founder and CTO of Speak, and walked away with a completely new perspective on how AI is finally solving the "silent learner" problem.
What follows is everything we learned: the philosophy behind their "Learn, Drill, Apply" method, the exact mechanics of their new Winter Release, and why major global enterprises are ditching traditional language classes for AI tutors.
Fair warning: we geek out on voice AI tech here, too (it's the best part IMO!).
Also, shout out to Andrew for dealing with our live tech hiccups like a pro! You’re a legend for that, Andrew!
IMO, the most important takeaway from the stream wasn't just the tech—it was the pedagogy. Andrew introduced us to the concept of "Functional Fluency" over perfection. It’s not about getting the grammar 100% right; it’s about reducing the latency between your brain and your mouth.
If you only have a few minutes or want to jump to the key moments, here are our favorite parts.
The Tech & The Winter Release
Business & Future Vision
Let’s dive into the new features dropped in the Winter Release (available now). Andrew walked us through the three pillars of the new experience:
1. The Learning PathThis isn't your standard "tree" of lessons. Speak has reorganized its curriculum into Units containing Learning Loops.
"We think sometimes users want to see a human face... but a lot of times, especially for custom generative courses... that has to be AI generated." — Andrew Hsu
2. Immersive Roleplay (The "Free Talk" Tab)This was the coolest demo. Andrew jumped into a scenario to practice slang.
3. Smart Review: Speak tracks every mistake you make across every lesson. The Smart Review tab uses a spaced-repetition algorithm (similar to Anki but powered by LLMs) to surface the exact phrases you’re struggling with, right before you forget them.
We talked a lot about how users interact with the app. Andrew described a phenomenon we're calling "Vibe Learning." In the Drill phase, the app forces you to repeat variations of a sentence ("How's the family?", "How is your mother?", "How are they?") dozens of times.The goal isn't intellectual understanding; it's muscle memory.
"We’re actually optimizing for communication... We're trying to get you to speak out loud and have that feel as automatic as possible, even if you're making mistakes."
This aligns perfectly with the "Vibe Coding" concept we see in engineering—getting into a flow state where the tool (or the language) becomes an extension of your thought process, rather than a syntax puzzle you have to solve manually.
One of the most surprising insights was Speak's massive traction with enterprises like Samsung and LG.Standard corporate language training is notoriously bad—generic videos about "business meetings" that nobody watches.Speak for Business changes the game by ingesting context.
We tried to press Andrew on when we'll have a universal translator or a perfect AI tutor. His answer was grounded in engineering reality. He highlighted that Voice AI is lagging behind Text AI.
The results speak for themselves. Speak has hit $100M ARR with 15M+ downloads. They are proving that AI isn't just a "feature" in education—it's the entire product.
"We basically realized that what we'd be able to build using these powerful tools was a better way of helping people get to fluency... [it] allows us to create a totally new product category here of AI native language learning tools."
If You’re a User Wanting to Learn a Language:
If You’re in EdTech / AI:
Where to Learn More
The following transcript has been lightly edited for clarity and length.
Question: What was the thesis when you started the company back in 2016?
Andrew Hsu: My co-founder and I were really interested in deep learning. This was pre-LLM, but speech recognition systems were starting to come back. We realized that if you look at the curve of progress, models were clearly going to become superhuman in 5 or 10 years. We wanted to use these tools to build a better way of helping people get to fluency—actually speaking out loud and getting immediate feedback, which previously required hiring a human.
Question: Does it switch to "easy mode" if it detects you're having trouble, like if you have a lisp?
Andrew Hsu: That's a good question. What we're optimizing for is communication, not harsh judgment. We want you to speak out loud and have it feel automatic. We tune the speech systems so that it is quite easy to match the words if you are communicating in good faith. We do have a separate pronunciation coach for phone-level feedback, but the drills are designed to get you into a flow state.
Question: Can it help non-native speakers of English speak more comfortably in professional settings?
Andrew Hsu: Yes, absolutely. We started as a consumer app, but in South Korea, big companies like Samsung and LG started using us for employee language training. Our B2B product, Speak for Business, allows us to generate custom curriculum based on your specific job role or even your meeting calendar. It’s hyper-customized training that wasn’t possible before AI.
Question: As someone building voice AI, what are the current limits? What hasn't been unlocked yet?
Andrew Hsu: Audio model intelligence is still one or two generations behind text models. The really hard thing is turn-taking—knowing when to respond. Even humans talk over each other, but the latency in AI makes it noticeable. Also, "code-switching"—speaking two languages in one sentence—is a frontier problem for both speech generation and recognition.
Question: Would you ever expand beyond language to teach other subjects?
Andrew Hsu: The long-term vision is to build a superhuman tutor. Our adaptive learning engine isn't specific to language; it tracks what you know and what you forget. Theoretically, we could expand to other subjects. But right now, there is such an enormous prize in winning language learning that we are hyper-focused on that.
Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.