GPT-4 Turbo is changing the game for everyone building products.
Three specific ways GPT-4 Turbo impacts product development today.
The moment OpenAI announced GPT-4 Turbo at DevDay I was immediately pinging my team about it — I’ve seen a lot of conversation about some of the other updates and the new Assistants API, but personally, I believe GPT-4 Turbo is going to be huge, and I thought I’d share why.
If you haven’t already read OpenAI’s announcement yet, check it out here first. Otherwise, I’ve laid out three things that have me hooked on Turbo already, and that I think you should be excited about too if you’re building a next-gen product.
Predictable output
The first major change I’ve observed with GPT-4 Turbo is an increasingly deterministic output. So far, building on top of LLMs has felt a bit like working with magic… but not always in a good way.
Seed
With the new fixed seed
feature (currently in beta) and a temperature
of 0, you’re much more likely to get the same output returned each time. Now that the output is more deterministic, you no longer need 10 runs to have confidence that your prompt works as intended.
For people like developers, who thrive at pure logic-based work, predictable results are key: they make AI both more straightforward to integrate into your product and simplify the process of prompt iteration. With more deterministic results, when you tweak your prompt and get a different output, you can reliably know it’s improved, rather than just random chance.
Improved predictability also means you can actually add regression tests for your AI-powered features, testing against GPT to make sure your prompts haven’t degraded over time. Up until this point, the unpredictability of AI responses would mean that your test randomly fails 1/10 times. This might sound small, but is actually in my mind a big turning point — reliable, non-flaky tests are a baseline requirement for being able to build dozens, and soon hundreds, of AI features into a product that you build and sell to real, paying customers.
JSON mode
It would be remiss of me to not also mention here that GPT-4 Turbo also comes with natively supported JSON output (they call it JSON mode). This is important because we’re now guaranteed to have valid JSON at the end; before, sometimes you might ask for JSON but it would fail, and need to then retry the same query until you can parse a valid JSON. With this move, OpenAI has effectively committed to native JSON support, and made it clear that this is the best path for developers to take (vs YAML).
The most frustrating part of developing with GPT-4 has been how unpredictable it can be — these new Turbo features mean that as product builders, we can actually start building on AI more like we’re used to building a regular product, in a robust and deterministic way.
More than 2x faster
From my own runs using our internal data, GPT-4 Turbo is more than twice as fast as GPT-4. This isn’t scientific, and varies depending on the day, but I haven’t seen any stats from OpenAI about the speedup, so I thought I’d share what I saw on a real-world use case.
Specifically, in my testing, I found Turbo to be at 26 tokens/second vs 13 tokens/second for GPT-4 (if you ran your own tests, I’d love to hear what you’ve been seeing too).
In terms of product impact, this is big — big-big. In many AI-based features, queries from users are run in two steps: reason first (think about this problem), then send a user-facing output. Because the reasoning step is typically hidden from the end user, the latency to first word or output seen by the user is drastically decreased with Turbo.
Up until now, reasoning has made real-time AI features hard, and forced developers to figure out which features can be fast enough to be real-time vs are ones that need to run in the background and alert the users when the output is ready. With the increased speed of Turbo, we all can start skewing more and more towards real-time AI-based features and stop spending as much mental energy figuring out which are doable in real-time now. For Height, this means some of our features like AI-generated automated standups and release notes can move from scheduled to real-time features.
Larger and cheaper context
The last key feature of GPT-4 Turbo I wanted to call attention to is just how much more context you can send at a time, and the price difference. Compared to GPT-4, Turbo supports 4x more context and is 6x cheaper. This is a real, meaningful increase that makes us have to think less about being miserly with our token counting, and shift our attention back to building real, streamlined product experiences. Effectively, there’s no longer a limit to your system and logic prompts, and what up til now had to be done in multiple roundtrips, can now be done in a single one. Back to our automated standups feature — they currently run on a per-person query, but now with the token increase, a team of 5 can be done in a single query.
To this point, prompt engineering meant being creative around building tooling to split features into multiple queries and calculating if your feature would be doable with the context limit, but today’s context window is big enough for the majority of use cases. And of course, I’d always welcome a bump to 1 million tokens, I just think we’ve hit a threshold going from 32K to 128K that’s more major than any other increases to come. And that’s really something worth calling out, as a major milestone for all AI-powered products.
Clearly, I’m downright giddy about what OpenAI’s announced last week, and I’m really excited to start seeing how these changes will allow people to build what would have been just a twinkle in our eyes a mere moment ago.
Selfishly, these announcements are huge for what we’re doing over at Height, as our upcoming 2.0 launch is heavily built on AI, and it makes me even more confident we can actually build what I’ve been envisioning the past few months.
Ping me (me@michaelvillar.com) if Turbo has unlocked any major new product features; I’d genuinely love to hear about what is now possible for you and your team.