Building Great AI Products Means Being a Great PM
Mark Zuckerberg shows how being a great PM means seizing opportunity, learning from users & prioritizing for success.
Whether or not you use a product from Meta, there's no doubt they have built the most successful social products ever created. A large portion of that success can be attributed to the investments Meta has made in AI since the early days: from social graphs feeding recommender systems, to monetization (ads), to content moderation - AI powers a lot of the core functionality across Instagram, Facebook and Whatsapp. It's no surprise that Meta FAIR (Fundamental AI Research) has been investing heavily in LLMs - the most recent public release being Llama 3, an Open Source model with leading benchmarks (at time of writing).
To share the news and more about the model, Zuck went on Dwarkesh Patel's Podcast. If you haven't yet given the video a watch, I'll link it at the bottom of this article. Since it's over an hour long, I wanted to share some takeaways about why this is a great case study into why Building Great AI Products Means Being a Great PM fundamentally.
Find opportunity in unexpected ways
One of the key reasons the Meta team is able to train and serve LLMs is their access to a (currently) scarce resource in the form of GPUs. Zuckerberg talks about ordering GPUs back in 2022. His team wanted to have capacity to train large models for recommending unconnected content. His response to the order was - “Let’s double that” - there might be something on the horizon we don’t know about.
“Part of running the company is knowing there’s always another thing”
This is much like building a great product - being able to look around corners is how you handle unexpected risk, bake in the potential for unexpected gains and ultimately capture opportunity. At the end of the day, Zuckberg knew the GPUs would be valuable for his team for either inference or training recommender systems - so why not take a risk and capture the potential upside?
“This is going to change all of the products”
Right now, the most common application for LLMs we see is chatbots. Today, most of these chatbots are still not very good. That's because chatbots will need to have reasoning. This might seem obvious but conversations are multi turn interactions - just responding isn’t enough. Figuring out what the RIGHT response should be instead of just responding is the hard (and meaningful) part. Multi-turn convos are the name of the game. Chatbots will need to think, not just talk.
Learn from your early users
Zuckerberg actually delves further into highlighting Llama 3's investment in coding capabilities - it turns out, the team (and really industry as a whole) realized that LLMs are not only good at next token code prediction, but that strong coding abilities actually strengthens the LLM's performance on other reasoning based tasks. Llama 3 is designed to use tools well, which is not a feature that was available in Llama 2. for Llama 4, Zuckerberg hints the model may be able to create it's own tools to be used.
The Right Strategy Wins
Building successful products means prioritizing well - you have a budget (time, headcount, energy). That holds especially true when the model is the product. Zuckerberg talks about this in the context of energy constraints vs capital constraints vs resource constraints. When I heard this part of the conversation, I couldn't believe I was hearing the CEO of one of the most valuable companies in the world talking about getting energy permitted to his data centers. Though Zuckerberg mentions that powering data centers is expensive, he means fundamentally that the bottleneck here is time.
Another revelation is the Llama team's use of synthetic data generation to train the model. Rotating the high value tokens back in (i.e. recycling data during training) is one way to maximize performance gains. However, risk mitigation matters here as well - what behaviors won’t the team be able to mitigate that create a product risk? In the case of models, the risk is falling behind on the next architecture. Zuckerberg flags a key product tenet: Ship the MVP and know what to put in the next rev. For instance, though the current architecture of Llama 3 was able to scale up to 15 trillion tokens, the team still didn’t max out performance. However, knowing the architecture had improvements and as a company you have a finite set of resources, Meta had to ship the model they have that worked well enough today, to reallocate GPUs for training Llama 4.
Check out the whole interview below👇
If you liked this, consider subscribing, sharing and leaving any feedback!