
At 620 million monthly users, calling a boundary model for every image recommendation isn’t a strategy, it’s a bill. Pinterest CTO Matt Madrigal solved this by gutting Qwen3-VL’s vision layer and rebuilding it with custom placements, reducing costs by 90% and increasing accuracy by 30%.
The Madrigal team invests heavily in customizing open source models “mostly in-house”.
“If you have really unique data that you can fine-tune with an open source model, the quality of the data will literally exceed or exceed the size of the model,” Madrigal said recently. VB Beyond the Pilot podcast.
How Pinterest personalized Gwen for visual discovery
With nearly 620 million monthly active users, Pinterest has long implemented open source models for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP. The company has refined its Pin CLIP on the latter, incorporating unique visual placements and image metadata.
Pinterest’s conversational shopping assistant, Navigator 1, is built on Qwen3-VL and developed in “significant” ways. The Madrigal team essentially “unwrapped” Gwen’s vision encoder layer and fine-tuned the model in proprietary multimodal deployments. This allowed them to capture metadata around pins and images that could then be precomputed offline and regularly retrained on new data to deliver personalized experiences.
“With the open source models, especially the open Apache licenses, where you can really tune a lot of open weights and customize them for unique use cases – that’s where we found open source to be very powerful for us,” Madrigal said.
Bringing in their own input allows her team to gain context around metadata, pins and images; and, in particular, the model performs better in runtime and inference. Without these inputs, developers would have to call and code each image returned at runtime one by one. That results in a delay that is “20 times worse” in terms of outcomes, Madrigal said.
“If it’s something that’s going to be critical to our end users, something that’s going to reach over 600 million monthly active users, something that’s going to drive engagement, we’re probably going to build it or open source it and customize its capabilities,” he said.
How the taste chart covers evolving interests
To guide users to buy from inspiration, the Madrigal team is one "taste chart": a dynamic representation of what individual users actually like, not what they click. “It’s this representation of the evolving tastes of billions of people,” he said.
When people have a clear idea of what they want, they turn to Google or other search engines; Madrigal said Pinterest is for when they are still in the discovery phase. Pinterest’s goal is to encourage “lateral exploration” and convert discovery into intent (ie, clicking ads or making a purchase).
Under the hood, the architecture combines graph structure with image learning. User inputs capture the user’s evolving tastes. These are constantly updated based on activity and new content and alerts. “It’s not a social graph,” Madrigal said. “It’s more of a preference graph: What will inspire you? What are you trying to do next?”
For example, a user may engage in mid-century modern designs; another may prefer a Nantucket aesthetic. These preferences will be captured in the user inputs and the resulting taste chart will present specific, relevant products.
“You go from upper funnel, inspiration discovery, lower funnel intent to completion,” Madrigal said.
Listen to the full podcast to hear more about:
-
How Pinterest uses sandboxes to encourage creativity in a safe and inclusive way;
-
Why a continuous feedback loop can prevent visual AI from bending;
-
The importance of constant benchmarking to measure user engagement, performance, latency, and other factors.
You can also listen and subscribe Off the pilot about Spotify, apple or wherever you get your podcasts from.





