
Enterprise AI teams face a dilemma: The best models today may not be the best models a year from now. MassMutual’s answer is to hedge long-term bets and build infrastructure that can change models as the market changes.
“The world of artificial intelligence today is extremely dynamic,” Sears Merritt, MassMutual CIO, explained in the new VB Beyond the Pilot podcast. “We wanted to make sure we were positioned to ride that wave of dynamism.”
The strategy is paying off in a big way. MassMutual measured a nearly 30% increase in developer productivity, AI-powered contact center workflows reduced resolution times to under 10 minutes, and reduced costs from dollars to cents.
But the broader lesson for IT leaders may be less about outcomes and more about how companies thoughtfully build their AI infrastructure and keep users at the center.
Maintaining optionality for tomorrow’s opportunities
MassMutual works with leading vendors, but keeps those relationships on an hourly basis. “These relationships are limited so we’ve maintained an optionality for best-of-breed tools because things have matured in that space and settled and stabilized at a certain point,” Merritt said.
This philosophy also applies to open source models. Merritt says his team is “100%” looking at open source tools and sees technology playing a big role in how MassMutual (and similar companies) use AI.
“Of course, we will need frontier models and advanced capabilities to do what is impossible today, and possible tomorrow,” he said.
Measuring results from the start
MassMutual’s AI efforts fall into two broad categories.
The first focuses on enablement: Putting productivity-enhancing tools like Copilot and virtual assistants in the hands of all employees. The second involves what Merritt describes as “deepening and focusing” initiatives, where teams target a specific workflow or business process that will have a strong impact on advisors, underwriters or employees.
Instead of focusing on adoption metrics, these projects start with predefined success criteria. “Everything we do is measured,” Merritt said. “There’s always a benchmark of success that we set beforehand to determine whether or not we’re going to expand some of these.”
The company also encourages deliberate experimentation, giving employees access to a number of best-in-class models, “token-consumer workflows” and other possible capabilities so they can measure the benefits compared to “simpler, lower-cost” large language models (LLM).
At the same time, MassMutual is collecting increasingly detailed analytics around usage patterns, developer workflows, model performance and costs. The goal is to create operational intelligence to drive workloads to the right model based on cost, response quality, and user experience, while reducing costs.
These insights will ultimately drive optimization decisions around model routing, operational selection, response times, and infrastructure design.
“We’re getting access to analytics that allow us to look at usage patterns in great detail, developer workflows, and start to understand who’s using what, when, and for what kinds of tasks,” Merritt said.
Why MassMutual sometimes chooses the more expensive model
Another interesting aspect of MassMutual’s approach is how it evaluates the quality of its AI. Instead of focusing solely on benchmarks or token spending, the company uses what Merritt calls a “trust account” framework.
The process combines user feedback with operational metrics to understand how employees perceive AI-generated responses and whether those responses actually improve results.
The contact center redesign tested this framework. During development, staff were given access to two different LLMs. One produced real-time responses, but the quality was noisier. The other more expensive option took a few extra seconds to respond, but consistently provided higher quality responses.
Conventional wisdom and business speed might suggest users prefer the former; but they chose quality with great preference. Merritt’s team asked users about response quality, their preferred model, and their overall opinion of the experience.
Often users said, “We want the more expensive one. We’re willing to wait, but the quality difference is so great that the extra two seconds is actually worth it to us.”
This feedback ultimately determined which model MassMutual deployed.
“We incorporated that experience into our decision-making process and it led us to say that on a relative basis the costs are insignificant, so we’re going to use a more sophisticated model," Merritt said.
Listen to the full podcast to hear more about:
-
Why Mythos has “completely changed” the cyber security landscape – not the type of threats, but the speed at which these threats appear;
-
How a team of AI engineers modernized MassMutual’s core framework in 7 days (previously a process that would have taken 3 months);
-
Why MassMutual eschewed tokenmaxxing specifically to curb AI usage and costs and went “unlimited” to protect against cost shocks.
-
How a “multi-environment type” agent will support AI.
You can also listen and subscribe Off the pilot about Spotify, apple or wherever you get your podcasts from.





