
Time One Big Beautiful Bill Arriving as an unstructured 900-page document — without a standardized layout, published IRS forms, and a difficult shipping deadline — Intuit’s TurboTax team had one question: Can AI compress a months-long implementation into days without losing accuracy?
What they’ve built to do this is a template-less tax story, a workflow that integrates commercial AI tools, a domain-specific language, and a custom unit testing framework that any domain-constrained development team can learn.
Intuit’s director of taxation, Joy Shaw, has spent more than 30 years with the company and has experienced both. Tax Cuts and Jobs Act and OBBB. "There was a lot of noise in the law itself, and we were able to extract the tax implications, narrow it down to individual tax provisions, to our clients," Shaw told VentureBeat. "It was really fast using this kind of distillation tools and then allowed us to start coding before the forms and instructions came in."
How OBBB raised the bar
When the Tax Cuts and Jobs Act of 2017 was passed, the TurboTax team worked on the legislation without AI assistance. It took months and the exacting demands left no room for shortcuts.
"We used to have to go through the law and codify sections that refer to other code sections and try to figure it out ourselves." Shaw said.
OBBB came with the same accuracy requirements but with a different profile. At over 900 pages, it was structurally more complex than the TCJA. It came as an unstructured document with no standardized schema. The House and Senate versions used different language to describe the same provisions. And the team had to begin implementation before the IRS published official forms or instructions.
The question was whether AI tools could compress the timeline without compromising output. The answer required a special sequence and instrument that did not yet exist.
From unstructured document to domain specific code
OBBB was still moving through Congress when the TurboTax team started working on it. Using large language models, the team summarized the House version, then the Senate version, and then reconciled the differences. Both chambers cited the same basic tax code sections, a consistent anchor point that allowed these models to make comparisons between structurally disparate documents.
By signing day, the team had already filtered the provisions affecting TurboTax customers, narrowing them down to specific tax situations and customer profiles. Analysis, reconciliation, and provisioning filtering went from weeks to hours.
These tasks were carried out by ChatGPT and general purpose LLMs. However, when the work moves from analysis to implementation, these tools become difficult. TurboTax does not work in a standard programming language. Its tax calculation engine is built on a domain-specific language maintained internally by Intuit. Any model that generates code for this code base must translate the legal text into a syntax it has never been taught and determine how the new provisions interact with decades of existing code.
Claude became the main tool for those translations and dependency maps. Shaw said it can identify what has changed and what hasn’t, allowing developers to focus only on new provisions.
"Able to integrate with constants and define dependencies on variables," he said. "This accelerated the development process and allowed us to focus only on the things that changed."
Construction tools conformed to near-zero error limits
General purpose LLMs trained the team to work code. Two special tools built during the OBBB cycle are required for this code to be shippable.
The first auto-generated TurboTax product screens directly from law changes. Previously, developers designed these screens individually for each provision. The new tool handled most of it automatically, with manual customization only when necessary.
The second was a purpose-built unit test framework. Intuit has always run automated tests, but the previous system only returned pass/fail results. When the test failed, developers had to manually open the underlying tax return data file to trace the cause.
"Automation will let you know if you pass, fail, you’ll have to dig into the actual tax data file to see what could be wrong," Shaw said. The new framework identifies the specific code segment responsible, creates an explanation, and allows editing within the framework itself.
Shaw said the accuracy for the consumption tax product should be close to 100 percent. Sarah Aerni, vice president of technology for Intuit’s Consumer Group, said the architecture should deliver deterministic results.
"Having the kinds of abilities around determinism and verifiably correcting through tests – that leads to that kind of confidence," Aerni said.
The tool controls the speed. But Intuit also uses LLM-based assessment tools to validate the AI-generated product, and even requires a tax expert to assess whether the results of these tools are correct. "Almost anything requires human experience to confirm and validate it," Aerni said.
Four components that any regulated industry team can use
OBBB was a tax problem, but the underlying conditions are not tax specific. Healthcare, financial services, legal technology, and government contracting teams routinely face the same combination: complex regulatory documents, tight deadlines, proprietary codebases, and near-zero error tolerance.
According to Intuit’s implementation, four elements of the workflow are transferable to other domain-constrained development environments:
-
Use commercial LLMs for document analysis. General-purpose models handle parsing, reconciliation, and provisioning filtering well. This is where they increase speed without risking accuracy.
-
Switch to domain-aware tools when the analysis is done. General-purpose models will produce results that cannot be trusted at scale, creating a proprietary environment without understanding the code.
-
Build the evaluation infrastructure before the deadline, not during the sprint. Common automated testing generates pass/fail outputs. Domain-specific testing tools that identify failures and provide in-context fixes make AI-generated code shippable.
-
Deploy AI tools across the organization, not just engineering. Intuit trains and monitors usage across all functions, Shaw said. AI fluency is distributed throughout the organization rather than concentrated in early adopters.
"Here, we continue to rely on the capabilities of artificial intelligence and human intelligence to ensure that our customers get what they need from the experiences we build." Aerni said.




