No Surprise: The Volume of NSA Disputes Was Predictable
How the No Surprises Act’s Volume Crisis Was Baked In From the Start
“Flooding the system” is a line you read in almost every piece of journalism about the No Surprises Act (NSA) and efforts by doctors to obtain fair reimbursement under its Independent Dispute Resolution (IDR) process. Recent academic and insurer studies have tried to recast usage of IDR as profiteering, with the discrepancy between initial estimates and real-world usage presented as “abuse.” HaloMD has repeatedly faced these claims. While this talking point appears academic and evidence-based on its face, it’s premised on a foundation of bad data.
A few years ago, in Health Affairs Forefront, I argued that one of the most persistent implementation challenges for the NSA has been volume — not because the law is flawed, but because the methodology the federal government used to estimate its use was. This is not my subjective opinion, but rather, objective fact. Avoidable mistakes in methodology led to a dramatic underestimation of how many disputes the system would actually need to process.
That miscalculation continues to shape the debate today. It’s been repeated by well-intentioned observers, but weaponized by disingenuous opportunists.
If we want to improve the functionality of the NSA and the IDR process, we have to start by revisiting how the tri-departments — Health and Human Services, Labor, and Treasury — arrived at their original projections. There will be math, but stick with me.
A Modeling Error That Built the Backlog
Before the NSA went into effect, the tri-departments estimated that the federal IDR system would see roughly 17,000 disputes annually.
In reality, the system received 489,000 disputes in its first 14.5 months — an annualized run rate of roughly 405,000 disputes per year. It’s currently on track to surpass 2 million annual disputes.
That is not a marginal miss. It is an orders-of-magnitude error. The question is, “why?”
The answer is, the departments based their estimate entirely on New York’s IDR experience. They took the approximately 1,000 annual disputes observed by New York and scaled them nationally by calculating New York’s share of the employer-sponsored and privately insured population — roughly 5.7 percent of the national total — yielding the 17,000 figure.
So, what’s wrong with this approach? First, the Departments failed to differentiate between New Yorkers enrolled in state-regulated plans (and therefore subject to the state’s balance billing law) and those with plans regulated by the federal Department of Labor (which cover the majority of Americans and are NOT subject to state law), when determining the denominator for the volume estimation. But much worse, the methodology ignores the fact that New York’s law was not an apples-to-apples model.
New York’s law required the use of FAIR Health data as the benchmark for out-of-network reimbursement and effectively carved out emergency medicine disputes by tying those out-of-network payments to an independent external standard known as “usual and customary” rates. The federal NSA did not adopt that structure. Instead, disputes across all relevant medical specialties, including emergency medicine, are resolved by an independent arbitrator that must consider a range of factors when making its decision.
That difference matters.
New York’s system structurally eliminated disputes from the specialty most likely to result in an out-of-network visit: emergency medicine. The federal system does not, and unsurprisingly, over half of all federal disputes involve emergency medicine. Forget apples-to-oranges, this is more like comparing apples-to-omelets.
The Model that was Ignored
Texas offered a more relevant model, and usage data was readily available.
Texas’ balance billing law had:
An arbitration process,
No carve-out or anchor for emergency medicine,
A structure more generally analogous to the federal NSA.
In its first year after enactment in 2020, the Texas Department of Insurance received nearly 49,000 dispute resolution requests. That law applied to approximately 5.8 million Texans.
Let’s walk through the math, using the exact same underlying methodology used by the Departments, but substituting New York’s data for Texas’. Texas’ law applied to about 5.8 million Texans, who were enrolled in state-regulated insurance plans. At the time, the national employer-sponsored and privately insured population was approximately 183 million people, so:
5.8m covered Texans ÷ 183m insured nationally = .0317, meaning Texas’ experience represents about 3.17% of the relevant national insured population
49,000 disputes ÷ .0317 = 1,546,000 estimated annual disputes
That is the volume the federal government might have anticipated had it selected Texas as its baseline rather than New York.
Instead, the system was built for 17,000.
The Consequences of Underestimating Volume
When a system is built for 17,000 disputes and receives hundreds of thousands, predictable failures follow:
Misaligned administrative cost projections
Uninformed policy making
Eligibility delays
Backlogs are measured in months, not days
These failures are then mischaracterized as the result of “abuse” rather than modeling error. Volume has become a controversy. But the volume was predictable.
Overall Impact
The debate around the IDR process has often framed high volume as evidence of provider overuse or something even more nefarious.
But if the system was underbuilt from the beginning because it relied on a structurally suppressed model, then volume is not an aberration; it’s something we can forecast.
Texas should have been the focus because:
It applied to all of the same specialties as the NSA
It utilized an IDR process structured similarly to the NSA’s
Usage data was readily and publicly available
The proposed reforms to strengthen IDR infrastructure — including enhanced eligibility determination and improved portal transparency — are necessary. But they also reflect a recognition that the original capacity assumptions were flawed.
We cannot fix what we misdiagnose. Those who continue to rely on a misdiagnosis to treat a problem should be viewed with significant skepticism, as they are either uninformed or intentionally attempting to deceive.
If we continue treating volume as abuse rather than as a foreseeable outcome of flawed modeling, we risk undermining the very reforms the NSA created.
