I knew it. I just knew it. I knew I couldn’t get through October, a.k.a. Breast Cancer Awareness Month, without a controversial mammography study to sink my teeth into. And I didn’t. I suppose I should just be used to this now. I’m referring to the latest opus from H. Gilbert Welch and colleagues that appeared in the New England Journal of Medicine last night, Breast-Cancer Tumor Size, Overdiagnosis, and Mammography Screening Effectiveness. Yes, it’s about overdiagnosis, something I’ve blogged about more times than I can remember now, but it’s actually a rather interesting take on the issue.
Before 2008 or so, I never gave that much thought to the utility of mammographic screening as a means of early detection of breast cancer and—more or less—accepted the paradigm that early detection was always a good thing. Don’t get me wrong. I knew that the story was more complicated than that, but not so much more complicated that I had any significant doubts about the overall paradigm. Then, in 2009, the United States Preventative Services Task Force (USPSTF) dropped a bombshell with its recommendation that mammographic screening beginning at age 50 rather than age 40 for women at average risk of breast cancer. Ever since then, there have been a number of studies that have lead to a major rethinking of screening, in particular screening mammography and PSA testing for prostate cancer. It’s a rethinking that affects discussions even up to today. After all, it was only six days ago that I pointed out how Ben Stiller almost certainly gave too much credit to PSA testing for having saved his life from prostate cancer. Basically, screening is not the panacea that we had once hoped for, and the main reason is the phenomenon of overdiagnosis. Before I go on, though, remember that we are talking about screening asymptomatic populations. If a woman has symptoms or a palpable lump, none of this discussion applies. That woman should undergo mammography.
Basically, overdiagnosis is a phenomenon that can confound any screening program in which large populations of asymptomatic patients are subjected to a diagnostic test to screen for a disease. The basic concept is that there can be preclinical disease that either does not progress or progresses so slowly that it would never threaten the life of the patient within that patient’s lifetime. yet the test picks it up. Because we don’t have tests that can predict which lesions picked up by such a screening test will or will not progress to endanger the patient, physicians are left with little choice but to treat each screen-detected lesion as though it will progress, resulting in overtreatment. This situation is very much the case for mammography and breast cancer, for example, for which there is evidence that as many as one in five to one in three screen-detected (as opposed to cancers detected by symptoms or a mass) breast cancers are overdiagnosed. As a result, physicians are much less confident in traditional recommendations for screening mammography than we once were. Add to that the phenomenon of lead time bias, in which earlier detection doesn’t actually impact survival but only gives the appearance of prolonged survival, as I described recently in more detail. Similarly, due to the phenomenon of length bias (also described by yours truly recently), mammography also tends to preferentially detect slower growing tumors.
With that background in mind, let’s take a look at Welch’s latest. The basic idea behind the study is rooted in the key assumption behind mammography, which is that the detection of small tumors that have not yet become palpable, will prevent progression and, over time, lead to fewer large or more advanced tumors being diagnosed. Welch et al also note the difference between efficacy (how well a treatment or screening test works in randomized clinical trials) and effectiveness (how well an intervention works when “unleashed” in the community):
Although it may be possible to show the efficacy of screening mammography in reducing cancer-specific mortality in the relatively controlled setting of randomized trials, those trials may not accurately reflect the actual effectiveness of screening when it is used in clinical practice. Differences between efficacy and effectiveness with respect to the benefit of screening may be particularly stark when the treatments administered in practice have markedly changed from those administered in the trials that led to the implementation of widespread screening. Furthermore, although trial data may provide an assessment of some negative consequences of screening, such as false positive results and associated diagnostic procedures, such assessments may understate what actually occurs when screening is implemented in the general community. The collection of data regarding other harms, such as overdiagnosis (i.e., tumors detected on screening that never would have led to clinical symptoms), requires additional long-term follow-up of trial participants, and those data are often either not available or they reflect patient follow-up and testing practices from decades earlier.
This actually reflects a key controversy in breast cancer treatment. We know that breast cancer mortality has been steadily declining since 1990 or so, roughly 30% since then. The controversy is not over whether breast cancer mortality is declining. It is. The controversy is over what’s the cause: screening, better treatment, or some combination of the two. Indeed, Welch et al even note that in models used by the Cancer Intervention and Surveillance Modeling Network the estimates of the contribution of screening to the observed reduction in breast-cancer mortality range from as little as 28% to as much as 65%. To approach this question, Welch et al decided to take a very simple approach. At least, the question is simple. They decided to look at a metric that’s been measured for many years: The size of breast cancer tumors at the time of diagnosis. The hypothesis, of course, is that mammography should produce a shift towards smaller tumors. So Welch looked at breast cancer diagnoses in the SEER Database from 1975 to 2012, which encompasses the time period before the advent of mass mammographic screening in the US, the period during which screening programs were implemented, and the period after. Size at diagnosis was recorded and divided into the following groups:
There were a fair number of complexities, the main one having to correct for missing tumor sizes in the database, which were common decades ago but became less common as time went on. Without going into the details, I can point out that the results were as follows:
At first glance, this looks as though mammography is doing exactly what it’s supposed to be doing. Notice how, beginning in the early 1980s, there was a shift in the distribution of tumor size at diagnosis from larger tumors to smaller tumors. For example, the combination of in situ and tumors less than 1 cm increased from 11% to 40%, while the percentage over 3 cm in size decreased from 38% to 18%. So far, so good, right?
The observation here is that the increase in the number of small tumors was considerably greater than the decrease in the number of large tumors. Indeed, the results look very much like the results of Welch’s last study, which compared the incidence of advanced versus early cancers and found basically the same thing: The introduction of mammographic screening was associated with a greater increase in the incidence of early cancers than there was a decrease in the incidence of more advanced cancers. Thus, we have fairly consistent results showing in two different studies that, while the introduction of mammographic screening appears to have resulted in a decrease in the incidence of larger/more advanced tumors, it resulted in a far larger increase in the diagnosis of smaller/less advanced tumors. In the last study, Welch estimated the rate of overdiagnosis to be around 30%. What about in this study?
This was the magnitude of the shift:
However, this shift in size distribution was less the result of a substantial decrease in the incidence of large tumors and more the result of substantial increases in the detection of small tumors(Figure 2B). Nevertheless, modest decreases were seen in the incidence of large tumors. The changes in size-specific incidence of breast cancer after the introduction of screening mammography are shown in Table 1. The incidence of large tumors decreased by 30 cases of cancer per 100,000 women (from 145 to 115 cases of cancer per 100,000 women), and the incidence of small tumors increased by 162 cases of cancer per 100,000 women (from 82 to 244 cases of cancer per 100,000 women). Assuming that the underlying burden of clinically meaningful breast cancer was unchanged, these data suggest that 30 cases of cancer per 100,000 women were destined to become large but were detected earlier, and the remaining 132 cases of cancer per 100,000 women were overdiagnosed (i.e., 30 subtracted from 162).
This is an estimate of overdiagnosis even greater what previous studies have found, roughly 80% or 132/162 additionally detected tumors were overdiagnosed, and the estimated decrease in mortality attributable to mammography was 12 per 100,000 women in the earlier time period after mammographic screening was introduced. In more recent years, with better treatment, the estimated reduction in mortality was smaller, around 8 per 100,000. In comparison, the estimated reduction in mortality due to better treatment was 17 per 100,000. Thus, overall, better treatment has reduced mortality from breast cancer more than screening has. However, that’s not to say that mammography doesn’t save lives. It does. It’s just that the effect is more modest than previously believed.
Here’s a video video in which Welch explains his results:
Given the randomized controlled clinical trials that show a much larger reduction in breast cancer mortality due to screening mammography, why is it that more recent studies like this one show a much more modest effect of screening? Well, it’s frequently the case that “real world” effectiveness is less than what is found in clinical trials; so this observation should not come as a surprise. It should also not be a surprise that breast cancer treatment has been getting better and that that might make mammography less useful than it was 30 years ago. It’s also very complicated, as Welch points out:
There is no perfectly precise method to assess the population effects of cancer screening. Screening mammography performed in an asymptomatic population that has an average risk of cancer can, at best, have only a small absolute effect on cancer-specific mortality because the vast majority of women are not destined to die from the target cancer. Because the mortality effect is necessarily delayed in time, the availability of improving cancer treatment over time further complicates the assessment of the contribution of screening. Inferences regarding overdiagnosis are equally imprecise since overdiagnosis cannot be measured directly.
One notes that this study is also imprecise. As Joann G. Elmore, MD, MPH notes in an accompanying editorial, Welch et al rely on data with extensive missing values, forcing them to make assumptions about underlying disease burden that cannot be verified, which is why they acknowledge that their estimates are imprecise. She also notes:
We are using archaic disease-classification systems with inadequate vetting and defective nosologic boundaries. Diagnostic thresholds for “abnormality” need to be revised because the middle and lower boundaries of these classification systems have expanded without a clear benefit to patients. Disease-classification systems are often developed by experts on the basis of a small number of ideal cases and are then adopted broadly into clinical practice — a system that is antithetical to the scientific process. The National Academies of Sciences, Engineering, and Medicine recently deemed improvement of the diagnostic process “a moral, professional, and public health imperative.”10 Rigorous analytic methods are required for the development of disease nosologies, and physicians need more sophisticated tools to improve diagnostic precision and accuracy. At the patient level, we need better methods of distinguishing biologically self-limited tumors from harmful tumors that progress.
In cancer, I consider that last need to be the most critical of all. We require biological markers that tell us which of these tumors detected by mammography are safe to keep an eye on through “watchful waiting” and which are dangerous. Until we have those tools, overdiagnosis will remain a problem.
Here’s the thing. People will see what they want to see in this study. Those who believe screening saves lives will argue that the small benefit observed is worth the cost of overdiagnosis or will try to argue that Welch greatly overestimates how much overdiagnosis there is. Even so, there is a broad consensus that overdiagnosis is a problem. Indeed, new suggested mammography guidelines that have been recommended in the last few years have come about because of a desire to decrease overdiagnosis and overtreatment while maintaining early detection of potentially dangerous In contrast, the mammography “nihilists,” as I like to call them, will point to this study as saying that screening mammography is useless. However, science can never really fully answer whether a woman should undergo mammography or whether mass screening programs are worthwhile. The reason is that it boils down to a value judgment: Is a small decrease in one’s chance of dying of breast cancer worth the risk of overdiagnosis and harm from overtreatment? Many women will answer yes. Different women will make different choices, and different people will have different opinions.cancers.