Yesterday a new article was published in the prestigious Journal of the American Medical Association, JAMA. As a statistician, I find the fact that this article is published in such a prestigious journal in its current format somewhat surprising and definitely disturbing.

The key result that is noted in the article is the fact that male physicians taking multivitamins seem to develop less cancer than similar male physicians taking a placebo. In the multivitamin group, 1290 of 7317 (17.6%) subjects developed cancer, compared to 1379 of 7324 (18.8%) subjects in the placebo group. The difference is 1.2%, with a 95% confidence interval of [-0.05%, 2.45%] (note that my simple calculation of the difference in proportion is not statistically significant, compared with the survival analysis in the manuscript).

The issue I have with the paper is one of multiplicity. A simple comparison with the planned objectives and analyses of the study in clinicaltrials.gov indicate that there were 3 primary objectives of this study. There is additionally 4 secondary objectives stated, one of which is whether multivitamins reduces the risk of cancer. It seems fair to conclude that since the publication is addressing one of the secondary endpoints, then all seven outcomes were considered in the study. It is not entirely clear from the publication how many of the seven outcomes were statistically significant at the 5% level. The stated p-value is 0.04 for the main analysis in the publication. This must be interpreted in the light of all of the planned analyses in the study. A p-value of 0.04 is an indication that the probability of seeing this result if there is no actual difference between treatments is one in twenty-five. However, if there were 7 such analyses, then the probability of observing at least one result as impressive as that reported would be much less, possibly as low as one in four (i.e. quite likely).

To make matters worse, the authors have also reported adverse events based on statistical significance tests. I assume with such a large group of patients that there would have been a multitude of different types of adverse events reported (100s?). Were all of these different event types tested? In that case, it would be expected that you would find many statistically significant differences, just by chance (a significant result is only an indication that you would find such an extreme observation one in twenty times).

It is clear that this publication would not have demonstrated a clinically significant result if the fact that the study had so many objectives was taken into account. As such, the conclusions that the study makes should not be conclusive, and in my opinion, should not encourage anyone to take multivitamins to avoid development of cancer. I would however, highly recommend this paper to those in universities teaching statistics as a case study in what not to do.

The clinical trials entry that you link to states that the 3 primary objectives were to measure 1. prostate cancer 2. total cancer and 3. cardivascular events. This JAMA report is therefore concerned with primary objectives 1 & 2, not solely a secondary objective as your article above claims. Primary objective 3 was already reported (result negative) in a separate article in JAMA http://jama.jamanetwork.com/article.aspx?articleid=1389595.

Colin,

Thanks for your reply. I would like to clarify the trial objectives. From the web entry linked to above, they are:

Primary

•To determine whether vitamin E every other day reduces the risk of developing prostate cancer in older healthy male physicians.

•To determine whether daily vitamin C and/or a multivitamin reduces the risk of total cancer in these participants.

•To determine whether vitamin E every other day, vitamin C daily, or a multivitamin daily reduces the risk of major cardiovascular events in these participants.

Secondary

•To determine whether vitamin E and/or multivitamins reduce the risk of developing total cancer, colon cancer, and colon polyps in these participants.

•To determine whether vitamin E, vitamin C, or multivitamins reduce the risk of myocardial infarction and stroke in these participants.

•To determine whether vitamin E, vitamin C, or multivitamins reduce the risk of age-related macular degeneration or cataract in these participants.

•To determine whether vitamin E, vitamin C, or multivitamins reduce the risk of early cognitive decline in participants aged 65 and over.

Even if it was the case that there were only three objectives, this gives me an opportunity to explain further the type I error: incorrectly rejecting the null hypothesis (or more simply, concluding a treatment effect when one doesn’t actually exist). The scientific community frequently determine something to be a “true” effect, if the probability of the outcome is unlikely by chance. In practice, a cut off of 1 in 20 is usually used – that relates to a probability or p-value of 0.05. Imagine that we have a trial with 3 objectives, and each of these three objectives are measured as having a “true” treatment effect if the observed result would need to be more extreme than a 1 in 20 chance in the case that there is actually no difference between the treatments. If each of them had a one in 20 chance, then, taking an extreme case, you can imagine if you had 100 outcomes being tested, then it would be likely you would find around 5 that you would determine had a “true” effect – falsely. In our case, even with three objectives, it isn’t appropriate to use a simple cutoff of 1 in 20 (or p<0.05). A simple adjustment ("Bonferroni") is to divide the cutoff by the number of tests – in our case this could give a cutoff of 0.0167, which would change the conclusion of this paper. There are other methods available, though it is my opinion that all methods of taking into account the multiplicity would render this result statistically "non-significant".

The other point of note in the objectives is that there are multiple treatments, as well as placebo in the trial. This causes further concern in terms of the number of significance tests that are being carried out.

To clarify further, if this were a trial of a new drug, there is no way that the government regulators around the world would accept this result without taking into account the fact that multiple tests are being carried out. Here's the statement of the European regulators, that has some useful information…

http://www.ema.europa.eu/ema/pages/includes/document/open_document.jsp?webContentId=WC500003640

Note the quote in the conclusion "concern is focused on the opportunity to choose favourable results from multiple analyses".

It would seem to me intuitively that the adjustment for multiple initial objectives is necessary to counter a claim that one drug tested on 1000 people for 20 different conditions will likely turn up one positive which may be false, the positive then being trumpeted and the 19 negatives ignored.

In this case, however, 3 different treatments were used for 3 different conditions. Two were negative and one positive but all were published with equal prominence. I don’t see why the positive result is any less valid than if you did three trials consecutively each with one objective.

Of course, to do single objective trials would be incredibly wasteful when you have 14,000 good subjects and it takes 14 years for each?

If you could explain where I am wrong, in layman’s terms, I would be very grateful.

Hi Colin,

Just a clarification: there were 4 treatments tested in 3 diseases (and that was only the primary objectives).

If a trial has 3 primary objectives, and you are going to conclude that a treatment has a “real effect” for each of the objectives and use a probability cutoff that you will accept the evidence for each of the objectives if the results indicate that if there were no “real effect” then you would only observe such a big difference one in twenty times (i.e. if the p-value is less than 0.05), then it is clear that the chance of finding at least one “real effect” in the case of no underlying treatment difference is a lot less than 1 in 20. If the chance of getting each of the diseases is completely unrelated to each other, then the actual chance of seeing a “statistically significant” result is 1 in 20/3, which is much more likely. In practice, the chance of experiencing any cancer is related to prostate cancer, but this would require some more complex statistics to investigate.

The bottom line is that if all of these vitamin treatments have exactly no effect, the chance of seeing the results that were observed are actually quite high.

Thanks for the clarification. I think I am beginning to get it

What bothers me a little is that the “skeptic statistician community” like to say: “one in twenty results at 95% CI will be wrong”. Then,when a positive result turns up,they criticize the method, criticize the research team, criticize the publishers etc. Why not just accept that it is the 1 in 20, point to another 19 which are negative (including 6 or 7 in this study itself), and move on?

I wouldn’t be at all skeptical if:

1) this study had one primary objective that would be tested with one p-value in order to make a decision if a treatment worked

OR

2) if there were more than one objective, the proper statistics were applied.

Since neither of these happened, and actually there are a LOT of comparisons of treatments, groups of patients, and subgroups of patients, it is unfortunately pretty impossible whether to believe the result that was observed. I am involved in designing trials on a daily basis to attempt to prove that drugs work, and these are bread-and-butter issues for any statistician. It is my opinion that this trial is poorly designed rather than poorly reported.

Great discussion! I have found myself having very similar discussions before.

I found this article an interesting read.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1112991/