Comments on Statistical Analysis

When we’re looking at data, it is natural for us to find patterns. However, we have a tendency to find patterns, even in randomness. Because of biological and measurement variability, one group may appear different than another, even if there is in fact no real difference, and we may attribute this to a true effect.

In order to determine which patterns are real, we can use statistical hypothesis testing by assessing whether a pattern is supported by the data strongly enough to say that it is unlikely to have occured simply because of chance.

The key word is unlikely. Not impossible. We can still receive false positives, and we choose a chance of false positive (called a significance level, traditionally 5%) that is acceptable without putting an unnecessarily heavy burden of proof on ourselves.

This is all well and good, but a problem emerges when we test multiple patterns at once; for example, when testing whether many SNPs are associated with a disease. Since we’re testing each at a false positive rate of 5%, the probability of at least one false positive increases. What that means is that if we perform statistical hypothesis testing on our data in this manner and find some associations, we won’t be able to tell which ones are real patterns and which ones are false positives, and our conclusions are spurious.

In order to rectify this problem and separate the real associations from the false positives, there are several procedures we can use, called multiple comparison procedures. The simplest is to simply choose a more stringent significance level to test each hypothesis at so that the probability of at least one false positive is 5%.

I bring up this rather simple and dry point of statistics because some papers seem to think that an appropriate solution to this problem is to simply ignore it and report dishonest conclusions. The authors report that a proper multiple testing procedure analysis found that none of the tested SNPs were associated with diabetes and hypertension, but they still decide to conclude that their (likely spurious) results are statistically significant and even title their paper “Aryl hydrocarbon receptor nuclear translocator-like (BMAL1) is associated with susceptibility to hypertension and type 2 diabetes”.

Science attempts to find the truth and is based on careful examination of good evidence. Papers like these do not.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s