“Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.”
Yet fertilizer does work. Turns out Fisher was right with his conclusions. DOE!
“But in fact, there’s no logical basis for using a P value from a single study to draw any conclusion. If the chance of a fluke is less than 5 percent, two possible conclusions remain….”
Uhhh, sure, flukes can happen. I mean, I suppose one year the crop yield could be mysteriously better with no known reason, on the other hand maybe it was just the fertilizer. You know, we should really stop using fertilizer, because Fisher’s tests we’re really inconclusive.
“That test itself is neither necessary nor sufficient for proving a scientific result,” asserts Stephen Ziliak, an economic historian at Roosevelt University in Chicago.
So um, where’s the proof then? How do you demonstrate that adding fertilizer actually did anything? Your scientific result may be that the crops yielded more, but how do you know it was significant, or just a fluke?
“A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”
Unfortunate, sure, but this isn’t a textbook on statistics. On the other hand this is muddied a bit, because it’s completely OK to say that of two distributions, that there is a statistically significant different at 95% confidence. Which again, doesn’t prove anything definitively – isn’t that why they call it statistics, you know, “probability”.
“That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time.”
OK-OK. You’re a bone head dude. Lets suppose that a dog is awake for 12 hours a day (that’s according to statistics), and 5% of that waking time is spent barking. So the dog spends .6 hours barking (that’s a lot of barking if you ask me, why don’t you try feeding him). I think this guy is trying to say that if a dog barks 5% of the time even it’s well fed, and there’s a 5% error in the test then there’s no way to determine with confidence whether or not we can attribute the barking to food. But that’s not how the test would work because 5% of its “time” spent barking is 100% of its total bark time. So the question is, what percentage of that time is spent barking over the food bowel (for example)? i.e. 100% of the area under the bell curve (in this case) represents 5% of it’s day; but we don’t care that it’s only 5% of it’s day, we just care about the total bark time. But hay, nice try…
“Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms.”
That’s not a “common error”, gimmie a break. It’s just drug companies trying to sell drugs. It’s statistics 101 to know about practical vs. statistical significance, and marketing 101 to know how to take advantage of the word significant.
Loading...
I especially liked Milbank’s interview. Thanks a bunch!
“Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.”
Yet fertilizer does work. Turns out Fisher was right with his conclusions. DOE!
“But in fact, there’s no logical basis for using a P value from a single study to draw any conclusion. If the chance of a fluke is less than 5 percent, two possible conclusions remain….”
Uhhh, sure, flukes can happen. I mean, I suppose one year the crop yield could be mysteriously better with no known reason, on the other hand maybe it was just the fertilizer. You know, we should really stop using fertilizer, because Fisher’s tests we’re really inconclusive.
“That test itself is neither necessary nor sufficient for proving a scientific result,” asserts Stephen Ziliak, an economic historian at Roosevelt University in Chicago.
So um, where’s the proof then? How do you demonstrate that adding fertilizer actually did anything? Your scientific result may be that the crops yielded more, but how do you know it was significant, or just a fluke?
“A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”
Unfortunate, sure, but this isn’t a textbook on statistics. On the other hand this is muddied a bit, because it’s completely OK to say that of two distributions, that there is a statistically significant different at 95% confidence. Which again, doesn’t prove anything definitively – isn’t that why they call it statistics, you know, “probability”.
“That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time.”
OK-OK. You’re a bone head dude. Lets suppose that a dog is awake for 12 hours a day (that’s according to statistics), and 5% of that waking time is spent barking. So the dog spends .6 hours barking (that’s a lot of barking if you ask me, why don’t you try feeding him). I think this guy is trying to say that if a dog barks 5% of the time even it’s well fed, and there’s a 5% error in the test then there’s no way to determine with confidence whether or not we can attribute the barking to food. But that’s not how the test would work because 5% of its “time” spent barking is 100% of its total bark time. So the question is, what percentage of that time is spent barking over the food bowel (for example)? i.e. 100% of the area under the bell curve (in this case) represents 5% of it’s day; but we don’t care that it’s only 5% of it’s day, we just care about the total bark time. But hay, nice try…
“Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms.”
That’s not a “common error”, gimmie a break. It’s just drug companies trying to sell drugs. It’s statistics 101 to know about practical vs. statistical significance, and marketing 101 to know how to take advantage of the word significant.
I especially liked Milbank’s interview. Thanks a bunch!