Originally posted by revmic
Destructive testing on a sample of ONE is statistically meaningless!
Very true. It is statistically meaningless.
Few of us can buy 30 Fowler Pronghorns, made close in time, as a statistical sample. Few of us can buy even 30 AG Russell Deer Hunters to accumulate a bare minimum statistically significant sample size.
Anyone up for buying 30 Opinels to compare with 30 Kershaw Vapors? Only about $300-$450 for one sample group, and $600 for the other sample group.
You should ask someone who can afford to test like this, say, Sal Glesser, if they test 30 mules at each Rockwell hardness on the CATRA to ensure they are seeing scientifically significant results. I suspect they don't. But you could ask. I won't ask, because I don't think it's a reasonable hurdle to demand for such testing.
Originally posted by revmic
Your desire for factual knowledge regarding the limit capabilities of a design, a material, a process simply cannot be determined from a SINGLE sample. The suggestion or the inference that results from a sample of ONE can lead to CONCLUSIONS wrt to limit capabilities is ludicrous and THAT point is the issue with Cliff's testing AND his reports. Hype is hype....whether its promulgated by the knife maker OR the knife tester!
Meaningful results regarding limits capability MUST be arrived at statistically using a statistically valid sample size. To infer performance, good or bad, from an invalid sample is wrong and should be so acknowledged by the tester in no uncertain terms.
What you state is true in a scientific sense.
I will say that people who hold Cliff, or other testers, out as being "very scientific" have overstated, or mis-stated the case. Really, a choice of words that could have been better chosen. Performing tests that have some modicum of repeatability is
usually most of what is required for reasonable decision making.
I personally have no desire to test one product for statistical significance (but revmic, if you do, please do so and post your results for BF readers to review) ... that would get more to the consistency of that maker's work process, not to the statistical significance of whether a 10V blade at Rc63 was REALLY, statistically, more brittle than a 1095 blade at Rc58. Not whether CPM3V at Rc60 seemed to be a lot tougher than O1 at Rc60, to pick artibrary examples. (I can go to metallurgical tables and make an educated guess to answer this question, given a decent heat treat, anyway). Not whether a thinly ground stellite blade would stand up to ripping through carpet and plastic jugs vs. a medium thick grind in 3V.
But save your typing... I have no desire to debate statistical significance here. I don't hold the testers on this forum to scientific hurdles, and don't think that I implied that I did, or that anyone on these forums DOES test to statistical significance. I could be wrong... anyone up for large sample sizes? Any manufacturers maybe?
There tends to be a big gap between an engineering approach, and a scientific approach. Both approaches have their place. They can be complementary, or they can both be bantied about as superior to the other, and in the end, some common sense goes a long way.
4-1/2 quick examples to illustrate:
Example 1, from a different industry: An industrial control valve in a high pressure drop, particulate laden service (say, lime slurry) seems to last for about 2-3 months before it's worn out... won't control to set point, trim is eroded, process is out of control, cycling around. We have run this valve past it's limits. It's shot. Would you demand that I test 20-30 of these same valves in this service to be sure I had a statistically significant sample size... to make sure that e.g. the Masoneilan Camflex valve's 316SS trim set wasn't just having problems originating with some manufacturing flaw? I think not. If I suggested such an approach, I'd be offered a severance package at the next economic downturn. No... based on reasonable knowledge, and not statistical samples, I'd go to Stellite trim, then maybe on to Ceramic trim, and we'd have a solution that got me where I needed to be... longer reliability in this service. That's an engineer's approach. It isn't wrong. It isn't stupid. It isn't laden with the chance of making a goof based on insignificant sample sizes. It makes use of knowledge to solve problems.
Example 2, yet another industry: drug maker invents a new drug. They test it on 3 people, all seem to feel like their arthritis is better. Good test? Of course not. Need a big sample size, double blind testing, and careful statistical analysis to sort out placebo effect, and to attempt to quantify the effectiveness AND safety (side effects).
Example 3: You buy a Chrysler, live in Phoenix. Drive for 1 year, it sucks, poor reliability, overheats monthly, you sell it and buy another very similar Chrysler, drive for 1 year, it also irritates you due to reliability problems, overheating, now what. Insist on driving 28 more Chryslers before making up your mind to buy a Toyota for Phoenix driving? This could be a tire example...tires wear out faster than you care for, you try 29 more pairs on the front wheel drive car to make sure you weren't duped out of a great tire design by a statistically insignificant manufacturing flaw in first place.
Example 4: A tool bit is only lasting through 30 machined parts before it breaks (let's assume it's a new part design). You install another bit. It goes 35 parts, breaks. Do I test 28 more of these if 30-35 parts isn't adequate or reasonable? No, I got to a tougher and/or stronger tool bit, depending on failure mechanism. I ran the tool to it's limits... must I run 28 more also?
The BF reader will have to decide for themselves what constitutes reasonable testing, as always. And, as always, YOMV.
I'll quote myself from another thread, for revmic and others with a statistical significance oriented bent, and/or a bent towards criticism of a person who offers their views, test results on these forums:
Bluntly: if you can do a better job of testing and reporting your findings on any knife, please do so, and post your results here for others to review.
Goose <==> Gander, and so forth.
I'm done. Type away, anyone/everyone... I won't be back to comment on statistical significance.