Ways to attack Cliff Stamp's reviews - A guide for the thin-skinned

Originally posted by Roadrunner
If you think American beer is bad, try Korean beer sometime :barf: :barf: :barf:. I agree that Budweiser sucks, as does Coors, but MGD isn't bad, nor is Michelob. Corona is a personal favorite of mine, although not many people seem to agree with me on that.

Wait a minute, what was this thread about :confused:? Oh, yeah, Cliff rocks! No, wait, he chops up rocks. Hold on... screw it, I'm gonna to go drink a beer.

BEER? come on people you guys should know the only way to is vodka or rum!
 
Originally posted by mschwoeb
Actually Buch does their heat treat according to the methods that Paul Bos sets out, and for the stuff the want to make real sure of he does it.

Paul Bos might advise Buck on how to heat treat, but he doesn't actually work directly on the heat treating for most of Buck's stuff. Plus, Bos doesn't mark all of Buck's stuff. When I think of a Bos heat treat, I mean something done by Bos (or his employees, I know he doesn't do all of the stuff personally) with the Bos flame logo.
 
Rob,
Your points re: destructive testing are valid and your examples given regarding the professional engineering approach to components selection is valid....! That said however, you are missing one critical point. Destructive testing on a sample of ONE is statistically meaningless!

Your desire for factual knowledge regarding the limit capabilities of a design, a material, a process simply cannot be determined from a SINGLE sample. The suggestion or the inference that results from a sample of ONE can lead to CONCLUSIONS wrt to limit capabilities is ludicrous and THAT point is the issue with Cliff's testing AND his reports. Hype is hype....whether its promulgated by the knife maker OR the knife tester!

Meaningful results regarding limits capability MUST be arrived at statistically using a statistically valid sample size. To infer performance, good or bad, from an invalid sample is wrong and should be so acknowledged by the tester in no uncertain terms.
 
Originally posted by revmic
Destructive testing on a sample of ONE is statistically meaningless!
Very true. It is statistically meaningless.

Few of us can buy 30 Fowler Pronghorns, made close in time, as a statistical sample. Few of us can buy even 30 AG Russell Deer Hunters to accumulate a bare minimum statistically significant sample size.

Anyone up for buying 30 Opinels to compare with 30 Kershaw Vapors? Only about $300-$450 for one sample group, and $600 for the other sample group.

You should ask someone who can afford to test like this, say, Sal Glesser, if they test 30 mules at each Rockwell hardness on the CATRA to ensure they are seeing scientifically significant results. I suspect they don't. But you could ask. I won't ask, because I don't think it's a reasonable hurdle to demand for such testing.
Originally posted by revmic
Your desire for factual knowledge regarding the limit capabilities of a design, a material, a process simply cannot be determined from a SINGLE sample. The suggestion or the inference that results from a sample of ONE can lead to CONCLUSIONS wrt to limit capabilities is ludicrous and THAT point is the issue with Cliff's testing AND his reports. Hype is hype....whether its promulgated by the knife maker OR the knife tester!

Meaningful results regarding limits capability MUST be arrived at statistically using a statistically valid sample size. To infer performance, good or bad, from an invalid sample is wrong and should be so acknowledged by the tester in no uncertain terms.
What you state is true in a scientific sense.

I will say that people who hold Cliff, or other testers, out as being "very scientific" have overstated, or mis-stated the case. Really, a choice of words that could have been better chosen. Performing tests that have some modicum of repeatability is usually most of what is required for reasonable decision making.

I personally have no desire to test one product for statistical significance (but revmic, if you do, please do so and post your results for BF readers to review) ... that would get more to the consistency of that maker's work process, not to the statistical significance of whether a 10V blade at Rc63 was REALLY, statistically, more brittle than a 1095 blade at Rc58. Not whether CPM3V at Rc60 seemed to be a lot tougher than O1 at Rc60, to pick artibrary examples. (I can go to metallurgical tables and make an educated guess to answer this question, given a decent heat treat, anyway). Not whether a thinly ground stellite blade would stand up to ripping through carpet and plastic jugs vs. a medium thick grind in 3V.

But save your typing... I have no desire to debate statistical significance here. I don't hold the testers on this forum to scientific hurdles, and don't think that I implied that I did, or that anyone on these forums DOES test to statistical significance. I could be wrong... anyone up for large sample sizes? Any manufacturers maybe?

There tends to be a big gap between an engineering approach, and a scientific approach. Both approaches have their place. They can be complementary, or they can both be bantied about as superior to the other, and in the end, some common sense goes a long way.

4-1/2 quick examples to illustrate:

Example 1, from a different industry: An industrial control valve in a high pressure drop, particulate laden service (say, lime slurry) seems to last for about 2-3 months before it's worn out... won't control to set point, trim is eroded, process is out of control, cycling around. We have run this valve past it's limits. It's shot. Would you demand that I test 20-30 of these same valves in this service to be sure I had a statistically significant sample size... to make sure that e.g. the Masoneilan Camflex valve's 316SS trim set wasn't just having problems originating with some manufacturing flaw? I think not. If I suggested such an approach, I'd be offered a severance package at the next economic downturn. No... based on reasonable knowledge, and not statistical samples, I'd go to Stellite trim, then maybe on to Ceramic trim, and we'd have a solution that got me where I needed to be... longer reliability in this service. That's an engineer's approach. It isn't wrong. It isn't stupid. It isn't laden with the chance of making a goof based on insignificant sample sizes. It makes use of knowledge to solve problems.

Example 2, yet another industry: drug maker invents a new drug. They test it on 3 people, all seem to feel like their arthritis is better. Good test? Of course not. Need a big sample size, double blind testing, and careful statistical analysis to sort out placebo effect, and to attempt to quantify the effectiveness AND safety (side effects).

Example 3: You buy a Chrysler, live in Phoenix. Drive for 1 year, it sucks, poor reliability, overheats monthly, you sell it and buy another very similar Chrysler, drive for 1 year, it also irritates you due to reliability problems, overheating, now what. Insist on driving 28 more Chryslers before making up your mind to buy a Toyota for Phoenix driving? This could be a tire example...tires wear out faster than you care for, you try 29 more pairs on the front wheel drive car to make sure you weren't duped out of a great tire design by a statistically insignificant manufacturing flaw in first place.

Example 4: A tool bit is only lasting through 30 machined parts before it breaks (let's assume it's a new part design). You install another bit. It goes 35 parts, breaks. Do I test 28 more of these if 30-35 parts isn't adequate or reasonable? No, I got to a tougher and/or stronger tool bit, depending on failure mechanism. I ran the tool to it's limits... must I run 28 more also?

The BF reader will have to decide for themselves what constitutes reasonable testing, as always. And, as always, YOMV.

I'll quote myself from another thread, for revmic and others with a statistical significance oriented bent, and/or a bent towards criticism of a person who offers their views, test results on these forums:
Bluntly: if you can do a better job of testing and reporting your findings on any knife, please do so, and post your results here for others to review.
Goose <==> Gander, and so forth.

I'm done. Type away, anyone/everyone... I won't be back to comment on statistical significance.
 
Of course it is unreasonable to expect anyone to purchase 10 Fowlers or 10 Busses or whatever. BUT, that fact alone does not lend any credibility whatsoever to the destructive test results of a SINGLE sample. If your premise was correct,companies all over this country, the world for that matter, would immediately save themselves huge bucks by eliminating their multiple sample testing!

The fact is, ANY manufactured line, custom or otherwise, produces a normal distribution (bell curve) of results. Given a single sample from the distribution, the probability of that sample being on the high end or the low end or the middle of the distribution of limits cannot be known with any certainty. As an engineer recommending valve line replacement on the basis of a SINGLE failure, your judgement would indeed be suspect and questioned mightly by the original designers, the valve manufacturer, the bean counters in your own firm et. al. You would not win your case until you could demonstrate the valve inferiority with a larger sampling.

Opinions are in plentiful supply. They're like belly buttons, everybody has one. Opinions has a very real place in these forums. However, when AN opinion takes precedence because of supposed valid test results, then those test results should be VALID! My only quarrel, my only point wrt Cliff's efforts is that his results receive (by many) the status of FACT when indeed they are not. Cliff has an obligation to make his readers aware of the down side of his data!
 
Originally posted by revmic
My only quarrel, my only point wrt Cliff's efforts is that his results receive (by many) the status of FACT when indeed they are not. Cliff has an obligation to make his readers aware of the down side of his data!

Would there be any persuading you to rephrase that statement?

You are saying that it is not true that Cliff tests out knives and share his data with others. If I understand the spirit of your phrase, you're saying that many readers are taking Cliff's reviews to be the equivalent of peer-reviewed studies and that Cliff is under special obligation to know how others are interpreting his data and assure the folks who think he's writing peer-reviewed scientific journal-quality studies that their beliefs are unfounded. That's not what you wrote, but is that the intent of your phrasing?

So what of experiments? We see some guy take some equipment, make some predictions, perform his experiments, and record his data. How are the lone man's findings less valid? How is the lack of others jumping up to duplicate his experiments his fault? It's not his fault, of course. He lives free enough from regalism and communism to have the oppurtunity to perform these tests in the first place and therefore can't impose his knifetesting will on others. Those who would dismiss his findings because they lack multiple amounts of independent validation should also be busy duplicating these tests and reporting here whether the duplicated tests repeated those findings or found something else. Since this is the stated desire, the talk should be walked.

The scientific method (See something happen; guess why it happened; use your guess to predict what will happen under similar conditions; perform a test based on your guess; compare found results with predicted results; go back to drawing board if results disprove guess) can be performed by a single person with or without flaws and can be performed by limitless groups with or without flaws (look into the history of isokinetics research or even nutrition if you're still enamored with group concensus). An experiment can be scientifically valid without confirmation from outside parties and facts in general can certainly be valid without external confirmation.

Statistics do have their place. Rob's example about medicine is a perfect example. If you are uncertain about the outcome of a decision and are unable to obtain the data on your own and can't or won't pass on making the decision, statistics are your friend. Otherwise, statistics tend to be used to convince people their thoughts and ideas aren't valid unless the group agrees. Even here we're seeing at least one writer saying that Cliff Stamp isn't factually testing knives and documenting his findings because not enough of the group hasn't done the same.

So, please, is there any way to convince you to rephrase or reconsider your statement?
 
Thom
The fact that Cliff is a "lone" tester is NOT the point of my comments. My concern is based upon the fact that his tests to destruction involve a "lone" specimen! There is no intention to impune Cliff's integrity or his opinions or the truth of his results! BUT....testing a SINGLE specimen TO DESTRUCTION proves nothing and consequently if performed, should be so presented in his report. The reporter DOES bear some responsibility regarding the impact of his statements. That's why shouting "Fire" in a crowded theater is not protected by free speech laws.
 
Could someone please tell me how many beers I need to drink to be statistically significant?

I want to find out for once and for all what beer is the best, but I want to be scientific.

Hey Cliff, feel free to chime in with some charts and graphs or at least some ~ +/- numbers so that I can figure this stuff out! :eek:








:cool:
 
Originally posted by revmic
The fact that Cliff is a "lone" tester is NOT the point of my comments. My concern is based upon the fact that his tests to destruction involve a "lone" specimen!

Here's the thing, though:

You know it's a lone specimen. With the comments of the out of box knife's sharpness, fit and finish, and performance, you know it's a single knife. Where two copies of the same knife were used, you are told right there in black and white (or grey in some older browsers) that two copies of the knife were used.

Originally posted by revmic
There is no intention to impune Cliff's integrity or his opinions or the truth of his results! BUT....testing a SINGLE specimen TO DESTRUCTION proves nothing and consequently if performed, should be so presented in his report.

We're talking about the fascinating world of sharpened mineral composites with handles, are we not? Knives can be flat, concave, or convex ground. Knives can be made of certain types of steels that are hardened and tempered in certain ways. Containing mass and form, knives can have neutral, forward, or rearward balance. The cumulative data gathered from these single and double samples can allow the tester a fair degree of knowledge about what may be expected from a blade made out of a certain metal with a certain grind and center of balance. Read the Pronghorn review either here on Bladeforums or on Cliff's site if you'd like to skip 16+ pages of commentary.

Also, with the single specimens, even though they are often compared and contrasted with similarly designed blades in Cliff's reviews, the performance data gathered on a single specimen is still valuable. You know how that one blade performed under a certain set of conditions. So, a 1/4" ribbon cutting force test performed before and after heavy cutting will allow you to guage the edge retention of that particular knife. It's all typed out and formatted in good old HTML for anyone to see.

Originally posted by revmic
The reporter DOES bear some responsibility regarding the impact of his statements. That's why shouting "Fire" in a crowded theater is not protected by free speech laws.

Cliff's reviews state what Cliff did with the knife or knives Cliff had and Cliff's opinons regarding those knives and tests. No where in Cliff's review of the Busse Ergo Battle Mistress did he say "and the reader can confidently assume that most every other Busse brand Ergo Battle Mistress will perform in this same manner". I can't find similar statements in his other reviews, either. Since he hasn't been making such claims, how would he be irresponsible for not writing their disclaimers? You didn't pen "The Communist Manifesto" and no one in this thread is faulting you for not writing your disclaimer regarding the problems inherent in centralized planning.

Lets go back to an earlier statement you made:

Originally posted by revmic
Meaningful results regarding limits capability MUST be arrived at statistically using a statistically valid sample size. To infer performance, good or bad, from an invalid sample is wrong and should be so acknowledged by the tester in no uncertain terms.

Let's scrap 'meaningful' as you didn't mention to whom and why it would be meaningful. Since Cliff was testing the limits of the knives in his possession and didn't say that the results carry over to a large percentage of similar knives from the same maker, let's scrap the idea that the limits of the tested knives must be arrived at in a statistical manner. Let's even forget that what constitutes a valid or invalid sample size is arrived at by caprice. What does that leave us with?

It leaves me with acknowledging that you're misrepresenting Mr. Stamp's reviews.

You didn't say:
"I don't enjoy Cliff's knife reviews because he doesn't take what I consider to be a statistically valid sample of a particular knife and record the results of those tests in manner that explicitly states his test results are believed to be indicative of the performance of most of these knives."

Your criticism was that his reviews are implying statistically valid perform expectations on a model-for-model basis without using an adequate sample even though such is never stated in his reviews. Well, at least in the ones I've read. Please point me to the ones that do generalize model performance from insignificant samples if I am incorrect.
 
More often than not material strength distribution actually follows Weibull statistics not normal. This is a skewed distribution so that the high side has a steeper slope. From this follows that a high point is less likely to be a far flyer than a low point.

The pragmatic engineering approach seems to work quite well in practice with knives , I do have a strong background in materials testing and still when reading CL's reviews I don't have a feeling of missing statistics.

His woording could in some cases be clearer (PC statement) but compare his with a politicians a be happy :)

TLM
 
Originally posted by revmic
As an engineer recommending valve line replacement on the basis of a SINGLE failure, your judgement would indeed be suspect and questioned mightly by the original designers, the valve manufacturer, the bean counters in your own firm et. al. You would not win your case until you could demonstrate the valve inferiority with a larger sampling.
If you truly believe what you wrote, revmic, then it really is best for you, and for our collective society, that you landed in academia. Trust me on this. You'd struggle out in industry where practicality, learning through experience, and common sense (which is after all not so common) prevail over dogma every day.

The statistical toolkit is a useful one. I use basic statistical tools quite frequently. [caution: analogy follows... requires understanding]: But just like the guy who only owns one big and one small hammer in his toolkit, you'll struggle in the real world to build very many things with only one type of tool with which to clobber things.
 
Originally posted by xxo33
Could someone please tell me how many beers I need to drink to be statistically significant?

xxo33, you are missing the point. The number of beers you need to drink to be a statistically significant person isn't the question (you are already significant... you have voting rights dontcha?) ... a better statistical question is "how many times do you need to get drunk on 10 beers (one drunk = one item in sample) before we can be absolutely certain that 10 beers will get your Blood Alcohol Content (BAC) to the, say, 0.20 drunk level. ;) And of course you'd have to track BAC to make sure we were hitting near our target, what with variations in food consumption and metabolism and all ... and we'd have to calculate the average/mean, median, mode, and standard deviation bands of your BAC and have you perform some kind of standardized, repeatably scored test (say, a driving simulation video game?) to be sure of our results.

And... (drum roll please)... the correct answer is... 30! Yes, sports fans, 30 drinking binges are needed to prove you get drunk after 10 beers (but don't vary the consumption qty or you'll have to build another data set!).

At least, according to a rule of thumb which eminates from the statistical field (of which those are rare!), 30 is kind of the minimum sample size. So there are your marching orders! That's 300 beers, or roughly 12-1/2 cases. So get out there! You've got a month of partying ahead to prove you get drunk on 10 beers! Hey, it's cheaper and less stressful than testing Opinels and Kershaw Vapors... stop complaining.

Originally posted by xxo33
I want to find out for once and for all what beer is the best, but I want to be scientific.
BRAAHHP. Sorry. This one can't be proven by statistics, since it would be a matter of personal preference... uh, ... well... or can it?... revmic? Maybe we feed the thousands of beers to millions of people and let them vote... now THAT would require some statistical analysis!
 
Yo, Thom Brogan... I joined late here, but think there is another class of argument you might add to your list. This category is somewhat near your category 5, but let's be more scientific and precise about this while we are developing your "Ways to attack Cliff Stamp's reviews - A guide for the thin-skinned" guidelines document:

6. "Cliff isn't qualified to test knives."

This line of argument can manifest itself in numerous ways:

  • 6a. Cliff doesn't use scientific methods in his testing.
    6b. Cliff is too scientific. He uses too many non-real-world tests. He oughta use these knives in "real world" ways... that's all that counts.
    6c. Cliff doesn't use "real" statistical analysis ... his results
    that show test averages (means) and range bands don't pass statistical muster (sample size, repeatability) for him to generate any kind of standard deviation bands.
    6d. Cliff doesn't test a statistically significant number of knives in any given setting to be able to make any pronouncements what-so-ever. Statistical dogma requires a BARE minimum of 20, and better 30 or more knives to be tested in each test before we can draw any real, irrefutable (!?) conclusions.
    6e. Cliff isn't a knife maker, and is therefore unqualified to test the performance of any knife.
    6f. Cliff doesn't test to any accepted industry standards, e.g. ISO, ANSI, ASME, NIST, etc.

    ...and one of my personal favorites, although it's more about being *nice* than about being qualified:

    6g. Cliff didn't ask permission of the maker/production house, and therefore isn't allowed (qualified?!) to post his testing results.
 
Rob,

I like some of those. Of course, Cold Steel did send Cliff some test knives and I believe that Mr. Glesser is sending him a Temperance fixed blade before too long, but I guess that we could add an addendum attacking them for sending knives made outside of their corporate offices and submitting samples to a lone tester in a country with a statistically invalid number of provinces.

I have a feeling that the man I was attempting to honor and defend is embarrassed as all heck and would rather face a thousand angry critics disparaging his knowledge of cut than read a single post I've made.
 
And then there is item 7:

7. Cliff doesn't know anything about the nature of cut.

I guess that could be "6h" also... i.e. it's about qualifications. But then, who do we look to... who can bestow a B.S. degree in "Nature of Cut", the BSNC, upon the student of cut?

Originally posted by thombrogan
... disparaging his knowledge of cut than read a single post I've made.

Thanks Thom. You may not know cut either, but you know quote... and that is some good quote. ;) :D

Ok... that'll be quite enough fertilizer out of me for one morning. Back to the salt mine...


P.S. Wally and Homer mutated.
 
Statistically insignificant? Bwahahaha! I challenge our didactically inclined brethren to buy a $600 knife that won't cut cold butter unless heated properly (no pun intended) and tell us how insignificant that is. I propose it is statistically likely that a random sample could be representative of what one might expect from a second random sampling, +or- 3% to zero significant digits. For example, I have half a dozen knives and axes, more or less exactly, that Cliff has reviewed and find my experiences mirror his almost similarly, and yet are significantly deviant in use. For instance, Cliff found that a Becker Combat Bowie will break doing a 250# pull up, +or- his margin of uncalibrated measure, while I found a 185# handsome devil with durashocks on could only do 4 pullups while a 160# whippersnapper, in tennis shoes for cryin' out loud, started suspecting he could clean my clock after completing 12 reps. That could become statistically significant indeed, if one were to extrapolate the results beyond the inherently obvious, which is, of course, that certain knives are not designed, as such, to cut butter. I wonder how many experiments with different digits one needs to conclude with statistical significance that pissing into the wind yields unpleasent results. I submit, however unscientifically, that a reasonable conclusion could be infered on one hand while the other was still full, however insignificantly.
 
Did someone say.........BEER?!?!?!?

(In your best Homer Simpson voice) "Mmmmmmmmmm.....beer."
 
Mmmm...
attachment.php
 

Attachments

  • homer gotbeer.jpg
    homer gotbeer.jpg
    10.6 KB · Views: 129
Rob,

Do you think you could glue a miniature toy pistol so that your new avatar can be armless, but not unarmed?

Drinking beer on Mondays. What a horrible idea now that the stores are closed! :)
 
Back
Top