Whats The Forums Opinion Of Mad Dog?

Originally posted by Cliff Stamp:
As for the standard of 30 test items, one of the first things you learn in statistical analysis is how to adjust from nonnormal behavior in small sampling because large samples are not practical for day to day work. Awhile back we ordered a small cylinder of very expensive gas, when it was cooled it ruptured at the nozzle. We did not buy an extra 29 bottles and see if they all failed. This one failure was reported which resulted in a change of the design to handle low temp. cooling. They also replaced the gas we lost because of the rupture with a new bottle.

-Cliff

The adjustment you made is not a statistical adjustment but a common sense adjustment. There is a difference. If you have a zero fail policy, then it doesn't take a math maven to figure out that 1 failure is unacceptable. Nor if something fails for an obvious reason, it also does not take a stat whiz to determine that something is wrong and needs to be changed. And just because something is cheaper to do, it doesn't make it statistically valid. Let's not confuse economic priorities with the mathematics of statistics.



------------------
Hoodoo

The low, hoarse purr of the whirling stone—the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

Walt Whitman
 
Hoodoo, you are correct that a sample of 2 would not normally be of value, in regular testing. However, we are not talking about population or typical probabilities. We have many knowns here that enter the equation. First, Flat grind: while excellent for edge geometry it is very poor in the strength department. Second, very low tech carbon steel which can actually be a plus in some cases. Third, Differential temper, very tricky for the reasons I have stated umpteen times, Fourth, very high hardness at the edge. Fifth, Hard Chrome, bla, bla, bla. While a plus in some cases most of these factors are negatives. You then have to failures at well below standard stress levels. TWO in a row. You then have similar damage on a smaller scale on two additional ATAK's. So now we have 5 negative points and 4 failures, two of which were catastrophic(for lack of a better term) and two of which were moderate, most likely due to the user backing off to avoid a scene similar to what Cliff went through(my guess only of course). Man, I'm starting to feel like a broken record.

Like I said, there is plenty of proof. Hoodoo, have you gone back and read all the old posts from a year ago or more in both forums?

Like I said before, I really like the design of the MD knives and would love to own one. However, I would want to back off on the hardness, no HC, full tang screwed in, and maybe some better steel. Is that really too much to ask.
 
Originally posted by Cobalt:
Hoodoo, you are correct that a sample of 2 would not normally be of value, in regular testing. However, we are not talking about population or typical probabilities.

Cobalt,
I think you have misread my posts. My argument is purely a statistical one. I'm not saying that Cliff's data are not of value. Quite the contrary. It's very useful for making inferences and if you check some of my previous posts, I stated that I tend to agree with his inferences. But statistical analysis is a funny thing: it has fairly rigid rules. If the rules aren't met, you can't can't make a statistical claim of validity. (I say fairly rigid because statisticians speak in terms of "robustness." A particular test is robust if you can violate some of the assumptions and still get valid results confirmed independently.)

Here is an example of a statistical claim. "It is clear from the data (p<0.05) that MD knivs are junk." Note the p value. This says that the probability of the results of the teste occurred by chance alone is less than 5%. In other words, the results are statistically valid and not likely due to chance events. This is the only point I'm trying to make. Cliff's tests cannot make this claim because the p value is a calculated value. Without a p value, you have no statistical claim to fame. It's the statistical law and if you violate it, the statistical police will come to your house in the middle of the night and spirit you away and force you to do derivitives until you die.
smile.gif


But again, just because we can't do statistical analysis on data does not make the data useless. It just means that we have to interpret the data in a subjective manner. And, depending on the data and supporting evidence, there is nothing wrong with that. Science does not always move forward based on p values.
smile.gif


Maybe I'm getting nitpicky here but this is bread and butter to me. I don't publish unless my data show the p<0.5. That's the scientific standard for claim to fame and it's the standard I have to live by in my own profession. But I'm starting to regret I brought it up because I think some people are misinterpreting what I'm saying.

------------------
Hoodoo

The low, hoarse purr of the whirling stone—the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

Walt Whitman
 
Originally posted by Cobalt:
Hoodoo, I think we agree on the general idea.

I think you're right.
biggrin.gif


------------------
Hoodoo

The low, hoarse purr of the whirling stone—the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

Walt Whitman
 
Ok, it seems we have some question of whether or not Cliff having two knives fail is enough to prove anything. It appears that some say that he would have to test alot more the same way to have any meaningful results. Now, on the flip side of that, what if Cliff had said "WOW!!, those two Tusks I tested are the best, toughest knives around". Would the very same people say, "two is not enough to say they are good knives" ??

How many knives from anybody should you have to test to draw a conclusion thats not faulty? Over the years, I have had about 5 Spyderco Enduras. To me, they were all fine knives, and are quite durable for a knife of its type. Are the 5 that I have own/owned not enough to draw the conclusion that the Endura is a good knife? What if I had broken them all the same way? Then would 5 be enough to say they were bad?

Richard
 
Hoodoo, I also think you are right. I think you have taken the flavor of most of the posts on statistics and distilled them down to "sense." Thanks.

Bruce Woodbury
 
Richard, you have hit an important point, never once have I every heard McClung or anyone else questioned a commentary that was positive. It is never done. However when the results are not promotional in nature, well lets change our viewpoint on interpretation. This of course is severely biased and is done to skew the resulting population of reviews. McClung himself has often sent out one prototype for Hilton to comment on (which he does very well). Has it every been suggested to him that he needs to make 30 and have 30 different people comment on them? What about when Jerry Hossom recently switched to CPM-3V over A2 with very good results described in the review forum. I never saw one post telling Jerry or Mario that their reasoning was flawed and he needed to make quite a few more blades.

As for the statistics, several important points are being overlooked. First of all the population is non-normal, it is extremely non-normal. This is controlled by several factors. The major one of which is the QC at every step along the way. McClung has described that he does work on all the blades out of his shop including flex work on all the big blades. Of course besides the actual QC there is the inherent examination during the making of the blade. I would strongly bet that if you looked at the behavor of blades performance, it would be distributed roughly normal in the immediate extent of the mean say one standard deviation but extreme narrowing would occur after that.

To be more specific, lets look at one of my uncles cutting board. They are all carpenters with 40+ years experience. I have worked many times with them sheeting up houses and other such work. Never once have I had to return a board to be recut ("measure twice, cut once"). Not once. The simple fact is that it will not happen because they do checks themselves automatically as it has been drilled into them for over 40 years. Now there is variation of course, pencil lines are of varing thickness, saw widths, placemtn of the cut on a pencil mark, but it is non normal, there is no data at all outside of the immediate range of the mean. None. if you took one or two examples of their measuring skills you would see exactly the same thing as you would see in 100 examples.

As an even simpler task. Ask a classfull of students to measure a rod with a ruler calibrated in mm, so that they have to guess the 1/10 calibration. Once again you will see normal distribution in the 1/10 region but assuming competent careful measurement, no one will be outside a mm. It simply will not happen. If it does it indicates imcompetence in the individual - which I noted before is a worse status marker than a particular blades performance anyway.

So yes, sample from a normal distribution and you can see very skewed results with small sample sizes, not probable but it can happen. Sampling on extreme variance restricted populations does not have this problem.

One further note, the one restriction that needs to be added to the above is that the makers needs to be able to exclude the possibility of extreme outliers - steel flaws and such. This is why I am in full support of makers when they request that blades being critized for flaws be turned in for inspection. You have to allow them this right. Their commentary must be used along with the data you collect for a final interpretation. To be specific to the TUSK - McClung has already stated that the behavior of the second TUSK was to be expected - he examined it and found no flaw in the materials or the work. It does represent the abilities of the TUSK's based on this and the restricted variance of the population.

-Cliff

[This message has been edited by Cliff Stamp (edited 04-25-2000).]
 
What does my post have to do with knives? Maybe nothing, but here it goes
smile.gif


There's another type of statitics, called "Bayesian", that is of very different mathematical origin than the so called "Classical" statistics. Bayesian analysis doesn't use hypothesis tests or p values. But, what seems to me its greatest practical advantage, is that it allows (actually, requires) you to quantify your previous knowledge into something called a "prior distribution." Test observations are combined in a mathematically formal way with the prior distribution to produce the, you guessed it, "posterior distribution." Inferences about the population are then based on the posterior.

So what? Well, Bayesian statistics doesn't force you to pretend you don't know anything about the test subject before you collect data. We already know, or strongly believe, that hard metals are also brittle. So when we observe a knife that is both hard and brittle, it adds to what we already know. We strongly believe that experienced metallartists like Mad Dog can achieve consistent results. So when we obsere two strongly similar articles, it confirms our prior beliefs. Notice that we don't have to start out by saying, "Suppose we don't know anything about the hardness-brittleness relationship or quality knife makers. Now collect 30 samples ..." You start with knowledge, observe, and then update your knowledge.

Would I buy a Mad Dog? Heck yeah! Would I pry with it? Not now! Would I chop all day and expect it to hold a great edge? You betcha!

BTW - Two guys I work with just bought BM710s with M2 steel. They were downright dull out of the box. Forget the toilet paper test. These knives couldn't pass the heavy catalog cover test! I dropped one on the concrete and the blade chipped in 3 places. Does Benchmade make their M2 too hard?

So don't use your Benchmade M2 or Mad Dog TUSK for prying. That's what God invented screwdrivers for!



------------------
David

Life is good in Hollywood, Maryland!
 
I would pry with my Mad Dogs. That's one of the main reasons I bought a 3/16 Pack Rat to replace my 1/8 Lab Rat, more lateral strength. I won't pry with the 62 RcH edge though, I'll use the 54 RcH spine.
 
Point taken. Thank you. Hey, is this the longest Bladeforums thread ever?


------------------
David

Life is good in Hollywood, Maryland!
 
Cliff,

By nonnormal, I'm assuming you mean there should be zero or next to zero variance in the population. But I don't think the analogy holds up. Cut a board. That's one task. Build a knife. That's many tasks, each with it's own associated variance. Thus we would expect more variance in any task that requires multiple steps compared to a single task.

Granted, from experienced knifemakers, you would expect a high degree of repeatability (is kevin the only one involved in making his knives?), i.e., low variance, and you may be right in your assumption of this. What you are seeing in two samples may be what you are getting in every piece. Your data are compelling. But...I stand by my previous statements. You need a few more samples to truly estimate variance in a small sample. Unless you have apriori data, variance in a population is something you have to calculate, not assume. In this case, your fundamental assumption is that there is no variance. Or maybe more specifically, you are stating the opinion that there should be no variance in a custom made knife. Well, maybe in a perfect world.
smile.gif
I guess maybe the real question is, is that a reasonable assumption? Isn't that what we were discussing in the custom knife thread?


------------------
Hoodoo

The low, hoarse purr of the whirling stone—the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

Walt Whitman

[This message has been edited by Hoodoo (edited 04-25-2000).]
 
Man this thread is like herpes, just when you thought it was gone, it pops back up.

 
Hoodoo :

By nonnormal, I'm assuming you mean there should be zero or next to zero variance in the population.

Outside of the immedite vicinity of the mean. It is a cropped population.

Cut a board. That's one task. Build a knife. That's many tasks, each with it's own associated variance. Thus we would expect more variance in any task that requires multiple steps compared to a single task.

I was simplifing on the board remark to illustrate the cropped behavior - the principle holds for more complex tasks. While uncertainties certainly combine as you noted, the propogation of errors cannot exceed a given range because of the makers QC. Once again the cropped population is created.

you are stating the opinion that there should be no variance in a custom made knife.

No, I am saying it should be of no significance. Either the maker is competent and it is if of no consequence, or he is not and his work is of no consequence. For refernce note the comments by Hossom and Martin in the following thread :

http://www.bladeforums.com/ubb/Forum54/HTML/000608.html

To get really specific if R.J. Martin sharpened 100 blades and I did, or Jerry Hossom ground 100 blades and I did, you would only need to take one blade from each to reach a definate conclusion. Any more data would just give you a repeat conclusion. R.J. can sharpen knives better than I can, and Jerry is much better at hollow grinding.

While there is variance in each activity in all of the cases, it would be so low compared to the seperation of the means that no overlap will be present and the inherent QC control will even futher prevent this - this however is not even needed in this case because on their worse days the above two makers could easily out grind me on my best.

One futher remark on statistics, the t distribution can be used to generate confidence intervals on sample sizes with one degree of freedom (2 data points).

By the way, concerning the above remarks about prying - prying is mainly controlled by strength not toughness, assuming you don't pass the plastic deformation barrier. If you fault a blade at 62 RC by prying, you will easily deform it at 50 RC. Assuming a proper heat treat on both of course.

-Cliff

[This message has been edited by Cliff Stamp (edited 04-26-2000).]
 
Originally posted by Cliff Stamp:
Hoodoo :
To get really specific if R.J. Martin sharpened 100 blades and I did, or Jerry Hossom ground 100 blades and I did, you would only need to take one blade from each to reach a definate conclusion. Any more data would just give you a repeat conclusion. R.J. can sharpen knives better than I can, and Jerry is much better at hollow grinding.

[This message has been edited by Cliff Stamp (edited 04-26-2000).]

Actually I would assume that after sharpening 100 blades, you would learn something. Thus, there should be considerable variance in your sample as you move up the learning curve-unless you are incapable of learning. You may find that your 100th knife to be indistinguisable from Martin's. The same could be said for custom knives. I would expect that there would be bugs in some new models (i.e., prototypes) and they would improve as the maker perfected them. I've already seen this stated by a well-know maker in reference to one of his popular knives.
And again, sharpening for instance is one step in a multistep process of knifemaking. The analogy doesn't hold up. For instance, suppose you had some bad flux and it starts leaking out of the bolsters 6 mos after you sold the knife. Whose fault is that? It seems like you are expecting machine-like precision from people. Go to the bolt bin of any hardware store and take a good looks at the nuts and bolts there. Are they all perfect? If they are, I'd be amazed. When I worked for Detroit Diesel, we bought nothing but the best fasteners and there was always a few bad apples in every lot. The same was true for pistons, oil coolers, even brand new engines (especially new models), etc. I remember buyin a brand new Snap-On ratchet and the first day I used it, all the chrome peeled off.

I guess if we expect our custom knife makers to be perfect, then we should expect no less from our knife testers. Can you say that your tests are perfect Cliff? If they are, then you should be able to analyze them statistically. But you can't because they are not perfect.

Useful, informative, but not perfect.


------------------
Hoodoo

The low, hoarse purr of the whirling stone—the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

Walt Whitman
 
Hoodoo,

It isn't a matter of not learning, since he already is accomplished in the skills listed.

To expand an example offered by Cobalt (from page 2):
Do you sit on your toilet seat the same way every time?
If so, does that indicate that you have stopped learning how to take a dump?
If not, does that indicate that you have forgotten how to sit on a toilet seat since it has been so long (Possibly indicating anal retentiveness?) and are relearning the act with each placement of the buttocks to the ceramic?

To phrase it another way:
When you sit on the toilet seat, do you rest comfortably knowing that you are now an accomplished toilet sitter, or do you wonder how much more you need to learn to be one with the toilet?

And when you sit, is your left cheek closer to the back of the toilet than the right cheek? Does either cheek hang off the side more than the other? If so to either of the last two questions, is this by your design, preference or toilet-incompetence?

Hmmmmmm
 
Hey burnhamwow, thank you very much for bring this discussion to the least (and I do mean least) common denominator! I think you should test your theory and give us 30 data points on the subject. More if you like!

Bruce Woodbury
 
Wow! This thread reminds me of grad school. In all the years I was there though I never heard anyone define a perfect test as one that could be analyzed statistically, whatever that means. I suppose that means you have at least 30 samples. That's an interesting definition of perfect, especially considering that the number 30 is used because it's the nearest nice round number of samples yielding a Student's t distribution with enough degrees of freedom so that it's "statistically close enough" to a normal distribution. "Statistically close enough" being defined by professional statisticians as, "Well, you can't tell by eye-ballin' the graph that it ain't normal!"

I don't think I'm too far out on a limb believing that Cliff has sharpened more than 100 blades already. Heck, I did 5 already this week, and I'm a newby. If he was as good as Martin, we'd see threads titled "Stamp v Hartsfield" instead of "Marting v Hartsfield" If Cliff says he's not as good as Martin ya gotta believe him. Another hundred bladed probably won't close the gap all that much.

And what about needing "more samples to truly estimate variance"? Truly ... estimate. No matter how "true" an estiamte is, it's still an estimate not some "true" value of nature. More samples might give a better estimate, but a true estimate?

Even if Cliff had 30 thousand samples his data still wouldn't be normal. Normal data just doesn't pop up all that often in nature. When was the last time you measured negative population density for instance? Well, it just isn't possible. But in normally distributed data it's always possible to observe negative values.

And as for hypothesis tests proving anything, they don't. At best they show correlation, not cause and effect. If you're just a knife nut and not a math nut to boot, just consider that so called statistically valid hypothesis tests usually
result in statements like "The data do not show that the null hypothesis is not true." Huh? How's that for a double negative.

Non-normal does not necessarily mean zero variance. Far from it. What about lognormal, t, chi-squared, binomial, poisson, Rayleigh, gamma, weibull, etc. All non-normal, all non-zero variance.

What does that have to do with knives. Well, after conducting our usual round of office hair and paper shaving tests, I am confident at the p=.01 level that my knife is the darned sharpest piece of metal in the office. But being a Bayesian, I could have told you that without shredding 30 pieces of paper.


------------------
David

Life is good in Hollywood, Maryland!
 
Let me clarify my last post. I'm not claiming that Cliff doesn't know how to sharpen a knife. And perhaps comparing Ciff and Martin is an unfair comparison. I was assuming that Cliff was implying that because Martin does it for a living, he has become highly proficient. Cliff on the other hand, is not a professional knife maker (or am I wrong here) and therefore, would benefit from practice, whether he is sharpening a knife or hollow grinding one. If you do it day in day out day after day, you surely improve. As I read the posts in the knife making forum, I see even the best of the knifemaker forums constantly improving their skills.

Thus my point about learning. So assume we are talking about a novice and not Cliff. After 100 blades of sharpening and grinding, they would be much improved, unless they just can't get the hang of it. Thus, their 100 knives would have much greater variance in sharpness and in the grind. Depending on how good they are or even how lucky they are, some of the blades could even rival the pros (even a blind squirrel will find a nut once in awhile).

Now the gods of parametric statistics require that the variances of two samples be statistically equal before you can test them. In fact the statistics police require that you test for heterogeneity of variance. It's the law. Now there's ways of getting around this but that's not the point. The point is that if you randomly take 1 blade from a pool of 100 and compare it to another blade from a pool of 100, you can make subjective comparisons but not statistical ones. For the reasons I've already stated.

As for normal distributions, for small samples,it's not required. That's why we use the t distrubution or even nonparametric tests like the Mann Whitney U.

No statistical tests being performed on the data have been mentioned here. That's why it's not a statistical analysis. You can make up all the analogies you want, but statistics is more than just hand waving, surmise, assumption, and subjective comparisons. I've tried to make this point but evidently, it's not getting through. Statistical testing is just that. Statistical testing following the basic statistical methods. Can you make claims of significance? YES! Just not statistical significance. There is a difference. It's rigorously defined, no matter what kind of distribution you have. If you have a nonnormal distribution, there are all kinds of conversions you can use to normalize your data or you can use what is called nonparametric statistical tests. Etc, etc, etc.

Let me reiterate: I've pretty much agreed with Cliff's analysis of his results. At least I've tentatively accepted them. But not because of a rigorous statistical argument or analysis.

I see it as highly unlikely that many custom knife makers aren't constantly improving their skills. The knives they made a few years ago are most likely not as good as they ones they make today. Thus, you have variance, no matter which way you sit on the toilet.




------------------
Hoodoo

The low, hoarse purr of the whirling stone—the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

Walt Whitman
 
Back
Top