Rope Slicing Test, Part Deux

sodak · Aug 27, 2007

No, actually I'm going beyond what Cliff thinks is necessary, in an effort to make this as repeatable as possible. I appreciate everyone's comments, and although there will be a report, the raw data will be available to anyone who wants it. I figure the more standardized I can make this, the easier it will be for someone to duplicate this test. I plan on testing more knives than just these 2, and I want to make sure that it's as fair a test as possible.

I don't have any preconceived ideas as to which will hold an edge longer. D2 is one of my all time favorite knife steels for edge holding, so this will be fun. After I'm done with these 2 and send them on their way to the next tester, I'm going to pull out some of my M2 HSS blades, and have some fun - and probably some carpal tunnel damage....

In all seriousness, though, I'm trying to be as transparent here as possible. If you feel that I'm drifting, please let me know. I don't expect everyone to duplicate or even agree with the way that I do this, but I want it clearly understood so that any differences can be accounted for.

Cliff Stamp · Aug 27, 2007

sodak said:
No, actually I'm going beyond what Cliff thinks is necessary, in an effort to make this as repeatable as possible.

Not exactly, what I think is necessary is actually quite rigerous, I place significant demands on the data I collect to minimize both the random and systematic variances. However I will not constrain anyone else to those limits, just be clear about what you are doing and we will discuss the results and make sure the conclusions are supported. Just work to the precision and accuracy that is comfortable for you. When I start paying you for your time then I will request that it is done in a certain manner, until then, whatever you care to provide honestly is appreciated.

After I'm done with these 2 and send them on their way to the next tester, I'm going to pull out some of my M2 HSS blades, and have some fun - and probably some carpal tunnel damage....

We should really get M2 tested against the cold work counterpart.

-Cliff

sodak · Aug 27, 2007

Well, all righty then... Enough talking, and more cutting... May the best steels win!

db · Aug 27, 2007

SODAK truthfully whatever method and tests you use and the results you get are fine with me and will be very interesting to read. No test is perfect and I don’t give any test no matter who does it very much importance or try to make broad conclusions from them. I do however find the clear influence on the test made so far very interesting as the goal of forming the group was to prevent any influence and minimize bias.

Broos · Aug 27, 2007

Well said, Sodak.

Cliff Stamp said:
There is a huge problem in this thread with understanding the difference between precision and accuracy, what are the causes of each and how to determine the functional tolerance for both.

I may have misused the term accuracy, from a statistical POV, but all the potential issues with this test are repeatibility or precision issues.

The only accuracy issue you will have is how accurate you are in transcribing an inch scale onto your knife.

And I think it is obvious that repeatibility is what you want to shoot for out of this test. No matter what you do, you will never know how accurate the results are, because there is no reference (or true value) for them.

Sorry for the interruption. Cut away.

hardheart · Aug 27, 2007

here's a contraption that could maybe give consistent force
http://www.bladeforums.com/forums/showpost.php?p=3248712&postcount=35

maybe a slow motor, cordless screwdriver or such, could be used to rig a power fed slicing tool, driving a screw/worm gear, or winding a bit of cord to give the lateral movement.

I still like seeing it done by hand, since that's how they get used. If the variability of the process sits outside the average performance difference of the steels, I guess it's a wash. Stuff like ease of HT and other manufacturing considerations would be more important, imo.

Cliff Stamp · Aug 28, 2007

hardheart said:
I still like seeing it done by hand, since that's how they get used.

The sharpness testing could be done by machine, ideally it consists of three things, an up close magnification, a direct measurement of very shallow cutting ability, and some common very qualitative tests (shaving, paper cutting). If the cutting is done by machine it has to be clear you are testing edge retention for use by machines. You can not extrapolate machine cutting to cutting by people. This was proven in the fifties in Germany who had been doing such testing since the 20's.

If the variability of the process sits outside the average performance difference of the steels, I guess it's a wash.

Mike obtained 1.5-3% precision by hand, if the difference in the steels is smaller than this it is obviously not significant. There is more variation in steel composition that that from batch to batch.

-Cliff

gator68 · Aug 28, 2007

Cliff Stamp said:
This really is not necessary, it is a small random variance and not even the major one. Errors in measurement do not simply add as in 25% + 10% + 5 % + 5% = 45%. The actual summation of all these errors will only produce 28% which means you can ignore the smaller ones completely. The dominant random errors will be in the materials, what is cut and the actual steel itself, especially for D2 which looks even more inconsistent in composition than this which is ATS-34 :
,,,

D2 vs CPM D2 would be similar, but even more aggregated, with the carbides being twice as large. Now imagine testing an edge which intersected those large carbides vs one that did not. Even more importantly is to not get caught in precision because the real focus should be on accuracy which is determined by the systematic errors. Accuracy is how close your value is the the population value while precision is just how much your value changes from one measurement to the next.

I disagree. What is the point of the test? To my mind, it is a way for someone at home to quantify sharpness. This allows him to compare knives, in a relative way, from one day to the next.

Since the only thing actually measured is the length of cut, the rest of the setup should be made as repeatable as possible. Especially when you might do testing one day, and then put it away for a month. When you do the testing again next month, with a completely different knife, are you really going to lift the string to exactly the same point as you did last month? If not, this is a systematic error, not a random error that will average out. And systematic error is exactly what you need to avoid as you are doing a relative comparison between knives.

It is good that Mike is showing that in a single test he can get repeatable results. But if he now does a test on a different knife (say a big bowie, with a very different blade shape and relationship between handle and edge)... what do his previous numbers mean in relationship to the new test?

hardheart · Aug 28, 2007

what if you had a knife you cut with during each test, to make sure you got the same numbers from it each time. say you took a knife from the collection and set it aside just for this task, performed a set sharpening regiment with it before each test, and cut with it first to test your consistency.

Cliff Stamp · Aug 28, 2007

gator68 said:
Since the only thing actually measured is the length of cut, the rest of the setup should be made as repeatable as possible.

No arguement from me as a point of theory, I would just consider carefully the phrase "as possible". In general, in any experiment, the more you increase precision the more time/effort it takes there is always some point at which it is not functional to move further. What would be more valuable, two knives compared with a 5% spread or five knives with a 10% spread? This is where I mentioned functional precision in the above. A reasonable starting point for any experiment is to ask yourself, what precision do I need here to make meaningful decisions.

So I will ask, what precision would a testor need to have in order for the results to be useful as a general indication of how knives will perform in actual (uncalibrated) work? I will propose that it does not need to be anywhere close to 10%. Any difference of this scale will never be noticed anyway. Consider if you have one knife which is sharpened every month and another which is sharpened every 33 days if the work was exactly the same. How many people do you think would ever notice that difference? Now of course the work is never going to be exactly the same from month to month so will you see that difference - no way.

Note as well that if you take multiple sharpness measurements as the blade blunts then these as well also reduce the spread by averging, assuming you use the multi-point comparison I developed to calculate the edge retention ratios. So now you are not simply looking at 15 points to reduce the spread at each stage but multiply that by the number of stages you measure the sharpness. Say you measure at 15 intervals as well so the total number of data is 15*15=225. Now the final average result gets its spread reduced by a factor of 15 over the one shot measurements. Now even if the individual measurements vary by a wild 100%, the final average spread due to the measurement inconsistencies will still only about 6.5%. This is why random variances are never an important concern unless they are truely massive or you can not collect much data.

When you do the testing again next month, with a completely different knife, are you really going to lift the string to exactly the same point as you did last month? If not, this is a systematic error, not a random error that will average out. And systematic error is exactly what you need to avoid as you are doing a relative comparison between knives.

Exactly right, this is why I would carefully consider the shape of the knife in such a setup. The way I do the cord testing there is no influence because I am not lifting anything to a specific height. The results are the same within the small random variance (5%) no matter when I do them. What is much larger is the change due to blunting being an inherently random process which is why I would recommend doing the work at least three times and I am never comfortable until five.

It is good that Mike is showing that in a single test he can get repeatable results. But if he now does a test on a different knife (say a big bowie, with a very different blade shape and relationship between handle and edge)... what do his previous numbers mean in relationship to the new test?

As long as he is careful that the shape of the knife does not significantly influence the results in a systematic way he can be safe that the results are unbiased. If you feel the height really is a large concern for systematic deviations then just put a cross bar at a specific height and raise the knife to that and then draw. But I maintain that will have no significant effect on the results and is in fact a waste of time because you are reducing the non critical random error. It would be much more useful to instead do another run with that time instead.

Similar for example I would always recommend that instead of doing one run with 10 sharpness measurements you are much better off doing two runs with five each. You have the same data but the second is going to be far more population representative. My main point in the above is to not get carried away wastnig time with precision which gains nothing as it is below the sample variance anyway. Start by asking yourself "How much precision do I really need?" and proceed accordingly. The true focus will be on the systematic deviations which will limit accuracy. Be very careful there.

If you have the time/inclination, these are not theoretical quantities, you can measure them directly, I do myself all the time in the work I do and I have outlined many times how they can be done by anyone else who wishes to insure they are being unbiased.

-Cliff

gator68 · Aug 29, 2007

Cliff Stamp said:
...

As long as he is careful that the shape of the knife does not significantly influence the results in a systematic way he can be safe that the results are unbiased. If you feel the height really is a large concern for systematic deviations then just put a cross bar at a specific height and raise the knife to that and then draw. But I maintain that will have no significant effect on the results and is in fact a waste of time because you are reducing the non critical random error. It would be much more useful to instead do another run with that time instead.
...
If you have the time/inclination, these are not theoretical quantities, you can measure them directly, I do myself all the time in the work I do and I have outlined many times how they can be done by anyone else who wishes to insure they are being unbiased.

-Cliff

I still disagree. Presumably the length to cut (the only measurement) scales linearly with the applied force. The applied force scales linearly with the height to which the knife is lifted. The height the knife is lifted is small, so even small changes in this height could make significant errors. As I stated before, small changes in one testing session could average out. However, the variation from test to test, from month to month, could be a significant systematic bias. This is an easy thing to measure and/or fix. Sodak says he has already done this by taping a ruler in. I think this will go along way towards making this a useful comparison over the long term.

By the way, Sodak, take this as constructive criticism.

I think this is a great setup you have, and I applaud your actually spending the time to make all of these measurements. If we still had the chiclets, I'd be sending some your way.

sodak · Aug 29, 2007

No problem at all, that's why I started this thread to try and improve as I go.

Cliff Stamp · Aug 29, 2007

gator68 said:
As I stated before, small changes in one testing session could average out.

Lets not drift into the realm of could, we are dealing with simple issues of math here. When you record hundreds of measurements even large random errors will be reduced to insignificant amounts because of root N smoothing.

However, the variation from test to test, from month to month, could be a significant systematic bias.

I would be really surprised if the mean height was so different systematically from one run to the next that it would be comparable to the changes you would find in the material cut let alone from one knife to another in the same steel. This is what I meant when I said functional precision. It actually gains you nothing to measure beyond that precision, all you are in fact measuring is noise.

This shows four S30V blades used for edge retention on cardboard. Note that the precision in the individual results is much higher than the actual change from one S30V knife to the next. This extra precision is not really functional if the goal is to benchmark the steel. All of it gets lost when those four runs are themselves averaged.

I could have taken much more data and cut the noise on the individual runs in half. But would this have actually produced useful information? No. Again, think very carefully about what exact conclusion is trying to be reached and measure accordingly. Make sure your data is meaningful first and increase precision as time/effort allows. Here is an example from Cashen during a discussion about the benefits or lack of, of hammer forging by makers :

"I have impacted various damascus patterns to verify that there is a difference from something like random and a twist pattern, under controlled testing it is measurable but in knife use it just is not significant."

Ref :

http://forums.swordforum.com/showpost.php?p=801926&postcount=41

This is an example of measurement precision which is nonfunctional, and Cashen is directly aware of it, and rather than use this to hype some particular damascus pattern (that he just happens to specialize in) as the toughest of all patterns, he is straightforward that the difference is meaningless. This is the danger of very high precision, it can lead to false confidence.

In terms of precision I would argue that if you are trying to determine the performance of a steel the precision is nonfunctional if it is significantly greater than the variability in the individual samples of the knife. If you are trying to judge the performance of a specific knife, the precision is nonfunctional if it is significantly greater than the variability which will be seen in actual use. In both these cases the increased precision just measures noise.

The only time I would promote very high precision would be if a maker was changing his heat treating, trying to refine his product and you were looking to refine the knives by a small amounts over a long period of time. This may be the case later on. Right now I would prefer to not get so focused on precision but more on accuracy and then how to actually use the data, how to draw conclusions.

But as noted, it is up to the individual user, feel free to make the data as precise as you want with the time/effort you are willing to dedicate.

-Cliff

gator68 · Aug 29, 2007

Cliff Stamp said:
Lets not drift into the realm of could, we are dealing with simple issues of math here. When you record hundreds of measurements even large random errors will be reduced to insignificant amounts because of root N smoothing.

This is the crux of my disagreement. If one systematically lifts the string up 1/16" of an inch, one is systematically increasing the force by 25%. This is not a random error. Taking hundreds of measurements will not get rid of this error.

1/16" of an inch is not much, and it would surprise me that one could lift the string consistently, on different knives, over months, without some visual reference. Sounds like Sodak is now doing this.

It would be straightforward to test this directly.

Cliff Stamp said:
I would be really surprised if the mean height was so different systematically from one run to the next that it would be comparable to the changes you would find in the material cut let alone from one knife to another in the same steel. This is what I meant when I said functional precision. It actually gains you nothing to measure beyond that precision, all you are in fact measuring is noise.

This is a different issue. I still see no reason not to control for those systematic errors that you can control. If I was doing this, and I was changing the material I cut, I would save some to do comparison cutting with the new material as well. This is not a question of precision, but a question of understanding systematic changes in the setup.

I understand this is all alot of work, and I thank you, Cliff, and the others who are doing this. As I said, take this as constructive criticism.

Cliff Stamp · Aug 29, 2007

gator68 said:
1/16" of an inch is not much, and it would surprise me that one could lift the string consistently, on different knives, over months, without some visual reference.

Yes, I would expect a random variance. Note that you can reduce the error trivially, just lift up more. If you cut at one inch then you reduce that projected error to 6%.

I still see no reason not to control for those systematic errors that you can control.

Any reduction in uncertainty comes at the cost of time and effort which basically always means less total information. Instead of trying to minimize these variances it would be possible in the same time to add another angle, grit or even knife. I would argue that would be more useful effort. This even assumes that what you are doing is effecting the dominant errors, if they are not the dominant errors you achieve nothing.

If I was doing this, and I was changing the material I cut, I would save some to do comparison cutting with the new material as well.

Unless you buy stock materials you can not expect them to be consistent to anywhere close to 25%. If you want to compare from run to run independently then you would buy a selection of the same material, random sample from that and just add to it to keep the "population" at the same level.

-Cliff

db · Aug 29, 2007

Can someone plese post a youtube video of Wayne Goddard reading this thread for the first time.

I'd like to see his reaction to a few of the last couple of posts

Kevin R. Cashen · Aug 30, 2007

Cliff Stamp said:
... Here is an example from Cashen during a discussion about the benefits or lack of, of hammer forging by makers :

"I have impacted various damascus patterns to verify that there is a difference from something like random and a twist pattern, under controlled testing it is measurable but in knife use it just is not significant."

Ref :

http://forums.swordforum.com/showpost.php?p=801926&postcount=41

This is an example of measurement precision which is nonfunctional, and Cashen is directly aware of it, and rather than use this to hype some particular damascus pattern (that he just happens to specialize in) as the toughest of all patterns, he is straightforward that the difference is meaningless. This is the danger of very high precision, it can lead to false confidence...
-Cliff

You are very much correct here Cliff. Not only can we get a false idea of the significance of gains in some properties by our precision if we don't keep a level head about it, but how significant it is will also be seriously dependent upon the perception of the end user. Let's face it some people use their knives a whole heck of a lot harder than others. Combining my research with the descriptions of many of the heat treatments used by some very popular makers, I am very confident that they average knife user can use a blade riddled with fine pearlite and sub-par hardness and still be surprisingly impressed with the performance.

I can't stress enough that their is testing and there is advertising, and when the two get combined the consumer had better beware. Testing is done to gain insight about your production methods and how to improve them. To this end it is more often desirable, and more useful, to find what is wrong with your blades in order to improve them. If one only impresses themselves with their greatness in testing, there will be little growth except in your delusions of grandeur.

Intentionally selecting tests that are tailored to make your methods look better can only be called P.R., not valid research and testing.

Cliff Stamp · Aug 30, 2007

Kevin R. Cashen said:
Not only can we get a false idea of the significance of gains in some properties by our precision if we don't keep a level head about it, but how significant it is will also be seriously dependent upon the perception of the end user.

This is why the first step should always be a clear written goal. In statistics it is called the hypothesis statement, all it means is - what exactly are you trying to prove or disprove. Be very clear and as specific as possible. Trying to determine if a charpy machine can spot a difference in toughness is not the same as will users know the difference.

On Swordforums you made a comment about bainite vs tempered martensite not having the increased value for the effort and being more of a fad/hype. Paraphrasing, but basically your point was that functionally you did not think that properly tempered martensite in a suitable steel would have durability problems with large knives when used properly. Now while I do not actually agree with your conclusion, the approach is perfect.

You can not think of the gains themselves, but the functional effect. All the time and effort that went into the "superior" heat treatment or steel, if these are not actual performance gains going to be seen by the customer and you sell them as such then this is just hype and quite frankly I would call it directly lies.

How many people would be willing to have blind trials done on these "superior" blades. How much improvement would actually be seen when the users did not know which was the "superior" blade at the start of the test?

Let's face it some people use their knives a whole heck of a lot harder than others. Combining my research with the descriptions of many of the heat treatments used by some very popular makers, I am very confident that they average knife user can use a blade riddled with fine pearlite and sub-par hardness and still be surprisingly impressed with the performance.

Indeed. There are simply no benchmarks. I have seen people praise high end custom knives for something a $5 machete will do if it was properly sharpened. You can cut a dowel with this $500 custom, excellent, so can this 420J2 fantasy knife and the edge is <15 degrees per side. Are you still impressed? I think I am going to start an online version of the PCC with actual functional cutting events. An open contest, anyone can create their own events and I will collect the records. If there is interest maybe we can organize informal get togethers.

I can't stress enough that their is testing and there is advertising, and when the two get combined the consumer had better beware. Testing is done to gain insight about your production methods and how to improve them. To this end it is more often desirable, and more useful, to find what is wrong with your blades in order to improve them. If one only impresses themselves with their greatness in testing, there will be little growth except in your delusions of grandeur.

One of the sure signs of this is the constant and insanely dramatic, improvement and the total ignorance of any opposition. When opposing references are cited and ignored then there is no testing, just hype. You do not run from data if you are interested in performance, you embrace it and you have to determine why there is contention, not pretend it does not exist.

Intentionally selecting tests that are tailored to make your methods look better can only be called P.R., not valid research and testing.

Yes, for any aspect of performance there is always something that will decrease by pretty much the same amount. If you are testing then you measure both, if you are educating then you tell people about both. If you are a salesmen you only tell the positive.

-Cliff