Testing for Edge retention sample quantity for proper statistical analysis

Joined
Aug 22, 2005
Messages
78
I have read through Mr Ankerson's very comprehensive list of edge retention testing. I work in a lab in which strain testing is done on various metals(many in their final shape) for the aerospace industry. The materials provided undergo a series of tests in which there is a minimum sample quantity to arrive at a statistically significant number. Basically, a diminishing sinusoidal wave(not really, but it is the closest thing I can give as a picture) is what many of these sample results end up looking like. I have seen hundreds of these results, the apparent result of which is that somewhere between 27 and 35 samples, the tests start to level off and a representative average value which is quite accurate appears. Of course the higher the number of samples the smaller the deviation. What does this mean? I am posing this as a suggestion for edge retention testing. If you wish to get an accurate testing result one or even two samples is simply not enough. At 10 samples the error is still large. At 20 samples we get closer and at 30 samples it is significant. I am not any kind of authority on testing, yet I feel that even though tests done provide a clue, without a larger sampling the tests are relatively insignificant. It is understandably difficult for a single individual to have large samplings due to time and cost, yet it would be necessary for accurate results.

Has anyone here performed catra or other edge testing on one type of steel to obtain a significant result? Is there any other standardized method of edge testing that others can perform to add to the results already obtained?
 
I think that depends on the level of accuracy you wish to achieve, the results also seem to correlate logically, a poor steel never outperforms an exceptional steel. Of course it would be interesting to have a larger sample size, but if the variation was that large, then it would be irrelevant in real life anyways.
 
Greater sampling size becomes useful when the difference in performance between samples is relatively small. I would think a sample variation of 5 percent or less insignificant to everyday knife application. However, if you have several knife steels performing within 5 percent of each other and you want to see which is the best, then a higher sample rate would help single out the winner. In reality though most knives are judged in relation to the task at hand. So a knife is adequate to the task, falls below that level, or exceeds it significantly. A large sample size is not needed to demonstrate adequacy, especially when the variation between (in this case edge retention) high and low performance is so wide (much greater than 5 percent). You just need a steel that falls in that middle 33 percent for most tasks. That's why I think the Ankerson rope (it stopped being a thread long ago) is very useful despite the small sampling rate per steel type. Is more data better? Sure, but eventually you hit the point of diminishing returns - too work intensive, costly, time consuming, and expensive to be practical.
 
To the OP - I actually teach statistics and research design.
You have a good point, but there are a lot of reasons why a larger sample size is not practical, nor even useful following Jim's approach. First is it would be very difficult to standardize. You would need to retest the same knife to establish how much measurement error there was. Even testing one knife is very laborious and time-consuming, and it is done by hand, as is the blade prep, so there is going to be some measurement error. Then there is sample to sample error for a given blade model. Then there is knife to knife variation for a given steel, which can be huge given different HT, design / blade profile etc. He is not testing just spyderco mules!
And in the end, what question is he trying to answer? Get a ballpark sense of what range of performance you can get from a particular steel, NOT a precisely estimated point measure of cutting ability nor even a precise estimate of the relationship between any measure of knife form and cutting function. His work gives us a broad, fuzzy sense of how steels compare to each other across a range of uncontrolled variables. How much would it be worth to narrow the measurement variation to a small value? What would we know that we did not know, but that we need to know?
Not trying to be ornery, just let's figure out what question we are trying to answer first, then design the study to do that :)
 
NBHog, There aren't all that many CATRA machines around. Only a couple knife companies have them and they generally don't release much data. It's considered proprietary for a few reasons. There is one guy that has compiled all officially released numbers into one chart but that is limited. I believe there were 3 testers with Bohler/uddelhom being the one that had the most steels tested. Even they don't release all the details and use generic names for steels that might be 3V, 35VN, and some more. Who knows.

It takes time and a bigger budget than most have to do a lot of CATRA testing. It is expensive and time consuming for Jim Ankerson to do what he does and I respect that he does it at all, and on top of that freely shares his results.

I wouldn't suggest any changes to Jim's work unless I offered to not only reimburse him for costs, pay for his time and offer my time and work to help him and I consider him a friend.

There are a lot of people who make suggestions that seem really good and we usually try to not only give them encouragement to do what they are suggesting be done but will even help with getting knives for testing that have been tested elsewhere so we can see how the changes made to the tests affect results.

Everybody around here has a life and usually a job and family to contend with including Jim so if you can pick up where Jim left off, or even retest what Jim has done to add to the knowledge base please do so.

A lot of us love to learn anything we can.

Joe
 
Last edited:
Thank you all. I understand this is a monumental task to attempt. If Ankerson's methods can be duplicated, would it not be a good idea for others who own knives of the steels tested to attempt to test if they can?
 
Thank you all. I understand this is a monumental task to attempt. If Ankerson's methods can be duplicated, would it not be a good idea for others who own knives of the steels tested to attempt to test if they can?

"If." Are you going to do it?
 
My rope testing thread is a general guide, nothing really more than that.

I had to add in the geometry specs and knife models etc. into the coarse edge section along with the HRC numbers to provide more information.

In general it does take a lot of time just to test one knife, could be done in one day or over 2 days depending on how the actual knife performs.

The actual method is standardized, same edge angles, sharpened on the same stone on the Edge Pro and to the same sharpness level.

Same rope used across the board ordered from the same company, high quality rope so it's not exactly cheap.

The hard part is getting the method down, getting used to being accurate in the actual cutting, taking ones time making good cuts. It's very labor intensive to say the least so being consistent can prove to be difficult at 1st until a person gets used to it and realizes what it really takes.

I have no illusions of what can be done by hand and that's why I state it's a general guide.

As far as changing things up etc. as Joe stated I agree with him, I haven't gotten any emails from rich guys that have large amounts of disposable money yet that have sent me large checks. ;)

And yeah, this stuff isn't free, my time is actually worth something, a lot of those knives are customs and they aren't cheap at all.

Haven't seen any 4 and 5 figure donations coming my way yet, only ideas and they don't pay for anything.

So until that happens I will have to stick with what I got.
 
Last edited:
And we ALL (or most anyway) appreciate what you have done Jim. I think most folks understand that it is what it is, but I bet a lot of us have gleaned a lot of useful insights at least into potentials for steels, affects of HT and edge geometry, etc. Nothing like an actual knife cutting actual things to make an impression (pun intended).

bob
 
And we ALL (or most anyway) appreciate what you have done Jim. I think most folks understand that it is what it is, but I bet a lot of us have gleaned a lot of useful insights at least into potentials for steels, affects of HT and edge geometry, etc. Nothing like an actual knife cutting actual things to make an impression (pun intended).

bob

Thanks Bob. :)

It gets interesting because once I change geometry and start doing multiple runs like some suggest like 10 plus runs for each knife x each geometry.....

The way I test it could take me a month just to test one knife, that's actual cutting time so it would be longer.

Then start adding in different edge finishes on top of that.

I am actually really doing actual real cutting and not just making up numbers and or doing things half @ssed and guessing etc.
 
Can you imagine? Here are just some of the variables and issued involved in knife testing:

Category 1 (The knife itself) - Blade material and geometry, edge geometry and level of polish, heat treat, HRC.
Category 2 (The test medium) - Type, consistency, hardness, thickness, temperature, corrosive or not.
Category 3 (The test methodology) - Consistency, level of effort, time, subjective versus objective, reproducible.
Category 4 (Measurement) - Accuracy, quantifiability, meaningful results, reproducible.

I am sure I missing several. Perhaps this is why steel companies resist publishing their own test results - fear of peer review. :)
 
Can you imagine? Here are just some of the variables and issued involved in knife testing:

Category 1 (The knife itself) - Blade material and geometry, edge geometry and level of polish, heat treat, HRC.
Category 2 (The test medium) - Type, consistency, hardness, thickness, temperature, corrosive or not.
Category 3 (The test methodology) - Consistency, level of effort, time, subjective versus objective, reproducible.
Category 4 (Measurement) - Accuracy, quantifiability, meaningful results, reproducible.

I am sure I missing several. Perhaps this is why steel companies resist publishing their own test results - fear of peer review. :)

People just don't actually realize how much work it really is doing actual real testing and how much time it really takes.
 
Perhaps this is why steel companies resist publishing their own test results - fear of peer review

Peer review is one reason. Another is cost. To buy a CATRA and test it can run into the 6 figures depending on how much testing is done. If one is to put that much resources and time into something that helps them improve steel performance by tweaking heat treats and geometries why would they give that research away? On top of that they would then have the other manufacturers who have a lot of resources into their knife manufacturing as well as their customer base arguing about why their steel isn't doing what they know it does. Some manufacturers have attorneys on speed dial. :) Never underestimate large egos and the factor they bring into business.

There are CATRA owners who have done research for other manufacturers who paid for it and that information belongs to the company who paid for the research and can't be released.
 
OP, here are a few problems with your idea. First I fully understand where you are coming from but this is what I see are problems with what you are thinking.

1. First is the fact that heat treat and final Rc makes a huge difference for each steel. So you can have 10 knives of the same steel and there will be variation due to Rc and variation due to HT(in addition to Rc).

2. Second is blade and edge geometry. A thicker knife may have a longer lasting edge than a thin knife, however, As they both wear, the thicker knife will blunt faster simply because you are wearing into a thicker secondary grind.

3. What this means is that you need to literally get 30 knives from one manufacturer and you can only claim the test adequate for that manufacturer. 30 knives made of CPMS35VN from spyderco, can and will give you a different result from 30 knives made by Kershaw because of the 2 above points. Too many variables and variation within steels processed by different manufacturers.

4. Testing procedures being the same is nearly impossible. First, there is no way to perform the exact same test every time, unless you are using a machine that puts the exact same for on every knife, no two tests will be the same. This is another variation. For example, Ankerson can do his rope cut test on 1" manilla rope. I can try to duplicate it, but I slightly change the angle and put more force. I end up doing many more cuts on the same knife based on my force, because I am so much stronger than Ankerson ;).

I think it is better for people to test their own knives and report it. When you add this to others results you get a feel for the properties of a certain brand. Not so much just the steel.



It is a great idea, but nearly impossible to implement.
 
The Mastiff - I agree. I was just joshing about the peer review because lord knows testers around here certainly get plenty of it. :)
 
The Mastiff - I agree. I was just joshing about the peer review because lord knows testers around here certainly get plenty of it.

:)

There is some truth to it though. Example is one of Sal.G. posts here on BF
1. Spyderco does have a CATRA and we do extensive testing on all steels. We also test steels for foundries, Like Crucible and Carpenter. We tell others that since we are not an "accredited testing agency", it would not be proper for us to be used as an information source.

2. Any results that are published can be challenged by anyone for any reason. eg: their particular steel did not represent as they wanted.

3. We generally test at the optimal using hardness. We believe that CPM-S60V is not effective at RC61. I dropped an RC62 CPM-S60V blade on concrete and it literally cracked in half.

4. We had different results than those posted. Now what? I think you can see why Spyderco, Buck, Case, Leatherrman, etc. do not publish their results. (I believe there are fewer than 30 CATRA machines in use. They are quite expensive)

5. We learned after 10 years of testing that CATRA results are not the end all and be all of testing. "Real world" testing will sometimes yield different results than the CATRA for unforseen reasons.

6. We've always shared our findings with our customers as to our results. We just don't quote numbers.

sal

http://www.bladeforums.com/forums/showthread.php/769910-Oct-2010-Knives-Illustrated-CATRA-results-for-six-Crucible-steels

There have been other threads about this but this has some of the most basic reasons. He even mentioned others getting different results which is Peer review. Rather than argue with other companies he keeps his results in house.

Joe
 
:)

There is some truth to it though. Example is one of Sal.G. posts here on BF


http://www.bladeforums.com/forums/showthread.php/769910-Oct-2010-Knives-Illustrated-CATRA-results-for-six-Crucible-steels

There have been other threads about this but this has some of the most basic reasons. He even mentioned others getting different results which is Peer review. Rather than argue with other companies he keeps his results in house.

Joe


I have had a few discussions with Sal over this very topic over the years myself. :)
 
Peer review is a very important component of the scientific process. It becomes most valuable when experimentation produces new or unexpected results leading to new or modified theories. This is why researchers are very careful about publishing test results. They want their own house in order before someone else comes through for an inspection. Testers on Bladeforums take great risk when they post results, especially if the tests are inclined more toward the objective side of the process. Other posters can take issue with the results, the methodology, or the value of the test premise itself. Flame wars can and do arise in these threads, but after years of reading various test results from different testers I think we begin to see a kind of consensus emerge about knife materials, designs, and manufacturing techniques. It is almost like crowd sourcing for startups. If people think something can or will work they will reach agreement and support it. I would call it a kind of pragmatic consensus, and although it does not follow the scientific method exactly, real knowledge can be gained through this process. I mean think about it, the Japanese sword makers of yore did not understand the scientific method, but they did understand peer review and the pragmatic value of what worked and what did not. After all, if their sword designs failed the user of that sword was much the worse for it, and I am sure the review was very harsh. On the other hand, if the design succeeded then the maker had the opportunity to continue making and improving swords, and pass on what worked to the next generation. Forums like this create several generations within one lifetime.
 
Peer review is a very important component of the scientific process. It becomes most valuable when experimentation produces new or unexpected results leading to new or modified theories. This is why researchers are very careful about publishing test results. They want their own house in order before someone else comes through for an inspection. Testers on Bladeforums take great risk when they post results, especially if the tests are inclined more toward the objective side of the process. Other posters can take issue with the results, the methodology, or the value of the test premise itself. Flame wars can and do arise in these threads, but after years of reading various test results from different testers I think we begin to see a kind of consensus emerge about knife materials, designs, and manufacturing techniques. It is almost like crowd sourcing for startups. If people think something can or will work they will reach agreement and support it. I would call it a kind of pragmatic consensus, and although it does not follow the scientific method exactly, real knowledge can be gained through this process. I mean think about it, the Japanese sword makers of yore did not understand the scientific method, but they did understand peer review and the pragmatic value of what worked and what did not. After all, if their sword designs failed the user of that sword was much the worse for it, and I am sure the review was very harsh. On the other hand, if the design succeeded then the maker had the opportunity to continue making and improving swords, and pass on what worked to the next generation. Forums like this create several generations within one lifetime.


For the most part by now a lot of people on BF know BS when they read it so some of those so called tests are ignored these days.

Most of those are nothing more than flame bait really so it is best to ignore them.
 
Back
Top