My apologies for the long delay in reply. I have posted much of what I will relate before in other threads on this same topic:
When I watch Noss go to town on a knife, that tells me bupkiss about how the limits of MY knife. The simple fact of the matter is that his "tests" are done on ONE single knife, and it's not my knife. We don't know whether it was a lemon or not, we don't know what batch it was, etc. Does that help me know how far I can take my knife? Of course not. It's a different knife. Maybe we can get some loose extrapolations, but any scientist should know that you have to have a larger sample size than 1 to draw any kind of meaningful conclusions. And that, I think, is the true measure of repeatability. It's not repeated hits with a hammer. It's repeating the test on multiple subjects. It matters not at all to the scientific validity of the test whether other people would do the same. It matters whether the same test is done to multiple knives. That doesn't happen.
Just because his knife does or doesn't break in one given test doesn't mean that mine will behave in the same way.
At the end of the day, you should probably do SOME basic tests to find out if it's a lemon and has bad heat treat. Aside from that, the true test as far as I'm concerned is whether you treat your tools with respect or not. If you're the kind of person to bang on a knife with a hammer until it breaks, you deserve what's coming to you in my opinion. But without a broader sample size, those videos are of strictly limited value.
I do not disagree with any of this, and I doubt Noss would argue it either - in fact, he has said so many times. EVERY user should put their own tool through basic testing (i.e. use) up to their expected level of stress. Destruction tests are meant to go beyond that, to test the limits of the
tool rather than the limits of the user's common activities. If a noss-tested knife performs poorly in a task where your own knife excelled, submit that data and feel confident in your own tool :thumbup:
But regarding the request for replicates:
Quality Control is the job of the manufacturer, NOT the user or the independent tester. In any large batch production, there will be poor samples, and it is the manufacturers job to minimize these (by improving manufacturing processes) as well as testing random samples from each batch
themselves. How many should they test?
In production procedures in my laboratory, I routinely sample cultures for bacterial contamination. How many samples do I take? ONE. Why do I not take more? Because
previous experience (experiments) have refined our asceptic production techniques such that a single test is now sufficient to catch a mistake that may permeate the entire batch (unless we have specific reason to suspect an error elsewhere), and taking more samples is wasteful. SO, with an n=1 we qualify an entire batch/lot of samples produced by a uniform procedure expected (based on previous internal experimentation) to evince identical (at our level of precision) quality. If that ONE sample fails, we scrap the entire lot assuming that the one is representative of the whole.
Now, that is one specific area of production, and producers/manufacturers should establish their own QC policies regarding how many samples must be tested (and in what manner) to ensure sufficient quality of the lot for distribution, whether it be <1% or 100%. If they test 5 random samples out of 100, and 1 sample fails to meet specifications, does that mean that 1% or 20% of the lot is below specifications? It means that 1% is below and there is a 20%
probability of others in the lot also being below - there may
not be any more lemons (80% probability), but it is up to them to decide how representative that single lemon is, what effort should be exerted to remove any other lemons from the lot, and what (if anything) should be done to prevent them in the future.
Bottom-line,
every individual sample that the manufacturer decides to distribute SHOULD BE representative of every other (n=1). What a user experiences in ONE tool he can expect to experience in every other (n=1). THAT is good QC. In the case of "extreme knives", the makers should be "extreme"-testing their knives
themselves prior to releasing advertising regarding their proposed use,
especially if the product is relatively expensive and advertised for '
hard-use', then they will have data to present when an n=1 challenge is presented, lest they lose business should the n=1 test go against them. It is wiser for a maker to under-estimate the capabilities of his product and suppress hyperbolic hypotheses. Let the data speak for itself, and n=1 is data, however limited.
Noss recieves a single random sample to 'test' n=1. That single sample SHOULD BE representative of the entire batch if the individuals in that batch were produced in reliably repeatable fashion - every other sample from the same batch subjected to the same procedure by the same technician should give the same result (within a reasonable variance to cover the noise of different day, temperature, and other factors which should not drastically effect the performance of sample or technician). If that single sample performs badly, why should Noss bother with another one? QC is not his job (nor is testing knives, for that matter).
Since it is the manufacturer's job to produce reasonably identical samples, we should give them the benefit of the doubt and assume it is not a lemon (poor QC). So how repeatable is this test on another identical sample? Given the level of precision involved in generating data for each test (i.e. minimal), the levels of stress to which each sample are subjected (i.e. maximal), and the number of repetitions performed by this technician on entirely different samples coupled with extensive (rigorous) video documentation of each test, the tests
appear to be quite repeatable in regards to the results they would generate. Would the testing methodology be more rigorous if more samples were tested? Of course.
But it shouldn't be necessary (and no matter how many knives Noss tests he still will not have tested
your knife). These are NOT precise experiments and should not be expected to be so. They demonstrate over the course of an hour what a user might expect to experience over the course of many years of such (ab)use without touching up the blade-edge (signs of poor care for an important tool). Such experiences are never precisely repeatable, esp. for different users, so
the acceptable level of variance between each test result is quite large. These are
destruction tests, the entire point is to find out a general level of stress the tool can endure before fracture, i.e. when fracture/failure be can expected, and fracture in a knife is not a small event requiring precise measurement of the stresses involved at the moment of incidence.
The take-away message is 'YMMV
but don't be surprised if you have this experience'. Is the data of limited value? Yes, n=1. The danger of carrying conclusions from an n=1 demonstration too far -
"All generalizations are false, including this one." But what makes these tests so valuable is that they go beyond the simple (light handling) user reviews full of hyperbole regarding
hypothetical performance which comprise the VAST majority of knife reviews. Noss' 'tests' are not hypothetical. Need to know the limits to which your knife can be subjected? N=1 is a great deal more valuable than n=0.