I often read the claim that there are "too many variables" and therefore it's impossible to "prove" anything. However, that's what makes a problem suited to the scientific method, rather than the trial & error, engineering approach.
In my opinion, the first step to understanding stropping is to set aside the concept of "grit" or at least the assumption that grit and keenness/sharpness are correlated.
Well, you can certainly prove some factors, but wthout context they don't have a lot of predictive value - this is my gripe when folk draw blanket conclusions re stropping (and some other means and technique for that matter). Even when talking about media that are relatively stable like a Norton Econo stone there is going to be a large range of user observations based on individual understanding and execution. For the scientific method to work, we need a question to be answered, a hypothesis based on observable phenomena, and a means to quantify and influence those phenomena.
If we start with the question "what does stropping do to a cutting edge?", we then need to make some observations, but of what - surely just looking at the edge isn't going to cut it. The edge, the strop density and surface characteristics, the abrasive, the amount of applied force, what are we forgetting...? Oh yeah, the type and HT of the steel might be helpful.
The variables are many, especially if your goal is to come up with tests that other people can replicate with accuracy. Lacking that it is still interesting to note what's happening with impromptu noodling, but at the best one can only spot gross trends and not anything predictive. Leaving terms like "grit" aside, one could at least build a model of one abrasive sample in a handful of applications and the observable effects, whatever they might be.
I tossed most of my notes with exception of the formulas, but when mixing up the compound for my Washboard project I noticed even small tweaks to the abrasive mix and binder ratios made a difference, and that all on the same test surface. On top of that, every additional sheet of paper changes the unit pressure and amount of surface give - changing the finish on the edge, the amount of edge wrap, rate of loading, contribution of burnishing...
To put it together I'd have to begin with Shore A and D durometers. Then one can figure out a comparative means of measuring/estimating unit pressure, leading (I would think) to a very good predictive model of edge rounding at given geometries per edge width/applied force/speed (all controlled for by steel type and RC of course!). We're already getting into it pretty deep at this point and haven't conducted a single test, but how else to put the results in context. I am certain if you checked a handful of commercial and homemade strops the values would be all over the map, as would the abrasive holding force of different compound binders and their abrasive content. Sounds crazy, but these are all variables that don't exist to the same extent with most sharpening stones yet have a lot of influence.
In the micrographs above that is the same abrasive but with very different unit pressures although IIRC applied force was nearly the same.
This is why I don't consider most of what's out there on the topic to be research so much as observations based on tinkering - therefore not worthy of conclusion
outside the specifics of the actual tinkering/individual.
I suspect anyone who really delves into this without being paid good money to do so might rapidly lose their mind.