Changing the Ratings

Back to the spreadsheet! I bet you never thought you’d see so many entries about an Excel sheet in a primarily shaving blog…

I live by rating the shave products I use, along with the comfort, quality and overall ratings of each shave. It was the only way I could think of to track what I used and how they worked for me. And they have worked.

But it may be time for a change.

Let’s first talk about what exists now. Every one of the products I use, both hardware and software, with the exception of blades, gets one rating on a five-point scale, 1 being lowest, 5 being highest. If something works supremely well, it gets a 5. If it just sucks at everything it does, it gets a 1. Nothing has scored a 1.

Blades get two ratings from me, and one rating from elsewhere. The rating from elsewhere is just the Razor Emporium aggressiveness rating for blades, again a five-point scale from mild to aggressive from 1-5. The other two ratings are comfort and quality, and are also on a five-point scale from 1 (bad) to 5 (perfect).

I think for the products, these ratings scales work just fine. I don’t need to refine them because they do their job fine as it is.

But for the shaves, there’s a different need. I give each of the shaves a comfort, quality, and overall rating on the same five-point scale as the blade comfort and quality. The overall is just an average of comfort and quality.

The problem with these is that I’ve reached a point where I’d like to be able to differentiate shaves at the top end better. As you’ve probably noticed in the SOTDs, the “almost perfect” shaves have the same overall ratings as great shaves that aren’t almost perfect. And that bothers me. And while I’ve been using a .25 decimal in each ratings step (for example, 4.25, 4.5, etc.), this leads to an average that’s in .125 point steps, and somehow it feels imprecise and limiting the more I think about it.

For instance, I’ve rated a shave with a 5.000 rating for comfort, even if there’s some sting from the alum. That shouldn’t happen, but the next step down is 4.750, which would kick the shave into a different category. And a 5.000 rating for quality should be absolutely perfect, and from experience, that would not be a very frequent result.

One thing I’ve learned in analyzing data and reporting on the data is that you need to decide what story you want the data to tell. Clearly, I’m wanting this to tell the story of what’s good and what’s bad, but also how good is this item or shave’s “good,” or just how bad is it. And is the scale illustrative enough to really show the difference between ratings? If my worst shave ever is the lowest rating, and the best shave ever is the best rating, what happens to everything in between, and what do I do if something falls outside of those marks?

So here’s how my thinking went.

I created a problem or vision statement to describe what I wanted: I want a more precise measuring scale to capture more absolutely accurate and more relatively precise shave ratings in a way that is clear and easy to understand.

So then, I had to look at considerations. First, my current shave measuring scale has 17 possible ratings (and roughly double that for overall shave ratings with the possibility of averaging down to .125 point steps.

Second, I want something precise but clear–a scale that recognizes that a rating of, say, 36, is better than 35 by just a small amount. But at the same time, how much differentiation do I need between a step? Could I quantify how much better the comfort is of a shave that’s 36 versus 35? In other words, I need the scale to reflect a difference between each point but still not be so large as to make the gap between rating points unimportant.

Third, it needs to be realistically representative of the shaves. That is, I’ve had to carefully consider whether to give a shave a 5.000 or a 4.750 for comfort and quality just to make sure the overall score matches where I think the shave belongs. At the same time, bad shaves sometimes still get scores over the midpoint, and they shouldn’t. So I want to have the individual comfort and quality ratings to stand on their own and not be influenced by what I think the overall score should be.

Fourth, this scale should have a midpoint or somewhere that is a clear point for an average shave, a point that will be an absolutely perfect shave with no discomfort and no stubble, and a point for shaves that weren’t finished due to intense discomfort. And there need to be arbitrary, but definable steps within the scale. In addition, the scale needs clear percentiles so that almost perfect shaves can occupy the top 5 or 10%, and other ratings of shaves can fall into specific percentile ranges (great shaves, say, 81-90%, for instance). At the same time, though, the scale should fit into the concept of our base-10 counting, so probably have a top end that’s a multiple of 5, because if the scale ends has a top end at 41, 26, or 16, that isn’t intuitively a high point.

Fifth, if I decide to do the work to migrate existing ratings to the new rating scale, it needs to be fairly easy to do so, based on notes and the existing rating given.

Here’s what I considered, but ultimately rejected:

Doubling the existing rating scale. This probably would have been the easiest to transition existing ratings to, but it didn’t provide enough clarity or steps to capture the range of ratings I’d like to capture. For instance, a 10% difference in shaves would only be about 3 steps in the scale.
Expanding to a 13 point scale (0-12). This would have required half step increments, which I ended up deciding I didn’t want. I decided from this consideration that ratings will be a whole number, and averages will round down primarily so that an almost perfect shave cannot round up to a perfect shave rating.
Using a 40 or 50 point scale. This provided enough room, certainly, but as I thought about it more, there just was too much room on both sides of the midpoint. I didn’t need 20 or 25 steps of “good” shave.
A 100 point scale (or actually 101, with a 0 for unfinishable, and 100 for absolutely perfect). This just felt too grandiose, and one where I’d find myself debating between an 89 comfort versus a 90. It would have just made me frustrated at figuring out ratings.

So I settled a bit. 25 started feeling like a sweet spot. Percentiles would be 4% apart with each step. A 23/24 shave would average to a 23, and would still be over the 90th percentile, which feels like it’s in the range of an almost perfect shave. A perfect shave would be a 25, but I’d still use 0 as a shave that could not be finished as-is. There isn’t a perfect midpoint, and I need to accept that. But I think the step differentiation is such that a shave wouldn’t have to be exactly on the midpoint, and just okay. It could be just either side of just okay.

I know that most of my shaves going forward will be on the good side, so that’s really what I was considering here. While right now, they all cluster at 4.750 to 4.875 overall ratings, I’d like to see great, excellent, almost perfect, and perfect shaves spanning between about 70-100%, and this scale seems to accommodate that.

I should be able to track back and update existing ratings to match the new scale, with the new midpoint (or just above) tracking back to a shave around 4.0 on the old scale, which seems right. And the Kai shaves, one of which I didn’t finish with that blade, would rate a 0.

So, how do I implement this? I’ll finish defining the ratings steps, and I hope to start using it with next week’s SOTDs. and I’ll slowly go back in my spreadsheet and update the ratings there, but I don’t think I’ll update the past posts with the updated scale.

Stay tuned and watch for the new ratings!

Comments

Leave a Reply Cancel reply