Evaluating Objectives and Key Results (OKRs), past, present and future.
Part 1: Binary Scoring
When I first got going with OKRs about five years ago, we didn’t apply a scoring scale. Each Key Result was either achieved or not. Things were simple. It was binary. If your Key Result was “10 new customers by end of quarter” and you ended the quarter with nine new customers, the Key Result was not met.
In fact, it was assumed that you’d hit 10 customers midway through the quarter, cross out the 10 and raise the bar to 15 and then end the quarter with 20. This approach is sometimes referred to as “set the bar high and overachieve.”
It was an unwritten rule that if your team achieved its Objectives, your team would be more celebrated and more likely to get promotions. Your team was successful to the extent that the OKRs were achieved.
To be clear, individuals on the team that achieved its Objectives were more likely to get a bonus. After all, shouldn’t a bonus be tied to success?
This system didn’t always work well nor did it claim to be perfect. Suppose a team actually had the Key Result “10 new customers” but ended up with nine. There would be a sense of failure given that we had a binary scoring system. In other words, nine was interpreted as “falling short.” Not by much, but still, the feeling was one of losing.
To combat the black and white nature of a binary scoring system, more refined scoring systems were developed in the OKRs space. If we look at Google for example, we can see how the introduction of a scaled scoring system adds an additional degree of sophistication that works well with their culture.
Part 2: Google-Style Grading on a 0-1 Scale
I first learned about how Google grades OKRs about the time the Google Ventures video came out in 2013. The idea was to standardize how all OKRs are scored across the organization with 1 representing a complete achievement, 0.5 as “pretty good” and 0 as “no progress.” At Google, the culture values stretch goals. So much so that scoring all 1s on your OKRs means you didn’t set your goals high enough. Now I recently heard a story about a Googler who set goals very high and then went on to achieve all them. Apparently, everyone assumed he sandbagged. I’m not 100% sure this story is true, but given the number of people working at Google, it’s likely that this scenario has occurred multiple times.
I see how Google’s normalized scoring model can be effective to an extent. It gives everyone a way of knowing how to measure success. While it may not be perfect, it is certainly more effective than a 0/1 or yes/no-binary method of scoring. If nothing else, it standardizes the conversations and streamlines communication about performance on Objectives.
As an OKRs coach, I find most organizations that implement a scoring system either score the Key Results at the end of the quarter only or at several intervals during the quarter. However, they generally do not define scoring criteria as part of the definition of the Key Result. If you want to use a standardized scoring system, the scoring criteria for each Key Result MUST be defined as part of the creation of the Key Result. In these cases, I would argue that a Key Result is not finalized until the team agrees on the scoring criteria. The conversation about what makes a “.3” or a “.7” is also not very interesting unless we define the “.3” and the “.7”. After discussing this with Vincent Drucker, I’ve arrived on the following guidelines that my clients are finding very useful:
Key insight from OKRs coaching:Gamify OKRs by including a consistent scoring system (aka grades) for every Key Result
Here’s an example showing the power of defining scoring criteria upfront for a Key Result.
Key Result: Launch new product ABC with 10 active users by end of Q3
- Grade 0.3 = Prototype tested by 3 internal users
- Grade 0.7 = Prototype tested and approved with launch date in Q4
- Grade 1.0 = Product launched with 10 active users
This forces a conversation about what is aspirational versus realistic. The Engineering team may come back and say that even the 0.3 score is going to be difficult. Having these conversations before finalizing the Key Result ensures everyone’s on the same page from the start.
An alternative approach to scoring Key Results that I first heard about from a super-cool colleague* takes yet another approach to scoring OKRs that emphasizes the future rather than the past, adding another layer of sophistication to the scoring model.
Part 3: Predictive Scoring
Most organizations that approach me for help with OKRs do have some form of scoring OKRs. However, their scores focus exclusively on “progress to date.” This generates a data point for each Key Result in the form of “X% complete.”
This data may have some value; however, more and more OKRs users are starting to include a predictive element to their scoring. Let’s go back the “10 new customers” Key Result to analyze why predictive scoring is gaining so much traction.
Say you signed six customers in the first month of the quarter. Great, you’re 60% complete! However, say you do not believe your team will sign any more customers because the pipeline is dry or a key sales representative just left for another job. If you had a way of communicating that, you’ve lost confidence and feel this Key Result will stay at 60% and will not be met, you could alert your colleagues. Predictive scoring serves as an early warning system, driving earlier communication to better manage expectations and leadership avoid surprises.
I predict scoring systems will continue to evolve. While some organizations getting started with OKRs may not grasp the importance of scoring, my prediction is that scoring will continue to be one of the most critical variables to get right for your organization to ensure a successful OKRs deployment for the long term. Some organizations may wind up taking a hybrid approach that combines predictive and historical scoring for OKRs.
However, when you score OKRs, remember the intent is to communicate targets, manage expectations and enable continuous learning.