Taming the Metrics Boogeyman: A Data-Driven Approach to Elevating Engineering Productivity

Georges Akouri-Shan
9 min readApr 20, 2024
Struggling to decide between data complacency and data-driven decision-making

As an engineering manager, I faced the recurring challenge of measuring team productivity effectively. This narrative explores my journey from a common refusal to quantitatively measure engineering productivity to a data-driven approach that transformed our decision-making processes.

The Challenge: Quantity over Quality

Several years ago, I was at the helm of two engineering teams tasked with building out a new platform. I found myself grappling with a significant constraint: a complete lack of control over our recruitment process. I was given engineers and tasked with making the best of the situation. Amid budgetary constraints, our program team eyed the allure of low-cost vendors, subscribing to the maxim that more is better — the old ‘quantity over quality’ perspective. This belief, however, ran counter to a principle we all know as Brooks’ Law, which reminds us that efficiency and productivity hinge not on the size of the team but on its cohesion, skill, and the strategic allocation of tasks.

Turning Data into Insights

Initially, the program team relied solely on story points to quantify productivity, highlighting a critical gap for me (more on this later). This metric alone couldn’t capture the qualitative aspects of our team’s output. To demonstrate the nuanced differences in quality across our vendors, I dove deeper into the world of measuring developer productivity. My inquiries into how our engineering managers approached this subject were met with a deafening silence, a testament to the prevailing skepticism towards metrics within our organization. Given my urgency to take control of our recruitment funnel, I had to find something fast so I leveraged Bitbucket and JIRA to identify additional key performance indicators and developed the following rudimentary yet revealing scoring system:

  1. Number of Merged pull requests (PRs): +10 points
  2. Number of comments received on my PRs: -3 points
  3. Number of comments given on others’ PRs: +3 points
  4. Number of times “Needs Work” is marked on my PRs: +3 points
  5. Number of times “Needs Work” is marked on others’ PRs: +3 points
  6. Number of story points completed: +1 point

Drawing on four months of data across our teams, I confirmed my suspicions that the preferred, cost-effective vendor was a net negative on our productivity, whereas the more premium options delivered up to tenfold in value. The data table below uses mock data to showcase how I put this together:

Sample scoring system to differentiate performance across engineering teams

Armed with this information, I walked our head of engineering and our head of program management through my logic. Both of whom found the evidence overwhelming. This pivotal moment marked the start of a new era: one where I took the reins of the recruitment process for my organization to ensure a quality bar was met with each hire.

The Dark Side of Metrics

While the inception of this scoring system marked a significant step toward a rather simple way to quantifying the difference in productivity, it simultaneously unfurled the complexities associated with metric fixation. The cautionary tales echoed by experts, including Patrick Kua in his aptly titled piece The Appropriate Use Of Metrics, serve as a sobering reminder of the pitfalls of a narrow focus on quantitative measures. Kua adeptly elucidates the perils of over-optimization, encapsulating the essence of Goodhart’s Law: once a measure becomes a target, it ceases to be an effective measure.

It would be irresponsible to continue talking about metrics without highlighting a fundamental paradox in the use of metrics: the very act of targeting specific metrics can lead to behaviors that may undermine other essential aspects of performance. Reflecting on a past episode with our primary vendor who is contractually obligated to maintain a certain velocity of story points, we encountered unintended consequences that served as a potent lesson in the pitfalls of metric fixation:

  1. The Story Points Bubble: Our vendor relentlessly pushed its engineers to uphold points velocity at all costs, leading to an overestimation on each task and, over time, creating a facade of increased productivity. Program and engineering managers alike reveled in the glory of repeatedly hitting unprecedented productivity levels, unknowingly perpetuating this bubble. The discrepancy between reported success and actual progress exemplifies the perilous disconnect that can arise from misaligned metric incentives.
  2. Chasing Numbers Over Impact: Engineers, cognizant of the emphasis on maintaining a certain velocity, started to eschew smaller, potentially impactful tasks in favor of larger ones with inflated estimates. This behavior was not borne out of a desire to shirk responsibility but a rational response to the system’s incentives. The pursuit of “hitting targets” eclipsed the pursuit of meaningful contributions, leading to a skewed distribution of effort that favored the appearance of productivity over tangible outcomes.

This narrative underscores a broader spectrum of metrics-related pitfalls that organizations often grapple with:

  1. Sacrificing Innovation and Quality for Metrics: A narrow focus on metrics can deter teams from engaging in innovative and creative work, often leading to a compromise in the overall quality of outcomes.
  2. Misalignment of Incentives: Metrics fixation can cause a shift away from broader organizational objectives, leading teams to adopt behaviors that fulfill metric targets at the expense of strategic goals and genuine improvements.
  3. Risk of Data Misrepresentation: The pressure to showcase positive metrics can lead to the alteration or misreading of data, resulting in decisions informed by skewed or incorrect information.

Reflecting on these pitfalls, my journey (and I hope yours) moved forward with cautious optimism. While the use of productivity data is invaluable for measuring and guiding progress, its adoption necessitates a balanced approach that values the underlying objectives of improvement and growth over simply hitting a target.

Navigating Organization Growth with Visibility

Once I gained control of our recruitment process, I had quietly let my scoring system die, fearing its misuse. If you’re reading this, then perhaps you’ve reached a point of growth in your organization that is now challenging your ability to guarantee a quality bar. Are you finding yourself overwhelmed by the immediate demands of your daily operations, leaving you little time to stay ahead of performance concerns?

I hope you will come to a similar realization that hiding away from data is no longer sustainable. It was during this time that I was reminded of a key lesson from The Phoenix Project, a great fictional story about a manager’s challenges in implementing DevOps principles. One of the many lessons it offers is the importance of making work visible to escape the relentless cycle of our busyness.

Drawing on this for inspiration, I knew we had to bring some aspect of the scoring system back. Keeping the dark side of metrics in mind, my focus was on cultivating a culture of recognition and motivation, not one of fear. I collaborated with one of our top engineers to put together a leaderboard showcasing only two metrics: the number of merged PRs and the number of comments made on others’ PRs activity. I believed these metrics were safe enough that if engineers decided to “game” them, it wouldn’t be such a terrible thing. Lastly, we tied financial incentives to reward the top 3 performers in each metric for each month.

Data as a Tool For Recognition

The integration of a leaderboard system into our engineering operations offered a unique vantage point on motivation, performance, and growth within the team. This initiative emphasized an important lesson on motivation as a driver: the power of recognition far surpasses that of fear. It reaffirmed the principle that leadership’s role in metric implementation is crucial; without the proper incentives and oversight, such endeavors risk faltering under less attuned management. It’s worth noting that it is an absolute requirement for engineering leadership to be the spearhead on these initiatives. Choose complacency before allowing non-engineering managers to take the lead or risk total failure.

Optimizing Leaderboards to Maximize Participation

Leaderboards, while effective in driving motivation, present their own set of challenges. In a prior experience consulting for Sky TV, a premier subscription TV provider in Central America, I tapped into Yukai Chou’s gamification strategies to explore the complex dynamics of leaderboards.

One of many insights from Chou’s work is the dichotomy in response among different performance tiers: while those who enjoy competitive environments excel, those in the mid to lower tiers face the risk of disengagement. To address these challenges and ensure a balanced, inclusive experience, we refined our approach by:

  1. Adjusting for Role-Specific Advantages: To correct the imbalance stemming from team leads’ roles, which often involve more frequent commenting on PRs due to their responsibilities, we excluded them from the primary leaderboard. This change was made to level the playing field and ensure a fairer competition among individual contributors.
  2. Supporting Personal Growth: Acknowledging the diverse motivational drivers within our team, we implemented a feature allowing less competitive members to monitor their personal progress. This allowed team members to set and pursue individual growth goals, fostering a culture of self-improvement irrespective of peer competition.
Leaderboard example of filters including datepicker and role-based filtering
Leaderboard example of filters including datepicker and role-based filtering

Cultivating Engagement and Growth Among Junior Talent

The leaderboard, more than just a metric tracker, emerged as an essential tool for junior engineers, mirroring their engagement with our codebases and their evolution over time.

By providing a means to quantitatively monitor their involvement, the leaderboard highlights the superior value of engaging directly in coding and peer review practices over passive learning methods like reading documentation or watching tutorials. It’s in analyzing and modifying real code that the most valuable learning occurs.

We also witnessed how the leaderboard was helping to foster a growth mindset. With each metric tracked, junior engineers can see tangible evidence of their progress. This visibility transforms their metrics into milestones and motivators. Such feedback inspires engineers to continually surpass their past achievements and cultivate an ethos of relentless self-improvement.

Personal leaderboard showing a developer’s stats month-over-month
Personal leaderboard showing a developer’s stats month-over-month

Using Leading Indicators to Proactively Recognize Patterns

As an adamant supporter of Ben Horowitz’s Law of Crappy People, which states that “for any title level in a large organization, the talent on that level will eventually converge to the crappiest person with the title”, I along with any engineering manager should look to root out and resolve performance issues as soon as possible or risk a spreading virus.

Armed with data, our engineering managers can now gain foresight into potential challenges and opportunities within the team before they develop into deep-rooted issues. Here are a few notable insights:

  1. Disengagement or burnout: A decline in activity levels signaled the need for timely intervention, preventing burnout before it could take root.
  2. Collaboration and communication gaps: A consistent lack of comments from specific team members may signal collaboration or communication issues, indicating a need for targeted coaching to enhance team engagement and peer review participation.
  3. Spotting potential: On the positive side, engineers demonstrating a significant uptick in contributions can be noticed more quickly, allowing engineering managers to appropriately recognize them or assign them more challenging work.

Final Thoughts

Our deep dive into the integration of data-driven methodologies within engineering management has distilled some crucial insights. First, metrics are indispensable in navigating and fine-tuning our operations but must be employed judiciously to support rather than dictate our strategic goals.

Second, our exploration has illuminated not only the benefits but also the inherent risks associated with data-driven decision-making. The key lies in fostering a productive environment without slipping into the pitfalls of metric fixation, which can stifle creativity and promote a culture of fear.

Third, we’ve learned that vigilance and adaptability are essential. Adjusting our approach in response to feedback and changing conditions ensures that our use of data evolves to meet ongoing challenges effectively.

For those of you considering the shift towards more data-centric management strategies, the directive is clear: sidestepping the use of informed, analytic decision-making is no longer viable.

What’s Next

The following are the next initiatives aiming to further our use of data. Each of these initiatives is designed with the dual aim of enhancing our operational efficiency and nurturing a positive, productive work culture.

  1. Future state of the leaderboard: we’re currently expanding the leaderboard to more teams and envisioning an enhanced version that celebrates team achievements, emphasizes quality metrics like bugs squashed and technical debt reduction, including code smells and linting improvements.
  2. DevX Surveys: Implementing regular Developer Experience (DevX) surveys to gather feedback directly from engineers. These insights will guide improvements in our processes and tools, ensuring that our environment continuously evolves to meet the needs and preferences of our developers.
  3. Integration of DORA Metrics: Adopting the DORA (DevOps Research and Assessment) metrics as a comprehensive framework to evaluate our development practices. This will allow us to measure performance in areas critical to DevOps success, including deployment frequency, lead time for changes, change failure rate, and time to restore service, thereby aligning our metrics with industry best practices for software development.

If you’re ready to bring data into your engineering organization and are seeking guidance or insights, don’t hesitate to reach out!

--

--