Measuring Quality - It's Not Counting Bugs

“How’s the quality looking?”

“Well, we’ve got 47 open bugs, down from 52 last week.”

I’ve heard this conversation countless times - I’ve had this conversation countless times. And it really irritates me.

Not because the numbers are wrong - they might be perfectly accurate. But because measuring quality by counting bugs is like measuring the health of a restaurant by counting the number of complaints. Sure, it tells you something, but it completely misses the point.

The Bug Count Trap

There are three big problems with using bug counts as a quality metric:

Firstly, it encourages all the wrong behaviours. Teams start rejecting valid bugs with “That’s not really a bug, it’s Working As Designed/User Error/Insert Excuse Here”. They close bugs quickly without proper fixes. They avoid testing risky areas to prevent finding issues that would make the numbers look bad.

Secondly, it tells you almost nothing about what users actually experience. A system with zero bugs that takes 30 seconds to load a page isn’t high quality - it’s just slow. An application with five minor cosmetic issues that never crashes and processes transactions flawlessly? That might actually be pretty good.

But the real problem is that Bug Counts completely ignore everything else that matters.

What Actually Matters

Remember our definition from last time: quality is value to some person who matters. So let’s think about who those people are and what they actually care about.

From a quick brainstorm I did with a few teams a while back, here’s who typically matters for software quality. You’ll likely come up with more for your context: customers and their users, support teams, development teams, operations folk, sales teams, compliance people, and your executives or board.

And what do they care about? Everything from function and reliability to diagnostics, security, documentation, maintainability, how you’ll make money, and whether the system can scale when that big customer signs up.

Measuring just bugs captures maybe 20% of this picture. We can do better.

Better Quality Metrics

All quality metrics are proxies for some aspect of quality. Here are some proxies for aspects of quality that I’ve found much more useful than bug counts. You don’t need to use all of them - pick ones that matter for your context.

Function and Reliability: Customer-reported functional issues (severity matters more than count), percentage of features shipped that need immediate follow-up fixes, system availability and performance metrics.

Complexity and Maintainability: Average time from issue identified to fix released, how often bug fixes actually solve the problem (vs creating new ones), velocity trends (teams slow down when working with poor quality code).

Diagnostics and Supportability: Average time to reproduce reported issues, percentage of support calls resolved without escalation to development, number of issues resolved just by updating documentation.

User Experience: Time to complete key user workflows, support ticket volume (not just bug reports), customer satisfaction scores.

Technical Debt: Code review cycle time, test coverage trends (not just absolute percentage), deployment success rate.

The magic happens when you track several of these together. A spike in “average time to fix” might warn you of accumulating technical debt weeks before bug counts would show an issue. A drop in deployment success rate combined with increasing support tickets tells a clear story about quality degradation.

The Inconvenient Truth About Implementation

Now here’s the bit that everyone knows but nobody talks about: you probably won’t implement most of these metrics.

Why? Because they’re harder to measure than counting bugs. Bug counts come free from your ticket system. Performance numbers come from that test suite you already run. They’re easy, so we measure them, then convince ourselves they’re “good enough.”

“Average time to reproduce issues” sounds like a great metric until you realize you’d need to instrument your support process to track it. “Percentage of features needing immediate fixes” requires someone to categorise follow-up work. “Customer satisfaction scores” means setting up surveys and analysing responses and while you probably do have this one, it’s probably in some different support or sales system somewhere kept safely away from engineering.

So engineering teams stick with bug counts and tell themselves it’s pragmatic and anyway in their particular circumstance it’s a really good metric and besides it’s worked just fine for the last decade, and, and, and…

Build Quality Measurement Into Your Architecture

The solution isn’t to measure everything poorly or to ignore the cost of tracking. It’s to decide what you want to measure at the start of your project and build the ability to track it into your system.

Want to track user workflow completion times? Design telemetry into those workflows from day one. Worried about support escalations? Build structured logging that helps you categorise issues automatically. Care about feature stability? Design your deployment pipeline to track which features generate follow-up tickets.

This isn’t just about tooling - it’s about making conscious architectural decisions that enable the quality insights you’ll want later.

I worked with one team that identified early on that “time to reproduce customer issues” would be their key quality metric. So they built detailed session replay and structured error logging into their product from the start. When customers reported issues, the support team could often reproduce them in minutes rather than hours - and what initially started as a way to make it easy to measure a service metric became an incredibly valuable tool, and the investment in measurement infrastructure paid for itself within months.

Making It Practical

Start small. Pick three metrics that represent different aspects of quality for your specific context: one that measures user-facing value, one that measures technical health, one that measures operational effectiveness.

But here’s the key: don’t just pick them and hope for the best. Spend a day figuring out exactly how you’d measure each one. If it’s too hard with your current setup, either simplify the metric or build the measurement capability.

Track them for a month or two. See what stories they tell you. Adjust based on what you learn.

And the critical piece is to then take action. Make sure that these metrics actually drive better decisions. If tracking something doesn’t change how you work, stop tracking it!

A Word of Warning

Measurement isn’t quality. You can have perfect metrics and still build rubbish if you’re measuring the wrong things or not acting on what you learn.

Also, anything you measure will be gamed. That’s not necessarily bad - if people are gaming your deployment success rate by being more careful about releases, that might be exactly what you want. Just be aware of it.

The Bottom Line

Bug counts tell you almost nothing about quality. Quality is multi-dimensional, and your measurements should reflect that.

Choose metrics that represent what different stakeholders actually value. More importantly, build the ability to measure these things into your system from the start - don’t wait until you need the data to figure out how to collect it.

Track just enough to make better decisions, but not so much that you’re spending more time measuring than improving.

Most importantly, remember that the goal isn’t perfect metrics - it’s building software that people value and enjoy using. Everything else is just a means to that end.

Originally published on Edmund Pringle’s Substack. Follow Ed for more on software quality and engineering leadership.