You're here when: You're trying to measure something inherently multi-dimensional, like user engagement, account health, or lead quality, and you're debating whether a single number or a composite score is the right approach.
The Heuristic
Start simple. Stay simple until a simple metric has provably led you to a wrong decision.
- Who consumes this metric? Executives and non-technical stakeholders need one number. If the audience is a data team doing deep analysis, maybe a composite adds value. Maybe.
- Can you debug it? When a simple metric drops, you know what changed. When a composite score drops, you have to decompose it to figure out which input moved. That extra step kills response time.
- Has the simple version actually misled you? Not theoretically. Actually. If DAU told you users were healthy when they were churning, that's a case for something richer. If nobody can point to a real decision that went wrong, the simple metric is working.
- Can you explain it in one sentence? If describing your metric requires a paragraph about weights, normalization, and thresholds, nobody outside your team will use it.
Decision Tree
Quick Example
A SaaS company built a "Customer Health Score" combining login frequency (30%), feature adoption breadth (25%), support ticket volume (20%), NPS response (15%), and contract renewal date proximity (10%). It took a data scientist two weeks to build. When the score dropped for a key account, the CSM couldn't tell what was wrong. Was the customer logging in less? Using fewer features? Filing more tickets? She had to decompose the score manually every time, which meant she stopped using it. They replaced it with a single metric: weekly active usage of the core feature. It wasn't perfect, but the CSM could act on it immediately.
Composites Hide Causality
The fundamental problem with composite scores is that they compress information. That's the whole point, but it's also the whole problem.
When your engagement score goes from 72 to 65, you've lost information about what happened. Maybe login frequency dropped. Maybe feature adoption stayed flat. Maybe two inputs moved in opposite directions and partially canceled out. To understand the change, you have to reverse-engineer the score, which means you're doing the work the score was supposed to save you.
Simple metrics are debuggable by default. "Weekly active users dropped 12%" tells you exactly what happened. The investigation starts immediately: which segments? which cohorts? what changed last week? With a composite, the investigation starts with "which component of the score caused this?", an extra layer of indirection before you get to the actual problem.
The rare cases where composites earn their complexity are when you've exhausted simple metrics and can demonstrate that they consistently mislead. A lead scoring model that combines firmographic data, behavioral signals, and engagement metrics might genuinely outperform any single input. But even then, the composite should be built from simple metrics that are independently tracked and monitored.
The Anti-Pattern
The Black Box Score. Someone builds a 7-variable weighted health score. The weights were chosen by "intuition" and never validated. Nobody outside the data team can explain what the score means. When it surfaces a problem, the response is always "let me dig into the components," which takes a day. When it doesn't surface a problem, nobody knows if the score is working or just wrong. It lives on a dashboard that gets glanced at weekly and acted on never. Six months later someone asks "does anyone use the health score?" and the answer is silence.
Written with ❤️ by a human (still)