Should I centralize data ownership or keep it distributed?

You're here when: Multiple teams are building their own dashboards, defining their own metrics, and occasionally getting conflicting numbers. Someone asks "should we centralize this?" and the answer isn't obvious.

The Heuristic

The answer is almost always "both." Centralize the things that need to be consistent. Distribute the things that need to be fast.

Centralize definitions. If two teams can define "active user" differently, they will. One shared definition, one source of truth.
Centralize infrastructure. One warehouse, one transformation layer, one set of core data models. Not because centralization is inherently better, but because maintaining parallel infrastructure is expensive and divergence is inevitable.
Distribute analysis. The marketing team shouldn't need to file a ticket to explore campaign performance. Self-serve access to clean, modeled data is the whole point of good infrastructure.
Distribute dashboards. Team-specific views, ad-hoc queries, and exploratory analysis should live close to the team that uses them. Centralized doesn't mean every chart goes through a review board.

Decision Tree

Loading visualization...

What to Centralize vs. What to Distribute

Loading visualization...

Quick Example

A Series B marketplace had a classic conflict. Marketing reported MRR as $520K. Finance reported it as $445K. The CEO saw both numbers in the same board meeting and lost confidence in both teams. The root cause: marketing counted annual contracts at their monthly rate starting from signature date. Finance counted them starting from activation date, and excluded contracts in a grace period. Neither was wrong, they were using different definitions. The fix wasn't picking one team's number. It was defining MRR once, in a shared data model, with explicit inclusion and exclusion criteria. Both teams could still build their own dashboards, but the underlying metric came from one place.

The Data Mesh Trap

Zhamak Dehghani's data mesh framework is compelling on paper: treat data as a product, give domain teams ownership, build federated governance. For companies with hundreds of engineers and dozens of data producers, it's a legitimate architecture.

For startups under 100 people, it's a recipe for chaos. Domain ownership works when each domain has the staffing to maintain data quality, documentation, and SLAs. At a startup, "domain ownership" usually means "the one engineer on that team who sort of knows SQL is now responsible for data quality." The result is inconsistency, duplication, and the exact metric conflicts you were trying to solve.

The pragmatic path for startups: centralize infrastructure and definitions with a small data team (or a single analytics engineer), then distribute access and analysis as broadly as possible. You can always decentralize ownership later when you have the headcount to support it. You can't easily re-centralize after every team has built their own incompatible data stack.

The Anti-Pattern

The Metric Civil War. Marketing says MRR is $500K. Finance says it's $420K. Product says DAU is 15,000. Growth says it's 12,000. Nobody knows which definition is right because each team built their own queries against raw data with their own filters, their own date logic, and their own inclusion criteria. Meetings devolve into arguments about whose number is correct instead of discussions about what to do. The CEO starts asking "which number should I believe?" and nobody has a good answer.

Loading visualization...

Written with ❤️ by a human (still)