The skill: Data documentation translates what your tables contain into what they mean. Without it, every new person who touches the data has to reverse-engineer the same answers from scratch, and they'll get some of them wrong.
In a Nutshell
- For every table, document six things: what it represents in business terms, what one row equals (the grain), key columns and what they mean, how often it updates, known caveats or limitations, and one example query.
- Put docs next to the code. dbt yml files, inline comments in the model SQL, or a README in the same directory. Anything that lives in a separate wiki will drift within weeks.
- Business context beats technical detail. "This table has one row per completed purchase, excluding refunds and test transactions" is more useful than "columns: id (bigint), user_id (uuid), amount (numeric)."
- Document metric definitions where they're computed. If
monthly_active_usersis defined in a dbt model, the definition belongs in that model's yml file, not in a Notion page three clicks away. - Include known limitations. "This table doesn't include mobile app events before March 2024" saves someone a week of debugging.
- Prune quarterly. Dead documentation is worse than no documentation. If a table is deprecated, remove the docs or mark them clearly. Stale docs erode trust in all docs.
What Good Looks Like
Here's a template for documenting a table. You don't need anything fancier than this.
Loading visualization...
The goal isn't perfection. It's reducing the time between "I have a question about this data" and "I found the answer" from hours to minutes.
Do's and Don'ts
Loading visualization...
Loading visualization...
Written with ❤️ by a human (still)