Writing SQL queries

The skill: Writing SQL that other people (including future you) can read, debug, and trust. Readable queries catch bugs. Unreadable queries create them.

In a Nutshell

CTEs over subqueries, always. CTEs (Common Table Expressions) are named building blocks that let you break a complex query into readable steps, like chapters in a book. Subqueries are queries nested inside other queries, and they become impossible to read fast. CTEs read top-to-bottom. Name each one like a chapter title.
SELECT only the columns you need. SELECT * means "give me every column in this table" — convenient, but dangerous. Explicit column lists act as documentation and prevent breakage when tables change.
COALESCE your NULLs. NULLs are the database's way of saying "no value here." COALESCE is a function that provides a fallback value when data is missing — so instead of a blank, you get a zero or a default label. Every join that might miss a match should have a fallback value.
Filter early, aggregate late. WHERE clauses are filters. Push them as close to the source tables as possible so you're doing less work on smaller data before combining it.
One CTE, one job. If a CTE is doing filtering AND aggregation AND joining, split it.
LIMIT first, then remove it. Validate logic on 100 rows before running on 100 million.
Comment the business rule, not the syntax. -- Exclude free trial users who never activated beats -- filter where status != trial.
Consistent formatting. Pick a style (lowercase SQL keywords, uppercase, whatever) and stick with it across the team.

The CTE pattern that scales

The cleanest analytics queries follow a three-layer pattern:

Import CTEs pull from source tables with minimal filtering. Name them after the source: raw_events, raw_users. This is where you apply date filters and basic WHERE clauses (your first round of filters).

Logic CTEs do the heavy lifting. Joins, aggregations, window functions (calculations that look across multiple rows at once, like ranking or running totals), business logic. Name them after what they produce: daily_active_users, revenue_by_plan. Each one should be independently testable — you should be able to SELECT * (grab all columns) from any CTE and verify it makes sense.

Final SELECT is just a simple query against your logic CTEs. If your final SELECT needs a complex CASE statement or a 5-way join, you probably need one more CTE.

This pattern makes debugging trivial. When a number looks wrong, you comment out the final SELECT, replace it with SELECT * FROM [suspect_cte], and check each layer independently. No scrolling through nested subqueries trying to match parentheses.

Do's and Don'ts

Loading visualization...

Written with ❤️ by a human (still)