Format a SQL GROUP BY with Aggregates
GROUP BY queries with aggregate functions are the foundation of every data analytics and business reporting system. Revenue by region, orders per customer, average ticket size by product category, monthly active users — all of these are GROUP BY patterns. Understanding how to write, read, and debug these queries is an essential skill for any developer who works with databases. This example calculates e-commerce reporting metrics: for each country and month combination, it counts distinct completed orders, sums revenue, and calculates average order value. It filters to only completed orders in 2024, groups by country and computed month, filters groups with less than $1000 revenue using HAVING, and sorts by revenue descending to surface the highest-value segments first. The critical distinction between WHERE and HAVING: WHERE filters individual rows before they are grouped and aggregated. HAVING filters the resulting groups after aggregation. You cannot use SUM(o.total) in a WHERE clause because the sum doesn't exist until after rows are grouped. HAVING EXISTS precisely for this post-aggregation filtering. A common bug is writing WHERE revenue > 1000 (where revenue is an alias) when the correct expression is HAVING SUM(o.total) > 1000. DATE_FORMAT(o.created_at, '%Y-%m') deserves explanation: this MySQL function formats a timestamp as "2024-01", extracting just the year and month. PostgreSQL uses DATE_TRUNC('month', created_at) for the same purpose. Using the alias month in the GROUP BY clause rather than repeating the full expression is a MySQL extension; standard SQL requires you to repeat the full expression. COUNT(DISTINCT o.id) counts unique order IDs within each group, which is important when a JOIN multiplies rows. Without DISTINCT, if an order can appear multiple times due to a join, your order count will be inflated. Always ask whether your JOIN can produce row duplication and whether DISTINCT is needed in your aggregates. Real-world scenarios: a business intelligence dashboard showing monthly revenue by geography; a product analytics report comparing feature adoption by user segment; a financial report calculating ARPU (average revenue per user) by subscription tier. Tips: add indexes on the columns used in WHERE, JOIN, and GROUP BY clauses. For frequently run aggregate queries on large tables, consider materialized views or pre-aggregated summary tables that update periodically.
SELECT c.country, DATE_FORMAT(o.created_at, '%Y-%m') AS month, COUNT(DISTINCT o.id) AS order_count, SUM(o.total) AS revenue, AVG(o.total) AS avg_order_value FROM orders o INNER JOIN customers c ON o.customer_id = c.id WHERE o.status = 'completed' AND o.created_at >= '2024-01-01' GROUP BY c.country, month HAVING revenue > 1000 ORDER BY revenue DESC;
FAQ
- What is the difference between WHERE and HAVING?
- WHERE filters rows before aggregation. HAVING filters groups after aggregation. You cannot use aggregate functions like SUM in a WHERE clause — use HAVING instead.
- Why must SELECT columns appear in GROUP BY?
- Aggregate functions collapse multiple rows into one. Any non-aggregated column must be in the GROUP BY so the database knows which value to use when multiple source rows map to one output row.
- What does COUNT(DISTINCT) do?
- COUNT(DISTINCT column) counts unique values in a column within each group, ignoring duplicates. COUNT(*) counts all rows including duplicates.
Related Examples
Multi-table JOIN queries are the workhorse of relational database development, b...
Format a SQL UPDATE with WHEREUPDATE statements are among the most dangerous SQL commands to write incorrectly...
Format a SQL CREATE TABLE StatementCREATE TABLE statements are the foundation of relational database design, and th...