🎓 # PostgreSQL Aggregate Functions
PostgreSQL offers a rich set of aggregate functions that go beyond the standard COUNT
, SUM
, AVG
, MIN
, and MAX
. These specialized functions can help you write more expressive and efficient queries. Let’s explore some of the most useful ones with practical examples.
Setting Up Our Example Data
For our examples, we’ll use a simple comments
table:
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
post_id INTEGER,
status VARCHAR(20),
rating INTEGER,
content TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO comments (post_id, status, rating, content) VALUES
(1, 'published', 5, 'Great post!'),
(1, 'draft', 3, 'Needs work'),
(2, 'published', 4, 'Very helpful'),
(3, 'published', 5, 'Love it'),
(3, 'published', 4, 'Good content'),
(4, 'draft', 2, 'Not ready yet');
BOOL_OR
- Check if Any Condition is True
The BOOL_OR
function returns true
if at least one row in the group matches your condition. It’s like a logical OR operation across multiple rows.
Basic usage:
SELECT BOOL_OR(status = 'draft')
FROM comments;
Output:
bool_or |
---|
true |
This tells us that at least one comment has draft status.
Grouped usage:
SELECT
post_id,
BOOL_OR(status = 'draft') as has_drafts
FROM comments
GROUP BY post_id;
Output:
post_id | has_drafts |
---|---|
1 | true |
2 | false |
3 | false |
4 | true |
Practical use case: Check which posts have pending moderation items or incomplete content.
BOOL_AND
- Check if All Conditions are True
The counterpart to BOOL_OR
, BOOL_AND
returns true
only if ALL rows in the group match your condition.
SELECT
post_id,
BOOL_AND(status = 'published') as all_published
FROM comments
GROUP BY post_id;
Output:
post_id | all_published |
---|---|
1 | false |
2 | true |
3 | true |
4 | false |
Practical use case: Verify that all components of a workflow are complete before marking something as ready.
ARRAY_AGG
- Collect Values into Arrays
ARRAY_AGG
collects all values from a column into a PostgreSQL array. This is incredibly useful for denormalizing data or creating structured outputs.
Basic usage:
SELECT ARRAY_AGG(status)
FROM comments;
Output:
array_agg |
---|
{published,draft,published,published,published,draft} |
Grouped usage:
SELECT
post_id,
ARRAY_AGG(status) as all_statuses
FROM comments
GROUP BY post_id;
Output:
post_id | all_statuses |
---|---|
1 | {published,draft} |
2 | {published} |
3 | {published,published} |
4 | {draft} |
Remove duplicates with DISTINCT:
SELECT
post_id,
ARRAY_AGG(DISTINCT status) as unique_statuses
FROM comments
GROUP BY post_id;
Output:
post_id | unique_statuses |
---|---|
1 | {published,draft} |
2 | {published} |
3 | {published} |
4 | {draft} |
Ordered aggregation:
SELECT
post_id,
ARRAY_AGG(rating ORDER BY rating DESC) as ratings_desc
FROM comments
GROUP BY post_id;
Output:
post_id | ratings_desc |
---|---|
1 | {5,3} |
2 | {4} |
3 | {5,4} |
4 | {2} |
STRING_AGG
- Concatenate Strings
Similar to ARRAY_AGG
but creates a delimited string instead of an array.
SELECT
post_id,
STRING_AGG(content, ' | ' ORDER BY rating DESC) as combined_content
FROM comments
GROUP BY post_id;
Output:
post_id | combined_content |
---|---|
1 | Great post! | Needs work |
2 | Very helpful |
3 | Love it | Good content |
4 | Not ready yet |
Practical use case: Create comma-separated lists of tags, categories, or user names.
JSON_AGG
and JSONB_AGG
- Create JSON Arrays
These functions collect rows into JSON arrays, perfect for creating nested JSON responses.
SELECT
post_id,
JSON_AGG(
JSON_BUILD_OBJECT(
'status', status,
'rating', rating,
'content', content
)
) as comment_details
FROM comments
GROUP BY post_id;
Output:
post_id | comment_details |
---|---|
1 | [{“status”:”published”,”rating”:5,”content”:”Great post!”},{“status”:”draft”,”rating”:3,”content”:”Needs work”}] |
2 | [{“status”:”published”,”rating”:4,”content”:”Very helpful”}] |
3 | [{“status”:”published”,”rating”:5,”content”:”Love it”},{“status”:”published”,”rating”:4,”content”:”Good content”}] |
4 | [{“status”:”draft”,”rating”:2,”content”:”Not ready yet”}] |
Statistical Aggregate Functions
PostgreSQL includes several statistical functions:
Standard deviation:
SELECT
post_id,
AVG(rating) as avg_rating,
STDDEV(rating) as rating_stddev
FROM comments
GROUP BY post_id;
Output:
post_id | avg_rating | rating_stddev |
---|---|---|
1 | 4.0 | 1.4142135623730951 |
2 | 4.0 | null |
3 | 4.5 | 0.7071067811865476 |
4 | 2.0 | null |
Percentiles:
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rating) as median_rating,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY rating) as p95_rating
FROM comments;
Output:
median_rating | p95_rating |
---|---|
4.0 | 5.0 |
Window Functions vs Aggregate Functions
Remember that many aggregate functions can also be used as window functions:
SELECT
post_id,
status,
rating,
AVG(rating) OVER (PARTITION BY post_id) as post_avg_rating,
ARRAY_AGG(status) OVER (PARTITION BY post_id) as post_statuses
FROM comments;
Output:
post_id | status | rating | post_avg_rating | post_statuses |
---|---|---|---|---|
1 | published | 5 | 4.0 | {published,draft} |
1 | draft | 3 | 4.0 | {published,draft} |
2 | published | 4 | 4.0 | {published} |
3 | published | 5 | 4.5 | {published,published} |
3 | published | 4 | 4.5 | {published,published} |
4 | draft | 2 | 2.0 | {draft} |
Performance Tips
- Use indexes: Ensure columns used in
GROUP BY
and aggregate conditions are indexed - Filter early: Use
WHERE
clauses before aggregation when possible - Consider partial indexes: For frequent conditional aggregations like
BOOL_OR(status = 'draft')
- DISTINCT can be expensive: Use judiciously with
ARRAY_AGG(DISTINCT ...)
Real-World Examples
E-commerce order analysis:
SELECT
customer_id,
COUNT(*) as total_orders,
BOOL_OR(status = 'cancelled') as has_cancellations,
ARRAY_AGG(DISTINCT product_category) as purchased_categories,
STRING_AGG(order_id::text, ', ') as order_list
FROM orders
GROUP BY customer_id;
Example Output:
customer_id | total_orders | has_cancellations | purchased_categories | order_list |
---|---|---|---|---|
123 | 3 | true | {electronics,books} | 1001, 1002, 1003 |
456 | 2 | false | {clothing} | 1004, 1005 |
Content moderation dashboard:
SELECT
DATE(created_at) as date,
COUNT(*) as total_comments,
BOOL_OR(status = 'flagged') as needs_attention,
AVG(CASE WHEN status = 'published' THEN rating END) as avg_published_rating
FROM comments
GROUP BY DATE(created_at)
ORDER BY date DESC;
Example Output:
date | total_comments | needs_attention | avg_published_rating |
---|---|---|---|
2025-08-07 | 4 | false | 4.5 |
2025-08-06 | 2 | true | 4.0 |
Conclusion
PostgreSQL’s aggregate functions provide powerful tools for data analysis and reporting. Functions like BOOL_OR
, ARRAY_AGG
, and STRING_AGG
can significantly simplify complex queries and reduce the need for application-level data processing. The key is understanding when and how to use each function effectively in your specific use cases.
These functions shine particularly well when combined with GROUP BY
clauses and can often replace multiple queries or complex application logic with a single, efficient SQL statement.