🎓 # PostgreSQL Aggregate Functions

PostgreSQL offers a rich set of aggregate functions that go beyond the standard COUNT, SUM, AVG, MIN, and MAX. These specialized functions can help you write more expressive and efficient queries. Let’s explore some of the most useful ones with practical examples.

Setting Up Our Example Data

For our examples, we’ll use a simple comments table:

CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    status VARCHAR(20),
    rating INTEGER,
    content TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO comments (post_id, status, rating, content) VALUES
(1, 'published', 5, 'Great post!'),
(1, 'draft', 3, 'Needs work'),
(2, 'published', 4, 'Very helpful'),
(3, 'published', 5, 'Love it'),
(3, 'published', 4, 'Good content'),
(4, 'draft', 2, 'Not ready yet');

BOOL_OR - Check if Any Condition is True

The BOOL_OR function returns true if at least one row in the group matches your condition. It’s like a logical OR operation across multiple rows.

Basic usage:

SELECT BOOL_OR(status = 'draft')
FROM comments;

Output:

bool_or
true

This tells us that at least one comment has draft status.

Grouped usage:

SELECT 
    post_id,
    BOOL_OR(status = 'draft') as has_drafts
FROM comments
GROUP BY post_id;

Output:

post_id has_drafts
1 true
2 false
3 false
4 true

Practical use case: Check which posts have pending moderation items or incomplete content.

BOOL_AND - Check if All Conditions are True

The counterpart to BOOL_OR, BOOL_AND returns true only if ALL rows in the group match your condition.

SELECT 
    post_id,
    BOOL_AND(status = 'published') as all_published
FROM comments
GROUP BY post_id;

Output:

post_id all_published
1 false
2 true
3 true
4 false

Practical use case: Verify that all components of a workflow are complete before marking something as ready.

ARRAY_AGG - Collect Values into Arrays

ARRAY_AGG collects all values from a column into a PostgreSQL array. This is incredibly useful for denormalizing data or creating structured outputs.

Basic usage:

SELECT ARRAY_AGG(status)
FROM comments;

Output:

array_agg
{published,draft,published,published,published,draft}

Grouped usage:

SELECT 
    post_id,
    ARRAY_AGG(status) as all_statuses
FROM comments
GROUP BY post_id;

Output:

post_id all_statuses
1 {published,draft}
2 {published}
3 {published,published}
4 {draft}

Remove duplicates with DISTINCT:

SELECT 
    post_id,
    ARRAY_AGG(DISTINCT status) as unique_statuses
FROM comments
GROUP BY post_id;

Output:

post_id unique_statuses
1 {published,draft}
2 {published}
3 {published}
4 {draft}

Ordered aggregation:

SELECT 
    post_id,
    ARRAY_AGG(rating ORDER BY rating DESC) as ratings_desc
FROM comments
GROUP BY post_id;

Output:

post_id ratings_desc
1 {5,3}
2 {4}
3 {5,4}
4 {2}

STRING_AGG - Concatenate Strings

Similar to ARRAY_AGG but creates a delimited string instead of an array.

SELECT 
    post_id,
    STRING_AGG(content, ' | ' ORDER BY rating DESC) as combined_content
FROM comments
GROUP BY post_id;

Output:

post_id combined_content
1 Great post! | Needs work
2 Very helpful
3 Love it | Good content
4 Not ready yet

Practical use case: Create comma-separated lists of tags, categories, or user names.

JSON_AGG and JSONB_AGG - Create JSON Arrays

These functions collect rows into JSON arrays, perfect for creating nested JSON responses.

SELECT 
    post_id,
    JSON_AGG(
        JSON_BUILD_OBJECT(
            'status', status,
            'rating', rating,
            'content', content
        )
    ) as comment_details
FROM comments
GROUP BY post_id;

Output:

post_id comment_details
1 [{“status”:”published”,”rating”:5,”content”:”Great post!”},{“status”:”draft”,”rating”:3,”content”:”Needs work”}]
2 [{“status”:”published”,”rating”:4,”content”:”Very helpful”}]
3 [{“status”:”published”,”rating”:5,”content”:”Love it”},{“status”:”published”,”rating”:4,”content”:”Good content”}]
4 [{“status”:”draft”,”rating”:2,”content”:”Not ready yet”}]

Statistical Aggregate Functions

PostgreSQL includes several statistical functions:

Standard deviation:

SELECT 
    post_id,
    AVG(rating) as avg_rating,
    STDDEV(rating) as rating_stddev
FROM comments
GROUP BY post_id;

Output:

post_id avg_rating rating_stddev
1 4.0 1.4142135623730951
2 4.0 null
3 4.5 0.7071067811865476
4 2.0 null

Percentiles:

SELECT 
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rating) as median_rating,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY rating) as p95_rating
FROM comments;

Output:

median_rating p95_rating
4.0 5.0

Window Functions vs Aggregate Functions

Remember that many aggregate functions can also be used as window functions:

SELECT 
    post_id,
    status,
    rating,
    AVG(rating) OVER (PARTITION BY post_id) as post_avg_rating,
    ARRAY_AGG(status) OVER (PARTITION BY post_id) as post_statuses
FROM comments;

Output:

post_id status rating post_avg_rating post_statuses
1 published 5 4.0 {published,draft}
1 draft 3 4.0 {published,draft}
2 published 4 4.0 {published}
3 published 5 4.5 {published,published}
3 published 4 4.5 {published,published}
4 draft 2 2.0 {draft}

Performance Tips

  1. Use indexes: Ensure columns used in GROUP BY and aggregate conditions are indexed
  2. Filter early: Use WHERE clauses before aggregation when possible
  3. Consider partial indexes: For frequent conditional aggregations like BOOL_OR(status = 'draft')
  4. DISTINCT can be expensive: Use judiciously with ARRAY_AGG(DISTINCT ...)

Real-World Examples

E-commerce order analysis:

SELECT 
    customer_id,
    COUNT(*) as total_orders,
    BOOL_OR(status = 'cancelled') as has_cancellations,
    ARRAY_AGG(DISTINCT product_category) as purchased_categories,
    STRING_AGG(order_id::text, ', ') as order_list
FROM orders
GROUP BY customer_id;

Example Output:

customer_id total_orders has_cancellations purchased_categories order_list
123 3 true {electronics,books} 1001, 1002, 1003
456 2 false {clothing} 1004, 1005

Content moderation dashboard:

SELECT 
    DATE(created_at) as date,
    COUNT(*) as total_comments,
    BOOL_OR(status = 'flagged') as needs_attention,
    AVG(CASE WHEN status = 'published' THEN rating END) as avg_published_rating
FROM comments
GROUP BY DATE(created_at)
ORDER BY date DESC;

Example Output:

date total_comments needs_attention avg_published_rating
2025-08-07 4 false 4.5
2025-08-06 2 true 4.0

Conclusion

PostgreSQL’s aggregate functions provide powerful tools for data analysis and reporting. Functions like BOOL_OR, ARRAY_AGG, and STRING_AGG can significantly simplify complex queries and reduce the need for application-level data processing. The key is understanding when and how to use each function effectively in your specific use cases.

These functions shine particularly well when combined with GROUP BY clauses and can often replace multiple queries or complex application logic with a single, efficient SQL statement.