You already know that I'm a big fan of custom aggregates. What's even more fun is when you combine them with the ordered aggregate feature from PostgreSQL 9.0. For example, let's take a fairly common task: selecting the item with the maximum importance.
This is a challenge for standard aggregation because you want to display a value which relates to an importance score contained in another table. For example, the table could be a lookup list of error types, and you want to display the most severe error. This is simple to do if you only want one row, but what if you want to group by other columns? Well, here's a relatively neat way using ordered aggregates and custom aggregates.
First, we want to create a generic aggregate called "first", which simply returns the first non-null value in the column:
CREATE FUNCTION first_state (
anyelement, anyelement )
RETURNS anyelement
LANGUAGE SQL
IMMUTABLE STRICT
AS $f$
SELECT $1;
$f$;
CREATE AGGREGATE first ( anyelement ) (
SFUNC = first_state,
STYPE = anyelement
);
This is a generally useful aggregate to have around, btw, so I suggest adding it to your general library.
Now, let's look at our tables:
table errors (
error_code text primary key,
severity int not null default 0,
description text
)
table error_log (
lo_time timestamptz,
error_code text
)
So, say I wanted to see the count of errors for each hour in the log and the most severe error during that hour? Well, what I need to do is to select the first error code which shows up when I order by severity. For good measure, I order by error_code as well, in case there's two codes with the same severity:
SELECT date_trunc('hour',log_time) as log_hour,
count(*) as error_count,
first(error_code order by severity desc, error_code)
as most_severe
FROM error_log
JOIN errors USING (error_code)
GROUP BY date_trunc('hour',log_time)
ORDER BY date_trunc('hour',log_time);
And that works a treat:
log_hour | error_count | most_severe
-----------------------+-------------+-------------
2013-06-17 00:00:00+00 | 6 | CX56B
2013-06-17 01:00:00+00 | 16 | CX56B
2013-06-17 02:00:00+00 | 4 | PPPMQ5
2013-06-17 03:00:00+00 | 2 | PPPMQ5
2013-06-17 04:00:00+00 | 24 | CX56B
2013-06-17 05:00:00+00 | 15 | CX56B
2013-06-17 06:00:00+00 | 5 | FM229A
I'm sure you can figure out ways to adapt this to your needs. Enjoy!
We really should have first() (and possibly last()) bundled in core PostgreSQL.
ReplyDeletecount_true() is also a surprisingly handy general custom aggregate.
DeleteWhat's the difference between first() above and first_value() already in PG core?
ReplyDeletehttp://www.postgresql.org/docs/current/static/functions-window.html
first_value() """returns value evaluated at the row that is the first row of the window frame"""
(sorry if this is posted multiple times - no feedback at all on whether it worked or not)
first_value() requires a window declaration; it doesn't work as an aggregate.
DeleteTechnically, we should support first_value(x order by y), which is effectively the same thing. But we don't.