One way to get this kind of mode is to do a windowing function, but as mentioned this works poorly with other aggregates in the same result set. So let's take the same custom aggregate approach and see if we can do better.
Now, for types other than boolean, mode() is a "two-pass aggregate". That means it's impossible to calculate in one pass; you need two passes, one to sort the set, and one to count the items and pick the MCV. Since we know we'll need two passes, we'll construct our aggregate to assume that it's receiving sorted data going in, and make sure it gets sorted data when we use it. Given 9.2's new ORDER BY clause for aggregates, that's easy to ensure.
For a state type, we'll need a 4-part register. This register will include:
- the last seen value
- the count of the last seen value
- the most common value so far
- the count of the most common value so far
create or replace function sorted_mode_state(
tally bigint[], nextval bigint
)
returns bigint[]
language sql
immutable
as $f$
-- have we seen this value before?
SELECT CASE WHEN nextval = tally[1] THEN
-- if weve seen it before, does it out-count the prior MCV?
CASE WHEN tally[2] = tally[4] THEN
-- if so, swap the MCV
ARRAY[ nextval, tally[2] + 1, nextval, tally[2] + 1
ELSE
-- if not, keep counting
ARRAY[ nextval, tally[2] + 1, tally[3], tally[4] ]
END
-- is the register uninitialized? then take the first value
WHEN tally[1] IS NULL THEN
ARRAY [ nextval, 1, nextval, 1 ]
-- skip nulls
WHEN nextval IS NULL THEN
tally
ELSE
-- if it's a new value, count 1
ARRAY [ nextval, 1, tally[3], tally[4] ]
END;
$f$;
The final function just picks out tally[3], returning NULL if no MCV has been selected:
create function sorted_mode_final(
tally bigint[]
)
returns bigint
language sql
immutable
as $f$
SELECT CASE
-- do we have an MCV?
WHEN tally[4] > 1 THEN tally[3]
-- otherwise, do we have a single-value set?
WHEN tally[1] = tally[3] THEN tally[3]
-- is there no clear mcv? return null
ELSE NULL END;
$f$;
Then we can build our aggregate:
CREATE AGGREGATE sorted_mode ( bigint ) (
SFUNC = sorted_mode_state,
STYPE = BIGINT[],
FINALFUNC = sorted_mode_final,
INITCOND = '{}'
);
Usage:
SELECT server, date_trunc('hour', poll_time) as thehour,
sorted_mode(status ORDER BY status) as status_mode
FROM server_status
GROUP BY server, date_trunc('hour', poll_time)
ORDER BY date_trunc('hour', poll_time), server;
This approach has some inherent limitations, of course:
- if no value appears more than once, it returns NULL, unless there's only one value in the set;
- if several values appear the same number of times, it returns the first one in the sort order;
- it can't do estimated mode or other mode constructions.
No comments:
Post a Comment