Database Soup: More on 9.4 percentiles

Thursday, August 14, 2014

More on 9.4 percentiles

Some of you may be following the discussion on JSONB compression on -hackers, which is the remaining major issue for 9.4. As part of this, I needed to run some statistics on average column sizes for JSON vs JSONB tables. This makes a good example of how incredibly useful WITHIN GROUP will be in 9.4.

First, I created two versions of the data, one called "jsonic" which has text-JSON data, and one called "jsonbish" which has the JSONB version. Then I ran some comparisons using pg_column_size on the JSON columns.

with colsizes_b as (
    select pg_column_size(jsonbish.jcol) as colsize
    from jsonbish
),
distrib_b as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_b
),
colsizes_j as (
    select pg_column_size(jsonic.jcol) as colsize
    from jsonic
),
distrib_j as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_j
)
select 'json' as thetype,
    colsize_distribution from distrib_j
union all
select 'jsonb',
    colsize_distribution
    from distrib_b;

So I'm taking the column size of each row, then sorting them by size, and then doing a percentile distribution using the new WITHIN GROUP. There's no group there because the group is actually the whole table. I've chosen the usual box plot percentages: minimum, 25%, median, 75%, and maximum.

And the results:

thetype |    colsize_distribution
---------+----------------------------
json    | {1741,1767,1854,1904,2292}
jsonb   | {3551,5866,5910,5958,6168}

... which demonstrates that we do indeed have a major issue with compressability.

Thursday, August 14, 2014

More on 9.4 percentiles

No comments:

Post a Comment