Some of you may be following the discussion on JSONB compression on -hackers, which is the remaining major issue for 9.4. As part of this, I needed to run some statistics on average column sizes for JSON vs JSONB tables. This makes a good example of how incredibly useful WITHIN GROUP will be in 9.4.
First, I created two versions of the data, one called "jsonic" which has text-JSON data, and one called "jsonbish" which has the JSONB version. Then I ran some comparisons using pg_column_size on the JSON columns.
with colsizes_b as (
select pg_column_size(jsonbish.jcol) as colsize
from jsonbish
),
distrib_b as (
select percentile_cont(array [0,0.25,0.5,0.75,1])
within group (order by colsize) as colsize_distribution
from colsizes_b
),
colsizes_j as (
select pg_column_size(jsonic.jcol) as colsize
from jsonic
),
distrib_j as (
select percentile_cont(array [0,0.25,0.5,0.75,1])
within group (order by colsize) as colsize_distribution
from colsizes_j
)
select 'json' as thetype,
colsize_distribution from distrib_j
union all
select 'jsonb',
colsize_distribution
from distrib_b;
So I'm taking the column size of each row, then sorting them by size, and then doing a percentile distribution using the new WITHIN GROUP. There's no group there because the group is actually the whole table. I've chosen the usual box plot percentages: minimum, 25%, median, 75%, and maximum.
And the results:
thetype | colsize_distribution
---------+----------------------------
json | {1741,1767,1854,1904,2292}
jsonb | {3551,5866,5910,5958,6168}
... which demonstrates that we do indeed have a major issue with compressability.
No comments:
Post a Comment