At stake is the question of whether the key locations should be saved as successive offsets, or as lengths. The advantage of the former is that it speeds up extraction by making it only two lookups to locate a top-level key regardless of the number of keys you have. The disadvantage is that a series of increasing offsets isn't very compressible.
For JSONB fields which consist of a few top-level keys and large values, this question makes no difference at all. However, for the worst case ... 150+ top-level keys with short (under 10 bytes) values, the difference is quite dramatic. For example, I constructed a test with 183 keys, of which 175 were NUMERIC. I checked both table size and time to extract key #160 from 100,000 rows:
Metric | Offsets | Lengths |
---|---|---|
Table Size | 1147 MB | 541 MB |
Extract 100K | 2.8s | 5.2s |
This is a "worst case" scenario for the difference between these two designs. Note that the extraction slowdown affects only retrieving the value to the client; it does not affect index lookups of JSONB rows, which are speedy no matter which patch is employed.
However, we're undecided on this "fix" because we don't know a couple things:
- How likely are users to have 150+ top-level keys in one field (or keys on any single level together) with short values?
- Is up to 60% space savings in return for up to 80% extraction slowdown a good tradeoff? Or a bad one?
Here's my test results with github archive data as posted in the mailing list.
ReplyDeleteThe main difference in this dataset is that the PushEvent objects I'm querying in this case have only 8 toplevel keys (thus the minimal overhead).
----------------
Test data: 610MB of Json -- 341969 items
Index size (jsonb_ops): 331MB
Test query 1: SELECT data->'url', data->'actor' FROM t_json WHERE data @> '{"type": "PushEvent"}'
Test query 1 items: 169732
Test query 2: SELECT data FROM t_json WHERE data @> '{"type": "PushEvent"}'
Test query 2 items:
----------------
HEAD (aka, all offsets) EXTENDED
Size: 374MB
Toast Size: 145MB
Test query 1 runtime: 680ms
Test query 2 runtime: 405ms
----------------
HEAD (aka, all offsets) EXTERNAL
Size: 366MB
Toast Size: 333MB
Test query 1 runtime: 505ms
Test query 2 runtime: 350ms
----------------
All Lengths (Tom Lane patch) EXTENDED
Size: 379MB
Toast Size: 108MB
Test query 1 runtime: 720ms
Test query 2 runtime: 420ms
----------------
All Lengths (Tom Lane patch) EXTERNAL
Size: 366MB
Toast Size: 333MB
Test query 1 runtime: 525ms
Test query 2 runtime: 355ms
Disk space is cheap. If you have that many keys, you want speed, not space. /twocents
ReplyDeleteThe cost of disk space is not so important here, it's the likelihood of whether the data needed will be in the filesystem disk cache or not as reading from disk is orders of magnitude slower than reading from memory. If the data takes up less space then more of it can fit in RAM and fewer reads will result in accessing the disk.
ReplyDeleteI would counter Laurence's rebuttal by saying bith disk and ram are cheap. Speed matters. Hardware changes frequently; architectural designs not so much.
ReplyDeleteFixing current pglz deficiencies by making jsonb's storage format less optimal does not sound like a good basis for a data layout design decision.
ReplyDeleteI would stick to absolute offsets. If a tuning of the space vs. time tradeoff is needed, this should be handled by pglz compression options (e.g. a delta compression pre stage).
any possibility of making the tradeoff a runtime configuration with the default probably being for speed because storage is generally cheap.
ReplyDeleteThere's some room to improve jsonb format, the question is if it's worth having another beta.
ReplyDeleteIt's possible for example, to store small integers (up to 28 bits) inline instead of an 8~12 bytes Decimal. This will easily shave a lot of bytes in real world data with negligible runtime cost for most operations.
I'd also be in favour of speed over space. A relational database as it is uses a lot more storage vs what would be absolutely necessary to represent the same information in a more minimal or compact format.
ReplyDeleteIt would be interesting to separate out the slowdown inherent in the new format from the slowdown resulting from decompression now being required, as Arthur's figures do with STORAGE EXTENDED vs EXTERNAL.
ReplyDeleteIf the majority of that slowdown is due to the decompression (which is certainly conceivable) then people can choose the appropriate column storage for their workload.
Greg,
ReplyDelete> Disk space is cheap
Not so cheap, in case of SSD (still). But I wonder would be this trade-off picture the same in case of SSD (I assume yes, if those timings were measured for "warm" database)
As a postings-list format, an 80% slowdown in decoding time for a 60% space savings is not a good trade-off.
ReplyDeleteAlso, I would expect that the format will end up changing in the medium/long-term to some kind of group var-byte encoding, just as practically all other postings-list formats from other systems have done - which will change the storage efficiency picture again (better).
Also, starting from the standpoint of doing "row-by-row" compression in postgresql is simply Doing Things Wrong, so Don't Force It (DFI). In other words: the problem here is with the pglz compression. Don't try to "force" the format of your data into something uglier or slower just to compensate for pglz.
Also, IIRC, isn't the current space usage stil- much better than what you'd typically see when comparing to MongoDB? In that case, why all hair-splitting and fuss?
Because it's considerably bigger than the space requirements for JSON-text, and users will be surprised by that.
DeleteIf the idea is that JSON (text) is very compressible but slow, then it seems almost redundant to make that same trade-off again with JSONB. It would make JSONB less compelling as an alternative. If people bother to choose JSONB instead of JSON, then it's because they're fully intending to optimize for speed rather than space (otherwise just use JSON). If JSONB allows better indexing and that's one reason why you'd use it, then that's already a speed/space trade-off anyway.
ReplyDelete