Database Soup: October 2014

Thursday, October 30, 2014

Upcoming Seattle Visit

I will be in Seattle soon on business. This will include two opportunties to do PostgreSQL community stuff:

A meetup with SEAPUG on the 12th, where I will talk about the upcoming 9.4 features.
A BOF at Usenix LISA where I talk about the same, only to a different crowd.

If you're in the Seattle area, come chat! We'll go out to Dilletante and have chocolate!

Tuesday, October 21, 2014

One of the things I mentioned in my series on VACUUM FREEZE was that we really needed a Postgres utility which would opportunistically freeze tables during low traffic periods. Today I'm announcing the Flexible Freeze project, our first attempt at designing such a utility.

All that's there right now is a simple Python script. However, that script is already a useful tool, installed at multiple production sites. Here's how the script works:

identify your active databases and daily/weekly low traffic periods.
create a cron job which calls flexible_freeze.py with a time limit to keep it inside your low traffic window.
flexible_freeze.py will loop through your tables with the oldest XIDs, freezing them until it runs out of time or out of tables

There is also a second mode, using the --vacuum switch, which does VACUUM ANALYZE on the tables with the most dead rows (according to pg_stat_user_tables). This is to help users who have a strong high/low traffic cycle and want to make sure that regular vacuuming takes place during low traffic. If you're running both modes, we advise doing the freeze first.

Of course, I have a tanker-truck full of desired improvements/expansions to this. So, pull requests welcome.

If you're more into Ruby, Wanelo has rewritten flexible freeze for Rails and incorporated it into their Postmodern tool.

Friday, October 10, 2014

New Table Bloat Query

To accompany the New Index Bloat Query, I've written a New Table Bloat Query. This also involves the launch of the pgx_scripts project on GitHub, which will include most of the "useful scripts" I talk about here, as well as some scripts from my co-workers.

The new table bloat query is different from the check_postgres.pl version in several ways:

Rewritten to use WITH statements for better maintainability and clarity
Conditional logic for old Postgres versions and 32-bit platforms taken out
Index bloat removed, since we have a separate query for that
Columns formatted to be more immediately comprehensible

In the course of building this, I found two fundamentally hard issues:

Some attributes (such as JSON and polygon fields) have no stats, so those tables can't be estimated.
There's no good way to estimate bloat for compressed (TOAST) attributes and rows.

Also, while I rewrote the query almost entirely, I am still relying on Greg's core math for estimating table size. Comparing this with the results of pgstattuple, I'm seeing an error of +/- 20%, which is pretty substantial. I'm not clear on where that error is coming from, so help improving the math is very welcome!

Results look like this:

  databasename | schemaname |   tablename   | pct_bloat | mb_bloat | table_mb   
 --------------+------------+-------------------+-----------+----------+----------  
  members_2014 | public   | current_member  |    92 |  16.98 |  18.547   
  members_2014 | public   | member_response  |    87 |  17.46 |  20.000   
  members_2014 | public   | archive_member  |    84 |  35.16 |  41.734   
  members_2014 | public   | survey      |    57 |  28.59 |  50.188

pct_bloat is how much of the table (0 to 100) is estimated to be dead space. MB_bloat is how many megabytes of bloat are estimated to exist. Table_mb is the actual size of the table in megabytes.

The suggested criteria is to list tables which are either more than 50% bloat and bigger than 10MB, or more than 25% bloat and bigger than 1GB. However, you should calibrate this according to your own database.

Thursday, October 2, 2014

JSONB and 9.4: Move Slow and Break Things

If you've been paying any attention at all, you're probably wondering why 9.4 isn't out yet. The answer is that we had to change JSONB at the last minute, in a way that breaks compatibility with earlier betas.

In August, a beta-testing user reported that we had an issue with JSONB not compressing well. This was because of the binary structure of key offsets at the beginning of the JSONB value, and the affects were dramatic; in worst cases, JSONB values were 150% larger than comparable JSON values. We spent August through September revising the data structure and Heikki and Tom eventually developed one which gives better compressibility without sacrificing extraction speed.

I did a few benchmarks on the various JSONB types. We're getting a JSONB which is both faster and smaller than competing databases, so it'll be worth the wait.

However, this means that we'll be releasing an 9.4beta3 next week, whose JSONB type will be incompatible with prior betas; you'll have to dump and reload if you were using Beta 1 or Beta 2 and have JSONB data. It also means a delay in final release of 9.4.