Database Soup: October 2013

Thursday, October 24, 2013

SFPUG Wants Your Feedback

The San Francisco PostgreSQL User Group is looking for feedback and recommendations for what we should do in 2014. As such, I've set up a survey here. We are also looking for speakers, sponsors and venue hosts; contact me or fill out the survey. Thanks!

Friday, October 18, 2013

Adjusting defaults for PostgreSQL.conf

For PostgreSQL 9.4, we've been discussing adjusting defaults for postgresql.conf to give users a better out-of-box experience. Our existing defaults were set at a time when we used SysV memory, and when the average developer laptop or low-end server had 512MB of RAM. So, a survey.

Here's some things contributors have already adjusted:

shared_buffers now goes up to 128MB automatically depending on available RAM reservation (we did this for 9.3)
effective_cache_size is automatically set to shared_buffers X 3 (might be X 4 by the time we release)

Now we are discussing increasing the defaults and/or autotuning for work_mem and maintenance_work_mem. Suggested new defaults are 4X the current: 4MB and 64MB. These should allow PostgreSQL to run on any machine of 1GB or greater memory a bit better.

However, one thing which came up during this discussion is that many people may be running PostgreSQL on virtual machines which have less than 1GB of RAM available to PostgreSQL. I don't think they are, but we lack data. So I put up a survey to see what amounts of RAM our users really have on their smallest machines/VMs. Please fill out the survey so that we can have reasonable input for development, and if you have comments, please put them below.

Wednesday, October 16, 2013

SFPUG: Stephen Frost Hacking Live

Tommorrow night (October 17th) we will have live streaming video of Stephen Frost presenting on how to hack PostgreSQL. Video will start around 7:40PM PDT, hopefully. This is for the October Surprise meetup.

Thursday, October 10, 2013

How urgent is that Update?

We've released a cumulative bugfix release today for all supported versions of PostgreSQL, in case you somehow missed it. This update doesn't contain anything time-critical for most users. You'll want to add it to your upgrade queue for your next scheduled downtime, but you don't need to schedule an update for this weekend. Unless, of course, you are one of the two dozen or so users who reported the bugs fixed.

However, you don't really need me to tell you that, because it's in the language of the release. More on this below.

A couple of notable fixes in this release:

Hstore-JSON extension fix: if you are using both Hstore and JSON in Postgres 9.3, you should run ALTER EXTENSION hstore UPDATE after updating your server. Due to a mistake in extension versioning, users missed out on some JSON-hstore conversion functions in 9.3.

Guarantee transmission of all WAL files: users with replication setups and stringent data retention requirements will want to apply the update sooner rather than later because of this fix. It fixes an issue where under some circumstances a promoted replica might ignore a few transactions it actually received from the master before failover, considering them lost.

Over the last dozen or so update releases, we've standardized the language we use to explain how critical the fixes in the release are for most users.

First, there's the question of what kind of update it is:

Bugfix Release or Update Release: this usually contains two to four months of accumulated bug fixes for various reported bugs. Some of these bugs may be critical data loss bugs; see the release contents to check this.

Security Release: the update contains a patch for one or more security holes. This makes the update at least somewhat urgent, because as soon as the security patch is posted online, the black hats start working on exploits (assuming they didn't before then).

Then we use some standard language about the general criticality of bugs fixed in the release:

All users of the affected versions are strongly urged to apply the update immediately: release contains a high-exposure security hole or critical data loss bug which affects all users. Stop reading, declare a downtime, and apply the update now. Fortunately, we've only had one of these in the last 6 years.

All users should apply this update as soon as possible: the release contains one or more security holes and/or major data loss bugs expected to affect most users. Schedule a downtime tonight or this weekend and apply the update.

All users of _________ should apply this update immediately/as soon as possible: the same as above, but only for users of a particular feature or extension.

Users should apply this update at the next available opportunity: the release contains some low-risk security holes and/or serious data loss bugs which only occur under certain circumstances. Updating to this release shouldn't be put off for more than a couple weeks, but you can wait for a good low-traffic window to take the downtime.

Users should update at the next regular maintenance window/scheduled downtime: this update contains nothing critical or even serious for most users; it's largely a cumulative collection of bugfixes. The update should be added to the general queue of OS, library and application updates for the production servers.

Note, however, that I said most users. Every update we push out is urgent for at least a few users who are hit by the specific bugs involved. For example, for the current release, users who depend heavily on the following features should consider updating sooner rather than later:

Range indexes
SP-GiST
JSON
regular expressions
prepared query plans
SSL connections
more than 24GB (!) of work memory

That's why we're reluctant to adopt any system which rates update releases on some absolute scale. It doesn't matter how critical the update is in general; it matters how critical it is to you.

Monday, October 7, 2013

Fancy SQL Monday: Finding broken keys

One of the things which usually happens on the rare occasions when you have PostgreSQL database corruption is that indexes become corrupted so that index entries no longer match the underlying table. For regular indexes, this can be fixed with a REINDEX or pg_repack. But it becomes a real problem when the index was used for a primary key, unique index or foreign key, because then duplicate or broken data gets inserted into the tables, and attempts to rebuild the index will fail.

Database-wide, the easiest way to find these broken indexes is to do a pg_dump and pg_restore of your whole database; data which violates keys will show up as errors on the pg_restore.

The second thing we need to do is find the duplicate or missing data so that we can "clean" it and then rebuild the index. First, we'll need to disable index access to make sure that we don't hit the index and get erroneous information back:

SET enable_indexscan = off;
SET enable_bitmapscan = off;
SET enable_indexonlyscan = off; -- 9.2+ only!

So for our first SQL trick, we want to query a table with a simple SERIAL surrogate key and find duplicate IDs. Then, for each of these IDs, we want to get the entire referenced row so we can look at it and decide how to fix the data. One way to do that is using a WITH clause and an aggregate:

WITH dups AS (
    SELECT ticketid, count(*)
    FROM tickets
    GROUP BY ticketid
    HAVING count(*) > 1 )
SELECT *
FROM tickets
   JOIN dups USING ( ticketid )
ORDER BY ticketid;

And that gives us a list of duplicate ticketids. From there, we can choose to delete rows, assign new IDs, or do something else, depending on the nature of the duplication.   Of course, if you never created any real keys (in addition to the surrogate key), you're going to have a devil of a time figuring things out.

However, the above query format isn't the only way we can search for duplicate IDs. We can also use a windowing query, which is often easier to type if we have multicolumn unique constraints:

WITH dupcount AS (
   SELECT comments.*,
       count(*) OVER (
            PARTITION BY post, commentor, commented_at)
AS entrycount
   FROM comments )
SELECT * FROM dupcount
WHERE entrycount > 1
ORDER BY post, commentor, commented_at;

Finding broken foreign keys is a bit more straightforwards. In this case, you just want to use an outer join to find the missing parent records:

SELECT shoppingcart_item.*
FROM shoppingcart_item
   LEFT OUTER JOIN shoppingcart
   ON shoppingcart_item.shoppingcart_id
          = shoppingcart.shoppingcart_id
WHERE shoppingcart.shoppingcart_id IS NULL
ORDER BY shoppingcart_item.shoppingcart_id;

Once you've found those, though, you can't do much more than dump them to a file and delete the records, since there's no way to find out what the missing parent record data was.

Happy de-corrupting!

Friday, October 4, 2013

De-corrupting TOAST Tables

In a database which has been mildly corrupted, the second issue you're liable to run into is corrupt TOAST tables. You'll know you have one because you'll be running a SELECT against the table or a pg_dump, and you'll get an error message like this one:

ERROR: unexpected chunk size 226 (expected 175) in final chunk 1 for toast value 8846932 in pg_toast_2877446

TOAST tables are where PostgreSQL stores offline compressed data for large values, like big text fields, geometries, large arrays, and BLOBs. What the server is telling you in that error message is that the size of the compressed block in the TOAST table doesn't match its metadata, and as a result Postgres can't decompress it.

Among other things, this is generally an indication that either you've found a PostgreSQL bug (which you should report, if you're running on the latest update version), or you've had a serious corruption event, if you didn't know already. So you should definitely do a full corruption check on the database.

But, what to do about the bad TOAST record?

Obviously, if you have a backup from just before the corruption occurred the answer is to restore that backup. But if you don't: this is fortunately hand-correctable.

The first thing to do is to find the actually row or rows where the corruption is. I do that with a simple DO script like this:

DO $f$
declare
    curid INT := 0;
    vcontent TEXT;
    badid INT;
begin
FOR badid IN SELECT message_id FROM messages LOOP
    curid = curid + 1;
    if curid % 100000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT contents
        INTO vcontent
        FROM messages where message_id = badid;
        vcontent := substr(vcontent,1000,2000);
    exception
        when others then
            raise notice 'data for message % is corrupt', badid;
            continue;
    end;
end loop;
end;
$f$;

The goal here is to reach into the TOASTed value for each row (the large-text contents field) and de-compress it, which should raise any errors you're going to get. Of course, you need to tailor the above for your own tables.

Once you've identified one or more rows, there are 3 steps you need to take to clean up the corrupted TOAST record. Note that this does not involve recovering the corrupted value: that's already lost, and you'll do yourself a favor by not trying to hang onto it. Get that from your backups.

Once you've located the row or rows which have corrupt TOAST data, you need to do three things to make sure that the DB is clean:

Re-index the TOAST table (in case it's the index which is corrupt)
Break the link between the live row and the corrupt TOAST record.
Rebuild the regular table in order to expunge the bad TOAST pointer.

All TOAST tables are located in the pg_toast schema. And the error message already gave us the name of the bad TOAST table. So we reindex by:

REINDEX TABLE pg_toast.pg_toast_2877446;

Breaking the link is the most obvious step:

UPDATE messages SET contents = '' WHERE message_id = 432212;

By setting the TOASTED column to a new value, we make sure that the "current" version of the row does not link to the bad TOAST entry. I did say we'd be destroying data here, yes?

But that's not all we have to do. The old row version still exists, and will continue to produce errors every time VACUUM visits it (preventing VACUUMs of that table from completing). So we need to build a new version of the table which does not include that old row version.

Fortunately, there is a utility to do this: the super-awesome pg_repack, by Daniele Varrazzo, Josh Kupershmidt and Itagaki Takahiro. This utility will save your life someday, so buy Daniele, Josh and/or Itagaki a drink of their choice if you see them.

pg_repack -n -t messages messagedb

Now, while pg_repack does not need an exclusive lock for most of the rebuild, it does need three things, which you need to be prepared for:

enough disk space to make a full copy of the table and its indexes.
a short exclusive lock at the very end of processing, including on related tables by FK.
to wait for all long-running transactions to complete

However, once you've done this, you should have a clean table free of TOAST corruption.

Wait, though, what's this?

ERROR: could not create unique index "messages_pkey"
DETAIL: Key (message_id)=(795437) is duplicated.

Oh-oh. Looks like we have more work to do. See you Monday!

Thursday, October 3, 2013

First thing you do is take a backup

Last week seems to have been a week for "pilot error": not just on SouthWest, but among our clientele at PGX. Three customers in two days managed to corrupt their databases through sysadmin mistakes, such as remounting the SAN with PostgreSQL running. Which reminded me of an important tip for what you should do whenever anything goes really wrong with your database server:

Shut down PostgreSQL if it's running.
Check if you have current valid backups and/or replicas which are not already corrupt. If so, make sure these are protected.
If there are no backups, make a full copy of the database files and all transaction logs.

You'll be tempted do everything you can to get the main production server up and running as soon as possible, rather than waiting for backups, especially if your database is quite large. Resist that temptation.

Many things you will try to rescue a corrupt database or a damaged server can cause further destruction of data. Worse, certain kinds of corruption (caused by bad storage or bad RAM, for example) will spread to the replicas and/or your backups given time. In a worst case, you can fiddle with your damaged DB server long enough that the regularly scheduled backup kicks in ... and overwrites a good backup with a corrupt one.

Of course, maintaining a few days or weeks of rotational backups is an even better idea, and protects you against many forms of data loss. I recommend Barman or pgRollDump for this purpose.