First, let me give you the good news. The following operations affect metadata only and require neither rewriting the table nor any per-row verification of values:
- DROP COLUMN
- ADD COLUMN with no DEFAULT
ADD COLUMN can be free too, provided that the new column allows NULLs and doesn't have a DEFAULT value set. You might think the following two transactions are almost identical:
ALTER TABLE users ADD COLUMN hat_size TEXT DEFAULT 'L';
BEGIN;
ALTER TABLE users ADD COLUMN hat_size TEXT;
ALTER TABLE users ALTER COLUMN hat_size SET DEFAULT 'L';
COMMIT;
... but the first one will rewrite the table entirely, whereas the second one will not. If you try both of those, and look at the results, it becomes clear why:
Version 1:
user_id | hat_size
---------+----------
1021 | L
1022 | L
1023 | L
1025 | L
1026 | L
Version 2:
user_id | hat_size
---------+----------
1021 |
1022 |
1023 |
1025 |
1026 |
As you see, if you do the ADD and the DEFAULT in a single statement, Postgres helpfully pre-populates the column for you. Whereas if you do them in two statements, it doesn't, and therefore doesn't rewrite the whole table. Yes, I realize that this behavior is fairly arbitrary, but changing it at this point would break a lot of backwards compatibility. And the "auto-populate the default" feature was added in the first place (in 7.2 IIRC) because people wanted it.
Now, above I said that certain actions were "free". The reason for the quotation marks is that these ALTERS do actually have costs. More on that in my next post.
Crisp and concise, Josh. Thank you.
ReplyDeleteVery Good
ReplyDeleteExcellent post! Thanks for sharing.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteJust an example of the full "non blocking" column adding cycle.
ReplyDeleteCreating a column.
ALTER TABLE public.comments ADD source_type integer;
ALTER TABLE public.comments ALTER source_type SET DEFAULT 0;
Creating the assisting index to ease defaults migration.
CREATE INDEX CONCURRENTLY i_comments_migration_tmp
ON comments (id) WHERE source_type IS NULL;
Now migrating the unset defaults by small chunks in shell.
PSQL=psql
total_updated=0
updated=1
while [ $updated -gt 0 ]; do
updated=$(($PSQL -X sports_ua <<EOF
UPDATE comments SET source_type = 0
WHERE id IN (
SELECT id FROM comments
WHERE source_type IS NULL LIMIT 5000);
EOF
) | cut -d ' ' -f 2)
(( total_updated+=updated ))
echo -ne "\r$total_updated"
done
Dropping the assisting index.
DROP INDEX i_comments_migration_tmp;
And adding NOT NULL if it is needed.
ALTER TABLE public.comments ALTER source_type SET NOT NULL;
thank you so much for taking the time to write this comment. does that SET NOT NULL require an exclusive lock on the table? I'm using version 9.1
DeleteIt does, but in this case it will be very fast because it doesn't need to update the table.
DeleteThank you for the write up Josh!
ReplyDeleteAnd Sergey, very nice too. If I understood it correctly, the key is to make UPDATE run in short amount of time during the deployment. This would be the one way.
I would also like to mention that 1st statement in Josh's example:
ALTER TABLE users ADD COLUMN hat_size TEXT DEFAULT 'L';
This will lock the table which makes it even less preferable for aiming no-downtime schema change deployment :-)
- Miho
Ah, just realized that you published part 2 regarding the lock condition!
Delete