Here's the New New Index Bloat Query.
Here's what I changed:
- Had it pull btree indexes only, because the calculations don't work for GIN/GiST indexes.
- Moved all of the nested subqueries up into WITH clauses for easier readability and maitainability. Since all supported Postgres versions now do CTEs, I didn't see any reason to work around them.
- Removed the \set statements, because they don't work for embedding the query in a script. And they only need setting dynamically if you've compiled PostgreSQL with nonstandard options, anyway.
- Removed a couple of CASE statements aimed at 7.X support; who cares?
- Added some extra informational columns for using this in interactive mode: table size, pretty index size, and index scans. This helps folks figure out whether to rebuild an index, ignore the bloat or to drop it.
- In the example query, filtering down to indexes with bloat over 50% and 50MB, which is our threshold for "significant bloat"
-[ RECORD 27 ]+-----------------------------
database_name | prod-web
schema_name | partitions
table_name | transactions_201308
index_name | transactions_201308_uuid_idx
bloat_pct | 52.9
bloat_bytes | 351649792
bloat_size | 335 MB
index_bytes | 664788992
index_size | 634 MB
table_bytes | 4570447872
table_size | 4359 MB
index_scans | 263
So this index would be a good candidate for deletion, since it's 50% bloated and seldom scanned.
Now, there's still more to do on this. We need a similar query for GIN/GiST/SPGiST. The query could use some more cleanup; removing one-letter table aliases, and GROUP BY 1,2,3,4 stuff. But it's vastly improved for checking which of your indexes need VACUUM/REINDEX. Thanks, Ioguix!
Very interesting, but one side note: the old query uses pg_stats and thus enables check_postgres to query the statistics without being a superuser for example.
ReplyDeleteWhich bits of my version require superuser perms? I don't see any.
DeleteThe JOIN on pg_statistics. I get this (sorry for the german) when I execute your query as a non-superuser:
DeleteFEHLER: keine Berechtigung für Relation pg_statistic
Bernd rewrote it here: http://pgsql.privatepaste.com/36c6206434
Which works fine for me as non-superuser.
Please, again, fix this piece of the query from:
Deletemaxalign - CASE /*...*/
WHEN 2%maxalign = 0 THEN maxalign
ELSE 2%maxalign
END
To:
maxalign - CASE /*...*/
WHEN index_tuple_hdr%maxalign = 0 THEN maxalign
ELSE index_tuple_hdr%maxalign
END
See my previous comment here and on my blog.
Cheers,
This comment has been removed by the author.
DeleteBernd,
DeleteHmmm. Seems like we should be able to get what we need out of pg_stats. Lemme play around with it.
Patches welcome, Josh, but plenty of users *do* care about 7.x and 8.x, so we cannot support WITH quite yet. :)
ReplyDeleteI care about 7.x and 8.x as well :)
DeleteNote that my query is 8.2+ only. It's because pg_index.indkey (int2vector) can not be cast to anything before 8.2. I'm afraid we'll be forced to deal with that on the app side using two queries. Hadn't time to work on that yet, but this is quite easy to do anyway.
This comment has been removed by the author.
DeleteGreg, I'm just solving the 90% case. I figure if a Postgres version is out-of-support, the user probably isn't updating their monitoring either.
DeleteNah, upgrading your monitoring is easy. Upgrading your large production database on the other hand,,,
DeleteThanks for improvements Josh !
ReplyDelete1. I hadn't time to work or test on GiST/GIN, and it probably doesn't work properly anyway. +1 for limiting explicitly to B-Tree
2. and 4. as Greg explained, I care about 7.x and 8.x as well, so I'll keep the sub-query version
3. \set was actually to make the query more "readable" and understandable. Obviously, it's easy to serialize them back in query for tools :-)
WARNING: You directly use value "2" in CTE "index_aligned" for "nulldatahdrwidth" instead of "index_tuple_hdr" computed in CTE "index_item_sizes". So your stats are not good enough with index referencing null-able fields. I answered you about this on my blog.
Cheers,
Fixed on gist, thanks!
DeleteWhat about comparison with pgstattuple? How different the results are?
ReplyDeleteThis part is covered in my blog post about the original query and how it compare to reality. See:
Deletehttp://blog.ioguix.net/index.php?post/2014/03/28/Better-bloat-estimation-on-indexes
Other comment hasn't been approved so I can't post this as a reply to it, but here's the table definition:
ReplyDeleteTable "public.transfers"
Column | Type | Modifiers
----------------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('transfers_id_seq'::regclass)
edited | integer |
external_id_organization | character varying(255) |
accountnumber_from | character varying(255) |
edited | character varying(255) |
edited | character varying(255) |
edited | character varying(255) |
edited | character varying(255) |
status | character varying(255) |
bj_job_id | integer |
edited | timestamp without time zone |
edited | timestamp without time zone |
edited | date |
edited | character varying(255) |
edited | integer |
edited | integer |
edited | integer |
edited | integer |
edited | integer |
edited | boolean | default true
edited | character varying(255) |
schedule_id | character varying(255) |
edited | character varying(255) |
edited | integer |
edited | timestamp without time zone |
edited | timestamp without time zone |
edited | character varying(255) |
edited | numeric | not null
Indexes:
"transfers_pkey" PRIMARY KEY, btree (id)
"index_transfers_on_schedule_id" UNIQUE, btree (schedule_id) WHERE schedule_id IS NOT NULL
"index_transfers_on_bj_job_id" btree (bj_job_id) WHERE bj_job_id IS NOT NULL
"index_transfers_on_external_id_organization_accountnumber_from" btree (external_id_organization, accountnumber_from)
"index_transfers_on_success_by_organization" btree (external_id_organization) WHERE transfer_status_indicates_success(status)
Triggers:
transfers_set_amount_as_decimal_trigger BEFORE INSERT OR UPDATE ON transfers FOR EACH ROW EXECUTE PROCEDURE set_amount_as_decimal()
...and some more info on the ID column and bloat query output:
ReplyDeleteproduction=# select count(1), min(id), max(id) from transfers;
count | min | max
---------+-----+---------
4453403 | 163 | 4534588
production=# \i bloat.sql
database_name | schema_name | table_name | index_name | bloat_pct | bloat_bytes | bloat_size | index_bytes | index_size | table_bytes | table_size | index_scans
-----------------+-------------+------------+----------------------------------------------------------------+-----------+-------------+------------+-------------+------------+-------------+------------+-------------
production | public | transfers | index_transfers_on_bj_job_id | 63.3 | 720896 | 704 kB | 1138688 | 1112 kB | 2030804992 | 1937 MB | 6
production | public | transfers | index_transfers_on_external_id_organization_accountnumber_from | 6.6 | 16277504 | 16 MB | 248111104 | 237 MB | 2030804992 | 1937 MB | 212895
production | public | transfers | index_transfers_on_schedule_id | 0.0 | 0 | 0 bytes | 16384 | 16 kB | 2030804992 | 1937 MB | 17
production | public | transfers | index_transfers_on_success_by_organization | 19.4 | 21061632 | 20 MB | 108298240 | 103 MB | 2030804992 | 1937 MB | 6451
production | public | transfers | transfers_pkey | 69.4 | 200253440 | 191 MB | 288636928 | 275 MB | 2030804992 | 1937 MB | 36690