As I mentioned, I'm back to writing for Linux Weekly News. My latest, a summary of interesting developments at pgCon, is now up. Note that reading this article requires an LWN subscription; since they pay me, I don't want to give away the content for free. If you don't want to subscribe, wait two weeks and the article will be free.
I also wrote a summary of what's in the PostgreSQL 9.2 Beta for LWN, which does not require a subscription. Enjoy!
Thursday, May 31, 2012
Monday, May 21, 2012
Testing 9.2: Autovacuum Logging
Since PostgreSQL 9.2 is in beta now, I've been using some generously donated cloud server time on GoGrid to test various features in the new release. One of the ones which is working brilliantly is the new, more verbose logging for Autovacuum.
Greg Smith and Noah Misch added additional information to the logging you get when you set log_autovacuum_min_duration. This includes information on the work done by autovacuum (buffers, pages and tuples read and written), as well as information on when autovacuum "skips" because it can't get a lock. Here's a sample:
LOG: automatic analyze of table "bench2.public.pgbench_tellers" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: automatic vacuum of table "bench2.public.pgbench_branches": index scans: 0
pages: 0 removed, 2 remain
tuples: 114 removed, 200 remain
buffer usage: 44 hits, 0 misses, 3 dirtied
avg read rate: 0.000 MiB/s, avg write rate: 45.334 MiB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: automatic analyze of table "bench2.public.pgbench_branches" system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec
LOG: automatic analyze of table "bench2.public.pgbench_tellers" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
This will go a long way towards helping users troubleshoot autovacuum in the future. Great work, Greg and Noah!
Greg Smith and Noah Misch added additional information to the logging you get when you set log_autovacuum_min_duration. This includes information on the work done by autovacuum (buffers, pages and tuples read and written), as well as information on when autovacuum "skips" because it can't get a lock. Here's a sample:
LOG: automatic analyze of table "bench2.public.pgbench_tellers" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: automatic vacuum of table "bench2.public.pgbench_branches": index scans: 0
pages: 0 removed, 2 remain
tuples: 114 removed, 200 remain
buffer usage: 44 hits, 0 misses, 3 dirtied
avg read rate: 0.000 MiB/s, avg write rate: 45.334 MiB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: automatic analyze of table "bench2.public.pgbench_branches" system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec
LOG: automatic analyze of table "bench2.public.pgbench_tellers" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
This will go a long way towards helping users troubleshoot autovacuum in the future. Great work, Greg and Noah!
Monday, May 7, 2012
Amazonstravaganza tommorrow, on UStream
Tommorrow we're going to have two back-to-back SFPUG presentations about running PostgreSQL on AWS: Scalr presenting their stuff, and "Amazon With Reduced Tears" by Christophe Pettus. The event is sold out at 100 RSVPs, though, so I suggest you watch it on video!
Barring severe technical difficulties, it will be presented live on SFPUG Ustream. Festivities should start around 7:15 PM PDT.
Barring severe technical difficulties, it will be presented live on SFPUG Ustream. Festivities should start around 7:15 PM PDT.
Sunday, May 6, 2012
Random Page Cost Revisited
Because the planner has limited information about the host system PostgreSQL is running on, we offer several planner configuration variables, which can be set either dynamically or in the postgresql.conf file. One of these, which has been a source of much contention in the past, is Random Page Cost (or random_page_cost, hereafter RPC). For the last couple of years RPC was not a variable you needed to worry about, but now, thanks to advances in storage technology, it's become one of the essential parameters again.
Random Page Cost is meant to represent the coefficient between the cost of looking up one row (out of many) via sequential reads, vs. the cost of looking up a single row individually using random access (disk seeks). This factor strongly influences the planner's decisions to use indexes vs. table scan, and composite vs. simple indexes, in executing queries.
For many generations of spinning hard drives, this ratio remained about the same, around 4.0.
Things have changed recently though. In fact, there are five now-common types of storage where you want to change RPC in order to match a different scan/seek ratio for different hardware.
1. High-End NAS/SAN: when working well, your large storage box combines a large volatile cache with many many disks, improving the number of concurrent seeks you can do. However, if storage is not direct-attach, you usually have limited throughput. This means that a random_page_cost of 2.5 or 3.0 is more appropriate.
2. Amazon EBS and Heroku: EBS, especially when "RAIDed", is an extreme example of the NAS/SAN case. Seeks are fast with virtually unlimited concurrency, while storage bandwitdh is erratic and slow, limiting full table scans. As such, we often use a random_page_cost of 1.1 to 2.0.
3. Bad SANs: the advent of iSCSI seems to have inspired a large variety of very poor-performing inexpensive storage units, such as the lower-end Dell Equallogics and various SATA-based Promise units. These boxes are cheaper for a reason, folks! Anyway, their quirks often give these boxes unique performance characteristics which strongly affect database access and prompt you to modify planner parameters. For example, we discovered through testing that one Promise box had decent throughput, but seek rates slower than a single drive due to a combination of architecture and driver issues. As such, we set RPC to 6.0. You'll have to trust the results of bonnie++ and fio in order to figure out where you should set things, though; each inferior unit is inferior in a different way.
4. SSDs: in general, SSDs have not lived up to their early hype for performance; random writes and sequential scans on most SSDs are only incrementally better than on HDDs. One area where SSDs shine, though, is random page access; a good array of SSDs can make even large index scans faster than sequential scans. Therefore, use a random_page_cost of 1.5 to 2.5.
5. NvRAM (or NAND): durable memory, such as FusionIO and Virident drives, do deliver on most of the performance promises originally made about SSDs. Everything is faster, sometimes orders of magnitude faster, than HDDs, most of all random access. While the concept of NvRAM might make you inclined to set RPC to 1, forcing almost 100% index access, that's going a little far because PostgreSQL still has a disk page structure. Therefore, I suggest 1.0 to 1.5.
Note that two other variables, effective_cache_size and effective_io_concurrency also affect seek/scan behavior. Possibly I will blog about them at a different time. Further, I have only been able to test with relatively few models and configurations of the above types of storage, so test, monitor and check your results before committing to a long-term configuration change.
Random Page Cost is meant to represent the coefficient between the cost of looking up one row (out of many) via sequential reads, vs. the cost of looking up a single row individually using random access (disk seeks). This factor strongly influences the planner's decisions to use indexes vs. table scan, and composite vs. simple indexes, in executing queries.
For many generations of spinning hard drives, this ratio remained about the same, around 4.0.
Things have changed recently though. In fact, there are five now-common types of storage where you want to change RPC in order to match a different scan/seek ratio for different hardware.
1. High-End NAS/SAN: when working well, your large storage box combines a large volatile cache with many many disks, improving the number of concurrent seeks you can do. However, if storage is not direct-attach, you usually have limited throughput. This means that a random_page_cost of 2.5 or 3.0 is more appropriate.
2. Amazon EBS and Heroku: EBS, especially when "RAIDed", is an extreme example of the NAS/SAN case. Seeks are fast with virtually unlimited concurrency, while storage bandwitdh is erratic and slow, limiting full table scans. As such, we often use a random_page_cost of 1.1 to 2.0.
3. Bad SANs: the advent of iSCSI seems to have inspired a large variety of very poor-performing inexpensive storage units, such as the lower-end Dell Equallogics and various SATA-based Promise units. These boxes are cheaper for a reason, folks! Anyway, their quirks often give these boxes unique performance characteristics which strongly affect database access and prompt you to modify planner parameters. For example, we discovered through testing that one Promise box had decent throughput, but seek rates slower than a single drive due to a combination of architecture and driver issues. As such, we set RPC to 6.0. You'll have to trust the results of bonnie++ and fio in order to figure out where you should set things, though; each inferior unit is inferior in a different way.
4. SSDs: in general, SSDs have not lived up to their early hype for performance; random writes and sequential scans on most SSDs are only incrementally better than on HDDs. One area where SSDs shine, though, is random page access; a good array of SSDs can make even large index scans faster than sequential scans. Therefore, use a random_page_cost of 1.5 to 2.5.
5. NvRAM (or NAND): durable memory, such as FusionIO and Virident drives, do deliver on most of the performance promises originally made about SSDs. Everything is faster, sometimes orders of magnitude faster, than HDDs, most of all random access. While the concept of NvRAM might make you inclined to set RPC to 1, forcing almost 100% index access, that's going a little far because PostgreSQL still has a disk page structure. Therefore, I suggest 1.0 to 1.5.
Note that two other variables, effective_cache_size and effective_io_concurrency also affect seek/scan behavior. Possibly I will blog about them at a different time. Further, I have only been able to test with relatively few models and configurations of the above types of storage, so test, monitor and check your results before committing to a long-term configuration change.
Thursday, May 3, 2012
SELECT fire_proton_torpedoes();
At pgCon this year you will have the chance to reduce your fellow attendees to their component elements, and win prizes for it!
We will be having a Schemaverse tournament for the length of the conference. Schemaverse is an amazing game which demonstrates that PostgreSQL isn't just a database, it's an entire development environment. You build a fleet of starships, fight battles, take planets, trade, and conquer the galaxy ... all from the psql command line. Write PL/pgSQL functions to command your armies!
To make things even more interesting, if you can hack into the Postgres server, you can win the game.
The Schemaverse tournament will begin Wednesday night and end shortly before the closing session at pgCon. The winner will get some one-of-a-kind PostgreSQL and Schemaverse swag, and even runners up will get posters or tshirts.
If you've already registered for pgCon, look for email from Dan Langille about how to sign up for the tournament. In the meantime, get over to the Schemaverse website and practice your space battle skills to get ready for the big event.
We will be having a Schemaverse tournament for the length of the conference. Schemaverse is an amazing game which demonstrates that PostgreSQL isn't just a database, it's an entire development environment. You build a fleet of starships, fight battles, take planets, trade, and conquer the galaxy ... all from the psql command line. Write PL/pgSQL functions to command your armies!
To make things even more interesting, if you can hack into the Postgres server, you can win the game.
The Schemaverse tournament will begin Wednesday night and end shortly before the closing session at pgCon. The winner will get some one-of-a-kind PostgreSQL and Schemaverse swag, and even runners up will get posters or tshirts.
If you've already registered for pgCon, look for email from Dan Langille about how to sign up for the tournament. In the meantime, get over to the Schemaverse website and practice your space battle skills to get ready for the big event.