In fact, you really need to avoid every kernel between 3.0 and 3.8. While RHEL has been sticking to the 2.6 kernels (which have their own issues, but not as bad as this), Ubuntu has released various 3.X kernels for 12.04. Why is this an issue? Well, let me give you two pictures.
Here's private benchmark workload running against PostgreSQL 9.3 on Ubuntu 12.04 with kernel 3.2.0. This is the IO utilization graph for the replica database, running a read-only workload:
Sorry for cropping; had to eliminate some proprietary information. The X-axis is time. This graph shows MB/s data transfers -- in this case, reads -- for the main database disk array. As you can see, it goes from 150MB/s to over 300MB/s. If this wasn't an SSD array, this machine would have fallen over.
Then we upgraded it to kernel 3.13.0, and ran the same exact workload as the previous test. Here's the new graph:
Bit of a difference, eh? Now we're between 40 and 60MB/s for the exact same workload: an 80% reduction in IO. We can thank the smart folks in the Linux FS/MM group for hammering down a whole slew of performance issues.
So, check your Postgres servers and make sure you're not running a bad kernel!
>> Linux ubuntu 3.5.0-54-generic #81~precise1-Ubuntu SMP Tue Jul 15 04:02:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
ReplyDelete>>Distributor ID: Ubuntu
>>Description: Ubuntu 12.04.4 LTS
>>Release: 12.04
>>Codename: precise
Is this version good for postgres?
Nope. 3.5 falls in the range of versions to skip.
DeleteFor that matter, 3.5 has some segfault crash bugs in the virtualization system.
DeleteIs this version 3.5 good or bad for postgresql
Delete3.5 has known major performance problems, per multiple citations above.
DeleteDo you really mean in between 3.0 and 3.8 because 3.13 then falls in between?
ReplyDeleteI run latest CentOS Linux release 7.0.1406 -- 3.10.0-123.6.3.el7.x86_64, is that ok?
Cypisek: that's not how release numbering works. 3.13 is *after* 3.8.
DeleteKernel 3.10, however, should be fine.
DeleteThank you Josh.
ReplyDeleteIt is implied but not clearly spelled out in the post; you mean to say that the database yields same, or maybe better, TPS even with reduced load on the I/O subsystem, right?
ReplyDeleteWell, the kernel 3.2 run actually had worse TPS because of the IOwaits. So 3.13 gave better TPS with reduced IO.
DeleteUbuntu 14.04 LTS has been out quite a while now, and even had its first point release. I certainly recommend it over 12.04 LTS, and ships with the 3.13 kernel.
ReplyDeleteMay I ask how did you get those figures and create graph?
ReplyDeleteIt's the internal performance monitoring from one of my clients. As such, I can't share details.
DeleteDoes this problem mainly affect Postgres? Or would you recommend against any system (even those not running PG) avoid these kernels?
ReplyDeleteI'd recommend against using them for any service which needs to do concurrent IO. Besides, you can't safely run Docker without upgrading to 3.9 or later anyway.
DeleteI didn't see this, but for posterity, the kernel devs suggested this is due to the 3.2-3.8 memory managers being overly aggressive about stale cache purging. Basically they weren't properly promoting inactive cache into the active set, so data was being repeatedly invalidated while it was being loaded from disk, leading to a ceaseless IO cycle.
ReplyDeleteThere were several patches that corrected this behavior, but some of the more subtle ones didn't make it in until 3.12 or so. 3.8 is the bare minimum for running a stable Linux server, IMO.
3.2.0 kernel shows 150-350MB/s read - faster IO
ReplyDelete|
3.13.0 kernel shows 40-60MB/s read - much slower IO
Your comment in your blog says "Kernel 3.10, however, should be fine." but this falls between 3.2 and 3.13
If we should avoid everything between 3.0 and 3.8, then why does your graph suggest the slowdown in IO comes at 3.13 which you point out "3.13 is *after* 3.8."
Slightly confused here
This comment has been removed by the author.
DeleteYou really need to read the text as well as looking at the graphs. You missed the two places where I explain that this is a fixed size workload; that is, it's the exact same number of queries and data output in both runs.
DeleteThe kernel 3.2 run is doing the exact same amount of work as the kernel 3.13 run. This shows how 3.2 has memory management issues; it's doing 140MB/s in completely unnecessary IO (on top of the 50MB/s of necessary IO).
Those issues were fixed in kernels 3.9 and 3.10, depending on your distribution.