Database Soup: We need a webapp benchmark

Thursday, November 6, 2014

We need a webapp benchmark

I've been doing some comparative testing on different cloud platform's hosting of PostgreSQL. And one of the deficiencies in this effort is the only benchmarking tool I have is pgbench, which doesn't reflect the kinds of workloads people would want to run on cloud hosting. Don't get me wrong, pgbench does everything you could imagine with the simple Wisconsin benchmark, including statistics and sampling. But the core benchmark is still something which doesn't look much like the kind of Rails and Django apps I deal with on a daily basis.

There's also TPCC-js, which is more sophisticated, but is ultimately still a transactional, back-office OLTP benchmark.

So I'm thinking of developing a "webapp" benchmark. Here's what I see as concepts for such a benchmark:

Read-mostly
No multi-statement transactions
Defined "users" concept with logins and new user registration
Needs a "sessions" table which is frequently updated
Read-write, read-only and session database connections should be separable, in order to test load-balancing optimization.
Queries counting, sorting and modifying content
Measured unit of work is the "user session" which would contain some content lookups and minor updates ("likes").

Now, one of the big questions is whether we should base this benchmark on the idea of a social networking (SN) site. I think we should; SN sites test a number of things, including locking and joins which might not be exercised by other types of applications (and aren't by pgbench). What do you think? Does anyone other than me want to work on this?

23 comments:

UnknownNovember 6, 2014 at 6:43 PM
What about a spec compliant TPC-W?

Jan published on written in PHP several years ago it can be obtained here on pgfoundry, http://pgfoundry.org/projects/tpc-w-php/
ReplyDelete
Replies
Arthur SilvaNovember 6, 2014 at 7:35 PM
I think this website/project can be helpful Josh http://oltpbenchmark.com/wiki/index.php?title=Main_Page specially the Epinions and SEATS workloads.
ReplyDelete
Replies
Software and MotorcyclesNovember 7, 2014 at 7:29 AM
Maybe some wiki made with Django? I'm thinking about using some existing open source project, something like https://github.com/django-wiki/django-wiki. Using Django will allow compare PostgreSql with other DBMS too.

I think that using a framework with an ORM may be a little unfair (since it uses basic features), but will give better results, closer to more common and real workloads.
ReplyDelete
Replies
Robert YoungNovember 7, 2014 at 10:05 AM
In an age when we can have servers with 100s of gigabytes of memory, terabytes of SSD, and dozens of cores for about $10K (lower end of the possibilities, yes) talking to browsers on broadband with persistent connections, perhaps we should contemplate designing apps to the VT-100/*nix database/RS-232 paradigm of times past. After all, all those so-called OO coders remain in love with dataObject/actionObject structure which is not a bit different from the COBOL/VSAM structure their granddaddies used. Some kinds of Back to The Future actually make sense; we can now do what Dr. Codd had in mind with hardware he didn't have.

A test suite should acknowledge such a structure, rather than promoting the disconnected structure of vanilla HTTP.
ReplyDelete
Replies
AnonymousNovember 8, 2014 at 5:55 PM
I think you should look into linkbench.
ReplyDelete
Replies
Laurence RoweNovember 9, 2014 at 12:55 AM
I'm not convinced your suggested set of measurements reflects typical web apps.
* Server-side sessions are rare in my experience, signed or encrypted cookies being the preferred alternative for almost any situation. (Sites with server-side sessions always seem frustratingly unusable with the session timing out at inopportune times, the implementation against the grain of HTTP.)
* "likes" should generally be implemented asynchronously, perhaps via a webserver log which is read in in small batches, so a 1 like per transaction under load would be unusual.
* The login action needs to reflect a usual user table rather than a database user. Very, very few webapps map app users to db users.
ReplyDelete
Replies
AnonymousNovember 12, 2014 at 2:19 PM
I think it's a great idea. One thing to add to the list: common pagination techniques, with abusing limit/offset being number 1. That's where most of my unexpected performance variation (and corresponding headaches) comes from.
ReplyDelete
Replies
AnonymousNovember 20, 2014 at 11:23 PM
You can use something like tcpcopy: https://github.com/session-replay-tools/tcpcopy & a real log from a Rails/Django MySQL intercept: https://github.com/session-replay-tools/mysql-replay-module

Note: I haven't used it yet, but seems promising!
ReplyDelete
Replies
ChrisDecember 7, 2014 at 2:17 PM
Came across this, perhaps it has something useful:

https://github.com/oltpbenchmark/oltpbench
ReplyDelete
Replies

Add comment