To get information regarding the performance of databases on the AS/400, Joe Pluta set up some benchmarks on the System i. This tip includes this test of MySQL engines (InnoDB and MyISAM), JT400, native RPG in DB2, and SQL.The results are in no way definitive, and only represent one use-case scenario.
In my ongoing pursuit of knowledge concerning the performance characteristics of MySQL, it became clear to me that running part of the benchmark on the PC and part on the i was comparing apples to aardvarks. In order to create a more uniform battleground, I decided to resurrect my old performance benchmarks framework and update it to support MySQL. This article will give you my results.
The original framework was created years ago for something I called the iSeries Advanced Architecture Initiative, or IAAI. While the IAAI never really took off, the benchmark software worked quite well, and I've dredged it up every so often to address specific issues.
My first benchmarks compared native I/O to embedded SQL, and the results were pretty dramatic. Native I/O outperformed SQL in nearly every test, often by wide margins. The only place where SQL was faster was when I purposely designed the test to allow inserts or updates of more than one record at a time. My next set of tests pitted basic Java arithmetic processes to their RPG counterparts, and in those benchmarks RPG won even more impressively than in the database tests. That's why I prefer my business logic to be written in RPG using native file I/O, and stick to using SQL for complex queries and Java for user interface work.
The second set of tests was actually an interesting challenge. It took a bit of plumbing to get a framework in place that could run both RPG and Java classes within the same test suite. This was written in the early days of RPG's support for Java method prototyping, and I stumbled more than once. But once I got the basic concepts in place it became relatively easy to add additional Java-based tests.
JDBC vs. RLA benchmarks
The next thing I did was run some comparisons to see how the record-level access (RLA) classes in IBM's Java toolbox would stack up to standard JDBC. The results were what you might expect: RLA falls somewhere between JDBC and native I/O. Although quite a bit faster than the corresponding JDBC code, the RLA classes weren't really fast enough to displace RPG business logic. Combine that with the one-off nature of the RLA classes (no other database had anything equivalent) and the benchmark didn't give me any new information. RPG was still by far the best performing business language.
So I didn't really get any great increase in knowledge from writing the test programs for the RLA vs. JDBC benchmarks. But as a nice little side benefit I ended up with a solid foundation upon which to test JDBC databases against one another. I had already built a number of flags into the test program configuration files; I simply used those to trigger the differences in configuration required to use the various database, things like the driver class name and the URL. One of the nice things about JDBC is that, actual SQL syntax differences aside, how uniform the API is for the various databases.
For my database comparisons, I have a small but varied set of benchmarks. Each one tests a slightly different behavior of the database. I have a read using a result set, an insert, and several varieties of updates (the primary difference being how many records are updated at a time).
The way the benchmark framework works is that I can define a test program and then group a set of test programs into a test suite. The test program definition allows me to specify the number of iterations to run for each test to avoid creating a test scenario that either runs for a few milliseconds or for several hours. This design allows me to run test suites multiple times and assign different Run IDs to each run. The results for each are stored in a database that I can then query for results. It's far from perfect; people have said the tests are too simple, too complicated, not enough structure, not enough control, and on and on. That's fine, I never said I was a statistics guru. I just wanted some baseline data and I got it -- and I'll share it with you!
MyISAM vs. InnoDB: The two primary storage engines for MySQL are MyISAM and InnoDB. Others exist, but these two are by far the most popular. I did most of my testing using the MyISAM engine. MyISAM is the default storage engine, and it's really pretty fast. The primary downside to MyISAM is that it is really a much more of a single-user database, or at least not a highly transactional database. Locks occur at the table (file) level rather than the row (record) level, and MyISAM has no support for transactions. So if you're looking for a highly robust database for a typical enterprise application, you may run into issues with MyISAM, especially when scaling to large numbers of users.
The InnoDB engine, on the other hand, has very good transaction support. Unfortunately that very support tends to cause some overhead especially when updating individual records in a database. But, let me give you the statistics. Among the many things I was able to do was to derive the number of seconds per 1,000 executions of a given database task.
|Read||Write||Update x 1||Update x 10||Update x 100|
Table 1: This shows the time to read/write/update 1,000 records.
Several very striking trends should be immediately visible from this simple table. First, native RPG code outperforms everything -- and not by small amounts. RPG is up to six times faster than embedded SQL, and 10 to 100 times faster than any form of JDBC. Again, I know that this simple benchmark is hardly proof beyond a reasonable doubt, but at the same time it's some pretty powerful circumstantial evidence, don't you think?
Another observation is how SQL can be made faster by processing multiple records at a time. This should come as no surprise to true SQL advocates; SQL is after all designed to deal with sets of records rather than individual rows. So the fact that SQL gets better with larger sets is hardly unusual. But what I find most interesting is that the effect is far more pronounced on the JDBC interfaces than on the embedded SQL. In fact, the embedded SQL is much closer to native I/O for single record processing than the corresponding JDBC code. That fact rather strongly suggests that when you use SQL, you should use it inside of RPG.
The last real in-your-face bit of analysis is that the InnoDB engine -- which is the version presented as the enterprise-level version of MySQL -- performed pretty miserably in the row-at-a-time world of traditional business logic. All that support for transactions and row-level locking seems to put a rather high burden on single row processing. Unfortunately, that is just the kind of function that begs for commitment control.
One thing I don't show in my table is a rather painful lesson I learned early on in my testing. All of my update tests use a WHERE clause that tests the field key2, which is the second field in the primary key. That is, although there is a primary key defined, it is over key1 and key2, and so a predicate testing key2 alone cannot use that primary key index. I forgot to create the index over the key2 field, and paid for it dearly. Updating 1,000 records in a 500,000 record file one record at a time took nearly an hour. Now, once I added the index, the time dropped to a little over a minute, but still it just goes to show that indexes are hugely important.
A grain of salt
The last thing I want to do is to try to present this as a definitive case either for or against MySQL, JDBC or SQL in general for that matter. My primitive testing methodology is far from sufficient to base any true business decisions (well, except for the fact that RPG absolutely rocks, but we knew that). To be fair about this, you'd have to first expand the breadth of the tests dramatically. You'd want to have a wider variety of table layouts and data types, and a more varied workload. And then you'd have to try it out on a range of different machines, or at the very least on your own machine, since I'm reasonably sure that hardware will play a significant role in the real-world performance of the database.
And even though the InnoDB engine has some clear performance issues, that is offset by the fact that you don't necessarily have to use either MyISAM or InnoDB exclusively. You can mix them, although obviously you won't be able have commitment control. It doesn't hurt that choosing one or the other is quite easy: specifying ENGINE=INNODB on the CREATE TABLE statement is all it takes to create an InnoDB table as opposed to a MyISAM table.
My overall impression? Well, I'm pretty biased but I think this only reinforces the idea that we really take DB2 on the i for granted. While some databases can approach the speed of DB2, it's usually at the expense of other things we take for granted, whether it's integrated row-level locking or just the rock-solid stability of the system under heavy load. MySQL is a popular database and may provide an option for prepackaged applications with somewhat less stringent database requirements such as content managements system, but I don't think MySQL will replace DB2 anytime soon.
ABOUT THE AUTHOR: Joe Pluta is the founder and chief architect of Pluta Brothers Design Inc. Joe uses the Rational tool suite extensively, and is on the beta teams for most of its i-related products. He performs onsite mentoring and proof-of-concept development and speaks at user groups around the country. Joe can be reached at firstname.lastname@example.org.