IBM and MySQL: The way they were
I found it particularly interesting that IBM announced future support for DB2 would be tightly integrated into MySQL. It's a little abstract, but think of MySQL as just an API that lets you access data. Originally MySQL had only one underlying physical structure, but over the years support for other physical layouts has been added by writing interface layers. MySQL calls these interface layers storage engines, and provides a full specification for third-parties to write their own. Last year's announcement stated that IBM and MySQL would together develop a storage engine that would directly access DB2 tables.
Why is this important? Because today the only way you can access a MySQL table on the i, is by using Java (e.g., JDBC and the MySQL driver). There is no way to directly access a traditional DB2 physical or logical file via MySQL, nor is there any way for an ILE program to directly access MySQL tables.
So where is my data?
Let's look at this in a little more detail. The default storage engine for MySQL is the MyISAM engine. While the InnoDB storage engine has more features, the MyISAM database is supposed to be at least as fast and it's easy to use (although there is some debate on the performance issue). When you create a table such as MYDATA in MyISAM, you end up with three files in your IFS, called MYDATA.frm, MYDATA.MYI and MYDATA.MYD. I'll avoid going into detail about the contents of these files except to say that they're not readable by anything but MySQL, and certainly not accessible to RPG.
So, when you access the MySQL driver through JDBC using the MySQL connector JAR file supplied by MySQL, you will actually be reading and writing those files on the IFS. On the other hand, if you use IBM's JDBC driver for DB2 (contained in the IBM Java Toolbox JAR file jt400.jar), you will instead directly access standard physical files in the QSYS file system on the i. A picture might help at this point:
In Figure 1, the IFS components are all in blue, while the QSYS components are in orange. On the left-hand side of the figure you'll notice that the Java programs (the classes Setup1 and Test1) are exactly the same when accessing either the MySQL data or the DB2 data. They do check a runtime parameter to identify which driver to use (either the MySQL driver located in mysql-516.jar or the IBM driver found in jt400.jar). But otherwise the programs execute exactly the same code regardless of the driver. In fact, the programs access a table named TEST.MYDATA2, in which MySQL is stored in several stream files in the IFS folder "/opt/mysqldata/test", while in DB2 the data is stored in a physical file named MYDATA2 in library TEST. The drivers -- and the servers accessed by the drivers -- are the ones that translate those requests directly.
I wrote a little RPGLE program named TEST1 that also resides in the library TEST.
What do these programs do?
Well, I wrote two different Java programs. Setup1 drops the existing table, recreates it, and then fills it with 1,000 records with different keys. In SQL terms, it creates the table with a primary key, like this:
CREATE TABLE TEST.MYDATA2
( KEY1 CHAR(10) NOT NULL,
DATA1 CHAR(30) NOT NULL,
PRIMARY KEY (KEY1) )
This creates a table with two fields, KEY1 and DATA1, with KEY1 being the primary key. When run in MySQL using the default MyISAM storage engine, this statement creates the three IFS files (.frm, .MYD and .MYI) shown in Figure 1. When run using IBM's Java Toolbox driver, the same statement creates a keyed physical file named MYDATA2 in the library TEST. It's actually pretty amazing how all of this works, although I find that I can get a little bit confused.
Note that if a MySQL storage engine for DB2 is created it would mean that executing the same instruction against MySQL using that storage engine would theoretically create a standard QSYS physical file in library TEST, exactly the same as using the IBM Toolbox driver. To be honest, I'm not sure whether this would be more or less confusing, but it would obviously be more consistent.
After throwing in a little bit of logging which allowed me to ignore the vagaries of connection time, I ended up with some interesting results. For creation of the table MySQL won handily. It took the toolbox driver a lot longer to create the physical file in library TEST than it took the MySQL driver to build its IFS files. But that's not terribly surprising; MySQL has complete control over the IFS files while the toolbox driver has to conform to the IBM procedure for creating a file.
The performance on the actual adds, though, was a little unexpected. It took 2.6 seconds to add 1,000 records via MySQL, but it took nearly three seconds to do so with the IBM Toolbox driver! That surprised me; I didn't realize there would be as much overhead in the Toolbox as there would be in MySQL. The difference decreased, however, when I upped the number of records inserted to 10,000; at that number, the two drivers ended in a dead heat of 23 seconds.
The numbers were also pretty interesting when it came to reading the data. In order to try and simulate "random" reads, I wrote a little loop that would read the first record and then the 501st, followed by the second and then the 502nd, and so on. I ran this on both drivers, and I further expanded the test pool by running first on my workstation and then natively on the i (I never ran the Setup1 program on the i; I was more interested in read performance). Once again, MySQL eked out a win over the Toolbox driver:
|Seconds for 1000 reads||MySQL Driver||IBM Toolbox Driver|
Two things should jump out. First, the read times on Windows are nearly as long as the write times. That's frightening. I'm doing this very painfully, performing a SELECT on a single record at a time, rather than doing a block read, but still the performance is abysmal. Imagine trying to read a million records, or 100 million (the latter would take over two days!). The other thing to note is that you actually get better performance running on the PC than on the i. That's got a lot to do with Java tuning; the Java processes should really be running in their own subsystem with their own memory pool so as not to conflict with the other non-Java workloads on the system. And of course the SQL folks will beat me about the head and shoulders saying I should be processing a set at a time. But even so, the performance is still just horrid.
Or is it? Maybe my machine is just woefully underpowered. Maybe I shouldn't be worried about read times that aren't a whole lot faster than my old PC XT (2 ms per record -- that's awfully high!). So I wrote the little RPG program TEST1 to read the records, doing basically the same thing: reading record 1, then 501, then 2, then 502, and so on. The results?
It took 0.12 seconds (that is, about a tenth of a second) to read all 1,000 records. That's more than 50 times the speed of either MySQL or the toolbox. I wasn't surprised by that number, really; Java is simply nowhere near as fast as RPG when it comes to accessing data a record at a time.
So, the upshot of all this is that, yes, MySQL over its own database performs roughly as well as IBM's Java Toolbox over DB2 data, although that performance isn't exactly something to write home about. But, it seems to be at least adequate to run applications that deal with hundreds or thousands of records rather than millions. To that end, I think the next test will be to install a real program as I hinted I was going to do in the previous article. I haven't gotten a chance to, but the performance numbers here lead me to want to at least give it a shot.
If you'd like to run the test for yourself, I'll include the two Java programs (the Setup1 and Test1 programs) as well as the RPGLE program (TEST1). If you want to test against MySQL, you'll have to install MySQL for i5/OS and also include the MySQL connector JAR file on your classpath. To test against DB2, you'll need the jt400.jar JAR file, which you can usually find on your own IFS in the following folder:
ABOUT THE AUTHOR: Joe Pluta is the founder and chief architect of Pluta Brothers Design Inc. Joe uses the Rational tool suite extensively, and is on the beta teams for most of its i-related products. He performs onsite mentoring and proof-of-concept development and speaks at user groups around the country. Joe can be reached at firstname.lastname@example.org. This was first published in December 2008
This was first published in December 2008