Saturday, February 6, 2010

The data cruncher rides again

As you may remember from my previous post, I simply wanted to import three spreadsheets into an MS-Access equivalent, use the "Query By Example" (QBE) mode to create a simple report and export the result to another spreadsheet. I did the whole thing in MS-Access as well and it took me about twenty minutes. The spreadsheets are relatively small: 2367 rows, 5149 rows and 27804 rows. Last time we saw how OOo Base 2.4.0 fell through. Not that it lacked functionality, but because it was so badly executed.

This time I decided to give Kexi 1.1.3 and Knoda 0.8.3 a go. Not that I really needed to, but just to see if they were up to it if I needed to execute another odd job like this. The first part concerned reading the spreadsheets. Both packages do not support MS-Excel files out of the box, so I needed to convert those to CSV. That wasn't too hard. OOo Calc did the job. Reading them was quite similar in both packages, since both provide beautiful wizards which make it quite easy. However, I was unable to set the primary key and I knew I had to pay for that once I executed the query.

In Kexi, I decided to open up the design table window and set the primary key. The program warned me however, when I did that I would lose all my data. Well, that was not an option. So I decided to skip that and get right to designing the query using the QBE window. I was unable to specify an INNER JOIN or OUTER JOIN, so that was a bummer. When I opened up the SQL window I saw why: it had created an old skool JOIN. These kinds of JOIN create a Carthesian product and use the WHERE clause to select the appropriate rows. Well, it could work. As expected, it ran for a while and then came back with 2690 rows. That was not quite what I expected. I expected 5149 rows. I decided to leave it there and continue with Knoda.

It is a shame that Knoda hasn't been developed any further, because it was a nice package. Nobody seems to know what has happened to its creator Horst Knorr. The last modifications are over 900 days ago, so consequently Knoda has not been ported to KDE 4.x. Anyway, I decided to use the dBase driver and read in the CSV files. I quicky found out that was not quite what I wanted, since every field with a name longer than 10 characters was ignored. Then I picked the SQLite3 driver. Reading in the data was much slower now. The largest table was progressing very slowly. I got nervous. In half an hour I had to catch the tramway to go to work.

(I know I have to explain this to my American audience. If you use public transport in Europe, you're not necessarily poor. I don't even have a drivers license. That doesn't mean I'm physically or mentally challenged in any way. I just means I never needed a car that badly that I thought the investment in time and money was worth it.)

So I decided to leave the thing be, take a shower and come back afterwards. Dripping wet I checked the screen. 67%. I dressed while keeping an eye on the progressbar. 96%, 97%, 98%. Come on! 99%, 100%! I quickly shut down the computer and ran outside.

Several hours later, I fired it up again and continued. The QBE was nice, I could even set the JOIN properties. I could not switch to the SQL window, though. That was only possible after I saved the query - and then I couldn't get back to the QBE window. Neither could I select multiple fields and drag them to the query bar. There was no way to make aggregations by using GROUP BY or COUNT(). Still, it did the job. Since I was unable to set primary keys, it took a while, but Ok. It came back with 5156 rows. That were 7 more than expected, but wait a minute: I was working with the uncorrected datasets. I knew that there were 7 duplications I had corrected in the report I had created yesterday. Alright!

(Yes, by this time I had already reported these inconsistencies to the Configuration Administrators and told them to figure out what was wrong and correct the appropriate entries in the Configuration Management System. I know what an "exception" is and I've been in this ballgame for twelve straight years, thank you.)

Then I had to export the result to an MS-Excel file. Knoda doesn't support this format, so I decided to export it to CSV. It took a bit longer than expected, but there it was in all its 1.2 MB beauty. I fired up OOo Calc and opened the file. OOo Calc showed a wizard, I filled in the correct values and off we went. FAIL!! OOo Calc quickly started to eat through my 2 GB RAM and then decided to consume all swap space. I frantically hit the KSysguard icon and then went for the terminal. I had only seconds to spare!
killall soffice
soffice: no process killed
ps -eaf

Help! Fortunately, KSysguard came up and although it was slow I was able to kill OOo Calc. Pfff! That was close!

I fired up KSpread 1.6.3 and decided to repeat the procedure. It didn't even blink. Duh? I saved the spreadsheet in .SXC format, fired up OOo Calc again and read the .SXC file. No problem. It was on the screen in mere seconds. WT.. is that?

The verdict
OOo Base has a horrible way to import and export data. I also think that choosing a Java SQL engine wasn't a good idea. But all in all, it offers all the functionality you need for this kind of job. It's beautifully designed and could be a real replacement for MS-Access, if only it wasn't so badly executed. Yes, it's so badly executed that it becomes unworkable. I'm not prepared to risk a freeze for doing an odd job on OOo Base that I can painlessly execute on MS-Access.

Kexi is nice, but seriously lacking. The QBE does not provide several key functionalities, ALTERing tables is virtually impossible, so it's not really a replacement for MS-Access. I've also taken a look at Kexi 2.0, but I don't see too much improvements in this area.

Knoda lacks several minor features, has its quirks and is not particularly fast (I would call it slow, but that could also be the fault of SQLite3), but it gets the job done! If you don't have MS-Access at your disposal and want something similar to do the job on your Linux machine, this would be my pick. Unfortunately, Knoda is slowly dying and that's a shame! Isn't there somebody who can pick this project up and continue it - or at least salvage the parts that can be integrated in Kexi?

Some final thoughts
What I thought to be a minor job turned into a three day quest to find the best MS-Access replacement. I automatically assumed that there would be a viable FOSS replacement for what I consider to be one of the least interesting and most buggy parts of the MS-Office suite. I was kind of shocked that we don't have it. That doesn't mean you can't get the job done - there is always the CLI of course - but it will take more time than the MS-Access way.

I use the QBE to generate most of the SQL and then edit it later. I find SQL rather verbose and don't particularly like the syntax - it reminds me of COBOL. But since SQL is the standard, it is hard to avoid.

Several readers pointed me to a bunch of major data crunching suites. But I just want to hang up a painting - not drill the Tokio subway. And I have a dislike for Java programs - don't ask me why, it's just a feeling.

I also have to admit I haven't used OpenOffice too much until now. I write my documents with LyX, hardly ever use spreadsheets and didn't have much use for OOo Base. To me OpenOffice was a way to view and modify MS-Office documents - keep a link with the madding MS-Office crowd, if you prefer. This was the first time I gave it a full workout and frankly: I was shocked. I can't have a program (any program) that brings my rather reliable machine to its knees - for whatever reason. In this small quest OpenOffice did that almost twice. That's disturbing. I find that disturbing, especially since the basic functionality is good and sometimes unequaled in FOSS.

On the other hand, I was pleasantly surprised by KOffice. It may not provide all the compatibility you need with the MS-Office world, but the functionality it has is well-executed and reliable. Next time I need a spreadsheet, but don't need absolute MS-Office compatibility, I definitely will use KSpread.

Wednesday, February 3, 2010

A data cruncher bites the dust

I know this post is gonna get me into trouble, but frankly - I don't care. I've been called a zealot, a bigot and lots of other names I didn't know about. I thank the people who threw those curses to my head, because without them I wouldn't have been able to expand my vocabulary with all these lofty insults.

Ok, why this lengthy introduction. Well, it's easy: FOSS programs didn't work for me. Ok, now I've said it. But it's the truth. You have to know that I'm a Configuration Manager and those guys do a lot of data crunching because their main job is to produce reports. And like it or not, but the main format they're processing and producing is MS-Excel. Users that want to have their data imported provide MS-Excel sheets and that's also the format they want to have in return.

The backend is an entirely different story. I've worked with Oracle, MySQL and SQL server backends. I use my own PHP toolkit and changing the backend is pretty trivial. I don't even care whether they're running IIS or Apache, Linux or Windows. Hell, if it doesn't perform there's always more hardware. Not my problem. I'm very particular about PHP for obvious reasons, since I don't want to throw away my code nor want to be tied to a single platform - don't get me started on Mono.

Anyway, where I'm currently working I have an entire LAMP stack at my disposal. Today, my schedule was empty so I decided to work from the comfort of my home. There were several odd jobs I had to do. The first one was to create three Change Requests on the CMDB datamodel. I always use LyX, so that job was quickly done. The next one was to produce two reports from four tables. I had dumped those into MS-Excel using PHPMyAdmin the previous day. PHPMyAdmin may have its quirks, but it works fast and reliable. No problems there.

Now I usually use MS-Access to produce those reports, but I don't have that one on my Linux box. So I decided to give OpenOffice Base 2.4.0 a go. It wanted me to load the spreadsheets in OpenOffice Calc and then paste the sheets into OOo Base. I found that a rather peculiar and somewhat slow procedure, but it worked alright. Making the reports was a little awkward, because I was not too familiar with the program, but went very well. When I was satisfied, I wanted to dump the reports into an MS-Excel sheet. And there it went terribly wrong..

My KDE 3.5 desktop froze. It froze even so bad that I had no control whatsoever. I waited for several minutes. Nothing happened. You have to know that I was a Configuration Manager at a major bank in the Netherlands for several years. I processed datasets with hundreds of thousands of records. This was a tiny dataset in comparison. Just a few thousand records. that's nothing. In the end I killed it - I still have an old skool Wyse serial terminal connected to my machine - and did something I never wanted to do: I put the OOo database on a memory stick and booted my employers WinXP laptop. That's the truth.

I hoped that maybe OOo on the WinXP desktop was a little better. No way. It froze again. I still saw some activity and waited. I waited a long time. It came back with 4096 rows. That's not enough, I needed 5150 rows (including the header). This would not do. In the end, I started MS-Access, read in the tables and produced the report. Nuff said. 20 minutes later I was done. It was infuriating!

First of all, if you spend so much time imitating MS-Access why can't you produce something that doesn't need a DDE server to export its files. It's easy! Even my 4tH compiler offers simple native MS-Excel export functions! Make it CVS, like Kexi or Knoda, I don't care! But just that? Impossible! Who produced that? Sun? IBM? Aren't they ashamed? If I were to work for government agencies that only offer OOo I couldn't get my work done! My goodness.. Microsoft saved the day.. I thought I would never see the day..

In the end, WinXP wouldn't release the memory stick. I found out why when I turned off the machine: the DDE server was still running for some reason. That's Microsoft behavior, folks! No, I won't blame Microsoft for that, I'm sure it was OOo which left that thing running.

On the other hand, I also had to create a presentation and used OOo Impress for that. It worked pretty nice, I exported the thing to PDF (I don't like presentation programs anyway, Acrobat Reader is so much easier) and I was done.

The verdict? I won't touch OOo Base with a pole anymore. It looks very nice, but it doesn't handle real world workloads. I have to try Kexi and Knoda and give this post a followup. I sincerely hope it will be much better than my latest experiences, because I can't allow myself to lose valuable time over this. When I'm working at home, I simply don't have the time for experiments: I need things that just work. Like LyX. Like OOo Impress. The only way to beat the system is to do better than the system. OOo Base just falls short.

Ok, it's true, I admit: I'm a Microsoft cshill. Sigh..