Archive for August, 2008
Compass (from Hibernate Search)
Since I’ve been exploring JPA, TopLink, Hibernate and Hibernate Search over the last couple of days I thought I’d move on and check out Compass. I’ve used Compass previously through Grails but never as a standalone library directly from Java. With Grails Searchable Compass is completely transparent to you.
One of my motivations for trying Compass was that I found Hibernate Search to be a little slow. Indexing was fine, but bringing back the results was quite painful. I wondered if Compass was faster.
Out of the box Compass was pretty easy to get going (though I seem to have problems creating/dropping/updating the database when the Compass session listener is enabled in the persistance.xml). I moved back to TopLink for no other reason than to use the JPA reference implementation rather than Hibernate (which I know Compass works with because of Grails). The help documentation is pretty complete for Compass which means that once you read it you are good to go. My simple app didn’t require cascades, inheritance, etc but it’s all in there and available to use.
One of the differences between Compass and Hibernate Search is that when HS returns a query result they are objects from Hibernate itself. For Compass you are getting Compass’ object back which might not have all the fields of the JPA version. If you want the JPA version you have to go and fetch it yourself (using the primary key shared between Compass and JPA).
On the surface this might seem like Hibernate has an advantage and to an extent it does. However the reason HS is “slow” as it needs to pull all that text from the database and into entity beans when you get the results list (you could use Projections to do something about that). For my data set of 20000 docs that’s 75mb of text, or perhaps more relevantly 7mb to process for every 1000 documents worth of search results.
If you contrast with Compass, it’s effectively maintaining two data repositories - one JPA and one Compass/Lucene index. When you get results from a Compass search you aren’t hitting the database, so you only get the fields back you asked you annotated as @Searchable*.
So comparing on a level playing field (ie getting the JPA persisted object for each result) Compass and Hibernate Search seemed to perform around the same speed, though Compass was maybe a little faster (10%). But if you just want to do a search through Compass and not grab the original JPA object (say because all the fields you need are stored in the Lucene index), then it takes about 400ms (vs 19 seconds)!
I was hoping to get the best of both worlds by lazy loading the large string field of the document class. However that didn’t work (with TopLink at least, others not tried). The alternative is to put the content in another dedicated class, use lazy fetching for JPA and SearchableComponent / SearchableCascade in Compass. That way you would search on the text but return the owning class (eg the metadata of a document) rather than the content. Speed would improve at the cost of complexity. I’ve tried implementing this without sucess in the last few minutes, TopLink seems to be eager fetching the content class despite my lazy request!
One of the failures of Lucene is that you can’t update a document without deleting it from the index and then adding (and hence re-indexing). Thus you have to have the original content around if have entries that change - sadly that means you either store it in the database (and take advantage of HS or Compass) or you can leave it on the file system and then write your own code to re-read as necessary.
Recreated post… JPA, Hibernate and Toplink
I failed to recover my original post so here’s a somewhat terse set of points from it:
- TopLink is bundled with Netbeans, so that’s really easy to get started with. However Hibernate has Lucene integration through Hibernate Search, which I wanted to try out.
- Hibernate is very difficult to get started with requiring you to download multiple Hiberate zip files and combine together a massive set of jars.
- Additional JARs are required to be downlaoded such as an implementation JAR for SLF4J (you should download the whole SLF4J zip and pick one applicable to you - though that’s not obvious from the SLF4j site)
- You still need commons-logging for Hibernate Search.
- My code from TopLink didn’t work with Hibernate out of the box (though that’s probably my assumptions about JPA rather than a comment on either TopLink or Hibernate). Having a Singleton EntityManagerFactory sorted this out.
- Hiberate was faster than TopLink ( 25% reduction in execution time) and also more robust. It didn’t “out of heap space” even with large transactions (which I’d had to batch up for TopLink).
- The additional (non-standard) Hibernate annotations (such as Index) reduced the execution time to a 25% of the original TopLink. Of course I could have added indexes manually (either through the database commandline or as JPA native SQL queries) but the annotation method is much cleaner.
Main points captured but not in the nicest form nor with the context which justifies them - sorry for that!
Hibernate Search
[Edit: This was a follow up to a post on JPA, Hibernate and Toplink, which has disappeared. I know I posted it because Google has it, but sadly not in cache! I must have deleted it but no idea how… ]
Following up on my previous post around an hour ago, Hibernate Search (that is Hibernate integrated with Lucene) does indeed work perfectly. Setting up entity beans for indexing and performing search is easy.
Almost prefect - another couple of out of heap size messages due to the single transaction which batches up 20000 documents (around 75mb on disk) and which is held in memory for Lucene to index on commit. Hibernate seems to do fine persisting this to the database though obviously Lucene has more work to do at commit point and runs out of memory constructing the index.
I could of course increase the heap size but instead simply batch off the transaction to commit 1000 documents at time. There is another way of doing that but it’s similar in results/method. I chose to explicitly batch the transaction simply so that I keep the database and index in harmony (and don’t have to worry about rollbacks to different stages).
Might be worth getting the Hibernate Search in Action book, though I think that you’d been to be doing some complex search and entity mapping in order to require more than the basic.
As an aside, I seem to be coming across JMS more and more in ‘everyday applications’. For example, Hibernate Search uses JMS for dedicated indexing, Glen Smith talks about his use of JMS for his Groovy Blog app and LinkedIn presented their JMS based architectured at JavaOne 2008. Make me wonder if JMS is the pragmatic, bloatfree ESB?
I’m confused about JavaFX
I’m very confused by JavaFX - just what is it for? I thought it was the Sun’s answer to RIA/Air/Flash/Flex/Silverlight but every demo I see it about producing Swing frame’s for bouncing balls or fancy graphic text. I know this isn’t the final release, but I can’t get to the bottom of the hype!
I don’t understand the overlap with Groovy Swing/GraphicsBuilder which seems to offer a large majority of the demonstrated functionality of the JavaFX and offers more power and better integration with Java itself. (Admittedly there are additional media, etc elements to JavaFX).
All this isn’t helped by the fact I can’t get my hands on it. Verying disappointing that there’s no Linux distribution (even feature limited version). This is particularly frustrating as it has been demonstrated that it is possible to hack the current Mac JavaFX SDK to run on Linux (just not supported officially).
Everyone’s a web designer
Continuing development in Grails I’ve been really stucked today by just how much time you spend as a ‘web designer’ (aka ‘front end guy’) vs a ‘programmer’ (’back end guy’). It really takes zero time before the majority of the coding is behind you and you need to start hacking away to make the app look good and (more importantly) be user friendly.
Strange that Grails is the first time I’ve felt this way - having used CakePHP, Django, etc in the past. I think this is because Grails is the first framework which feels integrated to me. Django for example seems very disjointed going between the template language and Python. I wonder if this ‘disjointedness’ meant that I had to ‘put my web designers’ hat on. In Grails I’m just wearing my Groovy hat (so to speak) - though the template language isn’t Groovy (but it feels closer thanks to easy to write taglibs and “${it}” everywhere).
So I need to check out some javascript/ajax html libraries to help me speed up the front end development. I confess that I would have been tempted to go for Ext JS where it not for the licence approach. In some sense, I don’t have a problem with the new GPL nature (though LGPL was better) but I find the whole concept quite confusing, as do many others.
Still since my work it usually open source I might give it a go. Alternatively jQuery is by far the best when ’starting from scratch’ and Yahoo UI is certainly impressive.
Remember PyCon UK
Just booked my place on PyCON (13/14 September, with tutorial and sprint days around). It’s been a good 6 months since I last coded any Python, so I hope I can remember what it’s all about.
This week I’ve developing some Grails applications. Unfortunately nothing too “bloggable” due to the nature of the work. Tomorrow I’m going to work on a streaming video application in Grails which I’ll hopefully release (as example code).
I must also check out the new Drools plugin which could be an excellent way of implementing business rules.