Compass (from Hibernate Search)
Since I’ve been exploring JPA, TopLink, Hibernate and Hibernate Search over the last couple of days I thought I’d move on and check out Compass. I’ve used Compass previously through Grails but never as a standalone library directly from Java. With Grails Searchable Compass is completely transparent to you.
One of my motivations for trying Compass was that I found Hibernate Search to be a little slow. Indexing was fine, but bringing back the results was quite painful. I wondered if Compass was faster.
Out of the box Compass was pretty easy to get going (though I seem to have problems creating/dropping/updating the database when the Compass session listener is enabled in the persistance.xml). I moved back to TopLink for no other reason than to use the JPA reference implementation rather than Hibernate (which I know Compass works with because of Grails). The help documentation is pretty complete for Compass which means that once you read it you are good to go. My simple app didn’t require cascades, inheritance, etc but it’s all in there and available to use.
One of the differences between Compass and Hibernate Search is that when HS returns a query result they are objects from Hibernate itself. For Compass you are getting Compass’ object back which might not have all the fields of the JPA version. If you want the JPA version you have to go and fetch it yourself (using the primary key shared between Compass and JPA).
On the surface this might seem like Hibernate has an advantage and to an extent it does. However the reason HS is “slow” as it needs to pull all that text from the database and into entity beans when you get the results list (you could use Projections to do something about that). For my data set of 20000 docs that’s 75mb of text, or perhaps more relevantly 7mb to process for every 1000 documents worth of search results.
If you contrast with Compass, it’s effectively maintaining two data repositories - one JPA and one Compass/Lucene index. When you get results from a Compass search you aren’t hitting the database, so you only get the fields back you asked you annotated as @Searchable*.
So comparing on a level playing field (ie getting the JPA persisted object for each result) Compass and Hibernate Search seemed to perform around the same speed, though Compass was maybe a little faster (10%). But if you just want to do a search through Compass and not grab the original JPA object (say because all the fields you need are stored in the Lucene index), then it takes about 400ms (vs 19 seconds)!
I was hoping to get the best of both worlds by lazy loading the large string field of the document class. However that didn’t work (with TopLink at least, others not tried). The alternative is to put the content in another dedicated class, use lazy fetching for JPA and SearchableComponent / SearchableCascade in Compass. That way you would search on the text but return the owning class (eg the metadata of a document) rather than the content. Speed would improve at the cost of complexity. I’ve tried implementing this without sucess in the last few minutes, TopLink seems to be eager fetching the content class despite my lazy request!
One of the failures of Lucene is that you can’t update a document without deleting it from the index and then adding (and hence re-indexing). Thus you have to have the original content around if have entries that change - sadly that means you either store it in the database (and take advantage of HS or Compass) or you can leave it on the file system and then write your own code to re-read as necessary.
Posted in Blog