Tuesday, February 19, 2013

Solr in Coldfusion 9

Some tips for using Coldfusion 9 Solr search:
  • Lesson Number 1: Make sure you review ALL the files under your collections \conf folder, many of them need to be commented out or changed from default values. Here is an incomplete list of what I changed:
    • Commented out words in protwords.txt
    • Deleted content in spellings.txt
    • Updated synonyms.txt, deleted the test words, and added new synonyms based on the vocabulary specific to my application's domain
    • schema.xml:
      • Updated default query operator from AND to OR
    • solrconfig.xml
      • commented out "solr", "rocks" and other similar configurations
    • Enable term highlighting (see http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSe9cbe5cf462523a0-5bf1c839123792503fa-8000.html \), but it's still not quite there. Still looking for a complte solution for this one.
  • In Coldfusion Admin, change Solr buffer limit from default value 40 to 80 (http://bloggeraroundthecorner.blogspot.com/2009/08/tuning-coldfusion-solr-part-1.html)
  • Ensure "Coldfusion 9 Solr Service" is started and set to "Automatic" startup type
  • To access Solr collection from remote Coldfusion servers, 
    • For the remote CF servers: change Solr server name in Coldfsuion Admin, DATA & Service > Solr Server > Solr Host Name
    • For the Solr Host: Allow remote server to have access to its Solr instance. Configure Jetty to listen on all incoming IP address. Coldfusion's default configuration is allow only, thus disabling remote CF server's access to it's hosted Solr instance. To change this, just open jetty.xml in {coldfusion home}\solr\ect, and comment out the host line which restricts to listen on only (interestingly, I found this solution from this link: http://helpx.adobe.com/coldfusion/kb/coldfusion-9-limit-access-solr.html, which claims Coldfusion's default configuration is to listen on all ip addressed
  • Web access to Solr admin: http://{Your CF Server Name}:8983/solr/
  • Last and a big one, when you try to update multiple indexes, you will see error like this:

Stack Trace:
org.apache.solr.common.SolrException: Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later


request: http://localhost:8983/solr/.../update?commit=true&waitFlush=false&waitSearcher=false&wt=javabin&version=1
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at coldfusion.tagext.search.SolrUtils.commitToServer(SolrUtils.java:1024)
    at coldfusion.tagext.search.SolrUtils.addDocument(SolrUtils.java:664)
    at coldfusion.tagext.search.IndexTag.doQueryUpdate(IndexTag.java:929)
    at coldfusion.tagext.search.IndexTag.doStartTag(IndexTag.java:254)
    at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2722)

This error will be consistently showing up in production environment with even very light traffic, and became a big show stopper for the usage of Solr collection provided by Coldfusion (does not seem to be a Solr problem, instead a problem of Coldfusion integration problem). 

Root cause seems to be that Coldfusion commits every index update, and causing excessive Solr searcher warming. Combining with misconfigured default Solr configuration, this problem starts to appear for a collection of only a few hundred entries.

In Coldfusion 10, this problems seems like easier to solve by using the newly introduced "autoCommit" attribute, which should always be "no" (i.e. never use the default value). Then I guess configuring the "autoCommit" in solr config file will solve the above problem in Coldfusion 10.

However, production environment that is already in CF 9 is stuck. My solution for it is a combination attack:
    • Manually throttle cfindex update by adding a sleep after every update
    • Wrap try catch block around the cfindex update, then add a longer sleep time in the catch block, and add retry logic to try update again after the sleep
    • Database flag of index entries, this flag will be set only after index is updated successfully. A scheduled CF task will scan this flag to identify missed updates (due to the above error), and try to update index again
The above approach so far allows me to update thousands of indexes without any problem.


Anonymous said...

Thank for writing this article. I have the same Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later problem. I put in a "sleep" of 500 and managed to get the rest of my files to index successfully. Thank you.

Guogang Hu said...

You are welcome. Great to know this post helped.