Tips for software engineer: Solr in Coldfusion 9

Some tips for using Coldfusion 9 Solr search:

Lesson Number 1: Make sure you review ALL the files under your collections \conf folder, many of them need to be commented out or changed from default values. Here is an incomplete list of what I changed:

Commented out words in protwords.txt
Deleted content in spellings.txt
Updated synonyms.txt, deleted the test words, and added new synonyms based on the vocabulary specific to my application's domain
schema.xml:

Updated default query operator from AND to OR

solrconfig.xml

commented out "solr", "rocks" and other similar configurations

Enable term highlighting (see http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSe9cbe5cf462523a0-5bf1c839123792503fa-8000.html \), but it's still not quite there. Still looking for a complte solution for this one.

In Coldfusion Admin, change Solr buffer limit from default value 40 to 80 (http://bloggeraroundthecorner.blogspot.com/2009/08/tuning-coldfusion-solr-part-1.html)
Ensure "Coldfusion 9 Solr Service" is started and set to "Automatic" startup type
To access Solr collection from remote Coldfusion servers,

For the remote CF servers: change Solr server name in Coldfsuion Admin, DATA & Service > Solr Server > Solr Host Name
For the Solr Host: Allow remote server to have access to its Solr instance. Configure Jetty to listen on all incoming IP address. Coldfusion's default configuration is allow only 127.0.0.1, thus disabling remote CF server's access to it's hosted Solr instance. To change this, just open jetty.xml in {coldfusion home}\solr\ect, and comment out the host line which restricts to listen on 127.0.0.1 only (interestingly, I found this solution from this link: http://helpx.adobe.com/coldfusion/kb/coldfusion-9-limit-access-solr.html, which claims Coldfusion's default configuration is to listen on all ip addressed

Web access to Solr admin: http://{Your CF Server Name}:8983/solr/
Last and a big one, when you try to update multiple indexes, you will see error like this:

Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later

Stack Trace:
...
org.apache.solr.common.SolrException: Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later

Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later

request: http://localhost:8983/solr/.../update?commit=true&waitFlush=false&waitSearcher=false&wt=javabin&version=1
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at coldfusion.tagext.search.SolrUtils.commitToServer(SolrUtils.java:1024)
at coldfusion.tagext.search.SolrUtils.addDocument(SolrUtils.java:664)
at coldfusion.tagext.search.IndexTag.doQueryUpdate(IndexTag.java:929)
at coldfusion.tagext.search.IndexTag.doStartTag(IndexTag.java:254)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2722)
…

This error will be consistently showing up in production environment with even very light traffic, and became a big show stopper for the usage of Solr collection provided by Coldfusion (does not seem to be a Solr problem, instead a problem of Coldfusion integration problem).

Root cause seems to be that Coldfusion commits every index update, and causing excessive Solr searcher warming. Combining with misconfigured default Solr configuration, this problem starts to appear for a collection of only a few hundred entries.

In Coldfusion 10, this problems seems like easier to solve by using the newly introduced "autoCommit" attribute, which should always be "no" (i.e. never use the default value). Then I guess configuring the "autoCommit" in solr config file will solve the above problem in Coldfusion 10.

However, production environment that is already in CF 9 is stuck. My solution for it is a combination attack:

Manually throttle cfindex update by adding a sleep after every update
Wrap try catch block around the cfindex update, then add a longer sleep time in the catch block, and add retry logic to try update again after the sleep
Database flag of index entries, this flag will be set only after index is updated successfully. A scheduled CF task will scan this flag to identify missed updates (due to the above error), and try to update index again

The above approach so far allows me to update thousands of indexes without any problem.

2 comments:

Anonymous said...: Thank for writing this article. I have the same Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later problem. I put in a "sleep" of 500 and managed to get the rest of my files to index successfully. Thank you.; 3:02 PM
Guogang Hu said...: You are welcome. Great to know this post helped.; 4:49 PM

Tips for software engineer

Tuesday, February 19, 2013

Solr in Coldfusion 9

2 comments:

About Me