Wednesday, February 25, 2015

Coldfusion 10 Solr Indexing Zip File that Contains PDF files

Seems like I should be surprised, if I don't find some surprises in Coldfusion every week. :) Here is another one that took me a few hours to find a solution. And hopefully will save a few hours for someone else.

Coldfusion 10 Update 15
Windows Server 2012

When indexing a bunch of files, Coldfusion stopped indexing without any exception or getting into any error state. It just stopped in the middle of indexing. If I was not looking at it closely, I would not have noticed that it has failed.

Again, Coldfusion stopped  the execution without throwing a fuss is a big surprise for me. If I run it in Brower, there is no usual 500 server error. Everything is just hunky-dory as far as Coldfusion is concerned!?

After some digging, I found out the following
  • It stopped on a zip file
  • The zip file has some PDF files in it
  • Coldfusion-error.log has the following message
Feb 25, 2015 8:30:28 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [CfmServlet] in context with path [/] threw exception [ROOT CAUSE: 
java.lang.NoClassDefFoundError: org/apache/pdfbox/pdmodel/PDDocument
 at org.apache.tika.parser.pdf.PDFParser.parse(
 at org.apache.tika.parser.CompositeParser.parse(
 at org.apache.tika.parser.AutoDetectParser.parse(
 at org.apache.tika.parser.DelegatingParser.parse(
 at org.apache.tika.parser.pkg.PackageParser.parseArchive(
 at org.apache.tika.parser.pkg.ZipParser.parse(
 at org.apache.tika.parser.CompositeParser.parse(
 at org.apache.tika.parser.AutoDetectParser.parse(

So, obviously, our Coldfusion distribution is missing some libraries.

Short answer: find jar file for PDFBox, throw them under Coldfusion lib folder and restart Coldfusion. And I got the jar file from here: pdfbox-0.8.0-incubating.jar

Long Answer: However, as with all Open Source projects, there is not much consideration of backward compatibility or official supported bundled distribution. I tried to download latest version of PDFBox, and it just does not work. So, I will need to find the original bundled version, and here is the journey (without detours I took :( ) to the right jar file


  • Google "pdfbox 0.8.0 incubating"
  • voila

Another Challenge (unsolved)
There is still some unsolved challenge for Solr. For example, Verity can index our PDF files correctly, but Solr's PDF reader seem to be sub-par. It only got some fragmented text from our PDF file, and it's missing a lot of keywords in our PDF files.

Monday, February 23, 2015

CFScript bug?

An extra semicolon at the end of "if" block is causing a lot of head scratching for me recently. Please see the code snippet below. Expected output should be:
Figure 1. Expected Output
However, below is the actual output:
Figure 2. Actual Output
Now, please notice this extra ";" at the end of line 6. If I remove it, everything will just work as expected.
    private struct function test() {
        if(1 == 1) {
            if(1 == 0) {
            }; //<- look at here
            return {data = 1};
        return {data = 2};
Due to the lack of specifications for CFScript language, I cannot tell if this grammar is even allowed. But I can tell you this: the journey of discovering and finding the root cause of this problem is not fun at all!


CFScript fall through the "if-else" statement, and did not return to caller from the expected branch.


An extra semicolon at the end of a block is the root cause.