Friday, July 31, 2015

Monday, July 06, 2015

JPA Join Fetch

A couple of key points to have a successful join fetch in JPA (backed by Hibernate):
  • Why do we want join fetch? JPA default fetch strategy is "Lazy", which has better performance under most circumstances. But, sometimes we do want to retrieve all children (and sometimes children's children). In that case, join fetch allows us to get everything in one Database round-trip instead of N (or N*N round-tips). This change can easily speed up those queries for 100 times (in my case it went from 20 seconds to 0.2 seconds).
  • Make sure in your Entity classes, the child collections are "Set" not "List". If you get the error message: "Hibernate cannot simultaneously fetch multiple bags". This is the root cause.
  • How to join multiple levels: 
        root
          .fetch([childAttributeName], JoinType.LEFT)
          .fetch([grandChildAttributeName], JoinType.LEFT);
  • Fetch join will return multiple duplicated rows for the parent entity. This is usually undesirable, wrap the return set in a LinkedHashSet will get the unique parent entities in the original select order. (See this Stack Overflow post)

Monday, March 16, 2015

Troubleshoot Networking Problems in Google Chrome Browser

* Developer Tools > Network

* chrome://net-internals/
Screenshot for the socket view: chrome://net-internals/#sockets
* wireshark (hopefully, do not need to go that far)

Wednesday, February 25, 2015

Coldfusion 10 Solr Indexing Zip File that Contains PDF files

Seems like I should be surprised, if I don't find some surprises in Coldfusion every week. :) Here is another one that took me a few hours to find a solution. And hopefully will save a few hours for someone else.

Environment
Coldfusion 10 Update 15
Windows Server 2012

Symptom
When indexing a bunch of files, Coldfusion stopped indexing without any exception or getting into any error state. It just stopped in the middle of indexing. If I was not looking at it closely, I would not have noticed that it has failed.

Again, Coldfusion stopped  the execution without throwing a fuss is a big surprise for me. If I run it in Brower, there is no usual 500 server error. Everything is just hunky-dory as far as Coldfusion is concerned!?

Cause
After some digging, I found out the following
  • It stopped on a zip file
  • The zip file has some PDF files in it
  • Coldfusion-error.log has the following message
Feb 25, 2015 8:30:28 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [CfmServlet] in context with path [/] threw exception [ROOT CAUSE: 
java.lang.NoClassDefFoundError: org/apache/pdfbox/pdmodel/PDDocument
 at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
 at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:52)
 at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:78)
 at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
 at coldfusion.tagext.search.SolrUtils.getMetadata(SolrUtils.java:599)
 at coldfusion.tagext.search.SolrUtils.getSolrDocument(SolrUtils.java:753)
 at coldfusion.tagext.search.SolrUtils.addDocument(SolrUtils.java:1339)
 at coldfusion.tagext.search.IndexTag.doUpdate(IndexTag.java:651)
 at coldfusion.tagext.search.IndexTag.doStartTag(IndexTag.java:340)

So, obviously, our Coldfusion distribution is missing some libraries.

Solution
Short answer: find jar file for PDFBox, throw them under Coldfusion lib folder and restart Coldfusion. And I got the jar file from here: pdfbox-0.8.0-incubating.jar

Long Answer: However, as with all Open Source projects, there is not much consideration of backward compatibility or official supported bundled distribution. I tried to download latest version of PDFBox, and it just does not work. So, I will need to find the original bundled version, and here is the journey (without detours I took :( ) to the right jar file

<dependency>
  <groupId>org.apache.pdfbox</groupId>
  <artifactId>pdfbox</artifactId>
  <version>0.8.0-incubating</version>
</dependency>

  • Google "pdfbox 0.8.0 incubating"
  • voila


Another Challenge (unsolved)
There is still some unsolved challenge for Solr. For example, Verity can index our PDF files correctly, but Solr's PDF reader seem to be sub-par. It only got some fragmented text from our PDF file, and it's missing a lot of keywords in our PDF files.

Monday, February 23, 2015

CFScript bug?

An extra semicolon at the end of "if" block is causing a lot of head scratching for me recently. Please see the code snippet below. Expected output should be:
Figure 1. Expected Output
However, below is the actual output:
Figure 2. Actual Output
Now, please notice this extra ";" at the end of line 6. If I remove it, everything will just work as expected.
<cfscript>
    private struct function test() {
        if(1 == 1) {
            if(1 == 0) {
                writeOutput("1==0");
            }; //<- look at here
            writeOutput("true");
            return {data = 1};
        }
        writeOutput("false");
        return {data = 2};
    }
    writeDump(test());
</cfscript>
Due to the lack of specifications for CFScript language, I cannot tell if this grammar is even allowed. But I can tell you this: the journey of discovering and finding the root cause of this problem is not fun at all!

Symptom

CFScript fall through the "if-else" statement, and did not return to caller from the expected branch.

Cause

An extra semicolon at the end of a block is the root cause.