Thursday, July 05, 2007

Boost.Regex with MFC/ATL in Visual Studio 2003 or Above

Searching, parsing, and validating text input is always a big part of contemporary programming. Regular expression is a powerful tool that is available to C#, Java, Perl, ... Unfortunately, it does not come with C++ or MFC library. One of the best regular expression library in C++ is Boost.Regex, which is accepted as part of C++ standard library in the next release of C++ standard. Boost.Regex library provides direct support for MFC/ATL string types, so that integration with MFC/ATL project is very straightforward. However, integrating BOOST library into Visual Studio 6.0 or older is very hard, due to their lack of standard C++ support.

To use Boost.Regex in a MFC/ATL project in Visual Studio 2003 or above is very easy. Just follow these steps:
  1. Download installation package created by Boost Consulting, following the links: Products > Free Downloads, you will find installer for latest BOOST library distribution. Just run the installer, it will automatically download the selected BOOST library source code from a and pre-compiled binaries from a mirror site;
  2. Setup VC++ include and library directories for BOOST libraries, detailed instructions can be found here: BOOST--Getting Started on Windows;
  3. If you are like me, getting very comfortable using MFC/ATL string types like CString, CStringW, TCHAR, _T, ... Please notice BOOST libraries normally only support standard string type. Please read this before you start to write code: Boost.Regex--Working with MFC/ATL String Types. Boost.Regex does provide a set of overloaded functions that use MFC/ATL types, you will need to include a header file: <boost\regex\mfc.hpp> which defines all the overloaded Boost.Regex APIs as shown in the document;
A sample code:
   1://boost regular expression library
2:#include <boost/regex.hpp>
3://support for MFC string types in boost Regular Expression library
4:#include <boost/regex/mfc.hpp>
6://extract string value from XML by given key
7://Example: given the string below, and key="LogFile"
8://<add key="LogFile" value="c:\temp\error.txt" />
9://return will be: c:\temp\error.txt
10:CString GetXMLValue(LPCTSTR xmlContent, LPCTSTR key)
12: CString strRegEx;
13: strRegEx.Format("<add\\s+key\\s*=\\s*\"%s\"\\s+value\\s*=\\s*\"([^\"]*)\"\\s*/>", key);
14: tregex re(strRegEx);
15: tmatch matches;
17: //we are expecting no more than 1 match in the file
18: if(regex_search(xmlContent, matches, re))
19: {
20: //extract the value from sub-expression
21: return CString(matches[1].first, matches[1].length());
22: }
23: else
24: {
25: return "";
26: }

Static link to Boost.Regex library for the above sample code, adds about 80K to the compiled binary output.
Official documentation of Boost.Regex library:

No comments: