Thursday, April 19, 2012

java XSLT processing with Saxon

Recently, I was looking for an efficient way to reduce the size of XML based clinical documents (CCD) that I am sending from a java REST API to mobile devices. These healthcare HL7 XML documents contain textual narrative blocks that are not used by the applications residing on the mobile devices.

On solution was to use Saxon, an open source XSLT processor in conjunction with a transformation style sheet to strip the Continuity of Care Documents of their narrative blocks.

In a CCD, each section under ClinicalDocument/component/structuredBody/component can have a text section that usually contains a narrative block similar to HTML content that can typically add 30% or more to the size of the CCD document:

This has the advantage to avoid creating a full DOM representation, and is more robust and flexible than an event based sequential SAX processor.

The first step is to create a XSLT stylesheet that will transform (in my case strip the text sections) from the XML document.
In my case, since I wanted to remove all text sections, I was able to use an identity transform mechanism that I modified to strip the text sections (uses the hl7 namespace):

<xsl:stylesheet xmlns:xsl='' version='2.0'

<xsl:output indent="no"/>
<xsl:strip-space elements="*"/>

<!-- identity template -->
<xsl:template match="@*|node()">
<xsl:apply-templates select="@*|node()"/>

<!--  remove descriptive sections -->
<xsl:template match="hl7:text"/>  


Then I placed this file (I named it ccd_no_text_sections.xsl)  under the /src/main/resources folder.

To use this XSLT stylesheet with Saxon, you need to use one of the Saxon Java libaries.
If you using maven, for example, just add the dependencies to your pom file based on the version you want to use:


or for the latest free version:


Then you need to create a transformer and call your style sheet for the transformation.
In my case, since I was planning other transformations, I decided to cash the XSLT transformer into a HashTable as follow:

import java.util.HashMap;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;

public class XSLTProcessor {

 private static final HashMap<String,Transformer> TRANSFORMER_CACHE = new HashMap<String,Transformer>();

 final static String CCD_NO_TEXT_SECTIONS = "/ccd_no_text_sections.xsl";
 private XSLTProcessor() {
   // no instantiation

 private static Transformer getCCDTransformer() throws TransformerConfigurationException {

  Transformer transformer = TRANSFORMER_CACHE.get(CCD_NO_TEXT_SECTIONS);
  if (transformer == null) { 
   TransformerFactory tsf = TransformerFactory.newInstance();
         InputStream is = XSLTProcessor.class.getResourceAsStream(CCD_NO_TEXT_SECTIONS);
   transformer = tsf.newTransformer(new StreamSource(is));
   return transformer;
  return transformer;
 public static String stripTextSections(final String xmlString) throws TransformerConfigurationException,
        TransformerFactoryConfigurationError {

        final StringReader xmlReader = new StringReader(xmlString);
        final StringWriter xmlWriter = new StringWriter();
        final Transformer ccdTransformer = getCCDTransformer();
        ccdTransformer.transform(new StreamSource(xmlReader),
            new StreamResult(xmlWriter));
        return xmlWriter.toString();

In this way, you can easily add new transformers if needed and refer to them by their stylesheet name located under the resource folder.
In addition to this, the transformer is created only once and always available for a transformation.

The transformation can be called directly on the document as follow:

try {
    String new_ccd = XSLTProcessor.stripTextSections(ccd);
catch (TransformerConfigurationException e) {
catch (TransformerException e) {
catch (TransformerFactoryConfigurationError e) {

No comments: