Merging XML Documents

08 Dec 2000 09:51

Introduction

Merging two XML documents is a common operation.  This document describes different ways to accomplish it using an example.

For an example, let's merge documents A and B.

Document A:

<Book id="b1"><Name>The wizard of OZ</Name></Book>
<Book id="b2"><Name>Java Servlet Programming</Name></Book>
<Book id="b3"><Name>John Coltrane Rage</Name></Book>

Document B:

<BookList>
   <Book id="b1"/>
   <Book id="b2"/>
</BookList>

The result should be:

<BookList>
   <Book id="b1"><Name>The wizard of OZ</Name></Book>
   <Book id="b2"><Name>Java Servlet Programming</Name></Book>
</BookList>

To test the examples, you should install XT.

Solution I - Concatenate the Documents

Create a new XML document that consists of both documents concatenated inside two top-level tags.

<Doc>
   <Doc1>
       -- insert Document A here --
   </Doc1>
   <Doc2>
       -- insert Document B here --
   </Doc2>
</Doc>

The problem with this solution is that namespace collisions are likely and it is still difficult to process this document with XSLT.

Solution II - Modify the Style Sheet

This solution is the best one.  Create a variable in a style sheet that contains one of the documents.  Then use the document() function to query the document in the variable.

T:\ftemp>type doc2.xml
<BookList>
    <Book id="b1"/>
    <Book id="b2"/>
</BookList>

T:\ftemp>type list4.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">

<xsl:output method="xml" indent="yes"/>

<xsl:variable name="first">
   <Book id="b1"><Name>The wizard of OZ</Name></Book>
   <Book id="b2"><Name>Java Servlet Programming</Name></Book>
   <Book id="b3"><Name>John Coltrane Rage</Name></Book>
</xsl:variable>

<xsl:variable name="second">
   <Book id="b1"><Name>An Uninteresting Book</Name></Book>
   <Book id="b2"><Name>Another Uninteresting Book</Name></Book>
   <Book id="b3"><Name>Yet Another Uninteresting Book</Name></Book>
</xsl:variable>

  <!--source of data; default can be overridden on command line-->
<xsl:param name="source" select="'first'"/>

<xsl:template match="/BookList">          <!--document element-->
   <BookList>
     <xsl:for-each select="Book">
       <Book id="{@id}">
         <xsl:variable name="id" select="string(@id)"/>
         <xsl:for-each select='document("")'><!--the stylesheet-->
           <xsl:copy-of select="//xsl:variable[@name=$source]
                                 /Book[@id=$id]
                                 /*"/>
         </xsl:for-each>
       </Book>
     </xsl:for-each>
   </BookList>
</xsl:template>

</xsl:stylesheet>

T:\ftemp>xt doc2.xml list4.xsl result1.xml

T:\ftemp>type result1.xml
<BookList>
<Book id="b1">
<Name xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">The wizard of OZ</Name>
</Book>
<Book id="b2">
<Name xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">Java Servlet
Programming</Name>
</Book>
</BookList>

T:\ftemp>xt doc2.xml list4.xsl result2.xml source=second

T:\ftemp>type result2.xml
<BookList>
<Book id="b1">
<Name xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">An Uninteresting
Book</Name>
</Book>
<Book id="b2">
<Name xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">Another Uninteresting
Book</Name>
</Book>
</BookList>

Solution III - Two Physical Files

The downfall with this solution is the extra i/o required to retrieve the second document.

The document() function in XSLT allows two documents to be merged.

Extract the files in test.zip.  Finally, run go.bat and inspect the file result.xml.

Note:

You needn't use <xsl:param> to store the name of the external document, you could just use:

           <xsl:for-each select="document('doc1.xml')">

But, that wouldn't be very flexible ... what I've done below is I've made "doc1.xml" the default if you don't supply one on the command line, but you can still override the default if you wish.

T:\ftemp>type doc1.xml
<?xml version="1.0"?>
<!DOCTYPE BookSet [
<!ATTLIST Book id ID #IMPLIED>
]>
<BookSet>
   <Book id="b1"><Name>The wizard of OZ</Name></Book>
   <Book id="b2"><Name>Java Servlet Programming</Name></Book>
   <Book id="b3"><Name>John Coltrane Rage</Name></Book>
</BookSet>

T:\ftemp>type doc2.xml
<BookList>
    <Book id="b1"/>
    <Book id="b2"/>
</BookList>

T:\ftemp>type list.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">

<xsl:output method="xml" indent="yes"/>

<xsl:param name="source" select="''"/>    <!--source of data-->

<xsl:template match="/BookList">        <!--document element-->
   <BookList>
     <xsl:for-each select="Book">
       <Book id="{@id}">
         <xsl:variable name="id" select="string(@id)"/>
            <!--note you cannot use document($source)/id($id)-->
         <xsl:for-each select="document($source)">
           <xsl:copy-of select="id($id)/*"/>
         </xsl:for-each>
       </Book>
     </xsl:for-each>
   </BookList>
</xsl:template>

</xsl:stylesheet>

T:\ftemp>xt doc2.xml list.xsl result.xml source=doc1.xml

T:\ftemp>type result.xml
<BookList>
<Book id="b1">
<Name>The wizard of OZ</Name>
</Book>
<Book id="b2">
<Name>Java Servlet Programming</Name>
</Book>
</BookList>