Populating Word 2007 Templates Through Open XML

Posted on December 23, 2009

35


I recently had a client at work interested in populating contracts out of the information stored in their task tracking tool.  Today this is a manual process where the user opens up a Microsoft Word template and retypes the data points stored in their primary application.

I first looked at a few commercial options, and then got some recommendations from Microsoft to look deeper into the Open XML SDK and leverage the native XML formats of the Office 2007 document types.  I found a few articles and blog posts that explained some of the steps, but didn’t seem to find a single source of the whole end to end process.  So, I figured that I’d demonstrate the prototype solution that I built.

First, we need a Word 2007 document.  Because I’m not feeling particular frisky today, I will fill this document with random text using the ever-useful “=rand(9)” command to make Word put 9 random paragraphs into my document.

2009.12.23word01

Next, I switch to the Developer tab to find the Content Controls I want to inject into my document.  Don’t see the Developer tab?  Go here to see how to enable it.

2009.12.23word02

Now I’m going to sprinkle a few Text Content Controls throughout my document.  The text of each control should indicate the type of content that goes there.  For each content control on the page, select it and choose the Properties button on the ribbon so that you can provide the control with a friendly name. 

2009.12.23word03

At this point, I have four Content Controls in my document and each has a friendly title.  Now we can save and close the document. As you probably know by now, the Office 2007 document formats are really just zip files.  If you change the extension of our just-saved Word doc from .docx to .zip, you can see the fun inside.

2009.12.23word04

I looked a few options for manipulating the underlying XML content and finally ended up on the easiest way to update my Content Controls with data from outside.  First, download the Word 2007 Content Control Toolkit from CodePlex.  Then install and launch the application.  After browsing to our Word document, we see our friendly-named Content Controls in the list.

2009.12.23word05

You’ll notice that the XPath column is empty.  What we need to do next is define a Custom XML Part for this Word document, and tie the individual XML nodes to each Content Control.  On the right hand side of the Word 2007 Content Control Toolkit you’ll see a window that tells us that there are currently no custom XML parts in the document.

2009.12.23word06

The astute among you may now guess that I will click the “Click here to create a new one.”  I have smart readers.  After choosing to create a new part, I switched to the Edit view so that I could easily hand craft an XML data structure.

2009.12.23word07

For a more complex structure, I could have also uploaded an existing XML structure.  The values I put inside each XML node are the values that the Word document will display in each content control.  Switch to the Bind view and you should see a tree structure.

2009.12.23word08

Click each node, and then drag it to the corresponding Content Control.  When all four are complete, the XPath column in the Content Controls should be populated.

2009.12.23word09

Go ahead and save the settings and close the tool.  Now, if we once again peek inside our Word doc by changing it’s extension to .zip,  we’ll see a new folder called CustomXml that has our XML definition in there.

2009.12.23word10

For my real prototype I built a WCF service that created the Word documents out of the templates and loaded them into SharePoint.  For this blog post, I’ll resort to a Console application which reads the template and emits the resulting Word document to my Desktop.  You’ll get the general idea though.

If you haven’t done so already, download and install the Open XML Format SDK 1.0 from Microsoft.  After you’ve done that, create a new VS.NET Console project and add a reference to DocumentFormat.OpenXML.  Mine was found here: C:\Program Files\OpenXMLSDK\1.0.1825\lib\DocumentFormat.OpenXml.dll. I then added the following “using” statements to my console class.

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using System.Xml; using System.IO;

Next I have all the code which makes a copy of my template, loads up the Word document, removes the existing XML part, and adds a new one which has been populated with the values I want within the Content Controls.

static void Main(string[] args)
        {

            Console.WriteLine("Starting up Word template updater ...");

            //get path to template and instance output
            string docTemplatePath = @"C:\Users\rseroter\Desktop\ContractSample.docx";
            string docOutputPath = @"C:\Users\rseroter\Desktop\ContractSample_Instance.docx";

            //create copy of template so that we don't overwrite it
            File.Copy(docTemplatePath , docOutputPath);

            Console.WriteLine("Created copy of template ...");

            //stand up object that reads the Word doc package
            using (WordprocessingDocument doc = WordprocessingDocument.Open(docOutputPath, true))
            {
                //create XML string matching custom XML part
                string newXml = "<root>" +
                    "<Location>Outer Space</Location>" +
                    "<DocType>Contract</DocType>" +
                    "<MenuOption>Start</MenuOption>" +
                    "<GalleryName>Photos</GalleryName>" +
                    "</root>";

                MainDocumentPart main = doc.MainDocumentPart;
                main.DeleteParts<CustomXmlPart>(main.CustomXmlParts);

                //add and write new XML part
                CustomXmlPart customXml = main.AddNewPart<CustomXmlPart>();
                using (StreamWriter ts = new StreamWriter(customXml.GetStream()))
                {

                    ts.Write(newXml);
                }

            //closing WordprocessingDocument automatically saves the document
            }

            Console.WriteLine("Done");
            Console.ReadLine();
        }

When I run the console application, I can see a new file added to my Desktop, and when I open it, I find that my Content Controls now have the values that I set from within my Console application.

2009.12.23word11

Not bad.  So, as you can imagine, it’s pretty simple to now take this Console app, and turn it into a service which takes in an object containing the data points we want added to our document.  So while this is hardly a replacement for a rich content management or contract authoring tool, it is a quick and easy way to do a programmatic mail merge and update existing documents.  Heck, you could even call this from a BizTalk application or custom application to generate documents based on message payloads.  Fun stuff.

Share

About these ads
Posted in: BizTalk