Populating Word 2007 Templates Through Open XML

I recently had a client at work interested in populating contracts out of the information stored in their task tracking tool.  Today this is a manual process where the user opens up a Microsoft Word template and retypes the data points stored in their primary application.

I first looked at a few commercial options, and then got some recommendations from Microsoft to look deeper into the Open XML SDK and leverage the native XML formats of the Office 2007 document types.  I found a few articles and blog posts that explained some of the steps, but didn’t seem to find a single source of the whole end to end process.  So, I figured that I’d demonstrate the prototype solution that I built.

First, we need a Word 2007 document.  Because I’m not feeling particular frisky today, I will fill this document with random text using the ever-useful “=rand(9)” command to make Word put 9 random paragraphs into my document.

2009.12.23word01

Next, I switch to the Developer tab to find the Content Controls I want to inject into my document.  Don’t see the Developer tab?  Go here to see how to enable it.

2009.12.23word02

Now I’m going to sprinkle a few Text Content Controls throughout my document.  The text of each control should indicate the type of content that goes there.  For each content control on the page, select it and choose the Properties button on the ribbon so that you can provide the control with a friendly name. 

2009.12.23word03

At this point, I have four Content Controls in my document and each has a friendly title.  Now we can save and close the document. As you probably know by now, the Office 2007 document formats are really just zip files.  If you change the extension of our just-saved Word doc from .docx to .zip, you can see the fun inside.

2009.12.23word04

I looked a few options for manipulating the underlying XML content and finally ended up on the easiest way to update my Content Controls with data from outside.  First, download the Word 2007 Content Control Toolkit from CodePlex.  Then install and launch the application.  After browsing to our Word document, we see our friendly-named Content Controls in the list.

2009.12.23word05

You’ll notice that the XPath column is empty.  What we need to do next is define a Custom XML Part for this Word document, and tie the individual XML nodes to each Content Control.  On the right hand side of the Word 2007 Content Control Toolkit you’ll see a window that tells us that there are currently no custom XML parts in the document.

2009.12.23word06

The astute among you may now guess that I will click the “Click here to create a new one.”  I have smart readers.  After choosing to create a new part, I switched to the Edit view so that I could easily hand craft an XML data structure.

2009.12.23word07

For a more complex structure, I could have also uploaded an existing XML structure.  The values I put inside each XML node are the values that the Word document will display in each content control.  Switch to the Bind view and you should see a tree structure.

2009.12.23word08

Click each node, and then drag it to the corresponding Content Control.  When all four are complete, the XPath column in the Content Controls should be populated.

2009.12.23word09

Go ahead and save the settings and close the tool.  Now, if we once again peek inside our Word doc by changing it’s extension to .zip,  we’ll see a new folder called CustomXml that has our XML definition in there.

2009.12.23word10

For my real prototype I built a WCF service that created the Word documents out of the templates and loaded them into SharePoint.  For this blog post, I’ll resort to a Console application which reads the template and emits the resulting Word document to my Desktop.  You’ll get the general idea though.

If you haven’t done so already, download and install the Open XML Format SDK 1.0 from Microsoft.  After you’ve done that, create a new VS.NET Console project and add a reference to DocumentFormat.OpenXML.  Mine was found here: C:\Program Files\OpenXMLSDK\1.0.1825\lib\DocumentFormat.OpenXml.dll. I then added the following “using” statements to my console class.

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using System.Xml; using System.IO;

Next I have all the code which makes a copy of my template, loads up the Word document, removes the existing XML part, and adds a new one which has been populated with the values I want within the Content Controls.

static void Main(string[] args)
        {

            Console.WriteLine("Starting up Word template updater ...");

            //get path to template and instance output
            string docTemplatePath = @"C:\Users\rseroter\Desktop\ContractSample.docx";
            string docOutputPath = @"C:\Users\rseroter\Desktop\ContractSample_Instance.docx";

            //create copy of template so that we don't overwrite it
            File.Copy(docTemplatePath , docOutputPath);

            Console.WriteLine("Created copy of template ...");

            //stand up object that reads the Word doc package
            using (WordprocessingDocument doc = WordprocessingDocument.Open(docOutputPath, true))
            {
                //create XML string matching custom XML part
                string newXml = "<root>" +
                    "<Location>Outer Space</Location>" +
                    "<DocType>Contract</DocType>" +
                    "<MenuOption>Start</MenuOption>" +
                    "<GalleryName>Photos</GalleryName>" +
                    "</root>";

                MainDocumentPart main = doc.MainDocumentPart;
                main.DeleteParts<CustomXmlPart>(main.CustomXmlParts);

                //add and write new XML part
                CustomXmlPart customXml = main.AddNewPart<CustomXmlPart>();
                using (StreamWriter ts = new StreamWriter(customXml.GetStream()))
                {

                    ts.Write(newXml);
                }

            //closing WordprocessingDocument automatically saves the document
            }

            Console.WriteLine("Done");
            Console.ReadLine();
        }

When I run the console application, I can see a new file added to my Desktop, and when I open it, I find that my Content Controls now have the values that I set from within my Console application.

2009.12.23word11

Not bad.  So, as you can imagine, it’s pretty simple to now take this Console app, and turn it into a service which takes in an object containing the data points we want added to our document.  So while this is hardly a replacement for a rich content management or contract authoring tool, it is a quick and easy way to do a programmatic mail merge and update existing documents.  Heck, you could even call this from a BizTalk application or custom application to generate documents based on message payloads.  Fun stuff.

Share

About these ads


Categories: BizTalk

36 replies

  1. Hey Richard,
    This is a really interesting article and I like the approach you took. I did some work several months ago to generate Word 2007 documents in BizTalk (http://www.modhul.com/2008/01/17/generating-microsoft-word-documents-natively-using-biztalk-2006/), but took more of a ‘map into Office Xml’ approach; your method is much more succinct and cleaner (but then again I don’t have your Microsoft contacts ;-)

    Excellent work!

  2. I liked your work as well. The pipeline approach works well, but figured that an encapsulated service might give me more reuse and better exception control.

  3. Is this the Custom XML part of Word thats being banned from the US market (with the lawsuit from i4i?)

  4. Erik, I think you’re right. Darn it. I guess I can only recommend it for people who already have Word installed!

  5. Looks like Erik beat me to the question – but I also think this seems to be the feature that is being removed in Word. Are we correct?

  6. Hi richard, this is a nice article. Can u please explain how to bind repeated data like a table?

  7. Hmmm, I haven’t tried this with repeating data elements like a table and don’t know how the content controls handle this either.

  8. Hi Richard, i am getting this error when compiled “The type ‘DocumentFormat.OpenXml.Packaging.CustomXmlPart’ must be convertible to ‘DocumentFormat.OpenXml.Packaging.IFixedContentTypePart’ in order to use it as parameter ‘T’ in the generic type or method ‘DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPart()'”. Can u please help me where i am wrong.

  9. I have been searching for almost 2 days to find a simple example on how to use an excel list to populate choices for a drop-down list in word 2007. I really think you may have the answer above – but after downloading the Content Control ToolKit I have no prompt to install and launch – nor can i find it anywhere. I would love to proceed with your example… can u help???

    Desperately seeking answers…

    Madelyn

  10. Bhanu, are you using the same version of the SDK that I did?

  11. Madelyn,
    I seem to recall some challenges there. I had to download and watch where I put it since it was hard to find. I thought it ended up on the Start menu though, and it should produce an exe in the Program Files directory. You have neither?

  12. Bhanu,
    It looks like the AddNewPart() method is now AddCustomXmlPart() with a zillion of overloaded flavors. Check out: http://social.msdn.microsoft.com/Forums/en-US/oxmlsdk/thread/95db2ea1-aa48-4b7d-99ae-86b3bad1bdd2 for more info. =)

  13. Great work.

    I have another question that is there a way you can define table with repeating rows in this approach?

  14. I modified the code to use SDK 2.0

    CustomXmlPart customXml = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
    byte[] byteArray = Encoding.UTF8.GetBytes(newXml);
    MemoryStream stream = new MemoryStream(byteArray);
    customXml.FeedData(stream);

    And the Docx output file does not contain data defined in new Xml.

    Any clue?

  15. Hi, if you find answer to add repeating data to word document using content conrol toolkit then do send d sollution 2 ma email address:
    vijaypant@live.com
    Thanx in advance

  16. Hi,

    I have noticed strange behaviour! When I insert Rich text control in word document and then bind it to xml node using word content control toolkit – when I open the document in Word again – the control is PlainText !!!

    Don’t know if this has to do with Word or with Word CC toolkit…

  17. This is very interesting information, but i found it less useful for me. I want to create an XML Template to create a Help Document where i can define my styles and then use this template to send output to PDF file.

    I got very useful information from this post.

    Thanks a lot for help.

  18. Hi, first of all….great post.
    I’m having trouble adding html content to the text content controls. Any idea?
    Tried using HtmlConverter (http://www.notesfor.net/post/2010/03/05/Html-to-OpenXml-converter.aspx) but that’s not working using templates, you have to add content to main part.

  19. Hi Richard,

    Great post.. and Thank you.

    I have a problem with the merged document. When we try to edit the content and save the merged document, the data is not saved. The merged data itself shows up.
    Even tried unchecking ‘Remove content control when contents are edited’ property (in the word template for a content control field). But doing so, none of the data was merged into the template.

    Can you please help me on this?
    KC

  20. Richard,

    After spending half a day staring at dcom config settings and trying to get word to automate creation of a template document working on a server (worked fine in my dev environment) I eventually thought there must be a better way nowadays.

    Of course there was and it is all here in your wonderful post, a couple of hours later and it’s all working using Open XML…. I owe you a pint ;-)

    For v2 open XML SDK the only code change as already explained was to change this:
    CustomXmlPart customXml = main.AddNewPart();
    To this:
    CustomXmlPart customXml = main.AddCustomXmlPart(CustomXmlPartType.CustomXml);

    Much Karma,
    Mark.

  21. It was a really useful article and tutorial.

    I have a document with numbers in complex mode, but when I replace the content with numbers, it is formatted as Latin numbers. Is there way to assign a style for each control. I tried right clicking on content properties and checking ‘Use a style to format…’ but it was useless.

    Regards.

  22. Thank you for the great Post. I have the following question I don’t know if it’s possible to implement. In case a certain section is empty is there a possible way to hide the Content Control related to it.

  23. It is very interesting post. I took a somewhat different approach. I designed an app to generate Invoices as Word doc. I inserted bookmarks in a doc and used it as template. The made copy of the same and modified word doc, by locating the book marks and inserting the data . But this process is taking lot of time

  24. Thank you very much for this article! It’s been a time saver even some many years after its original posting. I was wondering if you or any other reader might know how to replace the static newXML text below:
    string newXml = “” +
    “Outer Space” +
    “Contract” +
    “Start” +
    “Photos” +
    “”;

    with a .xml file instead? So, I know what xml file I want to use and have an XSD, just not sure how to use the file instead of specifying the answers in code (e.g. “Outer Space”).

    I tried this, but it’s not populating the Word doc:
    string newXML =@”C:\Users\Christopher\Desktop\BookData\TestReport.xml”;

    Thanks
    Chris

    • Was able to answer my own question, but thanks:

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using DocumentFormat.OpenXml;
      using DocumentFormat.OpenXml.Packaging;
      using DocumentFormat.OpenXml.Wordprocessing;
      using System.IO;

      namespace BookData
      {
      class Program
      {

      static void Main(string[] args)
      {
      string template = @”C:\Users\Christopher\Desktop\BookData\TestReportBeta.docx”;
      string outFile = @”C:\Users\Christopher\Desktop\BookData\TestReportBetaEND.docx”;
      string xmlPath = @”C:\Users\Christopher\Desktop\BookData\TestReport.xml”;

      // convert template to document
      File.Copy(template, outFile);

      using (WordprocessingDocument doc = WordprocessingDocument.Open(outFile, true))
      {
      MainDocumentPart mdp = doc.MainDocumentPart;
      if (mdp.CustomXmlParts != null)
      {
      mdp.DeleteParts(mdp.CustomXmlParts);
      }
      CustomXmlPart cxp = mdp.AddCustomXmlPart(CustomXmlPartType.CustomXml);
      FileStream fs = new FileStream(xmlPath, FileMode.Open);
      cxp.FeedData(fs);
      mdp.Document.Save();
      }
      }
      }
      }

      also here: http://social.msdn.microsoft.com/Forums/en-US/oxmlsdk/thread/cd778b36-e003-4b96-bddb-c87d5b4b25eb/

  25. I have followed this tutorial exactly as stated accounting for the sdk 2.0 changes indicated in the comments:

    CustomXmlPart customXml = main.AddCustomXmlPart(CustomXmlPartType.CustomXml);

    I have created a simple docx file that contains a single content control called test1.
    Here is my xml:

    string newXml = “test2″;

    I am implementing this in a wcf web service that is opening the test docx file from SharePoint and then copying the file to a SharePoint document library after inserting the xml.

    Whenever I try to open the docx file that has been updated and copied, I get the following error:

    Microsoft Office cannot open this file because some parts are missing or invalid.

    Anyone have any thoughts on what could be wrong?

  26. comment removed xml…it looks like this:

    root test1 test2 test1 root

  27. If I open the file from the file system in a console application, it works properly. If I open the file in SharePoint from a wcf web service, I get the error indicated above.

    Here is my code for opening the docx file from SharePoint:


    SPFile sourceFile = myweb.GetFile(“/mylibrary/test1.docx”);

    using (Stream sourceFileStream = sourceFile.OpenBinaryStream())
    {
    using (WordprocessingDocument doc = WordprocessingDocument.Open(sourceFileStream, true))
    {

    Richard, you mentioned in your post that you implemented this in a wcf web service and SharePoint. Can you possibly post some code indicating how you did it? Did you open the source docx file from the file system or from a SharePoint document library?

  28. Nice tutorial. I’m writing a PHP- based XML parser to extract info from templated .docx files. This is a great place to understand how to use MS word to create custom XML namespaces for parsing later. Had to buy MS VS pro though to use this feature, which costs a bomb for individual developers. Wonder if there is a way to do it in VS express(free version).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 264 other followers

%d bloggers like this: