XML, Web Services and Special Characters

Posted on November 9, 2007

2


If you’ve worked with XML technologies for any reasonable amount of time, you’re aware of the considerations when dealing with “special” characters. This recently came up at work, so I thought I’d share a few quick thoughts.

One of the developers was doing an HTTP post of XML content to a .NET web service. However, we discovered that a few of the records coming across had invalid characters.

Now you probably know that the following message is considered invalid XML:

<Person>
	<Name>Richard</Name>
	<Nickname>Thunder & Lightning</Nickname>
</Person>

The ampersand (“&”) isn’t allowed within a node’s text. Neither are “<”, “>” and a few others. Now if you call a web service by first doing an “Add Web Reference” in Visual Studio.NET, you are using a proxy class that covers up all the XML/SOAP stuff going on underneath. The proxy class (Reference.cs) inherits System.Web.Services.Protocols.SoapHttpClientProtocol, which you can see (using Reflector) takes care of proper serialization using the XmlWriter object. So setting my web service parameters like so …

When this actually goes across the wire to my web service, the payload has been appropriate encoded and the ampersand has been replaced …

However, if I decided to do my own HTTP post to the service and bypass a proxy, this is NOT the way to do it ..

HttpWebRequest webRequest = 
   (HttpWebRequest)HttpWebRequest.Create("http://localhost/bl/sv.asmx");
webRequest.Method = "POST";
webRequest.ContentType = "text/xml";

using (Stream reqStream = webRequest.GetRequestStream())
{

  string body = "<soap:Envelope xmlns:soap="+
  "\"http://schemas.xmlsoap.org/soap/envelope/\">"+
  "<soap:Body><Operation_1 xmlns=\"http://tempuri.org/\">" +
  "<ns0:Person xmlns:ns0=\"http://testnamespace\">" +
  "<ns0:Name>Richard & Amy</ns0:Name>" +
  "<ns0:Age>10</ns0:Age>" +
   "<ns0:Address>411 Broad Street</ns0:Address>" +
  "</ns0:Person>" +
  "</Operation_1></soap:Body></soap:Envelope>";

    byte[] bodyBytes = Encoding.UTF8.GetBytes(body);
    reqStream.Write(bodyBytes, 0, bodyBytes.Length);

}
HttpWebResponse webResponse = 
   (HttpWebResponse)webRequest.GetResponse();
MessageBox.Show("submitted, " + webResponse.StatusCode);

webResponse.Close();

Why is this bad? This may work for most scenarios, but in the case above, I have a special character (“&”) that is about to go unmolested across the wire …

Instead, the code above should be augmented to use an XmlTextWriter to build up the XML payload. These types of errors are such a freakin’ pain to debug since no errors actually get thrown when the receiving service fails to serialize the bad XML into a .NET object. In a BizTalk world, this means no SOAP exception to the caller, no suspended message, no error in the Event Log. Virtually no trace (outside of the IIS logs). Not good.

BizTalk itself doesn’t like poorly constructed XML either. The XmlReceive pipeline, in addition to “typing” the message (http://namespace#root) also parses the message. So while everyone says that the default XmlReceive pipeline doesn’t validate the structure (meaning XSD structure) of the message, it DOES validate the XML structure of the message. Keep that in mind. If I try to pass an invalid XML document (special characters, unclosed tags) that WILL bomb out in the pipeline layer.

If you try to cheat, and do pass-through pipelines and use XmlDocument as your initial orchestration message (thus bypassing any peeking at the message by BizTalk), you will still receive errors when you try to interact with the message later on. If you set the XmlDocument to the actual message variable in the orchestration, the message gets parsed at that time and fails if the structure is invalid.

So, this is probably elementary for you smart people, but it’s one of those little things that you might forget about. Be careful about generating XML content via string building and instead consider using XmlDocuments or XmlWriters to make sure that your content passes XML parsing rules.

Technorati Tags: ,

About these ads
Posted in: .NET, BizTalk