Wednesday, September 23, 2009

XML Default Namespaces and XPath Queries

The use of namespaces in XML schemas are useful and even necessary, but they cause a lot of confusion! This is compounded (confounded?) by the XPath API having different rules for applying namespaces than the rules applied to documents.

The biggest confusion for me came from the use of the default namespace. Use of a default namespace simplifies writing an XML document because it limits or eliminates the need to prefix element and attribute names. However, it places a bigger burden on the reader because the use of any namespace mandates the additional requirement of using a namespace manager in the API calls when executing queries. Any namespace that may be used in the document must be added to the namespace manager's map, including the default namespace. Be aware that the default namespace is not the same as no namespace, which is a vitally important principle of XPath queries.

Additionally, the query itself must include a namespace prefix for any and every element to be parsed. Regardless if the XML document qualifies the element to identify the namespace or not, the XPath query must qualify the elements. An unqualified element in the query will search the "no namespace" or "null" namespace, not the default namespace; XPath has no concept of a default namespace. This short blurb is found in the Microsoft documentation: XPath treats the empty prefix as the null namespace. So if your schema specifies a target namespace, an unqualified query element will never match anything in it.

What's the solution? Here are two.

1. Remove all namespaces from your XML document. For simple documents you don't plan on validating, this may be an acceptable alternative. But it's probably not a good idea for complex documents or documents where the construction will be validated; in these cases, the use of namespaces is almost mandated.

Sample XML document with no namespaces:

<?xml version="1.0" encoding="utf-8"?>
<parent>
  <child id="1">
    <item>Item 1</item>
    <item>Item 2</item>
  </child>
</parent>


Sample reader code:

use System.Xml;

XmlDocument doc = new XmlDocument();
doc.Load("/hasnodefault.xml");

// Query some elements. This is what you would naturally expect.
// Namespace prefixes cannot be used in these queries.
XmlNode root = doc.SelectSingleNode("/parent");
foreach (XmlNode node in root.SelectNodes("child/item"))
{
  // Do something with the child items.
  ;
}


2. Use a namespace manager when you are parsing your documents. If you use a default namespace, you must map it to some prefix (something other than an empty string); I will often use "default", which makes it obvious where I'm looking. Your XPath queries, unfortunately, will necessarily be more complicated, as you will have to include the namespace prefix on all elements you will be querying.

Sample XML document using a default namespace (xmlns=...):

<?xml version="1.0" encoding="utf-8"?>
<parent xmlns="http://tempuri.org/sample.xsd">
  <child id="1">
    <item>Item 1</item>
    <item>Item 2</item>
  </child>
</parent>


Sample reader code:

use System.Xml;

XmlDocument doc = new XmlDocument();
doc.Load("/hasdefault.xml");
XmlNamespaceManager nsmanager = new XmlNamespaceManager(doc.NameTable);

// Map the default namespace. Not optional.
// Note that the namespace URI is significant, not the prefix.
nsmanager.AddNamespace("default", "http://tempuri.org/sample.xsd");

// Query some elements.
// The namespace prefixes are required or the query will return null.
XmlNode root = doc.SelectSingleNode("/default:parent", nsmanager);
foreach (XmlNode node in root.SelectNodes("default:child/default:item", nsmanager))
{
  // Do something with the child items.
  ;
}


The short version of this story is: if you use namespaces in your document, even if it is a default namespace, you must qualify the elements used in your XPath queries and use a namespace manager with them.

Obviously, things can get more complicated than these simple examples. Understanding these fundamental concepts about namespaces is critical to maintaining your sanity.

For .NET users new to XML, the XPath queries (a W3C standard for addressing parts of XML documents) are implemented by the .NET framework's XmlNode.SelectNodes() and XmlNode.SelectSingleNode() methods and their related counterparts built on XPathNavigator and XPathExpression classes.

More information about XPath 1.0 is available at http://www.w3.org/TR/xpath.

No comments:

Post a Comment