Wednesday, September 23, 2009

XML Default Namespaces and XPath Queries

The use of namespaces in XML schemas are useful and even necessary, but they cause a lot of confusion! This is compounded (confounded?) by the XPath API having different rules for applying namespaces than the rules applied to documents.

The biggest confusion for me came from the use of the default namespace. Use of a default namespace simplifies writing an XML document because it limits or eliminates the need to prefix element and attribute names. However, it places a bigger burden on the reader because the use of any namespace mandates the additional requirement of using a namespace manager in the API calls when executing queries. Any namespace that may be used in the document must be added to the namespace manager's map, including the default namespace. Be aware that the default namespace is not the same as no namespace, which is a vitally important principle of XPath queries.

Additionally, the query itself must include a namespace prefix for any and every element to be parsed. Regardless if the XML document qualifies the element to identify the namespace or not, the XPath query must qualify the elements. An unqualified element in the query will search the "no namespace" or "null" namespace, not the default namespace; XPath has no concept of a default namespace. This short blurb is found in the Microsoft documentation: XPath treats the empty prefix as the null namespace. So if your schema specifies a target namespace, an unqualified query element will never match anything in it.

What's the solution? Here are two.

1. Remove all namespaces from your XML document. For simple documents you don't plan on validating, this may be an acceptable alternative. But it's probably not a good idea for complex documents or documents where the construction will be validated; in these cases, the use of namespaces is almost mandated.

Sample XML document with no namespaces:

<?xml version="1.0" encoding="utf-8"?>
<parent>
  <child id="1">
    <item>Item 1</item>
    <item>Item 2</item>
  </child>
</parent>


Sample reader code:

use System.Xml;

XmlDocument doc = new XmlDocument();
doc.Load("/hasnodefault.xml");

// Query some elements. This is what you would naturally expect.
// Namespace prefixes cannot be used in these queries.
XmlNode root = doc.SelectSingleNode("/parent");
foreach (XmlNode node in root.SelectNodes("child/item"))
{
  // Do something with the child items.
  ;
}


2. Use a namespace manager when you are parsing your documents. If you use a default namespace, you must map it to some prefix (something other than an empty string); I will often use "default", which makes it obvious where I'm looking. Your XPath queries, unfortunately, will necessarily be more complicated, as you will have to include the namespace prefix on all elements you will be querying.

Sample XML document using a default namespace (xmlns=...):

<?xml version="1.0" encoding="utf-8"?>
<parent xmlns="http://tempuri.org/sample.xsd">
  <child id="1">
    <item>Item 1</item>
    <item>Item 2</item>
  </child>
</parent>


Sample reader code:

use System.Xml;

XmlDocument doc = new XmlDocument();
doc.Load("/hasdefault.xml");
XmlNamespaceManager nsmanager = new XmlNamespaceManager(doc.NameTable);

// Map the default namespace. Not optional.
// Note that the namespace URI is significant, not the prefix.
nsmanager.AddNamespace("default", "http://tempuri.org/sample.xsd");

// Query some elements.
// The namespace prefixes are required or the query will return null.
XmlNode root = doc.SelectSingleNode("/default:parent", nsmanager);
foreach (XmlNode node in root.SelectNodes("default:child/default:item", nsmanager))
{
  // Do something with the child items.
  ;
}


The short version of this story is: if you use namespaces in your document, even if it is a default namespace, you must qualify the elements used in your XPath queries and use a namespace manager with them.

Obviously, things can get more complicated than these simple examples. Understanding these fundamental concepts about namespaces is critical to maintaining your sanity.

For .NET users new to XML, the XPath queries (a W3C standard for addressing parts of XML documents) are implemented by the .NET framework's XmlNode.SelectNodes() and XmlNode.SelectSingleNode() methods and their related counterparts built on XPathNavigator and XPathExpression classes.

More information about XPath 1.0 is available at http://www.w3.org/TR/xpath.

Tuesday, September 15, 2009

Convincing Visual Studio 2005 that SqlClient is a valid namespace

While getting a project started to connect and use an SQL Server database from a mobile device, I ran into the same problem apparently many others have, too. You had to force Visual Studio to include some framework components so you were allowed to use them. I happened to be using Visual Studio 2005 (aka Visual Studio 8.0) to build an application for a .NET Compact Framework 2.0 platform that exchanged data with a Microsoft SQL Server 2005 database over a wireless network.

To execute a query directly against the remote database without using the SQL Compact (aka SQL Mobile aka SQL CE) SDK and without using dataset objects, required SqlConnection, SqlCommand, SqlParameter, and several other classes that existed in the System.Data.SqlClient namespace. Unfortunately you couldn't just type your using statements and have VS magically link them to your application. You expected Microsoft to make it automatic? No way. That's where the Add Reference... feature is needed and was a source of confusion (days!) in my early development with the VS .NET IDE.

Without adding a proper reference, the statement:

using System.Data.SqlClient;

caused the following compilation error and would not complete the build:

The type or namespace name 'SqlClient' does not exist in the namespace 'System.Data' (are you missing an assembly reference?)

Part of the problem is the multiplicity of .NETCF versions. As is standard with Microsoft, things get moved, renamed, and otherwise mangled between versions. In .NET CF 2.0, SqlClient is part of the System.Data namespace and the System.Data.dll library file; in later versions, there is a separate System.Data.SqlClient.dll. To allow this necessary namespace to be used in your project, you must add a reference to the System.Data.SqlClient component. The trick is learning a couple things about a listed component on the .NET tab: the "Runtime" column identifies the framework version it belongs to, not the "Version" column, that is for the version of the component itself. Also the "Path" column refers to the component as used by the IDE, not by the target device. The path does not even exist on the target device. There were a few clues, but putting them together to arrive at a coherent solution took experience or lots of trials & errors.

At the time I wrote this, the SqlClient namespace for .NET CF 2.0 was added by referencing the component named System.Data.SqlClient, version 3.0.3600.0, runtime version v.0.50727, whose (IDE) path was C:\Program Files\Microsoft Visual Studio 8\SmartDevices\SDK\\SQL Server\Client\v2.0\System.Data.SqlClient.dll. And before you point out my "typo", yes, that was the real path.

Adding this component as a reference finally allowed me to compile without VS coughing up the error.