| << 18.3.1- XML Parsers | Chapter18 | 18.3.3- Creating XML Documents from a Web Page >> |
The W3C Document Object Model
The W3C Document Object Model is a set of object-oriented Application Programming Interfaces (API's) for HTML and XML documents. These define how our documents are structured, setting out objects, properties and methods that allow us to access parts of any XML document (such as their elements, attributes, and their values) and to manipulate their structure with programming languages (such as adding elements and attributes and changing their values and order).
Note that Document Object Model is a term that has been applied to browsers for some time, with intrinsic objects such as window, document and history, being considered part of the browser object model. The W3C DOM is an attempt to standardize the different implementations of browsers.
Here we are not particularly interested in the part of the model that holds information about the browser environment or the HTML pages that it loads. Rather we are interested in the area that contains information about an XML document that a browsers hosts. The browser may load the document directly, but it is often likely to expose it using the <XML> element in IE5, which we shall meet later.
The part of the model that deals with our XML document is
known as the DOM Level 1 and can be seen at http://www.w3.org/TR/REC-DOM-Level-1/
(version 1.0). The W3C recommendation does
not actually use the term objects; it uses the term interfaces
instead, although objects can be an easier way of thinking about them for ASP
developers. There are also now DOM Level 2 and Level 3 recommendations. You can see the complete list at http://www.w3.org/DOM/DOMTR .
If we look at our books.xml file, we can see that its structure is hierarchical. As we have already seen, we have a root element, which is <books>, under which there are a number of other elements, known as child elements.
<books>
<book>
<title>Beginning ASP 3.0</title>
<quantity>20</quantity>
<ISBN>1-861002-11-0</ISBN>
<authors>
<author_name>Brian Francis</author_name>
<author_name>Chris Ullman</author_name>
<author_name>Dave Sussman</author_name>
<author_name>John Kauffman</author_name>
<author_name>Jon Duckett</author_name>
<author_name>Juan Llibre</author_name>
</authors>
<description> ASP is a powerful technology for dynamically creating web site content. Learn how to create exciting pages that are tailored to your audience. Enhance your web/intranet presence with powerful web applications.</description>
<price US="$49.99"/>
</book>
</books>
We could actually show this in terms of a diagram, like so:
You can think of this as a tree, similar to a family tree in the way that it can be represented. You can see that the root of the tree grows from the <books> element, which then branches out with child elements. The more child elements there are, the more the tree would branch out. If there were more <book> elements, the tree would get wider, and if there were more elements under each <title> or <authors> element the tree would become deeper.
Each of the elements is referred to as a node in the W3C XML DOM, and there is an important reason for this. The DOM for HTML is actually quite different from the XML DOM. This is mainly because we are using XML to create our own languages, whose structure is unknown. In contrast, HTML is a fixed language. In HTML you always have an images collection, whether or not your HTML page includes images. In addition, in HTML you always have one forms collection, no matter how many form elements there may be in an HTML document instance. Each form is accessed using the document.forms collection, with its own elements collection. However, when we are creating our own markup languages, we do not have this previous knowledge. We can be certain that there is a root element, but we cannot be sure of what is underneath it, and we do not know what name it is.
To counter this problem each item within the tree is referred to as a generic node. Earlier we said that you could think of the tree in terms of a family tree. If a node has a node underneath it, the node underneath is called a child, and the node that sprouts the child is known as a parent. So, book is a parent of title, quantity, ISBN etc. and they are children of book. If a child has no other children, then it is known as a leaf node.
In this section, we will see how we can use the DOM to retrieve elements from the DOM tree, and how to add elements to the tree. Thus we will see how we can use the data in our XML files. But, before we can look at how to retrieve them, we should have a look at the Node object and its properties that make nodes available.
The Base Object
As we have said, the DOM provides a set of objects, methods and properties, that allow us to access and manipulate the DOM, and which represent the hierarchical nature of the tree. We will not be covering all of the DOM objects here – however, since we have been referring to Nodes, we should look at the base objects:
|
Object |
Description |
|
Node |
A single node in the hierarchy. |
|
NodeList |
A collection of nodes. |
|
NamedNodeMap |
A collection of nodes allowing access by name as well as index. |
There are a number of properties that allow us to navigate through the tree with the nodes:
|
Property |
Description |
|
ChildNodes |
Returns a NodeList containing the children of the node. |
|
FirstChild |
Returns a Node that is a reference to the first child. |
|
LastChild |
Returns a Node that is a reference to the last child. |
|
ParentNode |
Returns a Node that is a reference to the parent node. |
|
PreviousSibling |
Returns a Node that is a reference to the previous sibling, i.e., the previous node at the same level in the hierarchy. |
|
NextSibling |
Returns a Node that is a reference to the next sibling, i.e., the next node at the same level in the hierarchy. |
|
NodeName |
The name of the node. |
|
NodeValue |
The value of the node. |
This isn't a complete list, but it gives you an idea of what's possible. If we refer back to our diagram, and then add in some of the relationships, we will be able to see how these work:
Specific DOM Objects
Because XML is designed to be extensible, in contrast to the fixed structure of HTML, there are also specific objects for different types of node. Most inherit the properties and methods of the Node object, as well as adding specific methods and properties relevant to the particular node type.
|
Object |
Description |
|
Document |
The root object for an XML document |
|
Element |
An XML element. |
|
Attr |
An XML attribute. |
|
CharacterData |
The base object for text information in a document. |
|
CDATASection |
Unparsed character data. Equivalent to !CDATA. |
|
Text |
The text contents of an element or attribute node. |
|
Comment |
An XML comment element. |
|
ProcessingInstruction |
A processing instruction, as held in the <? ?> section. |
So, having seen some of the objects and properties of the DOM, let's use them to discover values of an XML document programmatically.
To do this we will use the MSXML parser (provided by Microsoft) that, as we said, exposes the W3C XML DOM.
Retrieving Values from an XML Document
Having said that MSXML exposes the W3C DOM, let's see how our tree would look in terms of nodes in the parser. MSXML adds an error object of the document object model to help us troubleshoot any problems in our application.
So, the root node represents the <books> element, and its child node the <book> element. Underneath that we have the elements <title>, <quantity>, <ISBN>, <author>, <description> and <price>. The <authors> element also exposes a set of <author> nodes, which can be referred to as a NodeList.
The error object that MSXML exposes is called ParseError, which contains information about the last parsing error. The ParseError object exposes a lot of useful information for debugging and error handling within ASP pages. It exposes the following information:
|
Property |
Description |
|
errorCode |
The error code |
|
filepos |
The absolute position of the error in the XML document |
|
line |
The line number of the line that caused the error |
|
linepos |
The character position of the line containing the error |
|
reason |
The cause of the error |
|
srcText |
The data where the error occurred |
|
url |
The URL of the XML document containing the error |
This is where the information came from when we loaded our badly formed XML example earlier.
Using the MSXML parser, we can get to the values of any of these nodes, navigating the tree using the DOM. We shall now do precisely this. However, as this is an ASP book, we shall use the parser on the server in an ASP page, and write the values of the title, ISBN and description to the client.
Try It Out – Walking the DOM
1. Fire up your favorite text editor and type in the following file. We will save this file as DomExample1.asp
<%
'create an instance of MSXML to retreive the book details
set objXML = Server.CreateObject("microsoft.XMLDOM")
'load the XML document that we want to add to the database
objXML.load("C:\Inetpub\wwwroot\BegASPFiles\XMLFiles\books.xml")
'see if it loaded OK, i.e. is a well-formed XML file
If objXML.parseError.errorCode = 0 Then
strTitle = objXML.documentElement.firstChild.firstChild.text
strISBN = objXML.documentElement.firstChild.childNodes(1).text
strDescription = objXML.documentElement.firstChild.childNodes(3).text
Response.Write (strTitle & "<BR>")
Response.Write (strISBN & "<BR>")
Response.Write (strDescription & "<BR>")
'write out if an error occured
Else
Response.Write ("Sorry, an error occurred retreiving information.")
End If
Set objXML = nothing
%>
2. Save the file with the name DomExample1.asp
3. When the results are written to the screen they should look like this, displaying the title, ISBN and the description:
|
|
How It Works
Firstly, we will create an instance of the MSXML parser when the page is executed. We set it to the value of a variable called objXML.
<%
'create an instance of MSXML to retreive the book details
set objXML = Server.CreateObject("microsoft.XMLDOM")
Next we have to load the XML file into the DOM, so we use the load method of MSXML. We reference the object and the load method using the traditional dot notation, as with any other component. The value of the load method is the URL of the file that is holding the XML.
'load the XML document that we want to add to the database
objXML.load("C:\Inetpub\wwwroot\BegASPFiles\XMLFiles\books.xml")
Next we use MSXML's special parseError object to check that the file loaded properly. We do this by checking whether the error code of the parseError object is 0. If it is, this indicates that there is no reported error. Using an If...Then statement, providing everything went well, we continue processing (if not we will raise an error, as you will see soon):
'see if it loaded OK, i.e. is a well-formed XML file
If objXML.parseError.errorCode = 0 Then
Provided no error occurs, we set a variable strTitle to the value of the child node of the book element exposed by the text property of the node:
strTitle = objXML.documentElement.firstChild.firstChild.text
This string will now be holding the value Beginning ASP 3.0, since this is the title.
We then use the nodeList object, which exposes collections of nodes, to obtain the ISBN and the description, into strings. Note that this is a zero-based index, so the <title> would be 0. We, however, want the <ISBN>, and then the <description> elements, which are 1 and 3 respectively in the index:
strISBN = objXML.documentElement.firstChild.childNodes(1).text
strDescription = objXML.documentElement.firstChild.childNodes(3).text
Next we use a simple Response.Write to write out the value of the strings:
Response.Write (strTitle & "<BR>")
Response.Write (strISBN & "<BR>")
Response.Write (strDescription & "<BR>")
To cover ourselves in case an error occurred, we have to finish our If...Then statement with an Else. If it fails we will write a simple message back to the client.
'write out if an error occured
Else
Response.Write ("Sorry, an error occurred retreiving information.")
End If
Finally we clean up our resources by setting our XML parser to nothing.
Set objXML = nothing
%>
This example relies on you knowing the structure of the document so that you can navigate the document. However, if you know an element's name, but not necessarily its position, you can also use the getElementsByTagName() method. It can be run against any node, and is used to find children of the current node you are working with (it can be used with the root node or a more specific collection, where the element you are looking for is the child). It returns a NodeList object, which is an unordered collection of nodes that match the query.
OK, what we have done here is fine for displaying selected nodes of an XML document. However, in order for it to be truly useful programmatically, we not only need to retrieve nodes, we also need to change the structure of a document. This is done using the Node objects methods.
Node Object Methods
In order to change the content of a loaded XML document, using the XML DOM, we use the read/write methods exposed by the Node object. The Node object offers us a set of methods for use when editing an XML document.
For example, the cloneNode() method copies an existing node and creates a new Node object to hold it. You can also set a value to copy all of its descendant nodes as well:
|
Method |
Description |
|
cloneNode(recurse_children) |
Creates a new Node object that is an exact clone of this node. If you include the Boolean parameter recurse_children, it will copy all child objects. |
There are also four methods that allow us to add, replace, insert or remove existing nodes:
|
Method |
Description |
|
appendChild(new_node) |
Appends a new object new_node to the end of the list of child nodes for this node |
|
replaceChild(new_node, old_node) |
Replaces the child node old_node with the new child node new_child, and returns the old child node |
|
insertBefore(new_node, this_node) |
Inserts a new Node object new_node into the list of child nodes for this node, before this_node or at the end of the list if no this_node is specified |
|
RemoveChild(this_node) |
Removes the child node this_node from the list of child nodes for this node, and returns it |
Don't let the use of child nodes confuse you – it is not a limitation, as it is expected that you would not need to alter a root element.
We can also check to see if the current node has any child nodes by calling the hasChildNodes() method, which returns True if the selected node has any nodes.
In the following Try It Out we will be using the DOM methods to create the following XML document:
<?xml version="1.0"?>
<books>
<book>
<title>Professional XML</title>
<quantity>30</quantity>
<ISBN>1-861003-11-0</ISBN>
</book>
</books>
Try It Out – Creating an XML Document on the Server
1. Open your text editor and type in the following code.
<%@LANGUAGE="VBScript"%>
<%
Response.Buffer = False
Response.ContentType = "text/xml"
%>
<?xml version="1.0"?>
<%
Response.Write makeXML()
Function makeXML()
Dim objParser
Dim book
Set objParser = Server.CreateObject("Microsoft.XMLDOM")
' Build an XML document using the DOM.
' Create the root node
Set objParser.documentElement = objParser.createElement("books")
' Create the book element and child elements
Set book = objParser.createElement("book")
book.appendChild objParser.createElement("title")
book.appendChild objParser.createElement("quantity")
book.appendChild objParser.createElement("ISBN")
' Set the PCDATA values
book.childNodes(0).text = "Professional XML"
book.childNodes(1).text = "30"
book.childNodes(2).text = "1-861003-11-0"
' Append a clone to the document
objParser.documentElement.appendChild book.cloneNode(true)
makeXML = objParser.xml
End Function
%>
2. Save the file as CreateXML.asp and view it in your web browser.
|
|
How It Works
We create an ASP page called CreateXML.asp and set the language to VBScript:
<%@LANGUAGE="VBScript"%>
We will be sending the resulting XML document that we create in this example back to the client, so we set the ContentType property of the Response object to "text/xml" to ensure that the proper HTTP headers are sent back to the client:
<%
Response.Buffer = False
Response.ContentType = "text/xml"
%>
As our browser knows that we are sending it back XML, we can just write in the XML prolog. We do this outside of the ASP delimiters, because we do not want it processed on the server, we want it sent back to the client:
<?xml version="1.0"?>
We then set the Response object to send back the value of the function makeXML() to the browser:
<%
Response.Write makeXML()
All of the functionality required to create the XML is held in the function makeXML(). We start by declaring a couple of variables, objParser and book. We set a variable called objparser to an instance of the DOM, as implemented by MSXML:
Function makeXML()
Dim objParser
Dim book
Set objParser = Server.CreateObject("Microsoft.XMLDOM")
We start building up the document by creating a root element in the DOM exposed by the parser, which is held by the variable objParser, using the createElement method. We call it books to form the root tag <books>:
' Build an XML document using the DOM.
' Create the root node
Set objParser.documentElement = objParser.createElement("books")
We can now continue to build up the document. We set the book variable we declared earlier to the value of objParser and use the createElement() method to add a <book> element, which holds the details about this book. After this we can just append the three child nodes to the book element using the appendChild() method:
' Create the book element and child elements
Set book = objParser.createElement("book")
book.appendChild objParser.createElement("title")
book.appendChild objParser.createElement("quantity")
book.appendChild objParser.createElement("ISBN")
So, book is now holding the empty elements to which we need to add values.
We add the values to the <title>, <quantity> and <ISBN> elements:
' Set the PCDATA values
book.childNodes(0).text = "Professional XML"
book.childNodes(1).text = "30"
book.childNodes(2).text = "1-861003-11-0"
Having set the values, we add the <book> element (and its newly created children) to the <books> element held in the objParser variable, using the appendChild() method:
' Append a clone to the document
objParser.documentElement.appendChild book.cloneNode(true)
Finally we send the newly created document using the xml method of the parser, which is held by objParser:
makeXML = objParser.xml
End Function
%>
As you can see from the result on your browser, we have created a proper XML document on the server.
Having looked at how to create an XML document using the DOM, let's now see how we can provide an interface for users to create XML documents over the web.
| << 18.3.1- XML Parsers | Chapter18 | 18.3.3- Creating XML Documents from a Web Page >> |

RSS


