Page

18.1.2- Examples of Markup Languages

Created by Brendan Doss.
Last Updated by Brendan Doss.  

PublicCategorized as 18. An Introduction to XML.

Not yet tagged
<< 18.1.1- The Characteristics of Markup LanguagesChapter1818.1.3- What is XML? >>

Examples of Markup Languages

Let's take a quick look at three markup languages, so that we can see where XML fits into the picture. We'll look at SGML, HTML and of course XML.

SGML

Standardized Generalized Markup Language (SGML) is a markup language that is used to create other markup languages. The most famous language written in SGML is HTML, which we all know and love for its use on the Web. HTML is known as an application of SGML. The problem with SGML is that it is very complicated – hence our interest in XML. XML is a simplified version of SGML, retaining much of SGML's functionality, yet designed for use on the web.

 

Back in 1986 SGML became an international standard (ISO 8879) for defining markup languages, before the Web was even conceived (in fact, SGML has been in existence since the late 1960's). Its purpose was to describe markup languages, by allowing the author to provide formal definitions for each of the elements and attributes in the language, thus allowing authors to create their own tags that related to their content. In effect, they could write their own markup language using SGML, which is exactly what happens when a new version of HTML is created. The World Wide Web Consortium (W3C) makes up the new tags, and it is up to browser manufacturers to implement them.

 

As a language SGML is very powerful, but with its power comes complexity, and many of the features are rarely used. It is very difficult to interpret an SGML document without the definition of the markup language, kept in a Document Type Definition (DTD). The DTD is where all the rules for the language are kept in SGML; after all you cannot make up your own markup language without specifying how it should be used. The DTD has to be sent with, or included in, the SGML document so that the custom created tags can be understood. In particular, it was adopted in industries where large amounts of documentation had to be marked up.

HTML

As we just saw, HTML was originally an SGML application. It describes how information is to be prepared for the World Wide Web. HTML is just a set of SGML rules and as such it also has a DTD. In fact there are several DTDs, ones for loose and tight structured HTML, ones for different versions, and so on. Being far simpler than SGML, and a fraction of its size, HTML is very easy to learn – a factor that quickly made it popular and widely adopted by all sorts of people.

 

It is common knowledge that Tim Berners-Lee created HTML in 1991 as a way of marking up technical papers so that they could easily be organized and transferred across different platforms for the scientific community. This is not meant to be a history lesson, but it is important to understand the concepts behind HTML if we are to appreciate the power of XML. The idea was to create a set of tags that could be transferred between computers so that others could render the document in a useful format. For example:

 

<H1> This is a primary heading</H1>

<H2>This is a secondary heading</H2>

<PRE>This is text whose formatting should be preserved</PRE>

<P>The text between these two tags is a paragraph</P>

 

Back then the scientific community had little concern over the aesthetic appearance of their documents. What mattered to them was that they could transfer their documents and that the meaning would be preserved. They weren't worried about the color of their fonts or the exact size of their primary heading!

HTML uses a protocol called HTTP (Hypertext Transfer Protocol) to transfer information across the Internet. It is one of a number of protocols used on the Internet, which are collectively knows as the Internet Protocol Suite.

 

What gives HTTP the edge over other protocols is the relative ease with which it can be used to retrieve another document. The combination of an easy-to-use protocol and a simple to learn language is an attractive proposition – and one that has ensured the rapid spread of systems implementing HTML and HTTP.

 

As HTML usage exploded and web browsers started to become readily available, non-scientific users soon started to create their own pages en masse. These non-scientific users became increasingly concerned with the aesthetic presentation of their material. Manufacturers of browsers, used to view web sites, were all too ready to offer different tags that would allow web page authors to display their documents with more creativity than was possible using plain ASCII text. Netscape was the first, adding the familiar <FONT> tag, which allowed users to change the actual text font as well as its size and weighting. This triggered a rapid expansion in the number of tags that browsers would support.

 

With the new tags, however, came new problems. Different browsers implemented the new tags inconsistently. Today we have sites that display signs saying that they are Best Viewed Through Netscape Navigator or are Designed For Internet Explorer. On top of all this, we now expect to be able to produce web pages that resemble documents created on the most sophisticated Desktop Publishing systems.

 

Meanwhile the browser's potential as a new application platform was quickly recognized, and web developers started creating distributed applications for businesses, using the Internet as a medium for information and financial transactions.

Drawbacks of HTML

While the widespread adoption of HTML propelled the rise in the number of people on the web, users wanted to do an ever-increasing variety of new and more complex things, and weaknesses with HTML became apparent:

 

  • HTML has a fixed tag set. You cannot create your own tags that can be interpreted by others.
  • HTML is a presentation technology. It doesn't carry information about the meaning of the content held within its tags
  • HTML is "flat". You cannot specify the importance of tags, so a hierarchy of data cannot be represented.
  • Browsers are being used as an application platform. HTML does not provide the power needed to create advanced web applications, at least not at the level at which developers are currently aiming. For example, it does not readily offer the ability for advanced retrieval of information from documents marked up in HTML and it is not easy to process the data within the document, because the text is only marked up for display.
  • High traffic volumes. HTML documents that are used as applications clog up the Internet with high volumes of client-server traffic. For example, sending large general sets of data across a network when only small amounts are required.

While HTML has proven a very useful way of marking up documents for display in a web browser, a document marked up in HTML tells us very little about its actual content. For most documents to be useful in a business environment, there is a need to know about the document's content. When a document contains content details, then it is possible to perform generalized processing and retrieval on that file. This means that it is no longer suitable for just one purpose – rather than just being used for display on the web, it can also be used as part of an application. Marking up data in a way that tells us about its content makes it self-describing. This means that the data can be re-used in different situations. SGML made this possible, but it is now also possible with XML – which is far simpler and rapidly gaining in popularity.

How XML Came About

The major players in the browser market made it clear that they had no intention of fully supporting SGML. Furthermore, its complexity prevented many people from learning it. So, moves were made to create a simplified version for use on the web, signaling a return to documents being marked up according to their content. In the same way that HTML was designed for technical papers, with tags such as the heading and paragraph tags, there was a move to allow people greater flexibility. They wanted to create their own tags and markup languages so that they could markup whatever they wanted, however they wanted, with the intention of making it self-describing.

 

The W3C saw the worth of creating a simplified version of SGML for use on the Web, and agreed to sponsor the project. When SGML was put under the knife, several of the non-essential parts were cut, thus molding it into a new language called XML. This lean alternative is far more accessible, its specification running to around a fifth of the size of the specification that defined SGML.

<< 18.1.1- The Characteristics of Markup LanguagesChapter1818.1.3- What is XML? >>

Copyright © 2003 by Wiley Publishing, Inc.

Powered by Near-TimeTerms of Services | Privacy Policy | Security Policy |