Markup Languages
by Quinn Stewart
In the beginning-
Although HTML is the best known application of markup languages, the
concept is nothing new. The evolution of the movable-type printing press
created new industries in printing and publishing, and brought with
it the instigation of "markup" tags or languages. These
were originally marks written on manuscripts and drafts giving instructions
to the printer for styles and sizes of font to use, instructions for
italic or boldface type, or other instructions on how the finished document
should appear. This is the basic format of markup languages--a
set of instructions that indicate how a final document will look.
As computers and computer-based text processing have increased in importance
and scope, markup languages have kept pace. Word-processing programs
use hidden markup languages; the keystrokes or toolbar commands for
boldface text or different sizes operate in much the same way as old
fashioned printers' marks, even though they are not visible to the user.
These kinds of markup tags became known as "procedural markup,"
a term still in use. Web-based documents, however, have provided
the greatest exposure for markup languages, and brought them into the
sphere of public knowledge.
GML and SGML
Beginning in the 1960s, a group of programmers at IBM conceived of
a standardized method of creating markup tags for computers, a common
standard that could accommodate different types of documents, created
on different types of computer platforms (a significant problem until
very recently). Their result was the Generalized Markup
Language (GML). GML was initially focused on legal documents,
but over time evolved into the Standard Generalized Markup Language
(SGML).
SGML is "a set of rules for defining and expressing the logical
structure of documents thereby enabling software products to control
the searching, retrieval, and structured display of those documents"
(http://lcweb.loc.gov/ead/eadback.html),
and is the foundation of much of what later became the World-Wide Web.
SGML is intended to focus on document structure rather than simply appearance,
which gives it the potential for future expansion as computer-based
communications continue to evolve. It was officially approved by the
International Organization for Standardization (ISO) in 1986, and remains
an accepted standard. However, SGML is intended as a basis for creating
languages rather than as a means of Web publishing. An apt analogy is:
"Think about building a model airplane. If that airplane were a
document, a markup language would be used to put it together. SGML would
be used as the basis for the assembly instructions, not the assembly
process itself." (Navarro
et al, http://corpitk.earthweb.com/reference/pro/0782122663/ch01/01-04.html).
Hypertext Markup Language (HTML) is a greatly simplified version of
SGML, and intended specifically for the Web. At the beginning
of the creation of the World-Wide Web, many different computer platforms
and document encoding standards existed. Web creators recognized
the need for a common descriptive language to help alleviate these problems.
HTML was developed from SGML to facilitate the use of the "hypertext"
environment, where links can be made from one document to another without
the need to navigate any hierarchical organization of documents.
HTML was deliberately envisioned to be simple, to allow its widespread
use. Although HTML was originally intended for use in UNIX environments,
it lent itself well to the newly-created Mosaic graphical interface
"browser," and soon became the standard for the Web.
As the Web has grown in size and sophistication however, the limitations
of HTML are becoming apparent. To address these limitations and
to return to the structural focus of the original SGML standard, the
EXtensible Markup Language (XML) is being developed.
XML differs from HTML in at least one fundamental way. With HTML, all
the tags used in a document must be introduced to and approved by the
World-Wide Web Consortium, or W3C. In the early days of HTML,
the main focus was on publishing documents on the Web. As the Web evolved,
developers wanted more control over the layout and format of documents,
and there have been at least 3 major revisions of HTML since 1992. In
their quest to fulfill the needs of users and gain market share, both
Netscape and Microsoft introduced tags specific to their HTML browsers,
in the hope that these "improvements" would then be adopted
by the W3C. This has led to an increasingly bloated set of tags for
HTML, as well as ongoing browser incompatibilities.
XML returns to a more SGML-like approach. Rather than specifying what
each tag and attribute means like HTML, XML uses tags to dictate the
structure of the document, not its display characteristics. It leaves
this interpretation up to the application that is reading it. For example,
the <b> tag in HTML denotes that the text enclosed by this tag
be rendered in bold. In XML, depending upon the defined context, <b>
may mean the enclosed text is broken, brown, bent or borrowed, or anything
else defined by XML for the rendering application. The definition and
use of the tags is left up to the developer. This makes XML extensible
to far more uses than HTML, since its tags can be used for many
more applications than just Web publishing. (For more information, see
the W3C's"XML
in 10 Points")
Transitioning from HTML to XML-
Although it is inevitable that XML will replace HTML eventually, the transition
will take time. In the interim, EXtensible Hypertext Markup Language (XHTML)
will provide the "bridge" between them. HTML will not disappear,
but the
World-Wide Web Consortium
(W3C) will no longer update it. XHTML is based on XML, and permits site
designers to easily add new tags and extensions to their languages. Additionally,
XHTML is intended to facilitate Web access by nontraditional agents, such
as personal digital assistants. Since the transition from HTML to XML
will evolve over time and must accommodate platforms and interfaces not
yet in existence, the W3C expects to update XHTML for some time. As a
hybrid, XHTML will be compatible both with HTML-based Web pages and those
created in XML. This will allow designers to begin creating sites which
are XML-based, but still compatible with the current generation of Web
browsers.
Synchronized Multimedia Integration Language (SMIL) is an XML markup
language created to allow independent multimedia objects to be synchronized
into a multimedia presentation. These objects can be audio, video, animation,
images, text etc. The development of SMIL by the W3C is an interesting
story, and a microcosm of the development of many of the specifications
of the W3C. When the SMIL 1.0 specification was under development in
1997, Microsoft was one of the members of the working group developing
the specification, along with RealNetworks, Macromedia, and Apple. At
that time, and to this day, these companies represent the major competitors
for distributing audio and video content on the Web. By the time the
first specification was released in 1998, Macromedia and Microsoft had
discontinued their participation in favor of their own proprietary formats.
However, once RealNetworks and Apple implemented the SMIL specification
in their products, the world-wide development community rapidly began
to adopt the specification. Realizing the error of their ways, Macromedia
and Microsoft returned to collaborate with the W3C working group.
Why all the fuss? SMIL was designed to be a simple text-based markup
language similar to HTML. It has two major tags, <par> and <seq>
which mean parallel and sequential. Basically, a <par> tag means
to play multimedia objects in parallel, a <seq> tag means to play
them in sequence. This simple language can create elaborate multimedia
presentations with minimal programming, and because it is based on XML,
different vendors can extend the language to fit their needs.
Multimedia is big business, especially interactive multimedia. As technologies
improve and bandwidth increases, SMIL could easily evolve into a multimedia-on-demand
system, with the ability to combine many different multimedia objects
around the world into interactive presentations. Once RealNetworks and
Apple demonstrated the power of the technology, Microsoft and Macromedia
realized that while they could retain their proprietary technologies,
the global community might not choose to use them, and instead embrace
the work of the W3C.
In late 2000, both RealNetworks RealPlayer and Apple's Quicktime Player
are SMIL-compliant, and Microsoft has begun implementing the SMIL 2.0
recommendations into Internet Explorer 5.5.
Encoded Archival Description (EAD) is one of the markup languages specific
to the Library and Information Science discipline, specifically archives.
It is based on SGML, which provides both flexibility in document description
and an open-ended standard that can accommodate both technological advances
and the specific needs of archivists. It has not yet been formally adopted
as the definitive standard for Archival description, but is expected
to be in the near future. (For more information, see the Library of
Congress Website: http://lcweb.loc.gov/ead/.
Sources:
Goldfarb, Charles F. (1996). The Roots of SGML -- A Personal Recollection.
http://www.sgmlsource.com/history/roots.htm
The Library of Congress. Encoded Archival Description Official Web
Site. http://lcweb.loc.gov/ead/
Navarro, Ann, White, Chuck & Burman, Linda. (1999). Mastering
XML. Sybex, Inc. http://corpitk.earthweb.com/reference/pro/0782122663/
Richmond, Alan. (2000). Introduction to XHTML, with eXamples. http://wdvl.com/Authoring/Languages/XML/XHTML/
Society of American Archivists. EAD Help Pages. http://jefferson.village.virginia.edu/ead/
The World-Wide Web Consortium http://www.w3c.org