Anatomy of a Topic Map

Kam McEvoy
kamcevoy@mail.utexas.edu
LIS385T
October 21, 2002



Table of Contents:

1.0 Introduction
2.0 Topic Maps and their Roots
3.0 Topic Map Elements
3.1 Topics
3.2 Occurrences
3.3 Associations
3.4 Scope, Merging, and Published Subjects: The Problem of Context
4.0 Topic Map Standards
5.0 Practical Applications of Topic Maps
6.0 A Look at XTM Code
7.0 Topic Map Implementation
8.0 Current Issues in Topic Maps
8.1 RDF and its Relation to XTM and Topic Maps
9.0 Conclusion
10.0 Examples of Topic Maps
References


back 1.0 Introduction

Navigating through massive amounts of unorganized digital information residing in dissimilar formats can present a nightmare for users. Results from search engines often lack the depth or context necessary to satisfy information needs. Topic maps respond to these search limitations by adding dimensionality to the information landscape. Touted as the "GPS" of the information universe, topic maps are a standardized way to represent structured information as: 1) a set of resources grouped around topics and 2) relationships between those topics (Ahmed, 2002, para. 1). This paper will examine topic maps and their origins, the elements comprising topic maps, and ISO standards that shape the topic maps. It will also discuss their application and viability within real-world environments, briefly review XTM syntax, and look at some challenges that face the topic map standards community.

back 2.0 Topic Maps and their Roots

Topic maps have roots in both traditional navigational tools used in the print world and the semantic networks designed for artificial intelligence, based on symbolic logic. Eric Freese posits that topic maps are "a powerful way to manage link information, much as glossaries, cross references, thesauri, and catalogs do in the paper world" (2000, sec. Topic Maps - an Introduction) Like traditional tools, the topic map can link important concepts together "independently of what is said about them in the information being indexed."(Garshol, 2002, para. 4). The topic map, like the index, is a separate entity that works outside of the content it searches. The topic map has an advantage over its print counterparts in that it can obtain information about those key concepts from dissimilar digital formats, such as databases and documents, thus topic maps are "used to organize information in a way that can be optimized for navigation" (Freese, 2000, sec. Topic Maps - An Introduction) in voluminous and disorganized environments.

In the early 1990s, the Davenport Group was trying to enable exchange between different types of computer documentation, and came upon the fact that indexes "conform to models of the structure of the knowledge available in the materials that they index. But the models are implicit, and they are nowhere to be found!" (Peters, 2000, sec. TAO of Topic Maps). If they could reify the models, they could merge indexes more easily.

Topic maps emerged from this idea of formalized models, consisting of three main elements: 1) topics, 2) associations between those topics, and 3) occurrences. Secondary concepts include scope, merging, and published subjects, which will be discussed later in this paper.

back 3.0 Topic Map Elements

back 3.1 Topics

In the XML Topic Map 1.0 specification (discussed further in the Standards section), topics are described as "A resource that acts as a proxy for some subject; the topic map system's representation of that subject." TopicMaps.org , 2001, sec. 1.3). In practice, the subject and topic are almost interchangeable, but formally, the topic is a reification of the subject. Topics are assigned names, one of which is a base name, and the others are variants, similar to a thesaurus. For example, "LIS 385T" might be a base name for the subject of this course, and "Information Architecture and Design" might be a variant name, describing the same topic, as well as "LIS 385T: Information Architecture and Design."

Topics can be grouped into types, which are topics themselves. For instance, LIS 385T is a type of course offered by the GSLIS program. While "LIS 385T" is the topic at hand, "Courses offered by the GSLIS program" is also considered a topic/subject.

back 3.2 Occurrences

An occurrence is an information resource that pertains to the subject/ topic at hand. For example, a cookbook index has a topic, "pumpkin," and a list of occurrences - pages 237, 238, 429. In the IA course example, the topic "LIS 385T" could have a list of occurrences including links to the course description web page, the course's web site, and the digital list of curriculum offered by the GSLIS program. Sometimes, occurrences can be topics in their own right. Pepper uses the example of a digital picture of William Shakespeare, which would be an occurrence under the subject, William Shakespeare, but perhaps the picture is a famous work studied by art students or a good example of digital graphics, rendering it a subject in these different contexts.

back 3.3 Associations

An association involves any kind of relationship between one or more topics. These are the links that in some ways mimic semantic networks. Pepper explains that "one of the ground-breaking aspects of topic maps…was the use of independent (or out-of-line) linking and addressing mechanisms. This frees the index from the resource it indexes and made it possible to create indexes for resources to which the indexer does not have write access." (Pepper, 2000, sec. The BUTs of TAO). Associations can also be grouped by type, and these types are defined in terms of topics as well. For instance, topic maps can determine all relationships that involve "is taught by", thus providing professor and course topics, or any other topics linked by that association type. Associations are therefore multi-directional, as in Don Turnbull teaches LIS 385T, and LIS 385T is taught by Don Turnbull. "The question of how to label a relationship is one of naming, not of direction" (TopicMaps.org, 2002, sec. 2.2.4).

back 3.4 Scope, Merging, and Published Subjects: The Problem of Context

Scope can be used not only to clarify topics, but also to help users navigate "by dynamically altering the view on a topic map based on the user profile and the way in which the map is used" (Pepper, 2000, Scope). Each characteristic can have a scope, either implicitly (considered the unconstrained scope) or explicitly (as a set of topics). (TopicMaps.org, 2002)

An unlimited amount of subject identifiers in the form of URIs can be assigned to a topic, which allows "subjects to be uniquely identified across topic maps and the entire web" (Garshol, 2002, sec. But Wait, There's More). Using these identifiers, topics on the same subject in different places can be merged to pool resources for the user. With this merging rule, the user can perform one-stop searching. For instance, if a user wanted information about South America, subject identifiers in Spanish, Portuguese, English relating to the topic, "South America" could be merged so that a user would get more hits, or occurrences, on that topic. This spares the user from having to type Spanish and Portuguese equivalents to "South America," cutting down three searches to one search. OASIS has created three teams devoted to creating a framework for common subject indicators, or "published subjects." These are "URIs and descriptions for concepts considered important by some publisher" (ISO SC34, 2002, sec. Meanwhile, at OASIS…).

back 4.0 Topic Map Standards

Topic maps currently maintain two standards: Hytime Topic Maps (from the ISO 13250:2000 standard, called HyTM) and the more recent XML Topic Maps (XTM), released in 2001 and developed by Topicmaps.org. One of the problems facing topic maps is the incongruence of syntax between HyTM and XTM. The scope element is treated differently, nested differently within the code. Also, there needs to be some correlation between HyTM's display names and sort names and XTM's variant names (ISO SC34, 2002, sec. The Present). There is also a technical report by Ontopia specifying a Linear Topic Map notation (LTM), which purports to be easier to read and write than XML. Its goal is to fill the "need for a simple textual format that can be used to concisely and clearly express topic map constructs in emails, discussions and similar contexts" (Gershon, 2002, sec. 1, para. 2).

back 5.0 Practical Applications of Topic Maps

The lack of interoperability among B2B vocabularies in XML makes topic maps a welcome invention in the business environment. Pepper notes that "the close similarity to semantic nets gives an idea of how topic maps, even without any occurrences connecting them to an information pool, can become valuable resources in their own right. This in turn opens up new business opportunities for creating and selling 'portable topic maps' that can be overlaid on multiple information pools." (2000, sec. Conclusion) Also, as copyrighted information becomes cheaper to come by, the publishing world can leverage its expertise by offering topic map services. On the flip side, the infrastructure must be created. Also, there will always be ambiguity concerning naming conventions.

back 6.0 XML Syntax: Some sample code

This fragment of code identifies three topic types: person, deliverable, course and website:

<topic id="person">
<baseName>
<baseNameString>Person</baseNameString>
</baseName>
</topic>

<topic id="deliverable">
<baseName>
<baseNameString>Deliverable</baseNameString>
</baseName>
</topic>

<topic id="course">
<baseName>
<baseNameString>course</baseNameString>
</baseName>
</topic>

<topic id="website">
<baseName>
<baseNameString>Website</baseNameString>
</baseName>
</topic>

Then we would populate the topic types with instances of topics, like Don Turnbull, Research Topic Paper, and LIS 385T: Information Architecture, and specify an occurrence:

<topic id="don-turnbull">
<instanceOf>
<topicRef xlink:href="#person"/>
</instanceOf>
<baseName>
<baseNameString>Don Turnbull</baseNameString>
</baseName>
</topic>

<topic id="research-topic-paper">
<instanceOf>
<topicRef xlink:href="#deliverable"/>
</instanceOf>
<baseName>
<baseNameString>Research Topic Paper</baseNameString>
</baseName>
</topic>

<topic id="lis385t-info-architecture">
<instanceOf>
<topicRef xlink:href="#course"/>
</instanceOf>
<baseName>
<baseNameString>LIS 385T: Information Architecture</baseNameString>
</baseName>
<occurrence>
<instanceOf>
<topicRef xlink:href="#website"/>
</instanceOf>
<resourceRef xlink:href="http://www.gslis.utexas.edu/~l385tdt/"/>
</occurrence>
</topic>

An example of an association is below, where the LIS385T course is linked to Don Turnbull:

<association id="turnbull-teaches-class">
<instanceOf>
<topicRef xlink:href="#taught-by"/>
</instanceOf>
<member>
<roleSpec>
<topicRef xlink:href="#professor"/>
</roleSpec>
<topicRef xlink:href="don-turnbull"/>
</member>
<member>
<roleSpec>
<topicRef xlink:href="#course"/>
</roleSpec>
<topicRef xlink:href="lis385t-info-architecture"/>
</member>

back 7.0 Topic Map Implementation

Topic maps can be generated manually or automatically. While manual maps created by information specialists yield richer, more accurate maps, budgetary and time constraints might encourage automation. Existing source data, if well structured, or structured source data like XML or other applications can offer alternative solutions. Although a topic map author needs nothing more than a text editor, specialized software for topic map editing and automatic generation exists, with products from topicmap.com and ontopia.com. Omnigator is a free topic map browser, and the source for topic map engines like TM4J, Perl XTM, and tmproc is available for viewing. Most topic map applications use a topic map engine to import and export XTM, store, update, and query topic maps. "For example, applications that implement a topic map-driven portal will sit on top of the engine and use it to access the topic map" (Garshol, 2002a, sec. How to Use Topic Maps).

back 8.0 Current Issues with Topic Map Standards
ISO SC34 is in the process of creating two new topic map standards: 1) ISO 18048: Topic Maps Query Language (TMQL), a SQL-like query language designed to "greatly simplify topic map application development by making it much easier to extract information from topic maps" (SC34, 2002), and 2) ISO 19756: Topic Maps Constraint Language (TMCL), which set limits on the objects and relations within the topic map, like "every invoice number must be related to one customer number."

ISO SC34 intends to break ISO 13250 into a multi-part standard and add the Standard Application Model (SAM), acting as a formal data model for topic maps. "The SAM is what will allow SC34 to solve the problems with the interpretations of the specifications, relate HyTM and XTM to one another, and create a foundation for TMQL and TMCL" (SC34, 2002).

The new ISO 13250 will also include a Reference Model, which is an abstracted graph model of topic maps that shows the relationships of knowledge representations between XTM, Resource Description Framework (RDF), and Knowledge Interchange Format (KIF). "In this model, names and occurrence resources turn into nodes on the same level as topics, and they are related to their topics using an association-like structure of nodes and arcs" (SC34, 2002).

back 8.1 RDF and its Relation to XTM and Topic Maps

RDF and XTM share similar structures and semantics but have evolved from very different communities that see a sharp distinction between the two. XTM features base names, occurrences, and associations as well as scope and merging - features that RDF lacks. Martin Lacher and Stephan Decker think that because the two data models share the central goal of defining an interchangeable format for Web-based knowledge exchange, the two paradigms should be integrated. This could be accomplished by representing Topic Map information as RDF information and thus allowing Topic Map information to be queried by an RDF-aware infrastructure (n.d., para. 1).

back 9.0 Conclusion

Assuming that topic map browsers incorporate information architecture principles into their design, accurate and well-integrated topic maps have the potential to profoundly alter search strategy and facilitate information retrieval. But even with a GPS for the information universe, it might take longer to get to an information destination than expected. Well-designed information still begets more information, and topic maps may pave the way for an increasing amount of dynamically created information, that, if not properly labeled and organized, may lead to more traffic.

back 10.0 Example Topic Maps

Techquila's Topic Map World

Publicly Available Topic Maps

The V Topic Map Browser

 

back References


Ahmed, Kal. (2002, May 21). Topic Maps - A Practical Introduction with Case Studies. Paper presented at the XML Europe 2002 Conference. Retrieved October 14, 2002 from http://62.231.133.220/idea-eks-nav/papers/03-05-01/03-05-01.html

de Grauw, Marc. (2002, August 21). Business Maps: Topic Maps go B2B. xml.com. Retrieved October 14, 2002 from http://www.xml.com/pub/a/2002/08/21/topicmapb2b.html

Freese, Eric. (2000.) Using Topic Maps for the representation, management and discovery of knowledge. Paper presented at the XML Europe 2000 Conference. Retrieved October 10, 2002 from http://www.gca.org/papers/xmleurope2000/papers/s22-01.html

Gershon, Lars M. (2002, September 11). What are Topic Maps? xml.com. Retrieved October 17, 2002 from http://www.xml.com/pub/a/2002/09/11/topicmaps.html

Gershon, Lars M. (2002b, May 15). Linear Topic Map Notation: Definition and introduction, version 1.2. Retrieved October 20, 2002 from http://www.ontopia.net/download/ltm.html

Lacher, Martin and Decker, Stephan. (2001). On the Integration of Topic Maps and RDF Data. Retrieved October 14, 2002 from http://www.semanticweb.org/SWWS/program/full/paper53.pdf

ISO/IEC JTC 1/SC34. (2002, June 25). Guide to the topic map standards. Retrieved October 14, 2002 from http://www.y12.doe.gov/sgml/sc34/document/0323.htm

Pepper, Steve. (2000, July). The TAO of Topic Maps: Finding the Way in the Age of Infoglut. Retrieved October 10, 2002 from http://www.ontopia.net/topicmaps/materials/tao.html

TopicMaps.org. (2001). XML Topic Maps (XTM) 1.0. Retrieved October 14, 2002 from http://www.topicmaps.org/xtm/1.0/