Graduate School of Library and Information Science, UT Austin
Information Technologies
and the
Information Professions
spacer


Shortcuts
Home
Introduction
Syllabus
Texts
Tech Modules
Assignments
Standards
Grading
Completion
Resources
Discussion Board
 
GSLIS Links
GSLIS Home
Tutorial Junction
IT Services
 
Site Tools
Site Map
Contact Info
 

Overview of Metadata
R. E. Wyllys

Introduction

This lesson discusses "metadata." As a word, "metadata" is a combination. One component is "data," the plural of "datum." The Merriam-Webster Collegiate Dictionary Online (MWCDO) provides three meanings for "data":

1 : factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation. . . .
2 : information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful
3 : information in numerical form that can be digitally transmitted or processed

The other component is "meta", which is used in the meaning that the MWCDO describes as follows:

3 [metaphysics] : more comprehensive : transcending <metapsychology> — used with the name of a discipline to designate a new but related discipline designed to deal critically with the original one <metamathematics>

Thus "metadata" means "data that deal with other data," or "data that deal with original data,"or casually but briefly, "data about data."

Use of "Metadata" in Library and Information Science

Within the library- and information-science (LIS) community, the most frequent use of "metadata" is to refer to data produced as part of the process of cataloging of materials in libraries and other information agencies. Cataloging data are, by their very nature, data about other things, such as books and other information-bearing entities (InBEs).

A less frequent but still important use of "metadata" in LIS is to refer to those parts of the structure of a relational database that describe the contents of the various tables (files) and columns (fields) that make up the database. For example, one might describe a certain column in a certain table in a database as: "This column specifies the employee's Social Security Number; it contains 9 bytes; the bytes must be numeric; and any row in the table that lacks data in this column is not a valid row in the table." These statements are metadata concerning the data that are stored in the database by using that table and that column.

Other uses of metadata in our field clearly exist; for, in a general sense, any statement that one makes about the nature of an item or items in a collection of InBEs can be viewed as a metadata statement. For example, the Website of UT-Austin's Nettie Lee Benson Collection begins with the following description of the collection; the description is a metadata statement:

The Nettie Lee Benson Latin American Collection, a unit of the General Libraries at the University of Texas at Austin, is a specialized research library focusing on materials from and about Latin America, and on materials relating to Spanish-speaking peoples in the United States. Latin America is here defined to include Mexico, Central America, the Caribbean island nations, South America, and areas of the United States during the period they were a part of the Spanish Empire or Mexico. Named in honor of its former director (1942-1975), the Nettie Lee Benson Collection contains over 800,000 books, periodicals, and pamphlets, 2,500 linear feet of manuscripts, 19,000 maps, 21,000 microforms, 11,500 broadsides, 93,500 photographs, and 38,000 items in a variety of other media (sound recordings, drawings, video tapes and cassettes, slides, transparencies, posters, memorabilia, and electronic media).

In this lesson, we concentrate on the use of "metadata" to refer to cataloging data.

Metadata as Cataloging Data

From the beginnings of their history, libraries (and other information agencies, such as archives) have provided various kinds of descriptions of, i.e., metadata about, the materials included in their collections. A description might be as brief as: "This room stores scrolls dealing with the plans of Pharaoh Rameses for his pyramid" or "Box containing our treaties with Sparta." Or a description might be as lengthy as a printed list of the works owned by a library, with each work described in terms of its author(s), title, and various other data about the work that the compilers of the list considered important.

In modern times, a widespread way of providing descriptions of materials in libraries has been the printed catalog card and its computerized successors, especially the MARC record. During the 20th century, a great deal of attention was given by librarians to the question of just what kinds of metadata should be employed in standard practice, i.e., to the question of how to set standards for the kinds of data that should be recorded—in the form of a catalog card or its equivalent—about the various materials that libraries and other information agencies collect. In Anglophone countries, an important embodiment of standardized metadata practices has been the various versions, beginning in 1908, of the Anglo-American Cataloging Rules (AACR), developed cooperatively by the American Library Association, the Canadian Library Association, and the Library Association (which is, of course, the association of libraries and librarians in the United Kingdom; the association's founders felt that any educated person would be able to supply the missing adjective, "British").

The principal elements used in providing metadata descriptions of typical library materials, such as books and other InBEs, include (see Endnote 1):

  • Main Entry
    A main entry specifies a personal author or creator; a corporate body author or creator; or the title of an InBE.
  • Added Entries
    Added entries are made for
    • Collaborators
    • Editors, compilers, revisers, etc.
    • Original authors (e.g., of works that have been extensively revised by others)
    • Adapters (e.g., an author of an adaptation of a work)
    • Performers (e.g., a conductor of a recorded musical work)
    • Corporate bodies (if their responsibility extends beyond mere publication)
    • Translators
    • Illustrators
    • Series
  • Subject headings
    Subject headings provide clues to the subject(s) with which the InBE deals.

The Dublin Core Initiative

Recent years have seen the widespread development of online public-access catalogs (OPACs) and, especially, the explosion of information resources available via the World-Wide Web. These developments sparked an effort to define a minimally sufficient set—a "core"—of cataloging data, i.e., metadata, that would be useful as a standard for OPACs and, in particular, for catalogs, guides, search engines, etc., aimed at providing access to "Document-Like Objects" (DLOs) available via the Web. This effort has become known as the Dublin Core Initiative because it began with a workshop held in March 1995 in Dublin, Ohio, sponsored by OCLC, Inc., and the National Center for Supercomputing Applications (NCSA).

The Dublin Core Initiative defined, in December 1996, a set of 15 metadata elements to be used as the minimally sufficient set, or core. This set has become known as the "Dublin Core." Here are the elements of the Dublin Core, as condensed from Dublin Core Metadata Element Set, Version 1.1: Reference Description:

 
ELEMENT
DEFINITION

COMMENT

  Title A name given to the resource

Typically, a Title will be a name by which the resource is formally known.

  Creator

An entity primarily responsible for making the content of the resource.

Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.

  Subject

The topic of the content of the resource.

Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

  Description

An account of the content of the resource.

Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

  Publisher

An entity responsible for making the resource available

Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.

  Contributor

An entity responsible for making contributions to the content of the resource.

Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.

  Date

A date associated with an event in the life cycle of the resource.

Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.

  Type

The nature or genre of the content of the resource.

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the working draft list of Dublin Core types [DCT1]). To describe the physical or digital manifestation of the resource, use the FORMAT element.

  Format

The physical or digital manifestation of the resource.

Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats).

  Identifier

An unambiguous reference to the resource within a given context.

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

  Source

A reference to a resource from which the present resource is derived.

The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.

  Language

A language of the intellectual content of the resource.

Recommended best practice for the values of the Language element is defined by RFC 1766 [RFC1766] which includes a two-letter Language Code (taken from the ISO 639 standard [ISO639]), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard [ISO3166]). For example, 'en' for English, 'fr' for French, or 'en-uk' for English used in the United Kingdom.

  Relation

A reference to a related resource.

Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.

  Coverage The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges.
  Rights

Information about rights held in and over the resource.

Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.

Summary

This lesson has provided an introduction to the idea of "metadata" and to how this term is used in LIS, with respect to both library-cataloging practice and the provision of access to information on the World-Wide Web.

For a deeper look at metadata and its many aspects, I strongly recommend that you read at least one of the following discussions:

  • Metadata by Dr. Francis L. Miksa, a note prepared for course LIS 384K.17, Organizing and Providing Access to Information, Graduate School of Library and Information Science, The University of Texas at Austin.
  • Setting the Stage by Dr. Anne J. Gilliland-Swetland, a book chapter that has been made available on the Web through the courtesy of the Getty Information Institute, a part of the J. Paul Getty Trust (see Endnote 2).

Endnotes

1. You will learn more about library-cataloging principles and practices when you take such GSLIS courses as LIS 384K.17, Organizing and Providing Access to Information (Gateway II), LIS 384K.8, Organization of Materials I, and LIS 384K.3, Subject Cataloging, Indexing and Categorization of Informational Materials.

2. "Setting the Stage" is part of an excellent short book:
Baca, Murtha, ed. Introduction to Metadata: Pathways to Digital Information. Los Angeles, CA: Getty Information Institute; 1998. ISBN:0-89236-533-1.

curve image  
Course emailbox: l38613dw@gslis.utexas.edu
GSLIS Website: www.gslis.utexas.edu

Last updated 2001 Aug 21 by R. E. Wyllys