|
|
MARC
Records and Variable-Length Record Structures
R. E. Wyllys
Introduction
This
lesson discusses the system known as MARC, the principal means in many
countries for computer-assisted handling of the bibliographic records
of InBEs (information-bearing entities) such as books, serials, and
other materials used in libraries and other information agencies. Our
main emphasis herein is the structure of the MARC record, i.e., the
computer files by which MARC information is actually handled.
The
MARC System
MARC
stands for MAchine-Readable Cataloging.
The MARC system originated at the Library of Congress (LC) in 1965,
as a partprobably the single most important partof the beginnings
of the automation of libraries in the U.S. and elsewhere. These beginnings
coincided with the arrival of business-oriented computers that could
perform various tasks at costs low enough to attract a much wider audience
of consumers than in years prior to the mid-1960s.
Before
we consider MARC in detail, we pause for a moment to put MARC into the
time-frame of one human being's professional careera career that
spanned vast changes in the world of libraries. In "The Library
Bulletin" of the UT-Austin General Libraries for February 23, 2001
(Vol. XX, No. 2, pp. 7-8) appeared an item, written by Peggy Mueller,
that stated in part:
Fleetwood
Giles, Cataloger, Cataloging Department, retired January 31, 2001, after
forty-six years of University of Texas employment. Fleetwood received
a B.A. (pre-med) from UT Austin in 1950, studied toward a B.S. in secondary
education and then completed the M.L.S. in 1958 from UT-Austin. He joined
the General Libraries in 1954 and was promoted to a professional librarian
position in 1957. . . . As a cataloger Fleetwood's career began with
3"x5" paper cards, electric erasers, and manual typewriters
and progressed through work forms to online cataloger workstations,
international bibliographic databases and much, much more.
Having
noted the extent of change in cataloging over the past four decades,
let us return to MARC itself. Building on a pilot project during 1965-1967,
the Library of Congress settled in 1968 on a form of computerized recording
of cataloging information, the MARC II record. This was the foundation
for the current record, called MARC 21, which is essentially the MARC
II record with some added features.
The
MARC record is a computer-readable and -manipulable record of cataloging
information for information-bearing entities (InBEs), such as books,
serials, etc. The MARC system moves cataloging information among the
institutions that participate in the system, which consists of:
- The Library
of Congress, which is the largest single originator of cataloging
information in the world. LC which also maintains and updates the
definitions of the various types of MARC records used in the U.S.,
in cooperation with advisory groups of librarians and other bibliographic
specialists and with national and international standards organizations
- A substantial
number of research libraries in the U.S. and elsewhere, which also
produce original cataloging records. The General Libraries of The
University of Texas at Austin are regularly among the top few libraries
in the U.S. in terms of the numbers of cataloging records they originate.
- Bibliographic
utilities, which help to disseminate and supplement MARC information
and to provide network support to cooperating libraries and information
agencies
- The bibliographic utilities include OCLC
(originally the Ohio College Library Center, but now known formally
just as OCLC, Inc.) and RLIN,
the Research Libraries Information Network.
- Affiliated
with the OCLC network are regional
networks that help transmit information to and from OCLC. The
regional networks include AMIGOS,
which is based in Dallas and serves over 650 libraries in Arizona,
Arkansas, New Mexico, Oklahoma, and Texas.
- Thousands
of participating libraries.
MARC
systems exist in several countries, with minor differences to accommodate
local needs. What is used in the U.S. is often called USMARC to identify
it with this country. Examples of other MARC systems are CAN/MARC and
UKMARC, the Canadian and U.K. systems, respectively.
The
Nature of the MARC Record
Fixed-Length
Computer Records
A
typical computer file consists of a set of records of equal length:
i.e., records such that
- each record
consists of the same fixed number of bytes,
- each record
is made up of the same number of fields, and
- each different
type of field has the same number of bytes in every record.
Note
that these restrictions allow different types of fields to have different
numbers of bytes. For example, suppose that each record contains a Social
Security Number (SSN) field and a telephone-number field (plus other
fields that we ignore here). Each SSN field will consist of 9 bytes;
each telephone-number field, of 10 bytes. The kind of structure we have
just described is called a "fixed-length" record.
Fixed-length
records work well for many applications. For example, consider a file
used in a company's accounting department to contain the information
needed to prepare employees' paychecks. The records in such a file would
store information for the individual employees of the company. Each
record would need fields that contain such information as SSN, hourly
wage rate, number of income-tax withholding deductions claimed, number
of hours worked in the current week, number of hours of overtime worked
in the current week, total of wages paid to date in the calendar year,
and total withheld to date in the calendar year. Each such field will
have a fixed length appropriate to the nature of the information in
the field; each record will contain the same number of fields; and,
hence, each record will be of the same fixed length as every other record
in the file.
However,
it should be clear that there are types of information that do not fit
neatly in fixed-length fields. For example, the title of a book can
vary in length anywhere from one byte to hundreds (or even thousands)
of bytes; the surname and the first name(s) of an author can vary ("Ann
Lee" is much shorter than "Gustaf-Adolphus von Sachsen und Coburg"),
and a book (in LC cataloging practice) can have 1, 2, or 3 authors (i.e.,
there can be a need for multiple fields for authors' names). As a moment's
reflection will show, these examples indicate that there can be serious
problems in using fixed-length records to handle certain types of data.
How
could one design a title field for a fixed-length record for book data?
Suppose we know that some book titles can be as long as, say, 1492 characters.
If we decide to provide 1492 bytes in a fixed-length field for titles,
then the vast majority of titles, being much shorter than 1492 bytes,
will occupy only a small portion of the title field, the rest of which
will have to be filled with space characters. For most records, this
would be a great waste of computer-storage space and communications
time. On the other hand, if we decide to provide fewer than 1492 bytes
for the title field, say 100 bytes, then we encounter another problem:
viz., although most titles will fit into a 100-byte field, there will
still be some wasted space with many titles, and, worse, some titles
will have to be truncated to their first 100 characters (including space
characters). (Furthermore, even the space-wasting 1492-byte field might
turn out to be too short for an extraordinary title.)
The
same problem, and a related one, arise with the author field. First,
it is clear that the varying lengths of authors' names present the same
problem as that of varying lengths of titles. But there is a second
problem, which stems from the fact that there can be 1, 2, or 3 authors
of a book. If we include 3 author fields in every fixed-length record
for a book, then much of the time, there will be nothing in the 2nd
author field and the 3rd author field but space characters.
Variable-Length
Computer Records
When
the staff of the MARC pilot project began, in 1965, to consider how
to handle catalog data in computers, they immediately encountered the
problems we have just outlined. Furthermore, at that time, almost all
computer files that had ever been designed or used were of the fixed-length-record
type. The MARC designers came up with a then-novel solution: the variable-length
record.
There
are two basic ways of designing a variable-length record for computer
use. The first way is to mark, or delimit, the beginnings and endings
(or, at a minimum, either the beginning or the ending) of fields and
records by special characters that are reserved for that purpose. (Note:
Almost all computer files, whether of fixed-length or variable-length
type, employ a special character to mark the end of the file. And many
fixed-length-record computer files use special end-of-record characters
for convenience and as a safety measure against error.) In order for
a computer program to use a file of variable-length records with variable-length
fields (and, possibly, of varying numbers of occurrences of a given
field), the program must, as it opens the file, examine each successive
character in the file to determine whether the character is one of the
special end-of-field or end-of -record delimiters. Whenever a character
is found to be a delimiter, the program knows it has finished inputting
a field or a record, and the program must take steps to handle the field
or record appropriately.
The
second way of designing a variable-length record is to include, at the
beginning of each record, a special field, of fixed length, in which
the lengths of all the variable-length fields in the record are specified,
but to use no special end-of-field or end-of-record characters. This
special field, usually called the "header", must itself be of fixed
length so that the program can quickly establish the nature of the structure
of the whole record, including its variable-length parts, by examining
the contents of the header. Often the header, since it is of fixed length,
will also include certain fields that are known always to be of a fixed
length (e.g., 4 digits for a year).
The
MARC record uses both these ways of dealing with records of catalog
information. Before we consider at the MARC record format, however,
we shall look at a example of each way of handling variable-length records.
Example
of Variable-Length Record Structure Using Delimiters
Suppose
we have some information about three companies, including their addresses,
and our contacts in the companies. Here are the data as we might write
them on pages in an address book.
IBM
Corporation
11400
Burnet Road
Building A1
Austin,
Texas 78758
Contacts:
Sam Robertson
Big-Bang Startup
Company
10
W. Martin Luther King Jr. Boulevard
Austin,
Texas
Contacts:
Stephen Hawking
ABC
Company
123 Main Street
Pocahontas,
Iowa 50747
Contacts:
Joe Smith, Jane Roe, Mary Fulano, John A. Doe
Next, suppose
we decide to store these data in a computer file using a variable-length
structure. First, we display the overall structure of each record,
then the delimiters we shall use, and, finally, the foregoing data
after being placed in the file.
Record
Structure
|
|
COMPANY_NAME |
a
variable-length field |
| |
ADDRESS |
a
variable-length field that may be repeated as many times as necessary
|
| |
CITY |
a
variable-length field |
| |
STATE |
a
variable-length field (state names are used, not their abbreviations) |
| |
ZIP |
a
variable-length field (since it can be either 5 or 9 digits in
length) |
| |
CONTACT_NAME |
a
variable-length field that may be repeated as many times as needed |
Delimiters
| |
« |
beginning
of file |
| |
» |
end
of file |
| |
ƒ |
beginning
of field |
| |
^ |
end
of field |
| |
‡ |
beginning
of subfield, i.e., beginning of one occurrence of a repeatable
field |
| |
† |
end
of subfield, i.e., end of one occurrence of a repeatable field |
| |
~ |
beginning
of record |
| |
§ |
end
of record |
Sample
File of Data Stored as Variable-Length Records Using Both Beginning
and Ending Delimiters
«~ƒIBM Corporation^ƒ‡11400
Burnet Road†‡Building A1^ƒAustin^ƒTexas^ƒ78758^ƒSam Robertson^§~ƒBig-Bang
Startup Company^ƒ10 W. Martin Luther King Jr. Boulevard^ƒAustin^ƒTexas^ƒ^ƒStephen
Hawking^§~ƒABC Company^ƒ123
Main Street^ƒPocahontas^ƒIowa^ƒ50574^ƒ‡Joe Smith‡†Jane Roe‡†Mary
Fulano†‡John A. Doe^§»
Note: In the
second record, that for Big-Bang Startup Company, there is no ZIPcode.
Its absence is shown by the use of adjacent beginning-of-field and
end-of-field delimiters, "ƒ^".
Next, we observe
that there are actually some unnecessary delimiters in the above example.
For instance, the physical beginning of a file will be identified
by whatever computer operating system is being used, so that our use
of an explicit beginning-of-file delimiter is superfluous, and we
may omit it. But, of course, once a program starts looking at the
contents of a file, it is important for the program to be able to
identify the end of the file, so we will not omit the end-of-file
delimiter.
In similar fashion,
we can observe that it is really not necessary to mark both the beginning
and the ending of each record. The beginning of the very first record
in the file must coincide with the beginning of the file itself; and
the beginnings of second and later records in the file must occur
immediately after an end-of-file mark. Thus, we may omit the beginning-of-record
delimiters provided that we retain the end-of-record delimiters.
Again
in similar fashion, we can note that it is unnecessary to mark both
the beginning and ending of each field. The beginning of the first field
in a record must coincide with the beginning of the record itself, and
the beginnings of second and later fields in the record must occur immediately
after an end-of record mark. Thus, we may omit the beginning-of-field
delimiters provided that we retain the end-of-field marks.
Finally,
in somewhat similar fashion, we can note that it is unnecessary to mark
both the beginning and ending of each subfield. We could reason, in
the fashion we have been using, that the beginning of the first subfield
in a field must coincide with the beginning of the field itself, and
that the beginnings of second and later subfields in the field must
occur immediately after an end-of subfield mark. However, we could also
reason that the ending of the first subfield in a field must occur immediately
before the beginning of the second subfield; that the ending of the
second subfield in a field must occur immediately before the beginning
of the third subfield; and so on for further subfields. This indicates
that it would be sufficient to use just beginning-of-subfield delimiters
and to omit end-of-subfield delimiters. (In fact, this is what the MARC
record format does.)
Here
is the example we used above, except that this time, in keeping with
the foregoing reasoning, we have omitted the beginning-of-file
delimiters, beginning-of-record
delimiters, beginning-of-field
delimiters, and end-of-subfield
delimiters, with
the result shown below.
Minimal Set of
Delimiters
| |
» |
end
of file |
| |
^ |
end
of field |
| |
‡ |
beginning
of subfield, i.e., beginning of one occurrence of a repeatable
field |
| |
§ |
end
of record |
Sample
File of Data Stored as Variable-Length Records Using a Minimal Set
of Delimiters
IBM
Corporation^‡11400 Burnet Road‡Building A1^Austin^Texas^78758^Sam Robertson^§Big-BangStartup
Company^‡10 W. Martin Luther King Jr. Boulevard^Austin^Texas^^Stephen
Hawking^§ABC Company^‡123
Main Street^Pocahontas^Iowa^50574^‡Joe Smith‡Jane Roe‡Mary Fulano‡John
A. Doe^§»
The
above example uses delimiters in a fashion quite similar to that of
the MARC record format.
Example
of Variable-Length Record Structure Using Header Blocks
Suppose
that we have (partial) cataloging data for two books.
Rob,
Peter; Coronel, Carlos. Database Systems: Design, Implementation,
and Management. Course Technology;
1997. ISBN:0-7600-4904-1.
Cassel, Paul.
Teach Yourself Access 97 in 14 Days. Sams; 1996. ISBN:0-672-30969-6.
Next,
suppose we decide to store these data in a computer file using a variable-length
structure that employs the header-block approach.. First, we display
the overall structure of each record and then the foregoing data after
being placed in the file.
Database
Structure
| Header
Block |
|
By
design, known to be 29 characters long |
| |
RECORD_ID
|
The
ISBN is used in this example. |
| |
COPYRIGHT_DATE |
|
| |
TITLE_LENGTH |
|
| |
LENGTH_OF_FIRST_AUTHOR_FIELD
|
By
LC design, no more than 3 authors |
| |
LENGTH_OF_SECOND_AUTHOR_FIELD |
|
| |
LENGTH_OF_THIRD_AUTHOR_FIELD |
|
| |
LENGTH_OF_PUBLISHER_FIELD |
|
| Data
Block |
|
|
| |
TITLE |
|
| |
FIRST_AUTHOR |
|
| |
SECOND_AUTHOR |
|
| |
THIRD_AUTHOR |
|
| |
PUBLISHER |
|
Sample File of Data Stored Using a Header Block
07600490411997056009014000017Database
Systems: Design, Implementation, and ManagementPeter RobCarlos CoronelCourse
Technology§06723096961996035011000000004Teach
Yourself Access 97 in 14 DaysPaul CasselSams§»
Translation of Sample for Humans
For
an example, we use the header block of the first record, in order to
show that the header-data string is parsed as though it read:
0760049041 1997 056 009 014 000 017
where
the first ten characters are the ISBN (0760049041); the next four characters,
the copyright date (1997); the next three, the number of characters
in the title (56); the next three, the number of characters in the first
author's name (9); the next three, the number of characters in the second
author's name (14); the next three, the number of characters in the
third author's name (0); and the last three, the number of characters
in the publisher's name (17). The second header-data string is parsed
in an analogous way.
Note:
This example is a simplified analog of the MARC record structure. It
shows how, in principle, header blocks of a fixed length can furnish
all the information needed for records of varying lengths. The actual
MARC record structure combines the header-block structure with field
delimiters. The resulting redundancy helps to reduce data errors.
The
MARC Format
The
actual MARC 21 format (the current version of USMARC) is based on the
header-block approach plus some use of delimiters. A MARC record begins
with a block that is always 24 characters long and is called the "leader".
The characters are numbered starting at 0, so that the leader occupies
position numbers 00 through 23 (we use an initial "0" in the numbers
of the first ten positions to minimize ambiguity).
Here
is what the various positions in the leader of a MARC record mean:
| |
00-04
|
Length of the entire record in bytes |
| |
05
|
Record
status (e.g., n = new; c = changed; d = deleted) |
| |
06
|
Type
of record (e.g., a = bibliographic; c = music, printed or microform) |
| |
07
|
Bibliographic
level (e.g., m = monograph; s = serial) |
| |
08-09
|
These
positions are always blank in USMARC records |
| |
10
|
Indicator
count (2 in USMARC records, since all such records use 2 indicators
per field) |
| |
11
|
Subfield
code count (2 = number of characters used to identify subfields,
the first such character always being ‡) |
| |
12-16
|
Base
address of data (i.e., the location of the first character of the
first data field, field 100; e.g., a base address of 00277 would
mean that the first data character was in the 278th position from
the start of the record, where the character's position number is
277, as noted earlier) |
| |
17
|
Encoding
level (i.e., the level of completeness of the record; "blank" denotes
"complete") |
| |
18
|
Descriptive
cataloging form (e.g., a = according to AACR2 [i.e., Anglo-American
Cataloging Rules, Revision 2]) |
| |
19
|
Linked
record code (blank if no related record; r if a related record exists) |
| |
20-23
|
Entry
map (always 4500 in USMARC; the final "0" has no current meaning,
but this position is reserved for possible future use) |
You
can see that the leader contains two portions (positions 00-04 and 12-16)
that deal with the lengths: the first, with the length of the entire
record; the second with the number of characters to be negotiated before
the beginning of the actual cataloging data. The rest of the leader
is given over to codes that can be used in searching for various types
of bibliographic data and/or cataloging data. The fixed positions of
these codes at the beginning of each MARC record facilitate rapid searching
of a file of MARC records.
The
leader is followed immediately by a section called the "directory".
The directory consists of a number of blocks of data, one for each field
in the portion of the MARC record that contains the actual cataloging
data. These blocks are all of the same length, but different records
can have different numbers of blocks, depending on the numbers of fields
in the actual cataloging data. (Typical records often have around a
dozen such blocks.) Each block shows the tag of its field, the length
of its field, and the starting position of its field relative to the
first field in the record. Thus the directory permits rapid location
of the start of each field. The end of the directory is indicated by
a "^" (the end-of-field delimiter).
Finally,
following the directory, come the fields containing the actual cataloging
data. These fields start in the character position specified in positions
12-16 of the leader. For the details of how these fields look in the
record and how they are used, see the MARC guide by Betty Furrie cited
in the second paragraph below.
Closing Remarks
The
purpose of this lesson has been to familiarize you with the MARC system
in quite general terms, and especially with the sophisticated nature
of the structure of computer files used to store and communicate cataloging
data for InBEs. You are not expected to learn the fine details of such
matters as the detailed uses of the various positions in the leader
of a MARC record, but you are expected to learn what the overall structure
is like: viz., that it is a combination of the header-block approach
and the delimiter approach to handling variable-length records. You
are also expected to understand the differences between computer files
made up of fixed-length records and those made up of variable-length
records, as well as why the storage and communication of cataloging
data demands the use of variable-length records.
For
further details and examples of how MARC 21 records are used and constructed,
I strongly recommend your looking at Understanding
MARC Bibliographic: Machine-Readable Cataloging, written by Betty
Furrie and made available online by the MARC Office of the Library of
Congress. The section of Ms. Furrie's work entitled MARC
21 Reference Materials presents a detailed example of a MARC 21
record.
|