THE UNIVERSITY OF TEXAS AT AUSTIN
SCHOOL OF INFORMATION


LIS 384K.11 (known as INF 385M, beginning with the Fall Semester 2003)
DATABASE-MANAGEMENT PRINCIPLES AND APPLICATIONS
R. E. Wyllys

Relational-Algebraic Operations on Database Tables


Introduction

This Course Note discusses (1) the operations of relational algebra on tables (relations) and (2) the relational database operations of "union," "intersection," "difference," "product," "projection," "selection," and "join."

Tables are arrays of elements arranged in rows and columns. In that respect, they are analogous to the mathematical concept of "matrix," but they differ from matrices in several ways, most notably in that the elements of a table need not be numbers, whereas the elements of a matrix must be numbers. Hence, the mathematical operations that can be performed on matrices, such as addition, subtraction, multiplication, and inversion, all of which require numbers, cannot be performed on tables. Instead, on tables we can perform only operations like those that we perform on sets, operations which have names like "union," "intersection," "product," etc. (whose exact nature we explore herein) and which are called "relational-algebraic" operations.

Relations (Tables) Used in this Course Note

We use several relations as examples herein. This section summarizes them in terms of: the names of the relations, their attributes, and the domains of the contents of the attributes. Remember that the domain of an attribute is the type of entry (or entries) that the attribute can have: alphabetic character, numeral, logical value, etc.

Relation GRAD where the attributes are SSN, Fname, Lname, and Major, and where
SSN = Social Security Number, with domain NNN"-"NN"-"NNNN
Fname = personal name, with domain 15A
Lname = family name, with domain 20A
Major = degree-program name, with domain 35A

Following the usual notational convention, we can call the above relation GRAD(SSN, Fname, Lname, Major) where the underscore indicates that we regard attribute SSN as the primary key.

Relation HONOR_STUDENT(Number, Fname, Lname, Interest) where
Number = Social Security Number, NNN"-"NN"-"NNNN
Fname = personal name, 15A
Lname = family name, 20A
Interest = degree-program name, 35A

Relation FACULTY(Fnum, Dept, Fname, Lname) where
Fnum = Social Security Number, NNN"-"NN"-"NNNN
Dept = Department name without the words "Department of", so that it has the same possible entries as the degree-program name, with domain 35A
Fname = personal name, 15A
Lname = family name, 20A

Relation STUDENT(SID, Fname, Lname, Major, Level, Birthdate, GPA) where
SID = Social Security Number, NNN"-"NN"-"NNNN
Fname = personal name, 15A
Lname = family name, 20A
Major = degree-program name, 35A
Level = FR, SO, JU, SE, or GR, 2A
Birthdate = month, day, and year, in pattern MMDDYY, with domain NN"/"NN"/"NN
GPA = grade-point average, N.NN

Relation CLASS(Cnum, Cname, Time, Room) where
Cnum = department abbreviation and class number, AAANNN
Cname = formal name of class, 30A
Time = hours and minutes, concatenated with "A" or "P", with domain HH":"MMA
Room = building abbreviation and room number, with domain 3A3N [=AAANNN]

Relation ENROLLMENT(SID, Cnum, RosterNum) where
SID = Social Security Number, NNN"-"NN"-"NNNN
Cnum = department abbreviation and class number, AAANNN
RosterNum = position of student in alphabetized list, 3N

Union-Compatibility

Two relations can be put together (added together, so to speak) only if (1) they have the same number of columns and (2) corresponding columns have the same domains. Technically speaking, the columns must be in the same order in the two original relations, but since the content of a table is not affected by the order of its columns or rows, columns that actually correspond in the nature of their entries can always be arranged (or re-arranged) so as to be in the same order. On the other hand, corresponding columns in the two relations may or may not have the same name; what really matters is the nature of the entries.

For example, we might have two tables each containing a column of Social Security Numbers, yet the column might be named "Student ID" in one table and Faculty ID in the other table. If, in other respects, the tables appeared similar, then to decide whether they were actually union-compatible, you would have to make a careful examination of the definitions of the domains of their respective attributes to verify that these domains are, in fact, the same. For example, the relation FACULTY is almost union-compatible with HONOR_STUDENT. After checking the definitions of the domains, you can see that the only real problem with FACULTY is that its Column 2, "Dept", would have to be moved to Column 4 in order to make union-compatibility possible. Since insofar as the information content of a relation is concerned, the order of its columns is immaterial, it is possible to define a new relation, FACULTY1(Fnum, Fname, Lname, Dept), which differs from FACULTY(Fnum, Dept, Fname, Lname) only in the order of its attributes. Then FACULTY1 and HONOR_STUDENT would be union-compatible.

Projection

A "projection"of a table is a choice of some or all of its columns. The selection may or may not involve changing the order of the columns. For example, we can describe the change from FACULTY to FACULTY1 as consisting of the "projection of FACULTY onto FACULTY1". This is so because the change involves selecting columns of FACULTY (all of them, as it happens) for incorporation into FACULTY1 in (as it happens) a slightly different order.

To illustrate these ideas, we can work with the following examples of the content of the relations defined as above, using fictitious names and courses from Grandiose State University (GSU), a fictional institution.

Contents of the Relations

Here are the full contents of the relations that we shall use in the examples of relational-algebraic operations.

GRAD
123-45-6789 Jack Jones LIS
222-33-4444 Lynn  Lee LIS
987-65-4321 Mary Ruiz History
555-66-7777 Mae East Drama

HONOR_STUDENT
888-99-1234 Raquel Ruiz Management
222-33-4444 Lynn  Lee LIS
543-21-0987 Ada Byron Computer Science

FACULTY
333-22-4567 Music Fanny Mendelssohn
444-34-7654 Mathematics Bernhard Riemann
332-11-3690 LIS Shera Jesse
345-65-8642 LIS Sarah Vann

FACULTY1
333-22-4567 Fanny Mendelssohn Music
444-34-7654 Bernhard Riemann Mathematics
332-11-3690 Jesse Shera LIS
345-65-8642 Sarah Vann LIS

STUDENT
435-69-3739 Martha Johnson Biology SE 03/11/73
3.00
454-67-8876 Henry Aldridge Psychology FR 06/07/74
1.95
123-45-6789 Jack Jones LIS GR 11/17/71
3.25
667-83-9631 Elmer Gantry Theology JU 07/04/72
2.25
222-33-4444 Lynn  Lee LIS GR 10/09/70
3.90
987-65-4321 Mary Ruiz History GR 05/15/69
2.90
888-99-1234 Raquel Ruiz Management SO 08/01/73
3.90
777-66-5432 John Doe Philosophy FR 09/16/73
1.75
543-21-0987 Ada Byron Computer Science SE 02/29/72
3.95

CLASS
LIS301 Library Administration 09:30A DEW123
LIS203 Information Management 08:00A DEW456
LIS456 Database Design 01:30P COD367
PHI123 Introduction to Philosophy 04:00P PLA552

ENROLLMENT
123-45-6789 LIS301
001
222-33-4444 LIS301
002
543-21-0987 LIS203
001
888-99-1234 LIS203
004
123-45-6789 LIS203
002
222-33-4444 LIS203
003
543-21-0987 LIS456
002
222-33-4444 LIS456
006
123-45-6789 LIS456
005
435-69-3739 LIS456
004
454-67-8876 LIS456
001
667-83-9631 LIS456
003
987-65-4321 LIS456
007
667-83-9631 PHI123
002
543-21-0987 PHI123
001

Relational-Algebraic Operations

Here are some examples of relational-algebraic operations performed on these tables.

Union

The first step in forming the union of tables GRAD and HONOR_STUDENT is to put together all the rows that occur in either or both of these tables:
 
GRAD union HONOR_STUDENT, Step 1
123-45-6789 Jack Jones LIS
222-33-4444 Lynn  Lee LIS
987-65-4321 Mary Ruiz History
888-99-1234 Raquel Ruiz Management
222-33-4444 Lynn  Lee LIS
543-21-0987 Ada Byron Computer Science

After the necessary operation of eliminating any and all rows that are duplicates of another row (here, the second Lynn Lee row), we have the final step in the union of GRAD and HONOR_STUDENT. It is a table that contains entries for all students who are either graduate students or honor students or both. (Note: We use "union" to denote the operation because most browsers cannot yet handle the mathematical symbol for the union of two sets, even though the World-Wide Web Consortium's standard for HTML 4.0 includes this union symbol along with many other mathematical symbols. For the same reason, we shall spell out some of the other relational-algebraic symbols in what follows.)
 
GRAD union HONOR_STUDENT
123-45-6789 Jack Jones LIS
222-33-4444 Lynn  Lee LIS
987-65-4321 Mary Ruiz History
888-99-1234 Raquel Ruiz Management
543-21-0987 Ada Byron Computer Science

Difference

Next, we can consider the difference of tables GRAD and HONOR_STUDENT. The difference table is what results when we subtract HONOR_STUDENT from GRAD:, viz., the set consisting of those rows of GRAD that are left after we remove from GRAD any row that occurs also in HONOR_STUDENT. In words, it is the set of those graduate students who are not honor students. It is written GRAD - HONOR_STUDENT.
 
GRAD - HONOR_STUDENT
123-45-6789 Jack Jones LIS
987-65-4321 Mary Ruiz History


Note that if we took the difference the other way round, viz., HONOR_STUDENT - GRAD, we would get a different result, the set of those honor students who are not graduate students.

Intersection

The intersection of tables GRAD and HONOR_STUDENT is the set of only those rows that occur in both GRAD and HONOR_STUDENT. In words, it is the set of the graduate students who are also honor students.
 
GRAD intersection HONOR_STUDENT
222-33-4444 Lynn  Lee LIS

Note:: The foregoing operations (union, difference, and intersection) can be performed on tables GRAD and HONOR_STUDENT only because these tables are union-compatible.

Product

The product (also called the "cross-product" and the "Cartesian product") of GRAD and ENROLLMENT is the combination of each row of GRAD with every row of ENROLLMENT:
 
GRAD cross ENROLLMENT
123-45-6789 Jack Jones LIS 123-45-6789 LIS301
001
123-45-6789 Jack Jones LIS 222-33-4444 LIS301
002
123-45-6789 Jack Jones LIS 543-21-0987 LIS203
001
123-45-6789 Jack Jones LIS 888-99-1234 LIS203
004
123-45-6789 Jack Jones LIS 123-45-6789 LIS203
002
123-45-6789 Jack Jones LIS 222-33-4444 LIS203
003
123-45-6789 Jack Jones LIS 543-21-0987 LIS456
002
123-45-6789 Jack Jones LIS 222-33-4444 LIS456
006
123-45-6789 Jack Jones LIS 123-45-6789 LIS456
005
123-45-6789 Jack Jones LIS 435-69-3739 LIS456
004
123-45-6789 Jack Jones LIS 454-67-8876 LIS456
001
123-45-6789 Jack Jones LIS 667-83-9631 LIS456
003
123-45-6789 Jack Jones LIS 987-65-4321 LIS456
007
123-45-6789 Jack Jones LIS 667-83-9631 PHI123
002
123-45-6789 Jack Jones LIS 543-21-0987 PHI123
001
222-33-4444 Lynn  Lee LIS 123-45-6789 LIS301
001
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS301
002
222-33-4444 Lynn  Lee LIS 543-21-0987 LIS203
001
222-33-4444 Lynn  Lee LIS 888-99-1234 LIS203
003
222-33-4444 Lynn  Lee LIS 123-45-6789 LIS203
002
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS203
003
222-33-4444 Lynn  Lee LIS 543-21-0987 LIS456
002
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS456
006
222-33-4444 Lynn  Lee LIS 123-45-6789 LIS456
005
222-33-4444 Lynn  Lee LIS 435-69-3739 LIS456
004
222-33-4444 Lynn  Lee LIS 454-67-8876 LIS456
001
222-33-4444 Lynn  Lee LIS 667-83-9631 LIS456
003
222-33-4444 Lynn  Lee LIS 987-65-4321 LIS456
007
222-33-4444 Lynn  Lee LIS 667-83-9631 PHI123
002
222-33-4444 Lynn  Lee LIS 543-21-0987 PHI123
001
987-65-4321 Mary Ruiz History 123-45-6789 LIS301
001
987-65-4321 Mary Ruiz History 222-33-4444 LIS301
002
987-65-4321 Mary Ruiz History 543-21-0987 LIS203
001
987-65-4321 Mary Ruiz History 888-99-1234 LIS203
004
987-65-4321 Mary Ruiz History 123-45-6789 LIS203
002
987-65-4321 Mary Ruiz History 222-33-4444 LIS203
003
987-65-4321 Mary Ruiz History 543-21-0987 LIS456
002
987-65-4321 Mary Ruiz History 222-33-4444 LIS456
006
987-65-4321 Mary Ruiz History 123-45-6789 LIS456
005
987-65-4321 Mary Ruiz History 435-69-3739 LIS456
004
987-65-4321 Mary Ruiz History 454-67-8876 LIS456
001
987-65-4321 Mary Ruiz History 667-83-9631 LIS456
003
987-65-4321 Mary Ruiz History 987-65-4321 LIS456
007
987-65-4321 Mary Ruiz History 667-83-9631 PHI123
002
987-65-4321 Mary Ruiz History 543-21-0987 PHI123
001

The above table shows in full the product of tables GRAD and ENROLLMENT (which is read as "GRAD cross ENROLLMENT"). However, the meaningful rows in the product are merely those in which the SSN attribute of GRAD matches the SID attribute of ENROLLMENT. (Note: It is possible to talk about "matching" these attributes only because--as you can verify--they have the same domain.)

Selection and the Equijoin of Two Tables

We can perform the operation of selection on the product table, GRAD cross ENROLLMENT; i.e., we can select rows from this product. If we use as our criterion for selection the match of SSN and SID, and if we discard all rows in which these attributes do not match, we will obtain a relation that provides information about the classes in which the grads are enrolled. This relation is a kind of join, the kind called an equijoin. We shall call this particular example "GRAD_COURSES" (the underscore serves only to provide a string of characters that lacks a space but is easily read by people).
 
GRAD_COURSES
123-45-6789 Jack Jones LIS 123-45-6789 LIS301
001
123-45-6789 Jack Jones LIS 123-45-6789 LIS203
002
123-45-6789 Jack Jones LIS 123-45-6789 LIS456
005
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS301
002
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS203
003
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS456
006
987-65-4321 Mary Ruiz History 987-65-4321 LIS456
007

Note: In a join, the attributes used in the selection condition must be from a common domain, as SSN and SID were in the preceding example.

Projection and the Natural Join (Inner Join) of Two Tables

Clearly, GRAD_COURSES still has some redundant information in it: viz., the two attributes with matching contents, SSN and SID. To polish the above result, we can perform a projection operation on GRAD_COURSES, selecting all columns except the SID column. We shall call the result GRAD_COURSES1.
 
GRAD_COURSES1
123-45-6789 Jack Jones LIS LIS301
001
123-45-6789 Jack Jones LIS LIS203
002
123-45-6789 Jack Jones LIS LIS456
005
222-33-4444 Lynn  Lee LIS LIS301
002
222-33-4444 Lynn  Lee LIS LIS203
003
222-33-4444 Lynn  Lee LIS LIS456
006
987-65-4321 Mary Ruiz History LIS456
007

This last table, GRAD_COURSES1, is an example of what we ordinarily call a "join" operation.

It began with (1) the formation of the product of GRAD and ENROLLMENT, continued with (2) the selection from GRAD _ ENROLLMENT of certain rows to yield GRAD_COURSES, and finished with (3) the projection of GRAD_COURSES onto GRAD_COURSES1 by the deletion of the attribute SID.

It is sometimes useful to distinguish between the result of steps (1) and (2) in a join operation, exemplified by GRAD_COURSES, which, as we noted earlier, can be called the equijoin of GRAD and ENROLLMENT, and the result of step (3), exemplified by GRAD_COURSES1, which can be called the natural join (or inner join) of GRAD and ENROLLMENT. The word "join" is ordinarily used to mean the "natural (inner) join" of two tables. To put it another way, the join (or natural join, or inner join) is the equijoin without the duplication of the common field on which the join is based.

It is interesting to note that an RDBMS actually goes through all of the steps above when the user asks it to join two tables. That is, the RDBMS must take each row in the first table in turn, examine all the rows in the second table to see whether any of the latter rows match the selected row from the first table, and keep only those rows in the second table that do match. That process is repeated for every row in the first table. Of course, an RDBMS, operating at computer speeds, can do this fairly quickly even for large tables, especially if the programmers of the RDBMS have worked hard at incorporating useful shortcuts.

The Left and Right Outer Joins of Two Tables

Still another type of join is the outer join, which consists of all the rows of one table together with matching rows from another table. Here "matching" means "having the same value in the common field" of the two tables. Outer joins come in two types: left and right.

Briefly, if A and B are two tables, then we say that the right outer join of A and B consists of all distinct rows of A, with each distinct row of A paired with all its matching (i.e., joining) rows in B. If there are rows in A that lack any matching rows in B, then those rows from A will appear with empty cells to the right of them in the right outer join. One way to think of this is that the right outer join displays those rows in the right-hand table that can be joined with the rows of the left-hand table.

Similarly, we say that the left outer join of A and B consists of all distinct rows of B, with each distinct row of B paired with all its matching rows in A. If there are rows in B that lack any matching rows in A, then those rows from B will appear with empty cells to the left of them in the left outer join. Thus we can think of this as meaning that the left outer join displays those rows in the left-hand table that can be joined with the rows of the right-hand table.

For example, the left outer join of tables GRAD and ENROLLMENT is
 
Left Outer Join of GRAD and ENROLLMENT
123-45-6789 Jack Jones LIS 123-45-6789 LIS301
001
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS301
002
        543-21-0987 LIS203
001
        888-99-1234 LIS203
004
123-45-6789 Jack Jones LIS 123-45-6789 LIS203
002
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS203
003
        543-21-0987 LIS456
002
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS456
006
123-45-6789 Jack Jones LIS 123-45-6789 LIS456
005
        435-69-3739 LIS456
004
        454-67-8876 LIS456
001
        667-83-9631 LIS456
003
987-65-4321 Mary Ruiz History 987-65-4321 LIS456
007
        667-83-9631 PHI123
002
        543-21-0987 PHI123
001

Thus a left outer join shows two things well: (1) what are all the distinct rows in the right-hand table (i.e., the second of the two original tables); and (2) which rows in the right-hand original table are not matched by rows in the left-hand (i.e., the first) of the two original tables.

What a left outer join does not show directly is whether there are rows in the left-hand table that fail to match any rows in the right-hand table. For example, suppose suppose we have a table GRAD1, a slightly modified version of GRAD containing a row for an imaginary fourth graduate student, Mae East, a drama major who is not enrolled in any course during the current semester.

GRAD1
123-45-6789 Jack Jones LIS
222-33-4444 Lynn  Lee LIS
987-65-4321 Mary Ruiz History
555-66-7777 Mae East Drama

The left outer join of GRAD1 and ENROLLMENT would look exactly like the above left outer join of GRAD and ENROLLMENT, i.e., it would lack even a hint of the existence of Mae East.

Like a mirror image of a left outer join, a right outer join shows two analogous things well: (1) what are all the distinct rows in the left-hand table (i.e., the first of the two original tables); and (2) which rows in the left-hand table fail to be matched by rows in the right-hand table. What a right outer join does not show directly is whether there are rows in the right-hand table that fail to match any rows in the left-hand table.

The right outer join of tables GRAD and ENROLLMENT is the same table as that shown earlier with the title GRAD_COURSES. This is because in our example every row in the left-hand table, GRAD, happens to be matched by at least one row in the right-hand table, ENROLLMENT, as is shown in GRAD_COURSES, so that there are no rows with empty cells in the last three columns.

Slightly different from the right outer join of GRAD and ENROLLMENT (viz., GRAD_COURSES) is the right outer join of GRAD1 and ENROLLMENT. This latter right outer join has a row for Mae East, with blanks in the last three columns. However, the right outer join of GRAD1 and ENROLLMENT fails to show whether there are any courses offered in Grandiose State University in addition to those shown; for example, this right outer join lacks even a hint of the existence of course PHIL123.

Right Outer Join of GRAD1 and ENROLLMENT
123-45-6789 Jack Jones LIS 123-45-6789 LIS301
001
123-45-6789 Jack Jones LIS 123-45-6789 LIS203
002
123-45-6789 Jack Jones LIS 123-45-6789 LIS456
005
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS301
002
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS203
003
222-33-4444 Lynn  Lee LIS 222-33-4444 LIS456
006
987-65-4321 Mary Ruiz History 987-65-4321 LIS456
007
555-66-7777 Mae East Drama      

Notations for Joins and Projections

The join process is often notated by the use of a parenthetical expression to specify the selection criterion. For example, the join in the preceding example could be written as "GRAD join(SSN = SID) ENROLLMENT". It is true that this notation does not clarify the projection operation. But the notation is often satisfactory because ordinarily the selection operation is the most critical part of the join process, and the deletion (in the projection part) of a duplicative attribute or attributes is relatively easy and/or obvious.

The notation can be elaborated by specifying the attributes to be contained in the result of the projection operation. The following example indicates how this is done. We start with:

GRAD join(SSN = SID) ENROLLMENT = GRAD_COURSES

Then we use brackets to indicate the projection operation and to specify the attributes remaining in the final result, after the projection:

GRAD_COURSES[SSN, Fname, Lname, Major, Cnum, RosterNum] = GRAD_COURSES1

Note that the bracket notation also specifies the order of the attributes in the result. For example, we can express the projection of FACULTY onto FACULTY1, illustrated earlier, as follows:

FACULTY[Fnum, Fname, Lname, Dept] = FACULTY1

Now, suppose we are interested in what courses Jack Jones is taking. We can find that out by selecting those rows of GRAD_COURSES1 for which SSN is equal to 123-45-6789, Mr. Jones's Social Security Number. If we call the resulting table JONES_COURSES, we can notate this selection as follows:

GRAD_COURSES1 where SSN = "123-45-6789" = JONES_COURSES

The table is
 
JONES_COURSES
123-45-6789 Jack Jones LIS LIS301
001
123-45-6789 Jack Jones LIS LIS203
002
123-45-6789 Jack Jones LIS LIS456
005

Finally, these notations can be combined in ways that are suggested by the following example:

(GRAD join(SSN = SID) ENROLLMENT)[SSN, Fname, Lname, Major, Cnum, RosterNum] where SSN = "123-45-6789" = JONES_COURSES


Last revised 2004 Feb 23