Graduate School of Library and Information Science, UT Austin
Information Technologies
and the
Information Professions
spacer


Shortcuts
Home
Introduction
Syllabus
Texts
Tech Modules
Assignments
Standards
Grading
Completion
Resources
Discussion Board
 
GSLIS Links
GSLIS Home
Tutorial Junction
IT Services
 
Site Tools
Site Map
Contact Info
 
Introduction to TCP/IP
R. E. Wyllys

Introduction

This lesson discusses the Transmission Control Protocol (TCP), one of the combination of two protocols that together handle the movement of a large proportion of the actual bits and bytes on the Internet. The other is Internet Protocol (IP). The TCP/IP combination provides highly reliable transmission of messages (i.e., data) over the Internet.

Another important Internet protocol, User Datagram Protocol (UDP), is also used in conjunction with IP, but is of interest primarily to network administrators rather than to most end-users of the Internet. The UDP/IP combination is used in networks in which reliability can be ordinarily be assumed to be very high, so that the costly checking and re-checking processes provided by the TCP/IP combination can safely be dispensed with. Aside from this mention of UDP, we will not treat it here.

What is TCP/IP?

The combination of protocols known as TCP/IP was developed, starting in 1974, as part of research, sponsored by the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense (DoD), aimed toward developing the computer network known as ARPANET.

TCP/IP consists of two different types of protocol: The Transmission Control Protocol handles both the breaking up of a message into small individual packages (packets) at the transmission end of a communication link, and also the reassembling of the packets into the original message at the receiving end. The Internet Protocol handles the movement of the individual packets over Internet communication channels.

On your microcomputer you have software that implements the TCP/IP protocols, i.e., software that is the processor for TCP and IP. In general these programs are referred to as "sockets" or "TCP/IP stacks." On Windows machines the programs are called "Winsock"; on Macintoshes, as "MacTCP."

Packets and Their Structure

In this lesson we try to remove some of the mystery surrounding the workings of the Internet by explaining how the packet-communication process works. This involves taking a close look at the structure of a TCP packet, in the expectation that when you have read through this discussion you will understand the essence of the process (even though no one who does not work daily with the packet structure can be expected to remember all the details precisely).

At the transmission end TCP breaks up a message into small packages of bytes called packets. The table below shows the parts of each packet in terms of its component fields. The table displays the name of each field and the numbers, within the packet, of the beginning bit and the ending bit of each field. For example, the Destination Address field begins with bit number 17 (i.e., the 17th bit in the packet) and ends with bit number 32 (i.e., the 32nd bit in the packet). The final bit in the packet, the last bit in the Data field, has a number that corresponds to the actual size of the packet. Packet sizes may vary, but are generally between 1000 and 1500.

  Field Name Beginning and Ending Bits
  Source Address 1, 16
  Destination Address 17, 32
  Sequence 33, 64
  Acknowledgment 65, 96
  Data Offset 97, 100
  Reserved 101, 106
  Urgent Flag (URG) 107, 107
  Acknowledge Flag (ACK) 108, 108
  Process Flag (PSH) 109, 109
  Reset Flag (RST) 110, 110
  Synchronization Flag (SYN) 111, 111
  Finish Flag (FIN) 112, 112
  Window 113, 128
  Checksum 129, 144
  Urgent Marker 145, 160
  Options 161, 184
  Pad 185, 192
  Data 193, Packet Size

The Fields in the Packet

The Source Address field contains the Internet address of the source of the packet, in digital form. That is, it stores an address like "www.gslis.utexas.edu" as the actual number, e.g., "128.83.145.81", for which "www.gslis.utexas.edu" is simply a semi-mnemonic substitute, more easily understood and remembered by humans than the underlying digits of the actual address. In similar fashion, the Destination Address field contains the digital Internet address to which the packet is to be sent.

For example, suppose that you enter "http://www.gslis.utexas.edu" in the address slot of your browser and hit the "Enter" (or "Return") key. Your browser works with the software in your computer (about which more later) to send a signal over the Internet to the Web server at the GSLIS Information Technology Laboratory (I. T. Lab), asking that server to send back to your browser a particular Webpage (which you know as the GSLIS home page: "Welcome to GSLIS . . .", etc.). The signal from your browser to the GSLIS Web server is contained in a packet that begins with a source address, viz., the Internet address assigned to you (at least temporarily, during your current login) by your Internet Service Provider (ISP). If you are using a machine in the IT Lab, the source address will be one assigned semi-permanently by the GSLIS Web server to the particular machine you are using. In either case it will be a digital address consisting of four different parts (whose significance we can ignore here) separated by three dots (periods). The destination address will, of course, be that of the GSLIS server.

You will notice that the two address fields contain 32 bits each. This is because the current Internet uses addresses that can be expressed by no more than 32 bits. However, the Internet has grown so fast that the world is starting to run out of such addresses, and the next version of the Internet (see below) will use 128-bit addresses, which should suffice for quite a while. This means that the packet structure will need to be modified as the new version of the Internet is phased in.

The Sequence field displays the relative byte offset of the first byte of the data in the packet. The reason it is a relative offset is that the original message is considered, by the transmitting and receiving TCP processors, as beginning at an arbitrary 32-bit sequence number. The transmitting and receiving TCP processors agree on this number as part of the process, called "handshaking," by which they set themselves up as a transmitting-receiving pair for the time needed to send the message. This temporary relationship is called a "virtual circuit" because it is not a permanent circuit, such as the permanent circuit composed of the pair of copper wires from your telephone to the nearest telephone-exchange computer.

The arbitrary nature of the 32-bit sequence number helps to provide a "unique" identifier for the group of packets that will handle the original message. The identifier is "unique" in the sense that it is highly unlikely that any other message being transmitted during the same time period will, by chance, have the same identifier. (The likelihood of two messages having the same identifier within the same reasonably short time period is extremely small; for 232 = 4,294,967,296, which means that a 32-bit sequence number can be anything from 0 to 4,294,967,295, inclusive.)

The sequence number of a given packet thus is the sum of (i) the unique identifier and (ii) the offset of the first byte in the packet from the first byte in the original message. For example, if the unique identifier were 12, 345, 678, if the message began with "Good morning," and if the second packet began at the "m", then the sequence number would equal

12, 345, 678 + 5 = 12, 345, 683

because the "m" is in position 6 in the message, which is 5 positions to the right of the first byte, the "G". (In this example, I made the size of the first packet unrealistically small in order to simplify the counting process.)

The virtual circuit operates in "full duplex" mode; i.e., data can go in both directions simultaneously (see Endnote 1). As the receiving TCP processor receives packets, it sends back acknowledgments to the transmitting TCP processor. If the ACK flag (see below) is set on (as it usually is in practice), then the Acknowledgment field contains the relative byte position of the last byte, sent by the transmitting TCP processor, that has been acknowledged by the receiving TCP processor.

The Data Offset field contains the offset (the number of bytes) from the beginning of the packet to the beginning of the Data field.

The Reserved field is not currently used and, hence, is always set to 0; it is there for possible future use.

The Flag fields are 1-bit fields that are used to send connection-state information between the two TCP processors. Since connection processes are quite technical, we will not go into the details of these flags in this lesson, aside from the already mentioned ACK flag.

The Window field is needed because TCP uses what is called a "sliding-window" process for sending packets. In theory, the simplest way to send the packets making up a message would be for the transmitting TCP processor to send the first packet, wait for the receiving processor to acknowledge it, then send the second packet, wait for it to be acknowledged, and so on. Furthermore, if the first packet was not acknowledged within a reasonable period of time, the transmitting processor would send it again, and once again await acknowledgment of it from the receiving processor. In practice, this simplest way would take an intolerably long time, and the sliding-window process is used instead.

In the sliding-window process, the transmitting processor sends several packets before awaiting acknowledgment of any of them, and the receiving processor acknowledges several packets at a time by sending to the transmitter the relative byte position of the last byte of the message that it has received successfully. The number of packets to be sent before the wait for acknowledgment is set dynamically; i.e., it can change from time to time depending on network conditions. Essentially what happens is that if the transmitting TCP processor is told by the receiving process that several successive groups of a given number of packets have been received successfully, the transmitting processor will conclude that network conditions are good and will decide to increase the number of packets per group. If, on the other hand, the receiving processor reports difficulties in successful reception, the transmitting processor will conclude that network conditions are poor and will decrease the number of packets per group. In either case, the transmitting processor will communicate its decision to the receiving processor via the entry in the Window field.

The Checksum field contains a number that the transmitting TCP processor forms by summing (using modular arithmetic) the contents of each 16-bit sequence (a "double byte") in the header and data portion of the message. The receiving TCP processor checks the checksum against the contents of the header and data portions; if the transmitted checksum and the checksum calculated by the receiving processor fail to agree, the receiving processor knows that an error has occurred during the travels of the packet. In this case, the receiving processor will request that the packet (or the group of packets to which it belongs) be re-sent by the transmitting processor.

If the URG flag has been set, the Urgent field is used to initiate certain technical procedures.

The Options field is used to contain various possible options, such as the security level of the packet.

The Pad field is used, when necessary, to pad out the TCP header to a multiple of 4 bytes.

Finally, the Data field contains the actual data of the portion of the original message that is being conveyed in the particular individual packet.

Summary

There are a number of further details involved in the sending and receiving of packets of information over the Internet, but what we have discussed thus far should suffice to provide you with a general idea of how packets are put together. It remains to consider how they are used.

The beauty of packet-based communications is that the packets travel individually and are small enough so that if a few of the packets that make up a whole message are lost or garbled (i.e., corrupted), there will be only a small penalty in time and costs to re-send the packets needed to make the whole message complete. Further, each individual packet travels between source and target along its own path, which need not be the same as that of other packets making up a given message. If a congested or broken link in one path is encountered by a packet, the packet can be re-directed (or even re-sent) over a different link. At every node where links come together, each arriving packet is examined by a device known as a "router," which endeavors to pick the best link over which to send that individual packet next so as to move it closer to its target. Still further, the packets need not arrive in order at the receiver; they contain within them the information needed for sorting their data contents into the correct order once all the packets have been received successfully.

The result of these capabilities is that TCP/IP communications achieve extremely high accuracy of transmission of messages at very low cost and, most of the time, with excellent speed.

Note on Standards and the Future

The specifications for TCP/IP are maintained by an international group, the Internet Engineering Task Force (IETF), a part of the Internet Architecture Board. The IETF Website includes a link to interesting information on the Internet Standards Process.

Standards groups are currently working on the next version of the Internet Protocol, known as IPv6. Here is what the IPng Working Group (Internet Protocol next generation) has to say about the current state of the new version:

What is IPv6?

IPv6 is short for "Internet Protocol Version 6". IPv6 is the "next generation" protocol designed by the IETF to replace the current version Internet Protocol, IP Version 4 ("IPv4").

Most of today's Internet uses IPv4, which is now nearly twenty years old. IPv4 has been remarkably resilient in spite of its age, but it is beginning to have problems. Most importantly, there is a growing shortage of IPv4 addresses, which are needed by all new machines added to the Internet.

IPv6 fixes a number of problems in IPv4, such as the limited number of available IPv4 addresses. It also adds many improvements to IPv4 in areas such as routing and network autoconfiguration. IPv6 is expected to gradually replace IPv4, with the two coexisting for a number of years during a transition period.

Some introductory information about the protocol can be found in our IPv6 FAQ. For those interested in the technical details, we have a list of IPv6 related specifications.

Endnote

1. In "half duplex" mode, data can go in only one direction at a time. Full-duplex operation is like using a telephone, where each person can talk and, even while talking, hear the person at the other end. Half-duplex operation is like using walkie-talkie radios, where the talker needs to say something like "Over to you" to indicate to the listener that the talker has finished his or her thought and is ready to listen.

curve image  
Course emailbox: l38613dw@gslis.utexas.edu
GSLIS Website: www.gslis.utexas.edu

Last updated 2001 Aug 21 by R. E. Wyllys