The history of TCP/IP began in the mid 1970s when the Advanced Research Project Agency (ARPA) started working on the protocols and architecture that today is the fundament for the Internet [2]. The TCP/IP took its current form around 1977-79. By the time of 1979 so many researchers were involved in the TCP/IP project that ARPA formed an internal committee to coordinate and guide the design of the protocols and architecture of the emerging Internet. The committee was called the Internet Control and Configuration Board (ICCB) and it was active until 1983 when it was reorganized.
The Internet Architecture Board (IAB) was created from the reorganization of the ICCB. The primary task of the IAB is to set official policies. Another of its duties is to decide which protocols are a required part of the TCP/IP suite.
In 1992, when Internet was moved away from the U.S. government, the Internet Society was created. This is an international organization formed to encourage participation in the Internet around the world.
The Internet we have today began around 1980 when ARPA started converting their machines to the TCP/IP protocols. To encourage usage of the new protocols, ARPA made a low cost implementation available for university researchers. ARPA was able to reach over 90% of the university computer science departments in the US. The timing for the protocol suite was perfect because the departments were just acquiring second and third computers and connecting them together. The departments needed protocols for communications, and no other were available.
The success of the TCP/IP technology and the Internet among computer science researchers lead other groups to adopt it. The National Science Foundation soon realized the importance of computer communication, and took an active role in expanding the use of TCP/IP Internet among scientists. In 1985 networks were established around six super computer centers. In 1986 it expanded and a new wide area backbone network was created, the NSFNET. The NSFNET eventually reached all the supercomputer centers and tied them to the ARPANET.
Adoption to the TCP/IP protocols and the growth of the Internet are no longer limited to scientists and government-funded projects. By 1994 the global Internet reached over 3 million computers in 61 countries. Today the growth of the Internet is very rapid, it connects computers almost everywhere, and all of them use TCP/IP.
Why use TCP/IP? Most networks are independent entities, established to serve the needs of a single group. The group uses hardware that suits their needs. Some may want a high-speed network to connect their machines, but such networks do not work over long distances. Others may want to communicate over long distances, settling for a slower network. TCP/IP provides a set of communication conventions making it possible to interconnect all these different kinds of networks.
The name TCP/IP refers to many different protocols, not only TCP and IP. Some of the more renowned ones are UDP, ICMP, and ARP.
The overview below (see figure 1) shows the general layout of how these protocols are related to each other. The parts comprising the protocol stack are the protocols found on the transport and network layers.
Figure 1: General Overview of TCP/IP
The Internet Protocol (IP) is the protocol that provides for an unreliable transmission of blocks of data called datagrams from one host to another [4]. That IP is unreliable means that it does not care if a sent packet is lost, duplicated, or corrupted. It relies on higher level protocols to ensure a reliable transmission.
The primary tasks of IP are addressing, routing and fragmentation. IP uses the address carried in the internet header to transmit the datagram towards its destination. The selection of the path for transmission is called routing. Fragmentation is needed when a packet is too large to fit the network below IP. If a large packet arrives at the IP level, IP divides the packet into smaller fragments before sending them. When a packet fragment arrives, IP is responsible for the reassembly of the packet.
Another duty of the IP protocol is to remove old packets from the network. Each packet has a time to live (TTL) field which indicates the maximum number of router hops the packet is allowed to do before it is discarded. This is due to the fact that otherwise unroutable packets would remain on the network forever, filling it with useless data.
The best way to get a better understanding of the IP protocol is to take a look at the format of an IP packet (see figure 2).
Some additional explanation might be required, i.e. the fragmentation/reassembly process and the checksum computation.
When a datagram arrives from a higher level protocol, IP first checks to assure that the size of the datagram is not too large for the underlying network. If it is, it has to be split into a number of separate datagrams by IP. The fragmentation process starts by checking the Don't Fragment (DF) flag. If the DF flag is set then fragmentation of this datagram is not allowed; the datagram is discarded and an error message is returned to the sender. If fragmentation is allowed the data has to be divided into smaller chunks and a header is created for each chunk of data. The fields in the header that are affected by the fragmentation are the More Fragment flag, the Fragment Offset, the Internet Header Length, the Total Length, the Checksum and the options (if present).
If the IP level receives a fragmented datagram from the lower level protocol it has to reassemble it before passing it to the protocol above. When a fragment arrives, IP checks the Identification, Source and Destination fields. If those fields match the fields in another already received fragment then they belong to the same datagram. Due to the way IP routes packets (i.e. each packet is routed separately, and may take a different route), the packets may arrive out of order. The Offset field in the header tells IP where in the original datagram this fragment belongs. IP just inserts the arriving fragments at their correct positions in the datagram until all fragments have arrived. When all fragments have arrived the reassembly is complete, and the datagram is passed on to the next higher protocol level.
The IP checksum is computed on the header only. The checksum field is the 16-bit one's complement of the sum of all 16-bit words in the header.
The Transfer Control Protocol (TCP) is designed to provide a reliable connection between two hosts across different more or less reliable networks using the Internet Protocol [5]. The data sent with TCP is considered to be a stream of bytes, even though the data has to be split into smaller packets due to the underlying protocols. TCP must recover from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. TCP also provides a way for the receiver to govern the amount of data sent by the sender, i.e. flow control.
Another responsibility of the TCP protocol is the connection handling. TCP can manage a large number of connections at the same time. The TCP header contains all the necessary information to ensure safe transfer, to handle flow control and to handle connections.
The description of the TCP header is in figure 3 with the fields described as follows.
The connection handling and establishing is an important part of the TCP protocol. The opening of the connection can be either active or passive. A passive connection waits for incoming connections rather than attempting to initiate a connection. In either case the connection is opened by a three-way handshake. The handshake is important because it synchronizes the sequence numbers on both sides of a connection. The sequence numbers are a fundamental part of TCP. Every byte sent over a TCP connection has a sequence number. Since every byte is sequenced, each of them can be acknowledged. The acknowledgement mechanism employed is cumulative so that an acknowledgement of sequence number X indicates that all bytes up to but not including X, have been received.
The three-way handshake is the procedure used to establish a connection. This procedure is normally initiated by one TCP and responded to by another TCP. The procedure also works if two TCP simultaneously initiate the procedure. The simplest way of handshake is when one TCP is listening, and another connects to it. TCP A sends a SYN to TCP B which is listening. TCP B sends a SYN and an acknowledge of the SYN from TCP A, and finally TCP A acknowledges the SYN sent by TCP B. Now a connection is open, and data transfer can begin.
Figure 4: The Connection Procedure
The amount of data transferred at a time is controlled by the window function. The window sent in each segment indicates the range of sequence numbers the sender of the window (the data receiver) is prepared to accept. If the receiving TCP is busy, the window will shrink, and when the receiving TCP can not accept any more data the window will be zero. If more data arrives than can be accepted, it will be discarded.
TCP must at least be able to receive one byte even when the window is zero. When the receiving TCP has a zero window and a segment arrives it must still be able to send an acknowledge showing its next expected sequence number and current window (zero).
When TCP is sending data, every segment has a sequence number. The receiving TCP acknowledges (ACK) every segment received. If a segment does not get acknowledged the sending TCP assumes that it has been lost somewhere and resends it. If two segments arrive with the same sequence numbers, then the last one is discarded.
The checksum on the TCP level is calculated on the whole packet, i.e. the TCP header and all the data. That way the receiving TCP can tell if some of the data was damaged during the transfer. The checksum also covers a pseudo header. This pseudo header contains the Source Address, the Destination Address, the Protocol, and TCP length (these values are taken from the IP header). This gives TCP protection against misrouted segments. The checksum field is otherwise calculated in the same way as in IP, i.e. it is the 16-bit one's complement of the sum of all 16-bit words in the header, the pseudo header, and the data.
The User Datagram Protocol (UDP) provides an unreliable connectionless delivery service using IP to transport messages between machines [6]. It does not use acknowledgements to assure that messages arrive, it does not order incoming messages, and it does not provide feedback to control the rate at which incoming information flows between machines. Thus UDP messages can be lost, duplicated, or arrive out of order. This means that it is up to the application using UDP to make the transfer reliable. Furthermore, packets can arrive faster than the recipient can process them. The UDP header reflects the simplicity of the protocol (see figure 5).
The Address Resolution Protocol (ARP) allows a host to find the hardware address of a target host on the same physical network, given only the IP address of the target [7]. Unlike most protocols, the data in an ARP packet does not have a fixed-format header. ARP supports a variety of network technologies, the length of the fields that contain addresses depend on the type of network. The header below (see figure 6) shows the header in the case of IP over Ethernet.
The IP address is assigned to a host independent of the machine's hardware address. To send an internet packet across a hardware network from one computer to another, a way is needed to map the IP address to a hardware address. The Address Resolution Protocol (ARP) performs dynamic address resolution, using low level network communication. ARP permits a machine to resolve addresses without keeping a permanent record.
A machine uses ARP to find the hardware address of another machine by broadcasting an ARP request. The request contains the IP address of the machine for which a hardware address is needed. All machines on the network receive the ARP request. If the request matches a machine's IP address, the machine responds by sending an ARP reply that contains the needed hardware address. The reply is sent directly to the asking computer; no need for a broadcast.
Normal communication across an internet involves sending messages from one application on some host to an application on some other host. However, e.g. routers may need to communicate directly with the network software on a particular host to report abnormal conditions or to send the host new routing information.
The Internet Control Message Protocol (ICMP) is a required part of IP, and it provides for extranormal communications among routers and hosts [8]. An ICMP message travel in the data field of an IP datagram. The datagram has three fixed length fields at the beginning of the message: an ICMP message type field, a code field, and an ICMP checksum field. The message type defines the format of the rest of the message as well as its meaning. There are several types of ICMP messages. They are Echo Reply, Destination Unreachable, Source Quench, Redirect, Echo, Time Exceeded, Parameter Problem, Timestamp, Timestamp Reply, Information Request and Information Reply.
Additionally to those protocols already mentioned, there are a few more that are not as commonly used. There are a couple of protocols that handle communication between routers, like EGP (Exterior Gateway Protocol), and RIP (Routing Information Protocol).
Multicast messages (messages that are sent to a specific set of computers) are controlled with IGMP (Internet Group Management Protocol).
There are also quite a number of protocols used on the application level above TCP and UDP. They may also be considered part of the TCP/IP suit. These protocols include:
The TCP/IP technology has worked well for two decades, but it is beginning to grow old.
The Internet has experienced many years of exponential growth, doubling in size every nine months or faster. By early 1994, a new host appeared on the Internet on average every 30 seconds, and the rate has increased since then. The traffic over Internet has increased even faster than the number of hosts. The increased traffic can be attributed to several causes. First, the Internet population is shifting from academicians and scientists to the general public. Consequently, people now use the Internet after business hours for activities such as shopping and entertainment. Second, new applications that transfer images and real time video generate more traffic than plain text. Third, automated search tools generate a substantial amount of traffic as they relentlessly probe Internet sites to find data.
The growth of the Internet is what has forced a change in IP. The address space in the IP protocol is too small. Other factors have contributed, real-time applications, and the need for secure communication.
IPv6 is the next generation of IP, it retains many of the basic concepts of IPv4, but changes most details. Like IPv4, IPv6 provides a connectionless transfer, best-effort delivery service. The format of IPv6, however, is completely different.
An IPv6 address is 128 bits long, making the address space so large that each person on earth could have his or her own internet as large as the current Internet. IPv6 divides addresses into types analogous to the way IPv4 divides addresses into classes.
An IPv6 datagram consists of a series of headers followed by data. A datagram always begins with a 40-bytes base header, which contains source and destination addresses and a flow identifier. The base header may be followed by zero or more extension headers, followed by data. Extension headers are optional, IPv6 uses them to hold much of the information IPv4 encoded in options.