Résumé

Pas de bouchons sur l'Internet de la prochaine génération

Le protocole TCP de contrôle de la transmission, qui commande la manière dont les données sont lancées dans le réseau, a connu un succès phénoménal et il régit aujourd'hui l'essentiel de la circulation sur l'Internet. Cependant, on pense de plus en plus qu'une révision complète s'impose pour faire face à un trafic toujours croissant. On dispose maintenant des modèles théoriques et des preuves mathématiques montrant que des solutions autres que TCP existent et qu'elles n'entraînent pas de blocage total. Des travaux sont en cours pour en faire la démonstration expérimentale.

Since the birth of the Internet, the Transmission Control Protocol (TCP) has been phenomenally successful. Today TCP controls the transmission of most traffic on the Internet - everything from downloading web pages to peer-to-peer file sharing. TCP's development began in 1974, it was overhauled in 1988, and since then its design has served well with barely any changes, even though data rates have rocketed from 30 kbit/s to 40 Gbit/s. There is now, however, a growing feeling in the network-research community that it is time for another major overhaul. We understand better the mathematics of how data networks work, and hope that we might be able to overhaul TCP so that it is fit for a long time to come.

To understand why TCP has been so successful, and to explain why an overhaul is needed, it is useful to know more about what TCP does. TCP code sits on every computer and device connected to the Internet. It is built into Windows, Linux, mobile phones, etc. When one computer has to transmit data to another, the data are split into packets. TCP decides when to send packets and how many to send. It has two duties: to resend any packets that may have been lost on the way (for example, because of congestion, or signal interference on a wireless link), and to limit the sending rate so that the network does not become congested.

The second duty was not part of TCP's original specification. It was added in 1988 by Van Jacobson at the Lawrence Berkeley National Laboratory (LBNL). In October 1986, the data rate between the University of California at Berkeley and LBNL - 400 m apart - collapsed from 32 kbit/s to just 40 bit/s. Jacobson and colleagues realized the problem: the network was congested, which caused packets to become lost in transit; TCP sent those packets again, and made the congestion worse. Jacobson's grand idea was that TCP could control congestion (Jacobson 1988). He proposed an extension to TCP: it should steadily increase its transmission rate when the network seems uncongested, and reduce the rate whenever it detects a lost packet (figure 1).

Arguably, Jacobson's congestion control is what has made the Internet succeed. In 1988, "proper" networks like national telephone networks were very different from the Internet. Someone sitting in the network control centre monitored traffic; when links came close to overload, traffic was rerouted - and when this was not enough your telephone call was simply blocked. In the Internet there is no central control, yet Jacobson's TCP achieves a similar effect, using the collected decentralized intelligence of all the computers connected to the Internet.

Congestion is a fundamentally difficult problem, and the Internet was the first large-scale demonstration that it could be solved without central control. This allowed the Internet to grow to become a global network. Nevertheless, network control centres are not out of business yet; Jacobson's TCP does not attempt to balance traffic across different parts of a network, nor to provide the speedy delivery that real-time voice and video need. These are still problems for which we need central control.

However, Jacobson's TCP is beginning to show its age. A driving problem is the difficulty of getting high data rates, even when capacity is available. Consider a physicist in CERN trying to send data to SLAC in California. The round-trip time (RTT), i.e. the time it takes to send a packet from CERN to SLAC and to receive an acknowledgement that the packet arrived, is a little over 200 ms. When the network is working smoothly and no packets are lost, TCP increases its data rate by one packet per RTT, every RTT - to achieve a data rate of 100 Mbit/s therefore takes 333 s. On the other hand, every time a packet is lost in transit, TCP cuts its data rate by half. That means that it takes 167 s to recover from a single packet loss. To achieve a sustained average data rate of 100 Mbit/s, no more than 1 in 1.9 million packets can be lost. But cosmic rays and imperfections in the optical fibre links are likely to corrupt at least one packet in every 1 million! TCP needs to change if it is to support such high data rates - but injudicious changes could bring about another congestion collapse.

The scientific approach is to devise mathematical models for congestion control, and to test them experimentally. The approach that physicists have developed for studying complex systems seems to work well for the Internet: first explore the detailed rules of interaction between atomic entities (like TCP connections), then formulate high-level laws about the behaviour of large collections of these entities (e.g. with differential equations), then investigate the consequences of these laws.

Indeed, Richard Feynman himself studied communications systems in this spirit. Daniel Hillis describes Feynman's work at the Thinking Machines Corporation (Hillis 1989): "By the end of that summer of 1983, Richard had completed his analysis of the behaviour of the router, and much to our surprise and amusement, he presented his answer in the form of a set of partial differential equations. To a physicist this may seem natural, but to a computer designer, treating a set of Boolean circuits as a continuous, differentiable system is a bit strange. Feynman's router equations were in terms of variables representing continuous quantities such as 'the average number of 1 bits in a message address.'"

In the past few years our theoretical understanding of Internet congestion control has blossomed. We now have a variety of mathematical models that help us evaluate how the network behaves, and what the consequences will be of rapidly growing communications capacity. Frank Kelly from the Statistical Laboratory in Cambridge has made a major contribution, which won him the IEEE Koji Kobayashi prize (Kelly 2000). Research groups in computer science, engineering and mathematics are pushing the boundary at Berkeley, Caltech, Stanford, Cambridge, Turin, Paris, the University of Massachusetts, University College London, MIT and elsewhere.

Theoretical models now exist for alternatives to TCP that can be mathematically proved not to cause congestion collapse, and work is ongoing to test these models experimentally. Some of these alternatives promise to balance traffic across the Internet, and to deliver high data rates and much better quality of service for services like voice-over-Internet-Protocol (VoIP) and live video - all at the same time. It seems likely that these theories will find their way into the mainstream Linux and Windows kernels within the next five years or so, and we hope that they will serve for many years to come.

Further reading

For further information see Sally Floyd's High Speed TCP website at www.icir.org/floyd/longpaths.html.

W Daniel Hillis 1989 Physics Today 42 (2) 78.

V Jacobson 1988 ACM SIGCOMM Computer Communication Review 18 (4) 314.

F P Kelly 2000 Phil. Trans. Roy. Soc. A358 2335.