"Because, sometimes, the Internet doesn't quite work..."
The MIT RON (Resilient Overlay Networks) project is a DARPA-funded effort
motivated by the desire to improve the robustness and availability of Internet
paths between hosts by an order of magnitude over today's wide-area Internet
routing infrastructure. The key design goal in RON is to develop techniques
to allow end-hosts and applications to cooperatively gain improved reliability
and performance from the Internet. At a glance, RON nodes examine the condition
of the Internet between themselves and the other nodes, and, based upon
how the network looks, decide if they should let packets flow directly to
other nodes, or if they should send them indirectly via other RON nodes. For
instance, the group of cooperating systems below can mutually provide a more
available and better-performing routing service than what vanilla Internet
routing can provide.
RON is an architecture that allows a small group of distributed Internet
applications to detect and recover from path outages and periods of degraded
performance within several seconds, improving over today's wide-area routing
protocols that take at least several minutes to recover. A RON is an application-layer
overlay on top of the existing Internet routing substrate. The RON nodes
monitor the functioning and quality of the Internet paths among themselves,
and use this information to decide whether to route packets directly over
the Internet or by way of other RON nodes, optimizing application-specific
routing metrics.
The RON project has several components, including:
Data analysis and understanding wide-area routing and fault-tolerance
behavior; BGP interactions
Simulations of RON behavior
RON is part of a larger research agenda on large-scale, robust,
Internet-based distributed systems, which spans areas ranging from
resilient routing (as in RON) to emerging peer-to-peer systems. Our
work on peer-to-peer systems is based on Chord, a scalable p2p lookup
service.
RON is also closely related to other current projects at LCS in the
area of robust Internet infrastructures and uses some of the ideas
from these projects: CM , the
Inernet Congestion Manager; and Click-SMP , a modular
PC-based router.
RON data, Internet experiments
RON1 and RON2 datasets - several million
latency and loss samples, with thousands of throughput samples
taken on the RON testbed
Since early 2001, we have run a real-life RON, which now has 17 sites located
around the Internet. Our deployment is international. We have
also collected extensive data sets and analyzed them. They will soon
be made publicly available on this page.
DNS
Performance and the Effectiveness of Caching
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris
Proc. 1st ACM SIGCOMM Internet Measurement Workshop, San Francisco,
CA, November 2001.
Resilient Overlay Networks
David G. Andersen, SM Thesis, Massachusetts Institute of Technology, May 2001.
[Postscript
(8.9 MB)]
[ps.gz (1.2 MB)][
PDF
(2.2 MB)] (86 pages)
The Detour Project
at the University of Washington. They developed "sting", which uses
TCP to determine forward andvreverse path packet loss rates. There has
also been a small project follow-on to Detour by some of David Wetherall's
students to test Detour. They simulated some algorithms for forming the
routing topology:
[Orig ps][Local Mirror]
The
projects list
is also available.
There are some important differences between RON and
Detour. First. RON seeks to prevent disruptions in end-to-end
communication in the face of failures. RON takes advantage of
underlying Internet path redundancy on time-scales of a few seconds,
reacting responsively to path outages and performance failures.
Second, RON is designed as an application-controlled routing overlay;
because each RON is more closely tied to the application using it, RON
more readily integrates application-specific path metrics and path
selection policies. Third, we present and analyze experimental
results from a real-world deployment of a RON to demonstrate fast
recovery from failure and improved latency and loss-rates even over
short time-scales.
The Berkeley
SPAND project.
The Spared Passive Network Performance toolkit lets applications
measure and share performance information with other local clients
to make better guesses about which (for example) mirror site to
use.
The SPAND paper contains more information
[ps]
local ps]
as does Mark Stemm's thesis
[html]
[ps]
[local ps].
RAMP
Reliable Adaptive Multipath Routing, from UCSD.
Network Characterization
Craig Labovitz's
BGP and network stability information
and Delayed Internet Routing convergence paper. (30-second to
3-minute outages from BGP fluctuations.)
The IDMaps Project
(Internet Distance Maps). Creating a "server" that can provide
pairwise Internet distance information.
Commercial tools:
VisualRoute
measures per-hop loss and delays.
VitalSignsNetMedic
uses bprobes and application-specific metrics to report network
performance.
Overlay Networks
The X-Bone Project
provides a toolkit for rapid deployment of overlay
network for things like IPv6.
A follow-on project,
Dynabone
plans to add dynamic overlay adaptation.
We gratefully acknowledge funding for RON from DARPA under the
Fault-Tolerant Networking (FTN) program of the ATO; it is being
supported by DARPA and the Space and Naval Warfare Systems Center
(SPAWAR), San Diego, under contract N66001-00-1-8933.