We are developing a streaming supercomputer (SS) that is
scalable from a single-chip to thousands of chips that we estimate
will achieve a factor of 100x improvement in the performance per unit
cost on a wide range of demanding numerical computations compared to
conventional cluster-based supercomputers. The SS uses a combination of
stream processing with a high-performance network to access a globally
shared memory to achieve this goal.
Imagine is a programmable signal and image processor that provides
the performance and performance density of a special-purpose processor.
Imagine achieves a peak performance of 20GFLOPS (single-precision
floating point) and 40GOPS (16-bit fixed point) and sustains over
12GFLOPS and 20GOPS on key signal processing benchmarks. Imagine
sustains a power efficiency of 3.7GFLOPS/W on these same benchmarks, a
factor of 20 better than the most efficient conventional signal
processors.
Scalable Network Fabrics
We are developing architectures and technologies to enable large,
scalable high-performance interconnection networks to be used in
parallel computers, network switches and routers, and high-performance
I/O systems. Recent results include the development of a hierarchical
network topology that makes efficient use of a combination of
electrical and optical links, a locality-preserving randomized
oblivious routing algorithm, a method for scheduling constrained
crossbar switches, new speculative and reservation-based flow control
methods, and a method for computing the worst-case traffic pattern for
any oblivious routing function.
We are investigating combined processor/memory architectures that
are best able to exploit 2009 semiconductor technologies. We envision
these architectures being composed of 10s to 100s of processors and
memory banks on a single semiconductor chip. Our research addresses
the design of the processors and memories, the architecture of the
interconnection network that ties them together, and mechanisms to
simplify programming of such machines.
We are developing methods and circuits that stretch the
performance bounds of electrical signalling between chips, boards, and
cabinets in a digital system. A prototype 0.25um 4Gb/s CMOS
transceiver has been developed, dissipating only 130mW, amenable for
large scale integration. Future chips include a a 20Gb/s 0.13um CMOS
transceiver.
Is an experimental parallel computer that demonstrated highly-efficient
mechanisms for parallelism including two-level multithreading, efficient
network interfaces, fast communication and synchronization, and support
for efficient shared memory protocols.
is a high-performance multicomputer router that demonstrates
new technologies ranging from architecture to circuit design. At
the architecture level the router uses a novel adaptive routing
algorithm, a link-level retry protocol, and a unique token
protocol. Together the two protocols greatly reduce the cost of
providing reliable, exactly-once end-to-end communication. At
the circuit level the router demonstrates the latest version of
our simultaneous bidirectional pads and a new method for
plesiochronous synchronization.
is an experimental parallel computer, in operation since July
1991, that demonstrates mechanisms that greatly reduce the
overhead involved in inter-processor interaction.
Brucek Khailany, William J. Dally, Scott Rixner, Ujval J. Kapasi,
Peter Mattson, Jin Namkoong, John D. Owens, Brian Towles, and Andrew
Chang.
"Imagine:
Media Processing with Streams." IEEE Micro, Mar/April 2001.
Dally, William J.,
Chang, Andrew., Chien,
Andrew., Fiske, Stuart., Horwat, Waldemar., Keen, John., Lethin,
Richard., Noakes, Michael., Nuth, Peter., Spertus,
Ellen., Wallach, Deborah., and Wills, D. Scott. "The
J-Machine" . Retrospective in 25 Years of the International
Symposia on Computer Architecture - Selected Papers. pp 54-58.
William J. Dally,
Virtual Channel Flow Control,
IEEE Transactions on Parallel and Distributed Systems
March, 1992, pp. 194-205.
William J. Dally
<[email protected]>
Stanford University
Computer Systems Laboratory
Gates Room 314
Stanford, CA 94305
(650) 725-8945
FAX: (650) 725-6949