hewlett-packard UNITED STATES
Skip site-wide navigation link group hewlett-packard home products and services support solutions how to buy
hewlett-packard logo with invent tag line - jump to hp.com home page
End of site-wide navigation link group
 
printable version
digital technical journal online
hp labs skip lorem ipsum dolor navigation menu link group
search
contact hp
introduction
foreword
table of contents
online issues
hp journal home
hp labs home
about hp labs
research
news and events
careers @ labs
technical reports
worldwide sites
end of lorem ipsum dolor navigation menu link group
foreword - Volume 7 Number 3

CURRENT ISSUE - Volume 7 Number 3 Jean C. Bonney
Director, External Research

The Information Utility, the Information Highway, the Internet, the Infobahn, the Information Economy --- the sound bytes of the 1990s. To make these concepts reality, a robust technology infrastructure is necessary. In 1990, Digital's research organization saw this need and set out to develop an experimental test bed that would examine assumptions and provide a basis for a technology edge in the '90s. The resulting project was Sequoia 2000, a three-year research collaboration between Digital, five campuses of the University of California, and several other industry and government organizations. The Sequoia 2000 vision is

Petabytes (i.e., trillions of bytes) of data in a distributed archive, transparently managed, and logically viewed over a high-speed network with isochronous capabilities easily accessed by end users via a host of tools--in other words, a big, fast, easy-to-use system.

Although the vision is still not reality today, our more than three years of participation in Sequoia 2000 research gave us the knowledge base we sought.

After a rigorous process of proposal development and review by experts at Digital and the University of California, Sequoia 2000 began in June 1991. The focus of the research was a high-speed, broadband network spanning University of California campuses from Berkeley to Santa Barbara, Los Angeles, and San Diego; a massive database; storage; a visualization system; and electronic collaboration. Driving the research requirements, were earth scientists. The computing needs of these scientists push the state of the art. Current computing technologies lack the capabilities earth scientists need to assimilate and interpret the vast quantities of information collected from satellites. Once the data are collected and organized, there is the challenge of massive simulations, simulations that forecast world climate ten or even one hundred years from now. These were exactly the kinds of challenges the computer scientists needed.

Among the major results of three years of work on Sequoia 2000 was a set of product requirements for large data applications. These requirements have been validated through discussions with customers in financial, healthcare, and communications industries and in government. The requirements include

  • A computing environment built on an object relational database, i.e., a data-centric computing system
  • A database that handles a wide variety of non-traditional objects such as text, audio, video, graphics, and images
  • Support for a variety of traditional databases and file systems
  • The ability to perform necessary operations from computing environments that are intuitive and have the same look and feel; the interface to the environment should be generic, very high level, and easily tailored to the user application
  • High-speed data migration between secondary and tertiary storage with the ability to handle very large data transfers
  • Network bandwidth capable of handling image transmission across networks in an acceptable time frame with quality guarantees for the data
  • High-quality remote visualization of any relevant data regardless of format; the user must be able to manipulate the visual data interactively
  • Reliable, guaranteed, delivery of data from tertiary storage to the desktop

Sequoia 2000 was also a catalyst for maturing the POSTGRES research database software to the point where it was ready for commercialization. The commercial version, Illustra, is available on Alpha platforms and is enjoying success in the banking industry and in geographic information system (GIS) applications, as well as in other government applications with massive data requirements. Illustra is also making inroads into the Internet where it is used by on-line services.

Yet another major result of Sequoia 2000 was a grant from the National Aeronautics and Space Administration (NASA) to develop an alternate architecture for the Earth Observing System Data and Information System (EOSDIS). EOSDIS will process the petabytes of real-time data from the Earth Observing System (EOS) satellites to be launched at the end of the decade. The alternate information architecture proposed by the University of California faculty was the Sequoia 2000 architecture. It will have a major influence on the EOSDIS project.

For the earth scientists, gains were made in simulation speeds and in access to large stores of organized data. These scientists used some of Digital's first Alpha workstation farms and software prototypes for their climate simulations. An eight-processor Alpha workstation farm provided a two to one price/performance advantage over the powerful, multimillion-dollar CRAY C90 machine. In another earth science application, scientists using Alpha and hierarchical storage systems could simulate two years' worth of climate data over the weekend without operator intervention; formerly, two months' worth of data took one day to simulate and required considerable operator intervention. Thus many more simulations could be processed in a fixed time and "time to discovery" was decreased considerably.

Now that we can look at Sequoia 2000 in retrospect, would we do such a project again? The answer is a resounding "yes" from all of us involved. It was a complex project that included 12 University of California faculty members, 25 graduate students, and 20 staff. Another 8 faculty members and students provided additional expertise. Four of Digital's engineers worked on site, and a variety of support personnel from other industry sponsors participated, including SAIC, the California Department of Water Resources, Hewlett-Packard, Metrum, United States Geological Survey (USGS), Hughes Application Information Services, and the Army Corps of Engineers.

But as is the case with such ambitious projects, there were unanticipated and difficult lessons for all to learn. To experiment with real-life test beds means considerably more than writing a rigorous set of hypotheses in a proposal. Michael Stonebraker, in his paper, notes a number of challenges we faced and the lessons learned. One of the issues that kept surfacing was the "grease and glue" for the infrastructure, that is, the interoperability of pieces of software and hardware that composed the end to end system. This remains a challenge that needs research if we are going to achieve the promised goals of internetworking. Another of the sticky points was that of scalability. On the one hand, it is difficult to build a very large networked system from scratch. On the other hand, as we slowly built the mass storage system to the point of minimal critical mass, we found that the current off-the-shelf technologies for mass storage were not ready to be put use for our purposes. So yes, we believe the project was worthwhile with some caveats. We gained critical knowledge about the technology, but we also came a long way in learning the art of directing and leading the type of project that is necessary to assist the Information Technology industry in its quest for the ubiquitous distributed information system.

How else are we going to get insight into the critical issues of building and reliably operating a robust information infrastructure without building a large test bed with real end users whose needs push the state of the art at each point along the way? We believe that large projects similar to Sequoia are crucial. The papers that follow attest to the important knowledge gained. We have focused specifically on the end to end system --- from the scientists' desktops to the mass storage system, the challenge of building and using a large data repository, the timely and fast movement of very large objects over the network, and browsing and visualizing data from networked sources.


Skip page footer
printable version
privacy statement using this site means you accept its terms © 1994-2002 hewlett-packard company
End of page footer