The Wide Area Information Server (WAIS): A Case Study Case Studies and Communications Management II Prof. Joseph Helgert School of Communications Grand Valley State University v. 3.0.1 by Henry E. Hardy 1993 March 8 Abstract The Wide Area Information Server programs were developed as a joint project by Thinking Machines Corporation (TMC), Apple Computer, and Dow Jones, Inc. Since 1989, WAIS servers have been established in the United States and many other countries. WAIS is currently implemented on the Internet and its associated connected TCP/IP regional and commercial networks. The Internet is a large collection of regional and local networks linked by a common high speed data network called the NSF backbone. The purpose of the WAIS client/server architecture is to provide an interface for answering questions posed by a user based on the information in large databases which may be physically remote from one another and the user. Although WAIS was first proposed as a commercial information provider interface, it is currently primarily provided free of charge to authorized Internet users. The WAIS provides ample opportunities for further research. The WAIS system is unlikely to spur large sales of TMC computers to large commercial database providers in the near future. WAIS is a useful and innovative technology for interactively accessing very large databases. Thinking Machines Corporation Thinking Machines Corporation manufactures a relatively small, relatively inexpensive (c. $4 million) supercomputer called "The Connection Machine." The Connection Machine uses multiple processors configured in a unique geometry (a "virtual hypercube"). [Brand, 1988, p. 182] The corporation was founded in 1983. During the last year for which data was available, 1991, the company reported sales of $75,022,017. The company has approximately 500 employees. Main offices are in Cambridge, MA, with branch offices in Santa Fe, NM, Chesterfield, MO, Bethesda, MD and Menlo Park, CA. The President and CEO is Sheryl Handler. The reported average of 55 days beyond terms for payment of outstanding obligations owed by the company may indicate some financial instability. Many items of capital equipment are currently being used as collateral on debt. The company is estimated to sell 80-100 units of the Connection Machine yearly. Because of the specialized nature of the equipment the company produces, and the lack of diversity in its product line, future financial prospects for the company are unclear. Despite these financial uncertainties, the Thinking Machine is well regarded in the industry for its innovative design, computing power and relatively low price. The company is closely held. Overview of WAIS One potential problem in marketing the Connection Machine is the relatively small amount of software available for the machine. A possible application for supercomputing techniques is fast access to very large databases. Thinking Machines Corporation has led the way in the supercomputer industry in this area with its "Wide Area Information Server" or WAIS (rhymes with "ways" or weighs"). Development of WAIS appears to have begun in 1986. Brewster Kahle of Thinking Machines Corporation developed the system with Craig Stanfield, Steve Smith and others. Further improvements in the system are continuing as of this writing. The term "server" in WAIS has a particular meaning in the computer industry. Client/server architecture refers to a two-part computer program. The main program, or "server" resides on a large mainframe or supercomputer. The terminal or "client" program runs on another computer which is used as a "smart" terminal for access to the server. Examples of client/server architecture include FTP, File Transfer Protocol; archie, the software archive server; and many Multi-User Domains (MUDs). WAIS is one example of a class of servers which have recently become available to improve access speed and ease of use of large online databases. Thinking Machines Corp. has developed a Macintosh client program called WAISstation for WAIS in cooperation with Apple Computer Corp. Users may point-and-click to access both text and graphical information stored at a local or remote WAIS site. [Dern, 1992] A Unix client software package called SWAIS is also available from Thinking Machines free of charge. This client may also be accessed from the Thinking Machines Inc. WAIS server ("quake.think.com") via the telnet protocol. Currently there are reported to be more than 260 Internet WAIS servers in 28 countries. [Anthes, 1992B] Strong points of the WAIS system The strongest point of the WAIS system is the innovative use of supercomputing techniques to provide very fast access to information in very large databases. The fact that a weighted score is produced helps somewhat in prioritizing the documents identified by the server program. The interface is fairly easy to use for an experienced database searcher; it still lacks true "natural language" comprehension. Items successfully retrieved ("hits") from the SWAIS client may be displayed, stored for later retrieval, or electronically mailed to the user's account elsewhere. The latter is a unique and very powerful feature. Some care must be exercised by the user not to flood their own mailbox or the connecting networks with huge amounts of documentation, as this author discovered during the process of examining the WAIS server at Thinking Machines. The underlying NISO standard for WAIS has been adopted by Dow Jones, Peat Marwick, Apple Computer, the Library of Congress and other influential organizations. The WAIS telnet client software is probably not simple enough to use for someone unfamiliar with computers and networked computing to easily comprehend. However, the Macintosh interface for WAIS has received positive reviews for ease of use. [anon, 1991 February, Dern, 1992] Deficiencies of WAIS WAIS has the ability to retrieve data stored at a remote site. However, there is currently no way for a given WAIS to access data indexed at another WAIS site. Consequently, one must sequentially connect to each of the up to 260 WAIS servers one at a time in order to learn what is indexed on them all. An index of some of the other WAIS servers is maintained online on the Thinking Machines WAIS server, however. Some of the competing protocols do have this ability to automatically connect to and search for data on remote servers, as is discussed briefly below. The SWAIS client at Thinking Machines Corp. ("quake.think.com") is configured to return only the top 40 hits from a given search. When this author used the documented feature available to attempt to change this number to 100, the SWAIS program crashed. There are potentially serious security problems with WAIS because of its ability to enter a remote system and transfer the contents of files from that system to a database user. It would be inadvisable at this time to use WAIS for documents or systems requiring a high degree of security. The relatively small installed base (a few hundred) of Connection Machines makes the potential number of WAIS servers quite small compared to server technologies based on the VMS or Unix operating systems. Competing technologies Several other servers exist which provide the same kind of services as WAIS. These include Archie, Gopher, World Wide Web, and Prospero. The criteria for comparing these technologies include cost, maintainance, efficiency and speed, size of the database, and ease of use. Archie, the FTP archive server, allows access to more than 800 FTP sites worldwide from one location. Archie was developed at McGill University in Canada, and is now implemented at a number of sites in the US and Europe as well. Archie may be run from a telnet session or from a free unix client program. Gopher (go-fer) is a client-server application developed at the University of Minnesota. Gopher offers the opportunity to search multiple databases at remote locations, and the ability to do full text searches on them. Clients are available for Mac, DOS, Unix, and VAX/VMS. [Snyder, 1992] World Wide Web (W3) is a project of the European nuclear research facility in Geneva, CERN. W3 is a program for accessing hypertext documents and searchable indexes. The goal of W3 is to provide the user with a single point- and-click graphical user interface to all the information available in the world. There are W3 gateways to WAIS, Archie, and Gopher. Client programs exist for a number of platforms. Prospero is an application similar to archie. It is based on a traditional file system model rather than on hypertext like W3. It is apparently being developed at the Information Sciences Institute at the University of Southern California. [Berners-Lee, 1992; whois database query by the author] A number of applications are being built on the X.500 standard. Among these is the "white pages" application of Performance Systems International (PSI). The white pages, like the older Unix "yellow pages," provides a way to search an online database of personal directory information. The British Library is currently making fairly unsuccessful efforts to place previously unpublished research papers available online in a MARC database into an online X.500 directory. There is currently some discussion of extending WAIS to make it conformant, or at least more conversant with the X.500 standard. [Berners-Lee, 1992] Areas for further study There are a number of interesting questions about the WAIS system which do not seem to be addressed adequately in the currently available literature. Usage patterns, a content analysis of the queries, and a network level of analysis of the WAIS system might yield interesting and useful results. Some users might find such research to represent an unwelcome intrusion into their sense of privacy. Conclusions WAIS is an innovative and fascinating technology. It is unlikely to provide Thinking Machines Corporation with the vehicle to drive the sales of their Connection Machine to commercial database providers as was apparently hoped at the start of the project. However, wider acceptance of the underlying standard and work in conjunction with developers of X.500, W3, Gopher, Archie and other systems may make the Connection Machine the vehicle of choice for accessing very large databases. WAIS is likely to be used mainly by very large governmental and non-commercial educational service providers in the near future. Copyright (C) 1992, 1993 Henry Edward Hardy. Selected Bibliography The literature on WAIS falls into five catagories. First, there are the technical reports from Thinking Machines Corporation. These are designated TMC-nnn where nnn is the document number. Many of these reports have also been published in magazines such as Online. Second, there are computer moderated communications sources such as Usenet and Bitnet mailing lists. Third are articles from the 'popular' technical press such as Byte, Digital Review, Computerworld and Datamation. Many articles have appeared in publications related to library science, such as Special Libraries and Academic and Library Computing. Fifth, some articles have appeared in news for the financial community in such publications as The Economist, Far Eastern Economic Review and Wall Street Computer Review. Finally, this author has made extensive use of online database servers such as DIALOG, archie, gopher, FTP and WAIS itself. Alexander, Michael. (1991, July 15). Fast Performance, Slow Sell. Computerworld, 25(28) 63-64. Alexander, Michael. (1991, November 4). Thinking Machines Thinks Big. Computerworld, 25(44). Ambrosio, Johanna. (1992, October 19). Super CPU Maker Widens Aim. Computerworld, 26(42). anon. (1991, September). AN INFORMATION SYSTEM FOR CORPORATE USERS: Wide Area Information Servers. Dow Jones jointly creates information retrieval system prototype. Online, pp 56-60. anon. (1992, March 14). Artificial Intelligence: Cogito, Ergo Something. The Economist, 322(7750) S5-S24. anon. (1991, December). ASIS Sponsors Symposium on Full-Text Retrieval. Information Today, 8(11) 13-15. anon. (1992, June 20). The fruitful, tangled trees of knowledge. The Economist, 323(n7724) 85 et passim. anon. (1992, September/October). INFORMATION -- the commodity of the future. Link Letter, 5(2). anon. (1991, December 9). The Promise Of The WAIS Protocol. Unix Today, p. 44. anon. (1992, May 2). Supercomputers: Little and Large. The Economist, 323(7757) 102-103. anon. (1992, March 16). WAIS: Is It the Lotus 1-2-3 of the Internet? Communications Week, p. 17. anon. (1991, July 1). WAIS promises easy text retrieval; prototype links Mac, Connection Machine. ELECTRONIC SERVICES UPDATE, p. 1. anon. WAIStation, A User Interface for WAIS. (1991, February). TMC-203. anon. (1992, March). WAIS: Wide Area Information Services. Online Libraries & Microcomputers, 10(3) 1. Anthes, Gary H. (1992a, January 22). Small Firms Unite Through Net. Computerworld, 26(3) 59,62. Anthes, Gary H et al. (1992b, July 20). A trio of navigation aids. Computerworld, p. 51. Bell, Gordon. (1992, August). Ultracomputers: A Teraflop Before Its Time. Communications of the ACM, 35(8) 26-47. Berners-Lee, Tim. (1992, April 16). WAIS-W3-X.500 BOF MINUTES. Brand, Stewart. (1991). The Media Lab Brandt, D. Scott. (1992, April). The library in academic computing: UNIX revisited. Academic and Library Computing, 9(4) 15 et passim. Carabine, Laura. (1992, March). Back to the Future. CAE, 11(3) 28-33. Caruso, Denise. (1992, February 17). Where there's a will there's a WAIS: public domain system solves many sticky info retrieval problems. Digital Media: A Seybold Report, 1(9) 5 et passim. Churbuck, David. (1991, December 23). The Computer as Detective. Forbes, 148(14) 150-155. Cisler, Steve. (1991, November). Future Visions. Online, 15(6) 90 et passim. December, John. (1992, September 26). Information Sources: the Internet and Computer-Mediated Communication. Dern, Daniel P. (1992, October 26). Index everything, share it company wide with WAIS; the publicly developed Wide Area Information Server is now being applied to corporate information. MacWEEK, 6(38) 24 et passim. DeWitt, David; Gray, Jim. (1992, June). Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM, 35(6) 85-98. Dun & Bradstreet. Online information -- Thinking Machines Corp. DIALOG. Ensor, Pat; Begg, Karin. (1992, May). Controversies of the information age: NISO's annual meeting. CD-ROM Professional, 5(3) 28 et passim. Farley, Laine, ed. (1992, September 26). Library Resources on the Internet: Strategies for Selection and Use. Database query to wais@quake.think.com. Fuochi, Andre. (1992, January 20). Parallel Processing: Is It All Promises? Computing Canada, 18(2) 1,6. Hoffman, Thomas. (1992, October 12). Financial Services Firms Move in Parallel. Computerworld, 26(41) 28. Jacobs, Sally. (1984, July 2). Corporate Geniuses. New England Business, 6(12) 16-24. Johnstone, Bob. (1992, September 10). Research & Innovation: The Number Guzzler. Far Eastern Economic Review, 155(36) 86. Johnstone, Bob. (1992, May 2). Supercomputers for the Home. The Economist, 323(7757) 102-103. Jul, Erik. (1992, May). FTP: full-text publishing? Computers in Libraries, 12(1) 41 et passim. Kahle, Brewster. An Information System for Corporate Users: Wide Area Information Servers. TMC-199. cf. Online Magazine, August 1991. Menlo Park, CA: Thinking Machines Corporation. Kahle, Brewster. (1992, May 8). Re: WAIS FAQ part 0 of n: getting started. Included document: (1991, September 16). New Unix Internet Release (Beta 3 Release) Available. Kahle, Brewster. (c. 1992). WAIS -- Making It Easier to Access Internet Resources. CERFnet News, 3(6). Kahle, Brewster, Kahle. (1992, March 2). WAIS-discussion digest #45: WAIS on Campus. Roles of Electronic Publishing on Campus. Usenet Newsgroups: alt.wais, list.wais-discussion. Kahle, Brewster. (1991, February). WAIStation, A User Interface for WAIS. Menlo Park, CA: Thinking Machines Corporation. Kahle, Brewster. (1989, November). Wide Area Information Servers Concepts. TMC-202. Menlo Park, CA: Thinking Machines Corporation. Kellem, Jeff. (1991, September 25). WAIS, A Sketch of an Overview. Usenet Newsgroup: alt.wais. Keyes, Jessica. (1992, February). Living in parallel. AI Expert, 7(2) 42 et passim. Klein, Elizabeth. (1992, February). "Thinking" Machines Add New Expertise to Banking Functions. Savings Institutions, 113(2) 37-38. Kozel, Edward R. Commercializing Internet: impact on corporate users. Telecommunications, 26(1) S11 et passim. Malamud, Carl. (1992, March 16). WAIS: Is it the Lotus 1-2-3 of the Internet? Communications Week, n 394, p 17. Markoff, John. (1991, July 3). For the PC User, Vast Libraries. NY: New York Times, national edition, 140, p. C1 et passim. National Information Standards Organization (Z39). (Z39.50-1988). Z39.50-1988: Information Retrieval Service Definition and Protocol Specification for Library Applications. MD: Bethesda, NISO. National Information Standards Organization (Z39). (Z39.50-1991 v. 2). Z39.50-1991: Information Retrieval Service Definition and Protocol Specification for Library Applications version 2. National Science Foundation (NSF). (ND). NSF STIS - Science and Technology Information System. Norr, Henry. (1991, May 14). Peat Marwick tries 'partner-friendly' system. MacWEEK, 5(19) 22. Norr, Henry. (1991, May 14). WAIS Promises Easy Text Retrieval. MacWeek, p. 22. Penczer, Peter. (1989, November). Supercomputers: Era Dawns on Wall Street. Wall Street Computer Review, 7(2) 40-50. Press, Larry. (1992, June). Collective dynabases. Communications of the ACM, 35(6) 26. Rifkin, Glenn. (1986, December 22). Parallel Processing: The Next Generation Is Already Under Way. Computerworld, 20(51) 35-40. Wadsworth. Ryan, Alan J. (1991, December 23/1992, January 2). Bright Lights, Hot Companies: News in Industry Not All Bad. Computerworld, 25(51) 33. Savage, J. A. (1991, November 25). Fast Systems No Lure for Commercial Users. Computerworld, 25(47) 93. Smith, S. 1987. Extracting Content Bearing Terms in Parallel on the Connection Machine. TMC-71. Menlo Park, CA: Thinking Machines Corporation. Snyder, Joel. (1992, May). TCP/IP for the Mac. LAN Magazine, 7(5) 93 et passim. Standard & Poor's. Online Information -- Thinking Machines Corp. DIALOG. Stanfill, C. (1991, October). Massively Parallel Information Retrieval for Wide Area Information Servers. Paper presented at the IEEE International Conference on Systems, Man, and Cybernetics. Charlottesville, Virginia. Stanfill, C. (1988, January). Parallel Computing for Information Retrieval: Recent Developments. TMC-69. Menlo Park, CA: Thinking Machines Corporation. Stanfill, C. & Kahle, Bruce. (1986, October). Parallel Free-Text Search on the Connection Machine System. TMC-72. cf. Communications of the ACM, 29(12). Menlo Park, CA: Thinking Machines Corporation. Stanfill, C. & Thau, R. (1990, December). Information Retrieval on the Connection Machine: 1 to 8192 Gigabytes. TMC-66. Menlo Park, CA: Thinking Machines Corporation. Stanfill, C. & Thau, R. (1990). A Parallel Indexed Algorithm for Information Retrieval. TMC-67. cf. Proceedings of the 12th International Conference on Research and Development in Information Retrieval SIGIR- 89. Menlo Park, CA: Thinking Machines Corporation. Stein, Richard. (1991, May). Browsing Through Terabytes: Wide-area information servers open a new frontier in personal and corporate information services. Byte, 16(5) 157-164. Tillman, Hope N.; Ladner, Sharyn J. (1992, Spring). Special librarians and the INTERNET. Special Libraries, (83)2 127 et passim. Touby, Laurel. (1990, November). The Thinking Machines: How to Have Experts at Your Fingertips. Working Woman, 15(11) 87-96. TRW, Inc. Credit Reports on Thinking Machines Corp. DIALOG. Wagers, Robert. (1992, November). DowQuest and Dow Jones Text-Search: Which Works Best and When? Online, 16(6) 35-42. Wallach, Steve. (1989, April). Commercial Parallelism's Progenitors. UNIX Review, 7(4) 44-45. Waltz, D., Smith, S., & Stanfill, C. (1987, July). Very Large Database Applications of the Connection Machine System. TMC-70. cf. AFIPS/1987 NCC Proceedings, July 1987. Menlo Park, CA: Thinking Machines Corporation. Westin, et. al. (1992, March 20). COMPUTERS IN THE WORKPLACE: ELYSIUM or PANOPTICON? Transcript of conference panel presentation and discussion. Wiegner, Kathleen K. (1992, February 3). Parallel Thinking. Forbes, 149(3) 92-93. Zorpette, Glenn. (1992, January). Technology 1992: Large Computers. IEEE Spectrum, 29(1) 33-35. 15