Introduction
One solution is a new class of data structures called Distributed Scalable Data Structures (SDDSs). First proposed in 1992-1993, [LNS93], SDDSs gave rise to important research effort, materialized by several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites, and terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches requiring an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file [LN95], [L96]. All these properties should be of prime importance for the applications, especially in the DBMS design arena, [ASS94]. They open new perspective for the VLDB design, for multimedia databases, for real-time and high-availability databases, for decision support systems, and for high performance computing in general.
Most documents, presentations etc. referred to below, define or present various types of SDDSs. Some present applications of SDDSs or discuss implementation issues, especially under Windows NT. Finally, some references point to the related work, especially major research projects on multicomputers, e.g., the NOW, and Castle project at UCB, [C94], and Legion at U. of Virginia [GW97].
The types of SDDSs known are:
SDDSs for ordered files. These extend the traditional ordered data structures, B-trees or binary trees, to the multicomputers, [LNS94], [KW94].
SDDSs for multi-attribute (multi-key) files. These algorithms extend the traditional k-d data structures [S89] to the multicomputers, [LN95], [N95], [L96]. Performance of multi-attribute search may get improved by several orders of magnitude.
high-availability SDDSs. These structures have no known counterpart among the traditional data structures. They are designed to transparently survive failures of server sites. They apply principles of mirroring, or of striping, or of grouping of records, revised to support the file scalability [LN95], [LN96], [LR97], [LMR98].
References
[A97] Aly, W., D. Organisation interne des SDDS (RP*). Mémoire de DEA. U. Dakar, 1998.
[ASS94] Amin, M., Schneider, D., and Singh, V., An Adaptive, Load Balancing Parallel Join Algorithm. 6th International Conference on Management of Data, Bangalore, India, December, 1994.
[B96] Bennour, F. Une Architecture pour un Gestionnaire des SDDS sous Windows NT. Mémoire de DEA. U. Paris 9, 1996.
[B97] Birnbaum, J. ACM97 Speakers Corner. Comm. of ACM, Jan. 1997.
[C94] Culler, D. NOW: Towards Everyday Supercomputing on a Network of Workstations. EECS Tech. Rep. UC Berkeley.
[CACM97] Comm. Of ACM. Special Issue on High-Performance Computing.(Oct. 1997).
[D93] Devine, R. Design and Implementation of DDH: Distributed Dynamic Hashing. Int. Conf. On Foundations of Data Organizations, FODO-93. Lecture Notes in Comp. Sc., Springer-Verlag (publ.), Oct. 1993.
[G96] Gray, J. Super-Servers: Commodity Computer Clusters Pose a Software Challemge. Microsoft, 1996. http://www.research.microsoft.com/
[GW96] Grimshaw, A, Wulf, W. The Legion Vision of a Worldwide Virtual Computer. Comm. of ACM, Jan. 1997.
[KLR96] Karlsson, J. Litwin, W., Risch, T. LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers. Intl. Conf. on Extending Database Technology, EDBT-96, Avignon, March 1996.
[KW94] Kroll, B., Widmayer, P. Distributing a Search Tree Among a Growing Number of Processors. ACM-SIGMOD Int. Conf. On Management of Data, 1994.
[K98] Knuth D. The Art of Computer Programming. 3rd Ed. Addison Wesley, 1998.
[L80] Linear Hashing : a new tool for file and tables addressing. Reprinted from VLDB-80 in READINGS IN DATABASES. 2-nd ed. Morgan Kaufmann Publishers, Inc., 1994. Stonebraker , M.(Ed.).
[L88] Larson, P. Dynamic Hash Tables, CACM, 31 (4), 1988.[L98] Lindberg., R. A Java Implementation of a Highly Available Scalable and Distributed Data Structure LH*g. Master Th. LiTH-IDA-Ex-97/65. U. Linkoping, 1997, 62.
[LNS93] Litwin, W. Neimat, M-A., Schneider, D. LH* : Linear Hashing for Distributed Files. ACM-SIGMOD Intl. Conf. On Management of Data, 1993.
[LNS94] Litwin, W., Neimat, M-A., Schneider, D. RP* : A Family of Order-Preserving Scalable Distributed Data Structures. 20th Intl. Conf on Very Large Data Bases (VLDB), 1994.
[LN95] Litwin, W., Neimat. k-RP* : a Family of High Performance Multi-attribute Scalable Distributed Data Structure. IEEE Intl. Conf. on Par. & Distr. Systems, PDIS-96, (Dec. 1996).
[LN95] W. Litwin, M-A Neimat. LH*s : a high-availability and high-security Scalable Distributed Data Structure. IEEE Workshop on Research Issues in Data Engineering. IEEE Press, 1997.
[LN96] Litwin, W., Neimat M-A. High-Availability LH* Schemes with Mirroring. With M-A, Neimat. Intl. Conf. on Coope. Inf. Syst. COOPIS-96, Brussels, 1996.
[LNS96] Litwin, W., Neimat, M-A., Schneider, D. LH*: A Scalable Distributed Data Structure. ACM Transactions on Database Systems ACM-TODS, (Dec. 1996).
[LR97] Litwin, W. Risch, T. LH*g : a high-availability Scalable Distributed Data Structure through record grouping. U-Paris 9 Tech. Rep. (May, 1997). Submitted for publ.
[LMR98] Litwin, W. Menon, J., Risch, T. LH* Schemes with Scalable Availability. IBM Almaden Research Rep. (May, 1998).
[L96] Lomet, D. Replicated Indexes for Distributed Data to app. in IEEE Intl. Conf. on Par. & Distr. Systems, PDIS-96, (Dec. 1996).
[M96] Microsoft Windows NT Server Cluster Strategy: High Availability and Scalability with Industry- Standard Hardware. A White Paper from the Business Systems Division. Microsoft, 1996.
[MS97] Microsoft Scalability Day: Coverage. http://204.203.124.10/backoffice/scalability/coverage.htm
[N95]Nardelli, E. Distributed Searching of Multi-dimensional Data: A Performance Evaluation Study. Journal of Par. & Distr. Computing 49, 11-134, 1998.
[P97] Pheng, P. Gestion d'une case de LH*. Mémoire de DEA. U. Paris 9, 1997.
[R98] Ramakrishnan, K. Database Management Systems. McGraw Hill, 1998.
[R98a] Ronstrom, M. Design and Modelling of a Parallel Data Server for Telecom Applications. Ph.D, Thesis U. Linkoping, 1998, 250.
[S89] Samet, H. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1989.
[SPW90] Severance, C., Pramanik, S. Wolberg, P. Distributed linear hashing and parallel projection in main memory databases. VLDB-90.
[S95] Souleiman, R. Communication Protocol for the Scalable Distributed Data Structures. Tech. Rep. Univ. Paris 9, 1995 (in French)
[T95] Tanenbaum, A., S. Distributed Operating Systems. Prentice Hall, 1995, 601.
[TZK96] Tung, S, Zha, H, Kefe, T. Concurrent Scalable Distributed Data Structures, Proceedings of the ISCA Intl. Conf. on Parallel and Distributed Computing Systems , pp. 131-136, Dijon, France, (Sept., 1996). Edited by K. Yetongnon and S. Harini
[VBWY94] Vingralek, R., Breitbart, Y., Weikum, G. Distributed File Organization with Scalable Cost/Performance. ACM-SIGMOD Int. Conf. On Management of Data, 1994.
[U94] Ullman, J. New Frontiers in Database System Research. Future Tendencies in Computer Science, Control, and Applied Mathematics. Lecture Notes in Computer Science 653, Springer-Verlag, 1994. A. Bensoussan, J. P. Verjus, ed. 87-101.