SEMINAIRE
Université Paris-Dauphine
Place du Maréchal de Lattre de Tassigny - Paris 16éme
Métro: Porte Dauphine - RER Ligne C: Avenue Foch -
BUS PC: Porte Dauphine
Mercredi
14 novembre 2001, 15h30
Salle A 703, 7ème étage, Bat. A ( Nouvelle aile du bâtiment de l'université)
Evaluation
of Join Strategies for Distributed Mediation
by
Vanja
Josifovski, Timour Katchanouov, Tore Risch
Three join algorithms
is described and evaluated for an environment composed of distributed
main-memory based mediators and data sources. First a streamed ship-out join is
presented where bulks of tuples are shipped to a mediator close to a data
source, followed by post-processing in the client mediator. The second join
algorithm is an extended streamed semi-join that in addition incrementally builds
a main-memory hash index in the client mediator. The third is a ship-in
algorithm where the data is materialized in the client mediator before the join
is executed there. The first two algorithms are suitable for sources that
require parameters to execute a query, as web search engines and computational
software. The last algorithm is used with sources not supporting parameterized
queries. For the algorithms we compare the execution times for obtaining the
last and the first N tuples and analyze the portion of the time spent in the different
subsystems. We varied the speed of the communication
network, bulk size,
duplicates in shipped data, and the size of the mediator's main memory. The
study shows that the choice of a join algorithm can lead to orders of magnitude
difference in the execution times in different mediation environments.