|
Abstract:
|
In this paper we study the performance of a distributed search engine from a data caching point of view. We compare and combine two different approaches to achieve better hit rates: (a) send the queries to the node which currently has the related data in its local memory (cache-aware load balancing), and (b) send the cached contents to the node where a query is being currently processed (cooperative caching). Furthermore, we study the best scheduling points in the query computation in which they can be reassigned to another node, and how this reassignation should be performed. Our analysis is guided by statistical tools on a real question answering system for several query distributions, which are typically found in query logs. |