Scalable Document Collection Clustering
Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree is an approach to clustering which scales well with large document collections due to its low algorithmic complexity and document encoding efficiency. K-tree is also useful for managing volatile collections overcoming the limitations of clustering algorithms that rely on global collection analysis. Random Indexing solves problems which arise when dealing with unconstrained feature sets and sparse document vectors when using cluster trees. The algorithms and data structures will be motivated, presented and explained. Result of experiments will be presented and discussed.
Shlomo Geva is the Head of Computer Science at the Queensland University of Technology, Faculty of Science an Technology. Shlomo is the co-chair of the INEX evaluations forum, dedicated to the evaluation of XML Information Retrieval systems. His research interests cover Information Retrieval, Computational Intelligence and Data Mining.