Paper abstract

Parallel Spectral Clustering

Yangqiu Song - Tsinghua University, China
Wen-Yen Chen - University of California, Santa Barbara, USA
Hongjie Bai - Google Research, USA/China
Chih-Jen Lin - National Taiwan University, Taipei, Taiwan
Edward Y. Chang - Google Research, USA/China

Session: Clustering 1
Springer Link: http://dx.doi.org/10.1007/978-3-540-87481-2_25

Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193,844 data instances and a large photo dataset of 637,137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem.