Paper abstract

Client-Friendly Classification over Random Hyperplane Hashes

Shyamsundar Rajaram - Hewlett Packard Laboratories, USA
Martin Scholz - Hewlett Packard Laboratories, USA

Session: Classification 1
Springer Link: http://dx.doi.org/10.1007/978-3-540-87481-2_17

We introduce a powerful and general feature representation based on a locality sensitive hash scheme called random hyperplane hashing (RHH) to address the problem of centrally learning (linear) classification models from data that is distributed on a number of clients, and subsequently deploying these models on the same clients. Our main goal is to balance classifier accuracy and different kinds of costs related to their deployment, including communication costs and computational complexity. We study how well schemes for sparse high-dimensional data adapt to the much denser representations gained by RHH, how much data has to be transmitted to preserve enough of the semantics of each document, and how the representations affect the overall computational complexity. We provide theoretical results in the form of error bounds and margin based bounds to analyze the performance of classifiers learnt over RHH, and empirically illustrate attractive properties of RHH over conventional representations.