Paper abstract

StreamKrimp: Detecting Change in Data Streams

Matthijs van Leeuwen - Universiteit Utrecht, The Netherlands
Arno Siebes - Universiteit Utrecht, The Netherlands

Session: Mining Sequences and Streams
Springer Link: http://dx.doi.org/10.1007/978-3-540-87479-9_62

Data streams are ubiquitous. Examples range from sensor networks to financial transactions and website logs. In fact, even market basket data can be seen as a stream of sales. Detecting changes in the distribution a stream is sampled from is one of the most challenging problems in stream mining, as only limited storage can be used. In this paper we analyse this problem for streams of transaction data from an MDL perspective. Based on this analysis we introduce the STREAMKRIMP algorithm, which uses the KRIMP algorithm to characterise probability distributions with code tables. With these code tables, STREAMKRIMP partitions the stream into a sequence of substreams. Each switch of code table indicates a change in the underlying distribution. Experiments on both real and artificial streams show that STREAMKRIMP detects the changes while using only a very limited amount of data storage.