Yongzhe Chang, Zhidong Li, Bang Zhang, Ling Luo, Arcot Sowmya, Yang Wang, Fang Chen
Pacific-Asia Conference on Knowledge Discovery and Data Mining
Unmarked event data is increasingly popular in temporal modeling, containing only the timestamp of each event occurrence without specifying the class or description of the events. A sequence of event is usually modeled as the realization from a latent intensity series. When the intensity varies, the events follow the Non-Homogeneous Poisson Process (NHPP). To analyze a sequence of such kind of events, an important task is to measure the similarity between two sequences based on their intensities. To avoid the difficulties of estimating the latent intensities, we measure the similarity using timestamps by Dynamic Time Warping (DTW), which can also resolve the issue that observations between two sequences are not aligned in time. Furthermore, real event data always has superposed noise, e.g. when comparing the purchase behaviour of two customers, we can be mislead if one customer visits market more often because of some occasional shopping events. We shall recover the DTW distance between two noise-superposed NHPP sequences to evaluate the similarity between them. We proposed two strategies, which are removing noise events on all possibilities before calculating the DTW distance, and integrating the noise removal into the DTW calculation in dynamic programming. We compare empirical performance of all the methods and quantitatively show that the proposed methods can recover the DTW distance effectively and efficiently.