Unsupervised online detection and prediction of outliers in streams of sensor data

Niko Reunanen, Tomi Räty (Corresponding Author), Juho J. Jokinen, Tyler Hoyt, David Culler

    Research output: Contribution to journalArticleScientificpeer-review

    Abstract

    Outliers are unexpected observations, which deviate from the majority of observations. Outlier detection and prediction are challenging tasks, because outliers are rare by definition. A stream is an unbounded source of data, which has to be processed promptly. This article proposes novel methods for outlier detection and outlier prediction in streams of sensor data. The outlier detection is an independent, unsupervised process, which is implemented using an autoencoder. The outlier detection continuously evaluates if the latest data point xi from a stream is an inlier or an outlier. This distinction is based on the reconstruction cost accompanied with Chebyshev’s inequality and the EWMA (exponentially weighted moving average) model. The outlier prediction uses the results of the outlier detection to form the required training data. The outlier prediction utilizes LR (logistic regression), SGD (stochastic gradient descent) and the hidden representation provided by the autoencoder to predict outliers in streams. The results of the experiments show that the proposed methods (1) provide accurate results, (2) are calculated in reduced computation time and (3) use a low amount of memory. Our proposed methods are suitable for analyzing streams of sensor data and providing results with low latency. The experiments also indicated that the outlier prediction is able to anticipate the occurrence of outliers in streams of sensor data.
    Original languageEnglish
    JournalInternational Journal of Data Science and Analytics
    DOIs
    Publication statusE-pub ahead of print - 3 Jun 2019
    MoE publication typeA1 Journal article-refereed

    Fingerprint

    Sensors
    Logistics
    Experiments
    Data storage equipment
    Costs

    Cite this

    @article{6fbb42522ba845e196d260fe863211de,
    title = "Unsupervised online detection and prediction of outliers in streams of sensor data",
    abstract = "Outliers are unexpected observations, which deviate from the majority of observations. Outlier detection and prediction are challenging tasks, because outliers are rare by definition. A stream is an unbounded source of data, which has to be processed promptly. This article proposes novel methods for outlier detection and outlier prediction in streams of sensor data. The outlier detection is an independent, unsupervised process, which is implemented using an autoencoder. The outlier detection continuously evaluates if the latest data point xi from a stream is an inlier or an outlier. This distinction is based on the reconstruction cost accompanied with Chebyshev’s inequality and the EWMA (exponentially weighted moving average) model. The outlier prediction uses the results of the outlier detection to form the required training data. The outlier prediction utilizes LR (logistic regression), SGD (stochastic gradient descent) and the hidden representation provided by the autoencoder to predict outliers in streams. The results of the experiments show that the proposed methods (1) provide accurate results, (2) are calculated in reduced computation time and (3) use a low amount of memory. Our proposed methods are suitable for analyzing streams of sensor data and providing results with low latency. The experiments also indicated that the outlier prediction is able to anticipate the occurrence of outliers in streams of sensor data.",
    author = "Niko Reunanen and Tomi R{\"a}ty and Jokinen, {Juho J.} and Tyler Hoyt and David Culler",
    year = "2019",
    month = "6",
    day = "3",
    doi = "10.1007/s41060-019-00191-3",
    language = "English",
    journal = "International Journal of Data Science and Analytics",
    issn = "2364-415X",
    publisher = "Springer",

    }

    Unsupervised online detection and prediction of outliers in streams of sensor data. / Reunanen, Niko; Räty, Tomi (Corresponding Author); Jokinen, Juho J.; Hoyt, Tyler; Culler, David.

    In: International Journal of Data Science and Analytics, 03.06.2019.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - Unsupervised online detection and prediction of outliers in streams of sensor data

    AU - Reunanen, Niko

    AU - Räty, Tomi

    AU - Jokinen, Juho J.

    AU - Hoyt, Tyler

    AU - Culler, David

    PY - 2019/6/3

    Y1 - 2019/6/3

    N2 - Outliers are unexpected observations, which deviate from the majority of observations. Outlier detection and prediction are challenging tasks, because outliers are rare by definition. A stream is an unbounded source of data, which has to be processed promptly. This article proposes novel methods for outlier detection and outlier prediction in streams of sensor data. The outlier detection is an independent, unsupervised process, which is implemented using an autoencoder. The outlier detection continuously evaluates if the latest data point xi from a stream is an inlier or an outlier. This distinction is based on the reconstruction cost accompanied with Chebyshev’s inequality and the EWMA (exponentially weighted moving average) model. The outlier prediction uses the results of the outlier detection to form the required training data. The outlier prediction utilizes LR (logistic regression), SGD (stochastic gradient descent) and the hidden representation provided by the autoencoder to predict outliers in streams. The results of the experiments show that the proposed methods (1) provide accurate results, (2) are calculated in reduced computation time and (3) use a low amount of memory. Our proposed methods are suitable for analyzing streams of sensor data and providing results with low latency. The experiments also indicated that the outlier prediction is able to anticipate the occurrence of outliers in streams of sensor data.

    AB - Outliers are unexpected observations, which deviate from the majority of observations. Outlier detection and prediction are challenging tasks, because outliers are rare by definition. A stream is an unbounded source of data, which has to be processed promptly. This article proposes novel methods for outlier detection and outlier prediction in streams of sensor data. The outlier detection is an independent, unsupervised process, which is implemented using an autoencoder. The outlier detection continuously evaluates if the latest data point xi from a stream is an inlier or an outlier. This distinction is based on the reconstruction cost accompanied with Chebyshev’s inequality and the EWMA (exponentially weighted moving average) model. The outlier prediction uses the results of the outlier detection to form the required training data. The outlier prediction utilizes LR (logistic regression), SGD (stochastic gradient descent) and the hidden representation provided by the autoencoder to predict outliers in streams. The results of the experiments show that the proposed methods (1) provide accurate results, (2) are calculated in reduced computation time and (3) use a low amount of memory. Our proposed methods are suitable for analyzing streams of sensor data and providing results with low latency. The experiments also indicated that the outlier prediction is able to anticipate the occurrence of outliers in streams of sensor data.

    U2 - 10.1007/s41060-019-00191-3

    DO - 10.1007/s41060-019-00191-3

    M3 - Article

    JO - International Journal of Data Science and Analytics

    JF - International Journal of Data Science and Analytics

    SN - 2364-415X

    ER -