Two-way analysis of high-dimensional collinear data

Ilkka Huopaniemi (Corresponding Author), Tommi Suvitaival, Janne Nikkilä, Matej Orešič, Samuel Kaski

Research output: Contribution to journalArticleScientificpeer-review

14 Citations (Scopus)

Abstract

We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.
Original languageEnglish
Pages (from-to)261-276
JournalData Mining and Knowledge Discovery
Volume19
Issue number2
DOIs
Publication statusPublished - 2009
MoE publication typeA1 Journal article-refereed

Fingerprint

Analysis of variance (ANOVA)
Medical problems
Decomposition

Keywords

  • ANOVA
  • factor analysis
  • hierarchical model
  • metabolomics
  • multi-way analysis
  • small sample-size

Cite this

Huopaniemi, I., Suvitaival, T., Nikkilä, J., Orešič, M., & Kaski, S. (2009). Two-way analysis of high-dimensional collinear data. Data Mining and Knowledge Discovery, 19(2), 261-276. https://doi.org/10.1007/s10618-009-0142-5
Huopaniemi, Ilkka ; Suvitaival, Tommi ; Nikkilä, Janne ; Orešič, Matej ; Kaski, Samuel. / Two-way analysis of high-dimensional collinear data. In: Data Mining and Knowledge Discovery. 2009 ; Vol. 19, No. 2. pp. 261-276.
@article{9d651885b1954c6faa5b6c14147e69c3,
title = "Two-way analysis of high-dimensional collinear data",
abstract = "We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.",
keywords = "ANOVA, factor analysis, hierarchical model, metabolomics, multi-way analysis, small sample-size",
author = "Ilkka Huopaniemi and Tommi Suvitaival and Janne Nikkil{\"a} and Matej Orešič and Samuel Kaski",
year = "2009",
doi = "10.1007/s10618-009-0142-5",
language = "English",
volume = "19",
pages = "261--276",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer",
number = "2",

}

Huopaniemi, I, Suvitaival, T, Nikkilä, J, Orešič, M & Kaski, S 2009, 'Two-way analysis of high-dimensional collinear data', Data Mining and Knowledge Discovery, vol. 19, no. 2, pp. 261-276. https://doi.org/10.1007/s10618-009-0142-5

Two-way analysis of high-dimensional collinear data. / Huopaniemi, Ilkka (Corresponding Author); Suvitaival, Tommi; Nikkilä, Janne; Orešič, Matej; Kaski, Samuel.

In: Data Mining and Knowledge Discovery, Vol. 19, No. 2, 2009, p. 261-276.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Two-way analysis of high-dimensional collinear data

AU - Huopaniemi, Ilkka

AU - Suvitaival, Tommi

AU - Nikkilä, Janne

AU - Orešič, Matej

AU - Kaski, Samuel

PY - 2009

Y1 - 2009

N2 - We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.

AB - We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.

KW - ANOVA

KW - factor analysis

KW - hierarchical model

KW - metabolomics

KW - multi-way analysis

KW - small sample-size

U2 - 10.1007/s10618-009-0142-5

DO - 10.1007/s10618-009-0142-5

M3 - Article

VL - 19

SP - 261

EP - 276

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 2

ER -

Huopaniemi I, Suvitaival T, Nikkilä J, Orešič M, Kaski S. Two-way analysis of high-dimensional collinear data. Data Mining and Knowledge Discovery. 2009;19(2):261-276. https://doi.org/10.1007/s10618-009-0142-5