Multivariate multi-way analysis of multi-source data

Ilkka Huopaniemi (Corresponding Author), Tommi Suvitaival, Janne Nikkilä, Matej Orešič, Samuel Kaski (Corresponding Author)

Research output: Contribution to journalArticleScientificpeer-review

31 Citations (Scopus)

Abstract

Motivation: Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates.

Results: We extend the applicability area of multivariate, multi-way ANOVA-type methods to multi-source cases by introducing a novel Bayesian model. The method is capable of finding covariate-related dependencies between the sources. It assumes the measurements consist of groups of similarly behaving variables, and estimates the multivariate covariate effects and their interaction effects for the discovered groups of variables. In particular, the method partitions the effects to those shared between the sources and to source-specific ones. The method is specifically designed for datasets with small sample sizes and high dimensionality.

We apply the method to a lipidomics dataset from a lung cancer study with two-way experimental setup, where measurements from several tissues with mostly distinct lipids have been taken. The method is also directly applicable to gene expression and proteomics.
Original languageEnglish
Pages (from-to)i391-i398
JournalBioinformatics
Volume26
Issue number12
DOIs
Publication statusPublished - 2010
MoE publication typeA1 Journal article-refereed

Fingerprint

Multiway Analysis
Information Storage and Retrieval
Multivariate Analysis
Analysis of variance (ANOVA)
Gene expression
Tissue
Lipids
Covariates
Time series
Throughput
Small Sample Size
Analysis of variance
Sample Size
Dimensionality
Analysis of Variance
Experiments
Gene Expression Profile
Metabolome
Interaction Effects
Lung Cancer

Cite this

Huopaniemi, I., Suvitaival, T., Nikkilä, J., Orešič, M., & Kaski, S. (2010). Multivariate multi-way analysis of multi-source data. Bioinformatics, 26(12), i391-i398. https://doi.org/10.1093/bioinformatics/btq174
Huopaniemi, Ilkka ; Suvitaival, Tommi ; Nikkilä, Janne ; Orešič, Matej ; Kaski, Samuel. / Multivariate multi-way analysis of multi-source data. In: Bioinformatics. 2010 ; Vol. 26, No. 12. pp. i391-i398.
@article{b3c57c8bb80c4478a765805b2498be5b,
title = "Multivariate multi-way analysis of multi-source data",
abstract = "Motivation: Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates.Results: We extend the applicability area of multivariate, multi-way ANOVA-type methods to multi-source cases by introducing a novel Bayesian model. The method is capable of finding covariate-related dependencies between the sources. It assumes the measurements consist of groups of similarly behaving variables, and estimates the multivariate covariate effects and their interaction effects for the discovered groups of variables. In particular, the method partitions the effects to those shared between the sources and to source-specific ones. The method is specifically designed for datasets with small sample sizes and high dimensionality.We apply the method to a lipidomics dataset from a lung cancer study with two-way experimental setup, where measurements from several tissues with mostly distinct lipids have been taken. The method is also directly applicable to gene expression and proteomics.",
author = "Ilkka Huopaniemi and Tommi Suvitaival and Janne Nikkil{\"a} and Matej Orešič and Samuel Kaski",
year = "2010",
doi = "10.1093/bioinformatics/btq174",
language = "English",
volume = "26",
pages = "i391--i398",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

Huopaniemi, I, Suvitaival, T, Nikkilä, J, Orešič, M & Kaski, S 2010, 'Multivariate multi-way analysis of multi-source data', Bioinformatics, vol. 26, no. 12, pp. i391-i398. https://doi.org/10.1093/bioinformatics/btq174

Multivariate multi-way analysis of multi-source data. / Huopaniemi, Ilkka (Corresponding Author); Suvitaival, Tommi; Nikkilä, Janne; Orešič, Matej; Kaski, Samuel (Corresponding Author).

In: Bioinformatics, Vol. 26, No. 12, 2010, p. i391-i398.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Multivariate multi-way analysis of multi-source data

AU - Huopaniemi, Ilkka

AU - Suvitaival, Tommi

AU - Nikkilä, Janne

AU - Orešič, Matej

AU - Kaski, Samuel

PY - 2010

Y1 - 2010

N2 - Motivation: Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates.Results: We extend the applicability area of multivariate, multi-way ANOVA-type methods to multi-source cases by introducing a novel Bayesian model. The method is capable of finding covariate-related dependencies between the sources. It assumes the measurements consist of groups of similarly behaving variables, and estimates the multivariate covariate effects and their interaction effects for the discovered groups of variables. In particular, the method partitions the effects to those shared between the sources and to source-specific ones. The method is specifically designed for datasets with small sample sizes and high dimensionality.We apply the method to a lipidomics dataset from a lung cancer study with two-way experimental setup, where measurements from several tissues with mostly distinct lipids have been taken. The method is also directly applicable to gene expression and proteomics.

AB - Motivation: Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates.Results: We extend the applicability area of multivariate, multi-way ANOVA-type methods to multi-source cases by introducing a novel Bayesian model. The method is capable of finding covariate-related dependencies between the sources. It assumes the measurements consist of groups of similarly behaving variables, and estimates the multivariate covariate effects and their interaction effects for the discovered groups of variables. In particular, the method partitions the effects to those shared between the sources and to source-specific ones. The method is specifically designed for datasets with small sample sizes and high dimensionality.We apply the method to a lipidomics dataset from a lung cancer study with two-way experimental setup, where measurements from several tissues with mostly distinct lipids have been taken. The method is also directly applicable to gene expression and proteomics.

U2 - 10.1093/bioinformatics/btq174

DO - 10.1093/bioinformatics/btq174

M3 - Article

VL - 26

SP - i391-i398

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -

Huopaniemi I, Suvitaival T, Nikkilä J, Orešič M, Kaski S. Multivariate multi-way analysis of multi-source data. Bioinformatics. 2010;26(12):i391-i398. https://doi.org/10.1093/bioinformatics/btq174