Functional prediction of unidentified lipids using supervised classifiers

Laxman Yetukuri (Corresponding Author), Jarkko Tikka, Jaakko Hollmén, Matej Orešič

Research output: Contribution to journalArticleScientificpeer-review

11 Citations (Scopus)

Abstract

Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.
Original languageEnglish
Pages (from-to)18-26
JournalMetabolomics
Volume6
Issue number1
DOIs
Publication statusPublished - 2010
MoE publication typeA1 Journal article-refereed

Fingerprint

Classifiers
Metabolomics
Discriminant Analysis
Labels
Least-Squares Analysis
Lipids
Mass spectrometry
Support vector machines
Discriminant analysis
Tandem Mass Spectrometry
Liquid Chromatography
Mass Spectrometry
Liquid chromatography
Metabolites
Screening
Throughput
Support Vector Machine
Datasets

Keywords

  • Lipidomics
  • mass spectrometry
  • machine learning
  • k-NN
  • SVM
  • PLS-DA
  • Naive Bayes

Cite this

Yetukuri, Laxman ; Tikka, Jarkko ; Hollmén, Jaakko ; Orešič, Matej. / Functional prediction of unidentified lipids using supervised classifiers. In: Metabolomics. 2010 ; Vol. 6, No. 1. pp. 18-26.
@article{5b99c615531f440a81d44b34315d5a03,
title = "Functional prediction of unidentified lipids using supervised classifiers",
abstract = "Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.",
keywords = "Lipidomics, mass spectrometry, machine learning, k-NN, SVM, PLS-DA, Naive Bayes",
author = "Laxman Yetukuri and Jarkko Tikka and Jaakko Hollm{\'e}n and Matej Orešič",
year = "2010",
doi = "10.1007/s11306-009-0179-x",
language = "English",
volume = "6",
pages = "18--26",
journal = "Metabolomics",
issn = "1573-3882",
publisher = "Springer",
number = "1",

}

Yetukuri, L, Tikka, J, Hollmén, J & Orešič, M 2010, 'Functional prediction of unidentified lipids using supervised classifiers', Metabolomics, vol. 6, no. 1, pp. 18-26. https://doi.org/10.1007/s11306-009-0179-x

Functional prediction of unidentified lipids using supervised classifiers. / Yetukuri, Laxman (Corresponding Author); Tikka, Jarkko; Hollmén, Jaakko; Orešič, Matej.

In: Metabolomics, Vol. 6, No. 1, 2010, p. 18-26.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Functional prediction of unidentified lipids using supervised classifiers

AU - Yetukuri, Laxman

AU - Tikka, Jarkko

AU - Hollmén, Jaakko

AU - Orešič, Matej

PY - 2010

Y1 - 2010

N2 - Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.

AB - Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.

KW - Lipidomics

KW - mass spectrometry

KW - machine learning

KW - k-NN

KW - SVM

KW - PLS-DA

KW - Naive Bayes

U2 - 10.1007/s11306-009-0179-x

DO - 10.1007/s11306-009-0179-x

M3 - Article

VL - 6

SP - 18

EP - 26

JO - Metabolomics

JF - Metabolomics

SN - 1573-3882

IS - 1

ER -