Bioinformatics approaches for the analysis of lipidomics data

Dissertation

Laxmana Rao Yetukuri

Research output: ThesisDissertationCollection of Articles

Abstract

The potential impact of lipid research has been increasingly realised both in disease treatment and prevention. Recent advances in soft ionization mass spectrometry (MS) such as electrospray ionization (ESI) have permitted parallel monitoring of several hundreds of lipids in a single experiment and thus facilitated lipidomics level studies. These advances, however, pose a greater challenge for bioinformaticians to handle massive amounts of information-rich MS data from modern analytical instruments in order to understand complex functions of lipids. The main aims of this thesis were to 1) develop bioinformatics approaches for lipid identification based on ultra performance liquid chromatography coupled to mass spectrometry (UPLC/MS) data, 2) predict the functional annotations for unidentified lipids, 3) understand the omics data in the context of pathways and 4) apply existing chemometric methods for exploratory data analysis as well as biomarker discovery. A bioinformatics strategy for the construction of lipid database for major classes of lipids is presented using simplified molecular input line entry system (SMILES) approach. The database was annotated with relevant information such as lipid names including short names, SMILES information, scores, molecular weight, monoisotopic mass, and isotope distribution. The database was tailored for UPLC/MS experiments by incorporating the information such as retention time range, adduct information and main fragments to screen for the potential lipids. This database information facilitated building experimental tandem mass spectrometry libraries for different biological tissues. Non-targeted metabolomics screening is often get plagued by the presence of unknown peaks and thus present an additional challenge for data interpretation. Multiple supervised classification methods were employed and compared for the functional prediction of class labels for unidentified lipids to facilitate exploratory analysis further as well as ease the identification process. As lipidomics goes beyond complete characterization of lipids, new strategies were developed to understand lipids in the context of pathways and thereby providing insights for the phenotype characterization. Chemometric methods such as principal component analysis (PCA) and partial least squares and discriminant analysis (PLS/DA) were utilised for exploratory analysis as well as biomarker discovery in the context of different disease phenotypes.
Original languageEnglish
QualificationDoctor Degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Taskinen, M., Supervisor, External person
  • Jauhiainen, Matti, Supervisor, External person
  • Orešič, Matej, Supervisor, External person
Award date11 Jun 2010
Place of PublicationEspoo
Publisher
Print ISBNs978-951-38-7402-5
Electronic ISBNs978-951-38-7403-2
Publication statusPublished - 2010
MoE publication typeG5 Doctoral dissertation (article)

Fingerprint

Bioinformatics
Lipids
Mass spectrometry
Liquid chromatography
Biomarkers
Electrospray ionization
Discriminant analysis
Isotopes
Principal component analysis
Ionization
Labels
Screening
Information systems
Experiments
Molecular weight
Tissue

Keywords

  • Lipids
  • Lipidomics
  • Bioinformatics
  • Lipid pathways
  • High density lipoproteins
  • k-nearest neighbours
  • Liquid chromatography/mass spectrometry
  • Principal component analysis
  • Partial least squares and discriminant analysis
  • Obesity
  • Support vector machines
  • LipidDB

Cite this

Yetukuri, L. R. (2010). Bioinformatics approaches for the analysis of lipidomics data: Dissertation. Espoo: VTT Technical Research Centre of Finland.
Yetukuri, Laxmana Rao. / Bioinformatics approaches for the analysis of lipidomics data : Dissertation. Espoo : VTT Technical Research Centre of Finland, 2010. 168 p.
@phdthesis{43db710974a6487e9c7b200741abcb54,
title = "Bioinformatics approaches for the analysis of lipidomics data: Dissertation",
abstract = "The potential impact of lipid research has been increasingly realised both in disease treatment and prevention. Recent advances in soft ionization mass spectrometry (MS) such as electrospray ionization (ESI) have permitted parallel monitoring of several hundreds of lipids in a single experiment and thus facilitated lipidomics level studies. These advances, however, pose a greater challenge for bioinformaticians to handle massive amounts of information-rich MS data from modern analytical instruments in order to understand complex functions of lipids. The main aims of this thesis were to 1) develop bioinformatics approaches for lipid identification based on ultra performance liquid chromatography coupled to mass spectrometry (UPLC/MS) data, 2) predict the functional annotations for unidentified lipids, 3) understand the omics data in the context of pathways and 4) apply existing chemometric methods for exploratory data analysis as well as biomarker discovery. A bioinformatics strategy for the construction of lipid database for major classes of lipids is presented using simplified molecular input line entry system (SMILES) approach. The database was annotated with relevant information such as lipid names including short names, SMILES information, scores, molecular weight, monoisotopic mass, and isotope distribution. The database was tailored for UPLC/MS experiments by incorporating the information such as retention time range, adduct information and main fragments to screen for the potential lipids. This database information facilitated building experimental tandem mass spectrometry libraries for different biological tissues. Non-targeted metabolomics screening is often get plagued by the presence of unknown peaks and thus present an additional challenge for data interpretation. Multiple supervised classification methods were employed and compared for the functional prediction of class labels for unidentified lipids to facilitate exploratory analysis further as well as ease the identification process. As lipidomics goes beyond complete characterization of lipids, new strategies were developed to understand lipids in the context of pathways and thereby providing insights for the phenotype characterization. Chemometric methods such as principal component analysis (PCA) and partial least squares and discriminant analysis (PLS/DA) were utilised for exploratory analysis as well as biomarker discovery in the context of different disease phenotypes.",
keywords = "Lipids, Lipidomics, Bioinformatics, Lipid pathways, High density lipoproteins, k-nearest neighbours, Liquid chromatography/mass spectrometry, Principal component analysis, Partial least squares and discriminant analysis, Obesity, Support vector machines, LipidDB",
author = "Yetukuri, {Laxmana Rao}",
year = "2010",
language = "English",
isbn = "978-951-38-7402-5",
series = "VTT Publications",
publisher = "VTT Technical Research Centre of Finland",
number = "741",
address = "Finland",
school = "Aalto University",

}

Yetukuri, LR 2010, 'Bioinformatics approaches for the analysis of lipidomics data: Dissertation', Doctor Degree, Aalto University, Espoo.

Bioinformatics approaches for the analysis of lipidomics data : Dissertation. / Yetukuri, Laxmana Rao.

Espoo : VTT Technical Research Centre of Finland, 2010. 168 p.

Research output: ThesisDissertationCollection of Articles

TY - THES

T1 - Bioinformatics approaches for the analysis of lipidomics data

T2 - Dissertation

AU - Yetukuri, Laxmana Rao

PY - 2010

Y1 - 2010

N2 - The potential impact of lipid research has been increasingly realised both in disease treatment and prevention. Recent advances in soft ionization mass spectrometry (MS) such as electrospray ionization (ESI) have permitted parallel monitoring of several hundreds of lipids in a single experiment and thus facilitated lipidomics level studies. These advances, however, pose a greater challenge for bioinformaticians to handle massive amounts of information-rich MS data from modern analytical instruments in order to understand complex functions of lipids. The main aims of this thesis were to 1) develop bioinformatics approaches for lipid identification based on ultra performance liquid chromatography coupled to mass spectrometry (UPLC/MS) data, 2) predict the functional annotations for unidentified lipids, 3) understand the omics data in the context of pathways and 4) apply existing chemometric methods for exploratory data analysis as well as biomarker discovery. A bioinformatics strategy for the construction of lipid database for major classes of lipids is presented using simplified molecular input line entry system (SMILES) approach. The database was annotated with relevant information such as lipid names including short names, SMILES information, scores, molecular weight, monoisotopic mass, and isotope distribution. The database was tailored for UPLC/MS experiments by incorporating the information such as retention time range, adduct information and main fragments to screen for the potential lipids. This database information facilitated building experimental tandem mass spectrometry libraries for different biological tissues. Non-targeted metabolomics screening is often get plagued by the presence of unknown peaks and thus present an additional challenge for data interpretation. Multiple supervised classification methods were employed and compared for the functional prediction of class labels for unidentified lipids to facilitate exploratory analysis further as well as ease the identification process. As lipidomics goes beyond complete characterization of lipids, new strategies were developed to understand lipids in the context of pathways and thereby providing insights for the phenotype characterization. Chemometric methods such as principal component analysis (PCA) and partial least squares and discriminant analysis (PLS/DA) were utilised for exploratory analysis as well as biomarker discovery in the context of different disease phenotypes.

AB - The potential impact of lipid research has been increasingly realised both in disease treatment and prevention. Recent advances in soft ionization mass spectrometry (MS) such as electrospray ionization (ESI) have permitted parallel monitoring of several hundreds of lipids in a single experiment and thus facilitated lipidomics level studies. These advances, however, pose a greater challenge for bioinformaticians to handle massive amounts of information-rich MS data from modern analytical instruments in order to understand complex functions of lipids. The main aims of this thesis were to 1) develop bioinformatics approaches for lipid identification based on ultra performance liquid chromatography coupled to mass spectrometry (UPLC/MS) data, 2) predict the functional annotations for unidentified lipids, 3) understand the omics data in the context of pathways and 4) apply existing chemometric methods for exploratory data analysis as well as biomarker discovery. A bioinformatics strategy for the construction of lipid database for major classes of lipids is presented using simplified molecular input line entry system (SMILES) approach. The database was annotated with relevant information such as lipid names including short names, SMILES information, scores, molecular weight, monoisotopic mass, and isotope distribution. The database was tailored for UPLC/MS experiments by incorporating the information such as retention time range, adduct information and main fragments to screen for the potential lipids. This database information facilitated building experimental tandem mass spectrometry libraries for different biological tissues. Non-targeted metabolomics screening is often get plagued by the presence of unknown peaks and thus present an additional challenge for data interpretation. Multiple supervised classification methods were employed and compared for the functional prediction of class labels for unidentified lipids to facilitate exploratory analysis further as well as ease the identification process. As lipidomics goes beyond complete characterization of lipids, new strategies were developed to understand lipids in the context of pathways and thereby providing insights for the phenotype characterization. Chemometric methods such as principal component analysis (PCA) and partial least squares and discriminant analysis (PLS/DA) were utilised for exploratory analysis as well as biomarker discovery in the context of different disease phenotypes.

KW - Lipids

KW - Lipidomics

KW - Bioinformatics

KW - Lipid pathways

KW - High density lipoproteins

KW - k-nearest neighbours

KW - Liquid chromatography/mass spectrometry

KW - Principal component analysis

KW - Partial least squares and discriminant analysis

KW - Obesity

KW - Support vector machines

KW - LipidDB

M3 - Dissertation

SN - 978-951-38-7402-5

T3 - VTT Publications

PB - VTT Technical Research Centre of Finland

CY - Espoo

ER -

Yetukuri LR. Bioinformatics approaches for the analysis of lipidomics data: Dissertation. Espoo: VTT Technical Research Centre of Finland, 2010. 168 p.