Data integration, pathway analysis and mining for systems biology

Dissertation

Venkata Gopalacharyulu Peddinti

Research output: ThesisDissertationCollection of Articles

Abstract

Post-genomic molecular biology embodies high-throughput experimental tech-niques and hence is a data-rich field. The goal of this thesis is to develop bioin-formatics methods to utilise publicly available data in order to produce knowl-edge and to aid mining of newly generated data. As an example of knowledge or hypothesis generation, consider function prediction of biological molecules. Assignment of protein function is a non-trivial task owing to the fact that the same protein may be involved in different biological processes, depending on the state of the biological system and protein localisation. The function of a gene or a gene product may be provided as a textual description in a gene or protein annotation database. Such textual descriptions lack in providing the contextual meaning of the gene function. Therefore, we need ways to represent the meaning in a formal way. Here we apply data integration approach to provide rich repre-sentation that enables context-sensitive mining of biological data in terms of integrated networks and conceptual spaces. Context-sensitive gene function an-notation follows naturally from this framework, as a particular application. Next, knowledge that is already publicly available can be used to aid mining of new experimental data. We developed an integrative bioinformatics method that util-ises publicly available knowledge of protein-protein interactions, metabolic net-works and transcriptional regulatory networks to analyse transcriptomics data and predict altered biological processes. We applied this method to a study of dynamic response of Saccharomyces cerevisiae to oxidative stress. The applica-tion of our method revealed dynamically altered biological functions in response to oxidative stress, which were validated by comprehensive in vivo metabolom-ics experiments. The results provided in this thesis indicate that integration of heterogeneous biological data facilitates advanced mining of the data. The meth-ods can be applied for gaining insight into functions of genes, gene products and other molecules, as well as for offering functional interpretation to transcriptom-ics and metabolomics experiments.
Original languageEnglish
QualificationDoctor Degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Kaski, Kimmo, Supervisor, External person
Award date14 May 2010
Place of PublicationEspoo
Publisher
Print ISBNs978-951-38-7385-1
Electronic ISBNs978-951-38-7386-8
Publication statusPublished - 2010
MoE publication typeG5 Doctoral dissertation (article)

Fingerprint

Systems Biology
Genes
Molecular Sequence Annotation
Biological Phenomena
Proteins
Data Mining
Oxidative Stress
Protein Databases
Metabolomics
Gene Regulatory Networks
Computational Biology
Saccharomyces cerevisiae
Molecular Biology

Keywords

  • systems biology
  • high-throughput data
  • data integration
  • data mining
  • visualisation
  • bioinformatics
  • conceptual spaces
  • network topology

Cite this

Peddinti, V. G. (2010). Data integration, pathway analysis and mining for systems biology: Dissertation. Espoo: VTT Technical Research Centre of Finland.
Peddinti, Venkata Gopalacharyulu. / Data integration, pathway analysis and mining for systems biology : Dissertation. Espoo : VTT Technical Research Centre of Finland, 2010. 119 p.
@phdthesis{aa1cdf46ddd54c9c819490e0e564d3c0,
title = "Data integration, pathway analysis and mining for systems biology: Dissertation",
abstract = "Post-genomic molecular biology embodies high-throughput experimental tech-niques and hence is a data-rich field. The goal of this thesis is to develop bioin-formatics methods to utilise publicly available data in order to produce knowl-edge and to aid mining of newly generated data. As an example of knowledge or hypothesis generation, consider function prediction of biological molecules. Assignment of protein function is a non-trivial task owing to the fact that the same protein may be involved in different biological processes, depending on the state of the biological system and protein localisation. The function of a gene or a gene product may be provided as a textual description in a gene or protein annotation database. Such textual descriptions lack in providing the contextual meaning of the gene function. Therefore, we need ways to represent the meaning in a formal way. Here we apply data integration approach to provide rich repre-sentation that enables context-sensitive mining of biological data in terms of integrated networks and conceptual spaces. Context-sensitive gene function an-notation follows naturally from this framework, as a particular application. Next, knowledge that is already publicly available can be used to aid mining of new experimental data. We developed an integrative bioinformatics method that util-ises publicly available knowledge of protein-protein interactions, metabolic net-works and transcriptional regulatory networks to analyse transcriptomics data and predict altered biological processes. We applied this method to a study of dynamic response of Saccharomyces cerevisiae to oxidative stress. The applica-tion of our method revealed dynamically altered biological functions in response to oxidative stress, which were validated by comprehensive in vivo metabolom-ics experiments. The results provided in this thesis indicate that integration of heterogeneous biological data facilitates advanced mining of the data. The meth-ods can be applied for gaining insight into functions of genes, gene products and other molecules, as well as for offering functional interpretation to transcriptom-ics and metabolomics experiments.",
keywords = "systems biology, high-throughput data, data integration, data mining, visualisation, bioinformatics, conceptual spaces, network topology",
author = "Peddinti, {Venkata Gopalacharyulu}",
note = "Project code: 70407",
year = "2010",
language = "English",
isbn = "978-951-38-7385-1",
series = "VTT Publications",
publisher = "VTT Technical Research Centre of Finland",
number = "732",
address = "Finland",
school = "Aalto University",

}

Peddinti, VG 2010, 'Data integration, pathway analysis and mining for systems biology: Dissertation', Doctor Degree, Aalto University, Espoo.

Data integration, pathway analysis and mining for systems biology : Dissertation. / Peddinti, Venkata Gopalacharyulu.

Espoo : VTT Technical Research Centre of Finland, 2010. 119 p.

Research output: ThesisDissertationCollection of Articles

TY - THES

T1 - Data integration, pathway analysis and mining for systems biology

T2 - Dissertation

AU - Peddinti, Venkata Gopalacharyulu

N1 - Project code: 70407

PY - 2010

Y1 - 2010

N2 - Post-genomic molecular biology embodies high-throughput experimental tech-niques and hence is a data-rich field. The goal of this thesis is to develop bioin-formatics methods to utilise publicly available data in order to produce knowl-edge and to aid mining of newly generated data. As an example of knowledge or hypothesis generation, consider function prediction of biological molecules. Assignment of protein function is a non-trivial task owing to the fact that the same protein may be involved in different biological processes, depending on the state of the biological system and protein localisation. The function of a gene or a gene product may be provided as a textual description in a gene or protein annotation database. Such textual descriptions lack in providing the contextual meaning of the gene function. Therefore, we need ways to represent the meaning in a formal way. Here we apply data integration approach to provide rich repre-sentation that enables context-sensitive mining of biological data in terms of integrated networks and conceptual spaces. Context-sensitive gene function an-notation follows naturally from this framework, as a particular application. Next, knowledge that is already publicly available can be used to aid mining of new experimental data. We developed an integrative bioinformatics method that util-ises publicly available knowledge of protein-protein interactions, metabolic net-works and transcriptional regulatory networks to analyse transcriptomics data and predict altered biological processes. We applied this method to a study of dynamic response of Saccharomyces cerevisiae to oxidative stress. The applica-tion of our method revealed dynamically altered biological functions in response to oxidative stress, which were validated by comprehensive in vivo metabolom-ics experiments. The results provided in this thesis indicate that integration of heterogeneous biological data facilitates advanced mining of the data. The meth-ods can be applied for gaining insight into functions of genes, gene products and other molecules, as well as for offering functional interpretation to transcriptom-ics and metabolomics experiments.

AB - Post-genomic molecular biology embodies high-throughput experimental tech-niques and hence is a data-rich field. The goal of this thesis is to develop bioin-formatics methods to utilise publicly available data in order to produce knowl-edge and to aid mining of newly generated data. As an example of knowledge or hypothesis generation, consider function prediction of biological molecules. Assignment of protein function is a non-trivial task owing to the fact that the same protein may be involved in different biological processes, depending on the state of the biological system and protein localisation. The function of a gene or a gene product may be provided as a textual description in a gene or protein annotation database. Such textual descriptions lack in providing the contextual meaning of the gene function. Therefore, we need ways to represent the meaning in a formal way. Here we apply data integration approach to provide rich repre-sentation that enables context-sensitive mining of biological data in terms of integrated networks and conceptual spaces. Context-sensitive gene function an-notation follows naturally from this framework, as a particular application. Next, knowledge that is already publicly available can be used to aid mining of new experimental data. We developed an integrative bioinformatics method that util-ises publicly available knowledge of protein-protein interactions, metabolic net-works and transcriptional regulatory networks to analyse transcriptomics data and predict altered biological processes. We applied this method to a study of dynamic response of Saccharomyces cerevisiae to oxidative stress. The applica-tion of our method revealed dynamically altered biological functions in response to oxidative stress, which were validated by comprehensive in vivo metabolom-ics experiments. The results provided in this thesis indicate that integration of heterogeneous biological data facilitates advanced mining of the data. The meth-ods can be applied for gaining insight into functions of genes, gene products and other molecules, as well as for offering functional interpretation to transcriptom-ics and metabolomics experiments.

KW - systems biology

KW - high-throughput data

KW - data integration

KW - data mining

KW - visualisation

KW - bioinformatics

KW - conceptual spaces

KW - network topology

M3 - Dissertation

SN - 978-951-38-7385-1

T3 - VTT Publications

PB - VTT Technical Research Centre of Finland

CY - Espoo

ER -

Peddinti VG. Data integration, pathway analysis and mining for systems biology: Dissertation. Espoo: VTT Technical Research Centre of Finland, 2010. 119 p.