GTI

A novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets

John Mpindi (Corresponding Author), Henri Sara, Saija Haapa-Paananen, Sami Kilpinen, T. Pisto, Elmar Bucher, K. Ojala, Kristiina Iljin, Paula Vainio, M. Björkman, Santosh Gupta, Pekka Kohonen, Matthias Nees, Olli Kallioniemi

Research output: Contribution to journalArticleScientificpeer-review

22 Citations (Scopus)

Abstract

Background: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes. Methodology: A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. Conclusions/Significance: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).
Original languageEnglish
Article numbere17259
Number of pages12
JournalPLoS ONE
Volume6
Issue number2
DOIs
Publication statusPublished - 2011
MoE publication typeA1 Journal article-refereed

Fingerprint

Microarrays
Transcriptome
Gene expression
Genes
Tissue
gene expression
genes
Glioblastoma
statistical analysis
Statistical methods
oncogenes
Datasets
tissues
Oncogenes
Gene Expression
Tumors
Staining and Labeling
therapeutics
gene overexpression
neoplasms

Cite this

Mpindi, John ; Sara, Henri ; Haapa-Paananen, Saija ; Kilpinen, Sami ; Pisto, T. ; Bucher, Elmar ; Ojala, K. ; Iljin, Kristiina ; Vainio, Paula ; Björkman, M. ; Gupta, Santosh ; Kohonen, Pekka ; Nees, Matthias ; Kallioniemi, Olli. / GTI : A novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets. In: PLoS ONE. 2011 ; Vol. 6, No. 2.
@article{8c40d70a482040b28add63893e39bd0e,
title = "GTI: A novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets",
abstract = "Background: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes. Methodology: A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65{\%} (19 of 29) of these genes, and 17 of these 19 genes (90{\%}) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. Conclusions/Significance: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).",
author = "John Mpindi and Henri Sara and Saija Haapa-Paananen and Sami Kilpinen and T. Pisto and Elmar Bucher and K. Ojala and Kristiina Iljin and Paula Vainio and M. Bj{\"o}rkman and Santosh Gupta and Pekka Kohonen and Matthias Nees and Olli Kallioniemi",
year = "2011",
doi = "10.1371/journal.pone.0017259",
language = "English",
volume = "6",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "2",

}

Mpindi, J, Sara, H, Haapa-Paananen, S, Kilpinen, S, Pisto, T, Bucher, E, Ojala, K, Iljin, K, Vainio, P, Björkman, M, Gupta, S, Kohonen, P, Nees, M & Kallioniemi, O 2011, 'GTI: A novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets', PLoS ONE, vol. 6, no. 2, e17259. https://doi.org/10.1371/journal.pone.0017259, https://doi.org/10.1371/annotation/7d571883-faf0-4f66-86a2-806c36c4741c

GTI : A novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets. / Mpindi, John (Corresponding Author); Sara, Henri; Haapa-Paananen, Saija; Kilpinen, Sami; Pisto, T.; Bucher, Elmar; Ojala, K.; Iljin, Kristiina; Vainio, Paula; Björkman, M.; Gupta, Santosh; Kohonen, Pekka; Nees, Matthias; Kallioniemi, Olli.

In: PLoS ONE, Vol. 6, No. 2, e17259, 2011.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - GTI

T2 - A novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets

AU - Mpindi, John

AU - Sara, Henri

AU - Haapa-Paananen, Saija

AU - Kilpinen, Sami

AU - Pisto, T.

AU - Bucher, Elmar

AU - Ojala, K.

AU - Iljin, Kristiina

AU - Vainio, Paula

AU - Björkman, M.

AU - Gupta, Santosh

AU - Kohonen, Pekka

AU - Nees, Matthias

AU - Kallioniemi, Olli

PY - 2011

Y1 - 2011

N2 - Background: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes. Methodology: A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. Conclusions/Significance: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).

AB - Background: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes. Methodology: A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. Conclusions/Significance: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).

U2 - 10.1371/journal.pone.0017259

DO - 10.1371/journal.pone.0017259

M3 - Article

VL - 6

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 2

M1 - e17259

ER -