Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations

Reija Autio (Corresponding Author), Sami Kilpinen, Matti Saarela, Olli Kallioniemi, Sampsa Hautaniemi, Jaakko Astola

Research output: Contribution to journalArticleScientificpeer-review

36 Citations (Scopus)

Abstract

Background

Gene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration.

Results

In this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization.

Conclusion

We conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.

Original languageEnglish
Article numberS24
Number of pages12
JournalBMC Bioinformatics
Volume10
Issue numberSuppl. 1
DOIs
Publication statusPublished - 2009
MoE publication typeA1 Journal article-refereed
EventThe 7th Asia Pacific Bioinformatics Conference, APBC 2009 - Beijing, China
Duration: 13 Jan 200916 Jan 2009

Fingerprint

Normalization
Genes
Microarrays
Experiment
Gene
Gene expression
Experiments
Microarray
Data integration
Oligonucleotides
Weibull distribution
Gene Expression
Essential Genes
Standardization
Oligonucleotide Array Sequence Analysis
Data Preprocessing
Normalize
Biomedical Research
Industry
Data Integration

Keywords

  • gene expression
  • cDNA microarrays
  • microarray
  • bioinformatics
  • affymetrix

Cite this

Autio, R., Kilpinen, S., Saarela, M., Kallioniemi, O., Hautaniemi, S., & Astola, J. (2009). Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations. BMC Bioinformatics, 10(Suppl. 1), [S24]. https://doi.org/10.1186/1471-2105-10-S1-S24
Autio, Reija ; Kilpinen, Sami ; Saarela, Matti ; Kallioniemi, Olli ; Hautaniemi, Sampsa ; Astola, Jaakko. / Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations. In: BMC Bioinformatics. 2009 ; Vol. 10, No. Suppl. 1.
@article{708080d878164c698309232f451ed471,
title = "Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations",
abstract = "Background Gene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration. Results In this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization. Conclusion We conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.",
keywords = "gene expression, cDNA microarrays, microarray, bioinformatics, affymetrix",
author = "Reija Autio and Sami Kilpinen and Matti Saarela and Olli Kallioniemi and Sampsa Hautaniemi and Jaakko Astola",
year = "2009",
doi = "10.1186/1471-2105-10-S1-S24",
language = "English",
volume = "10",
journal = "BMC Bioinformatics",
issn = "1471-2105",
number = "Suppl. 1",

}

Autio, R, Kilpinen, S, Saarela, M, Kallioniemi, O, Hautaniemi, S & Astola, J 2009, 'Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations', BMC Bioinformatics, vol. 10, no. Suppl. 1, S24. https://doi.org/10.1186/1471-2105-10-S1-S24

Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations. / Autio, Reija (Corresponding Author); Kilpinen, Sami; Saarela, Matti; Kallioniemi, Olli; Hautaniemi, Sampsa; Astola, Jaakko.

In: BMC Bioinformatics, Vol. 10, No. Suppl. 1, S24, 2009.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations

AU - Autio, Reija

AU - Kilpinen, Sami

AU - Saarela, Matti

AU - Kallioniemi, Olli

AU - Hautaniemi, Sampsa

AU - Astola, Jaakko

PY - 2009

Y1 - 2009

N2 - Background Gene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration. Results In this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization. Conclusion We conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.

AB - Background Gene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration. Results In this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization. Conclusion We conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.

KW - gene expression

KW - cDNA microarrays

KW - microarray

KW - bioinformatics

KW - affymetrix

U2 - 10.1186/1471-2105-10-S1-S24

DO - 10.1186/1471-2105-10-S1-S24

M3 - Article

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - Suppl. 1

M1 - S24

ER -