Comparison of protein coding gene content of yeast and other fungal genomes

Mikko Arvas, Teemu Kivioja, A. Mitchell, Markku Saloheimo, S. Oliver, Merja Penttilä

Research output: Chapter in Book/Report/Conference proceedingConference abstract in proceedingsScientific

Abstract

Despite the extensive research the exact function many yeast genes remains unknown. Comparisons to other fungal genomes can add power to the genomic analysis by providing the evolutionary context of genes. Our goal is to compare the protein coding gene contents of fungal genomes and relate the differences to the physiological differences between species and taxonomic groups. We have produced consistent Interpro annotations and clustering of protein coding sequences of 16 sequenced fungal genomes of which 8 are yeasts. Our computational system is based on BioPerl scripts and BioSQL schema for storing the sequences and annotations in a relational database. The clustering of protein sequences is done with Tribe-MCL graph clustering software using distances based on Blast E-values. We have discovered that the number of genes belonging to protein clusters having members from all fungal species studied is negatively correlated with genome size, i.e. larger fungal genomes are likely to have more specialized functions not present in species with smaller genomes. In contrast, based on protein clustering Pezizomycotina and Saccharomycotina seem to differ in their level of paralogy, i.e. in number of duplicated genes. In Saccharomycotina average level of paralogy is positively correlated to the size of the genome. In Pezizomycotina, possibly due to Repeat Induced Point mutations (RIP), no clear correlation exists. Using Generic Genome Browser we have created a web-based system that allows the scientists at VTT to easily utilize comparative information in their work. In particular, we link the Interpro entries and clusters so that a user can for a particular protein family browse the neighborhood of the family members detected by Interpro to assess the true extend of the family. In addition, a user can easily find the clusters and Interpro entries that have interesting species distributions. We also link the S. cerevisiae metabolic model iND750 to the comparative data.
Original languageEnglish
Title of host publicationInternational Specialised Symposium on Yeasts ISSY25
Subtitle of host publicationSystems Biology of Yeasts - from Models to Applications
Place of PublicationEspoo
PublisherVTT Technical Research Centre of Finland
Pages57
ISBN (Electronic)951-38-6308-5
ISBN (Print)951-38-6307-7
Publication statusPublished - 2006
EventInternational Specialised Symposium on Yeasts, ISSY 25 - Espoo, Finland
Duration: 18 Jun 200621 Jun 2006

Publication series

NameVTT Symposium
PublisherVTT
Number242
ISSN (Print)0357-9387
ISSN (Electronic)1455-0873

Conference

ConferenceInternational Specialised Symposium on Yeasts, ISSY 25
Abbreviated titleISSY 25
CountryFinland
CityEspoo
Period18/06/0621/06/06

Fingerprint

yeasts
genome
Saccharomycotina
genes
proteins
Pezizomycotina
tribal peoples
point mutation
mutagenesis
amino acid sequences
biogeography
genomics

Cite this

Arvas, M., Kivioja, T., Mitchell, A., Saloheimo, M., Oliver, S., & Penttilä, M. (2006). Comparison of protein coding gene content of yeast and other fungal genomes. In International Specialised Symposium on Yeasts ISSY25: Systems Biology of Yeasts - from Models to Applications (pp. 57). [P5] Espoo: VTT Technical Research Centre of Finland. VTT Symposium, No. 242
Arvas, Mikko ; Kivioja, Teemu ; Mitchell, A. ; Saloheimo, Markku ; Oliver, S. ; Penttilä, Merja. / Comparison of protein coding gene content of yeast and other fungal genomes. International Specialised Symposium on Yeasts ISSY25: Systems Biology of Yeasts - from Models to Applications. Espoo : VTT Technical Research Centre of Finland, 2006. pp. 57 (VTT Symposium; No. 242).
@inbook{2c64cfdbe9e746e2be4046b4d89d830f,
title = "Comparison of protein coding gene content of yeast and other fungal genomes",
abstract = "Despite the extensive research the exact function many yeast genes remains unknown. Comparisons to other fungal genomes can add power to the genomic analysis by providing the evolutionary context of genes. Our goal is to compare the protein coding gene contents of fungal genomes and relate the differences to the physiological differences between species and taxonomic groups. We have produced consistent Interpro annotations and clustering of protein coding sequences of 16 sequenced fungal genomes of which 8 are yeasts. Our computational system is based on BioPerl scripts and BioSQL schema for storing the sequences and annotations in a relational database. The clustering of protein sequences is done with Tribe-MCL graph clustering software using distances based on Blast E-values. We have discovered that the number of genes belonging to protein clusters having members from all fungal species studied is negatively correlated with genome size, i.e. larger fungal genomes are likely to have more specialized functions not present in species with smaller genomes. In contrast, based on protein clustering Pezizomycotina and Saccharomycotina seem to differ in their level of paralogy, i.e. in number of duplicated genes. In Saccharomycotina average level of paralogy is positively correlated to the size of the genome. In Pezizomycotina, possibly due to Repeat Induced Point mutations (RIP), no clear correlation exists. Using Generic Genome Browser we have created a web-based system that allows the scientists at VTT to easily utilize comparative information in their work. In particular, we link the Interpro entries and clusters so that a user can for a particular protein family browse the neighborhood of the family members detected by Interpro to assess the true extend of the family. In addition, a user can easily find the clusters and Interpro entries that have interesting species distributions. We also link the S. cerevisiae metabolic model iND750 to the comparative data.",
author = "Mikko Arvas and Teemu Kivioja and A. Mitchell and Markku Saloheimo and S. Oliver and Merja Penttil{\"a}",
year = "2006",
language = "English",
isbn = "951-38-6307-7",
series = "VTT Symposium",
publisher = "VTT Technical Research Centre of Finland",
number = "242",
pages = "57",
booktitle = "International Specialised Symposium on Yeasts ISSY25",
address = "Finland",

}

Arvas, M, Kivioja, T, Mitchell, A, Saloheimo, M, Oliver, S & Penttilä, M 2006, Comparison of protein coding gene content of yeast and other fungal genomes. in International Specialised Symposium on Yeasts ISSY25: Systems Biology of Yeasts - from Models to Applications., P5, VTT Technical Research Centre of Finland, Espoo, VTT Symposium, no. 242, pp. 57, International Specialised Symposium on Yeasts, ISSY 25 , Espoo, Finland, 18/06/06.

Comparison of protein coding gene content of yeast and other fungal genomes. / Arvas, Mikko; Kivioja, Teemu; Mitchell, A.; Saloheimo, Markku; Oliver, S.; Penttilä, Merja.

International Specialised Symposium on Yeasts ISSY25: Systems Biology of Yeasts - from Models to Applications. Espoo : VTT Technical Research Centre of Finland, 2006. p. 57 P5 (VTT Symposium; No. 242).

Research output: Chapter in Book/Report/Conference proceedingConference abstract in proceedingsScientific

TY - CHAP

T1 - Comparison of protein coding gene content of yeast and other fungal genomes

AU - Arvas, Mikko

AU - Kivioja, Teemu

AU - Mitchell, A.

AU - Saloheimo, Markku

AU - Oliver, S.

AU - Penttilä, Merja

PY - 2006

Y1 - 2006

N2 - Despite the extensive research the exact function many yeast genes remains unknown. Comparisons to other fungal genomes can add power to the genomic analysis by providing the evolutionary context of genes. Our goal is to compare the protein coding gene contents of fungal genomes and relate the differences to the physiological differences between species and taxonomic groups. We have produced consistent Interpro annotations and clustering of protein coding sequences of 16 sequenced fungal genomes of which 8 are yeasts. Our computational system is based on BioPerl scripts and BioSQL schema for storing the sequences and annotations in a relational database. The clustering of protein sequences is done with Tribe-MCL graph clustering software using distances based on Blast E-values. We have discovered that the number of genes belonging to protein clusters having members from all fungal species studied is negatively correlated with genome size, i.e. larger fungal genomes are likely to have more specialized functions not present in species with smaller genomes. In contrast, based on protein clustering Pezizomycotina and Saccharomycotina seem to differ in their level of paralogy, i.e. in number of duplicated genes. In Saccharomycotina average level of paralogy is positively correlated to the size of the genome. In Pezizomycotina, possibly due to Repeat Induced Point mutations (RIP), no clear correlation exists. Using Generic Genome Browser we have created a web-based system that allows the scientists at VTT to easily utilize comparative information in their work. In particular, we link the Interpro entries and clusters so that a user can for a particular protein family browse the neighborhood of the family members detected by Interpro to assess the true extend of the family. In addition, a user can easily find the clusters and Interpro entries that have interesting species distributions. We also link the S. cerevisiae metabolic model iND750 to the comparative data.

AB - Despite the extensive research the exact function many yeast genes remains unknown. Comparisons to other fungal genomes can add power to the genomic analysis by providing the evolutionary context of genes. Our goal is to compare the protein coding gene contents of fungal genomes and relate the differences to the physiological differences between species and taxonomic groups. We have produced consistent Interpro annotations and clustering of protein coding sequences of 16 sequenced fungal genomes of which 8 are yeasts. Our computational system is based on BioPerl scripts and BioSQL schema for storing the sequences and annotations in a relational database. The clustering of protein sequences is done with Tribe-MCL graph clustering software using distances based on Blast E-values. We have discovered that the number of genes belonging to protein clusters having members from all fungal species studied is negatively correlated with genome size, i.e. larger fungal genomes are likely to have more specialized functions not present in species with smaller genomes. In contrast, based on protein clustering Pezizomycotina and Saccharomycotina seem to differ in their level of paralogy, i.e. in number of duplicated genes. In Saccharomycotina average level of paralogy is positively correlated to the size of the genome. In Pezizomycotina, possibly due to Repeat Induced Point mutations (RIP), no clear correlation exists. Using Generic Genome Browser we have created a web-based system that allows the scientists at VTT to easily utilize comparative information in their work. In particular, we link the Interpro entries and clusters so that a user can for a particular protein family browse the neighborhood of the family members detected by Interpro to assess the true extend of the family. In addition, a user can easily find the clusters and Interpro entries that have interesting species distributions. We also link the S. cerevisiae metabolic model iND750 to the comparative data.

M3 - Conference abstract in proceedings

SN - 951-38-6307-7

T3 - VTT Symposium

SP - 57

BT - International Specialised Symposium on Yeasts ISSY25

PB - VTT Technical Research Centre of Finland

CY - Espoo

ER -

Arvas M, Kivioja T, Mitchell A, Saloheimo M, Oliver S, Penttilä M. Comparison of protein coding gene content of yeast and other fungal genomes. In International Specialised Symposium on Yeasts ISSY25: Systems Biology of Yeasts - from Models to Applications. Espoo: VTT Technical Research Centre of Finland. 2006. p. 57. P5. (VTT Symposium; No. 242).