A unified multitask architecture for predicting local protein properties

Yanjun Qi, Merja Oja, Jason Weston, William Stafford Noble (Corresponding Author)

Research output: Contribution to journalArticleScientificpeer-review

30 Citations (Scopus)

Abstract

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.
Original languageEnglish
Article numbere32235
Number of pages11
JournalPLoS ONE
Volume7
Issue number3
DOIs
Publication statusPublished - 2012
MoE publication typeA1 Journal article-refereed

Fingerprint

amino acid sequences
topology
neural networks
Labeling
learning
Topology
Natural Language Processing
synthetic proteins
Amino Acids
Proteins
proteins
Supervised learning
Protein Sorting Signals
signal peptide
Computational Biology
Network architecture
bioinformatics
Amino Acid Sequence
engineering
Joints

Cite this

Qi, Yanjun ; Oja, Merja ; Weston, Jason ; Noble, William Stafford. / A unified multitask architecture for predicting local protein properties. In: PLoS ONE. 2012 ; Vol. 7, No. 3.
@article{3935f6a71fed4f27bc54643b732e95d3,
title = "A unified multitask architecture for predicting local protein properties",
abstract = "A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.",
author = "Yanjun Qi and Merja Oja and Jason Weston and Noble, {William Stafford}",
year = "2012",
doi = "10.1371/journal.pone.0032235",
language = "English",
volume = "7",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "3",

}

A unified multitask architecture for predicting local protein properties. / Qi, Yanjun; Oja, Merja; Weston, Jason; Noble, William Stafford (Corresponding Author).

In: PLoS ONE, Vol. 7, No. 3, e32235, 2012.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - A unified multitask architecture for predicting local protein properties

AU - Qi, Yanjun

AU - Oja, Merja

AU - Weston, Jason

AU - Noble, William Stafford

PY - 2012

Y1 - 2012

N2 - A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

AB - A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

U2 - 10.1371/journal.pone.0032235

DO - 10.1371/journal.pone.0032235

M3 - Article

VL - 7

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 3

M1 - e32235

ER -