Abstract
Original language | English |
---|---|
Pages (from-to) | 2423-2431 |
Number of pages | 9 |
Journal | Bioinformatics |
Volume | 30 |
Issue number | 17 |
DOIs | |
Publication status | Published - 1 Sep 2014 |
MoE publication type | A1 Journal article-refereed |
Fingerprint
Cite this
}
SEK : Sparsity exploiting k-mer-based estimation of bacterial community composition. / Chatterjee, Saikat (Corresponding Author); Koslicki, David; Dong, Siyuan; Innocenti, Nicolas; Cheng, Lu; Lan, Yueheng; Vehkaperä, Mikko; Skoglund, Mikael; Rasmussen, Lars K.; Aurell, Erik; Corander, Jukka.
In: Bioinformatics, Vol. 30, No. 17, 01.09.2014, p. 2423-2431.Research output: Contribution to journal › Article › Scientific › peer-review
TY - JOUR
T1 - SEK
T2 - Sparsity exploiting k-mer-based estimation of bacterial community composition
AU - Chatterjee, Saikat
AU - Koslicki, David
AU - Dong, Siyuan
AU - Innocenti, Nicolas
AU - Cheng, Lu
AU - Lan, Yueheng
AU - Vehkaperä, Mikko
AU - Skoglund, Mikael
AU - Rasmussen, Lars K.
AU - Aurell, Erik
AU - Corander, Jukka
PY - 2014/9/1
Y1 - 2014/9/1
N2 - Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.
AB - Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.
UR - http://www.scopus.com/inward/record.url?scp=84907029456&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btu320
DO - 10.1093/bioinformatics/btu320
M3 - Article
C2 - 24812337
AN - SCOPUS:84907029456
VL - 30
SP - 2423
EP - 2431
JO - Bioinformatics
JF - Bioinformatics
SN - 1367-4803
IS - 17
ER -