SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition

Saikat Chatterjee (Corresponding Author), David Koslicki, Siyuan Dong, Nicolas Innocenti, Lu Cheng, Yueheng Lan, Mikko Vehkaperä, Mikael Skoglund, Lars K. Rasmussen, Erik Aurell, Jukka Corander

Research output: Contribution to journalArticleScientificpeer-review

6 Citations (Scopus)

Abstract

Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.
Original languageEnglish
Pages (from-to)2423-2431
Number of pages9
JournalBioinformatics
Volume30
Issue number17
DOIs
Publication statusPublished - 1 Sep 2014
MoE publication typeA1 Journal article-refereed

Fingerprint

Sparsity
Chemical analysis
Task Assignment
Metagenomics
Statistical Data Interpretation
Kernel Density Estimation
Spatial Analysis
Compressed Sensing
Statistical Models
Greedy Algorithm
Convex Optimization
Compressed sensing
Statistical Model
High Throughput
Convex optimization
Statistical Analysis
Signal Processing
Noise
Ports and harbors
Assignment

Cite this

Chatterjee, S., Koslicki, D., Dong, S., Innocenti, N., Cheng, L., Lan, Y., ... Corander, J. (2014). SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition. Bioinformatics, 30(17), 2423-2431. https://doi.org/10.1093/bioinformatics/btu320
Chatterjee, Saikat ; Koslicki, David ; Dong, Siyuan ; Innocenti, Nicolas ; Cheng, Lu ; Lan, Yueheng ; Vehkaperä, Mikko ; Skoglund, Mikael ; Rasmussen, Lars K. ; Aurell, Erik ; Corander, Jukka. / SEK : Sparsity exploiting k-mer-based estimation of bacterial community composition. In: Bioinformatics. 2014 ; Vol. 30, No. 17. pp. 2423-2431.
@article{197b6ddd801f44b382c2c5b0d2473527,
title = "SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition",
abstract = "Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.",
author = "Saikat Chatterjee and David Koslicki and Siyuan Dong and Nicolas Innocenti and Lu Cheng and Yueheng Lan and Mikko Vehkaper{\"a} and Mikael Skoglund and Rasmussen, {Lars K.} and Erik Aurell and Jukka Corander",
year = "2014",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/btu320",
language = "English",
volume = "30",
pages = "2423--2431",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

Chatterjee, S, Koslicki, D, Dong, S, Innocenti, N, Cheng, L, Lan, Y, Vehkaperä, M, Skoglund, M, Rasmussen, LK, Aurell, E & Corander, J 2014, 'SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition', Bioinformatics, vol. 30, no. 17, pp. 2423-2431. https://doi.org/10.1093/bioinformatics/btu320

SEK : Sparsity exploiting k-mer-based estimation of bacterial community composition. / Chatterjee, Saikat (Corresponding Author); Koslicki, David; Dong, Siyuan; Innocenti, Nicolas; Cheng, Lu; Lan, Yueheng; Vehkaperä, Mikko; Skoglund, Mikael; Rasmussen, Lars K.; Aurell, Erik; Corander, Jukka.

In: Bioinformatics, Vol. 30, No. 17, 01.09.2014, p. 2423-2431.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - SEK

T2 - Sparsity exploiting k-mer-based estimation of bacterial community composition

AU - Chatterjee, Saikat

AU - Koslicki, David

AU - Dong, Siyuan

AU - Innocenti, Nicolas

AU - Cheng, Lu

AU - Lan, Yueheng

AU - Vehkaperä, Mikko

AU - Skoglund, Mikael

AU - Rasmussen, Lars K.

AU - Aurell, Erik

AU - Corander, Jukka

PY - 2014/9/1

Y1 - 2014/9/1

N2 - Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.

AB - Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.

UR - http://www.scopus.com/inward/record.url?scp=84907029456&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu320

DO - 10.1093/bioinformatics/btu320

M3 - Article

C2 - 24812337

AN - SCOPUS:84907029456

VL - 30

SP - 2423

EP - 2431

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -

Chatterjee S, Koslicki D, Dong S, Innocenti N, Cheng L, Lan Y et al. SEK: Sparsity exploiting k-mer-based estimation of bacterial community composition. Bioinformatics. 2014 Sep 1;30(17):2423-2431. https://doi.org/10.1093/bioinformatics/btu320