On Enabling Techniques for Personal Audio Content Management

Tommi Lahti, Marko Helén, Olli Vuorinen, Eero Väyrynen, Juha Partala, Johannes Peltola, Satu-Marja Mäkelä

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)

Abstract

State-of-the-art automatic analysis tools for personal audio con-tent management are discussed in this paper. Our main target is to create a system, which has several co-operating management tools for audio database and which improve the results of each other. Bayesian networks based audio classification algorithm provides classification into four main audio classes (silence, speech, music, and noise) and serves as a first step for other subsequent analysis tools. For speech analysis we propose an improved Bayesian information criterion based speaker segmen-tation and clustering algorithm applying also a combined gender and emotion detection algorithm utilizing prosodic features. For the other main classes it is often hard to device any general and well functional pre-categorization that would fit the unforesee-able types of user recorded data. For compensating the absence of analysis tools for these classes we propose the use of efficient audio similarity measure and query-by-example algorithm with database clustering capabilities. The experimental results show that the combined use of the algorithms is feasible in practice.
Original languageEnglish
Title of host publicationMIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval
PublisherAssociation for Computing Machinery ACM
Pages113-120
ISBN (Print)978-1-60558-312-9
DOIs
Publication statusPublished - 2008
MoE publication typeA4 Article in a conference publication
Event1st ACM International Conference on Multimedia Information Retrieval, MIR 2008 - Vancouver, Canada
Duration: 30 Oct 201031 Oct 2010

Conference

Conference1st ACM International Conference on Multimedia Information Retrieval, MIR 2008
Abbreviated titleMIR 2008
CountryCanada
CityVancouver
Period30/10/1031/10/10

Fingerprint

Speech analysis
Bayesian networks
Clustering algorithms

Keywords

  • Personal audio content management
  • audio classification
  • speaker segmentation
  • emotion detection
  • query-by-example

Cite this

Lahti, T., Helén, M., Vuorinen, O., Väyrynen, E., Partala, J., Peltola, J., & Mäkelä, S-M. (2008). On Enabling Techniques for Personal Audio Content Management. In MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval (pp. 113-120). Association for Computing Machinery ACM. https://doi.org/10.1145/1460096.1460116
Lahti, Tommi ; Helén, Marko ; Vuorinen, Olli ; Väyrynen, Eero ; Partala, Juha ; Peltola, Johannes ; Mäkelä, Satu-Marja. / On Enabling Techniques for Personal Audio Content Management. MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval. Association for Computing Machinery ACM, 2008. pp. 113-120
@inproceedings{1d61f1936c6f4500995217cb13ec0286,
title = "On Enabling Techniques for Personal Audio Content Management",
abstract = "State-of-the-art automatic analysis tools for personal audio con-tent management are discussed in this paper. Our main target is to create a system, which has several co-operating management tools for audio database and which improve the results of each other. Bayesian networks based audio classification algorithm provides classification into four main audio classes (silence, speech, music, and noise) and serves as a first step for other subsequent analysis tools. For speech analysis we propose an improved Bayesian information criterion based speaker segmen-tation and clustering algorithm applying also a combined gender and emotion detection algorithm utilizing prosodic features. For the other main classes it is often hard to device any general and well functional pre-categorization that would fit the unforesee-able types of user recorded data. For compensating the absence of analysis tools for these classes we propose the use of efficient audio similarity measure and query-by-example algorithm with database clustering capabilities. The experimental results show that the combined use of the algorithms is feasible in practice.",
keywords = "Personal audio content management, audio classification, speaker segmentation, emotion detection, query-by-example",
author = "Tommi Lahti and Marko Hel{\'e}n and Olli Vuorinen and Eero V{\"a}yrynen and Juha Partala and Johannes Peltola and Satu-Marja M{\"a}kel{\"a}",
year = "2008",
doi = "10.1145/1460096.1460116",
language = "English",
isbn = "978-1-60558-312-9",
pages = "113--120",
booktitle = "MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval",
publisher = "Association for Computing Machinery ACM",
address = "United States",

}

Lahti, T, Helén, M, Vuorinen, O, Väyrynen, E, Partala, J, Peltola, J & Mäkelä, S-M 2008, On Enabling Techniques for Personal Audio Content Management. in MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval. Association for Computing Machinery ACM, pp. 113-120, 1st ACM International Conference on Multimedia Information Retrieval, MIR 2008, Vancouver, Canada, 30/10/10. https://doi.org/10.1145/1460096.1460116

On Enabling Techniques for Personal Audio Content Management. / Lahti, Tommi; Helén, Marko; Vuorinen, Olli; Väyrynen, Eero; Partala, Juha; Peltola, Johannes; Mäkelä, Satu-Marja.

MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval. Association for Computing Machinery ACM, 2008. p. 113-120.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

TY - GEN

T1 - On Enabling Techniques for Personal Audio Content Management

AU - Lahti, Tommi

AU - Helén, Marko

AU - Vuorinen, Olli

AU - Väyrynen, Eero

AU - Partala, Juha

AU - Peltola, Johannes

AU - Mäkelä, Satu-Marja

PY - 2008

Y1 - 2008

N2 - State-of-the-art automatic analysis tools for personal audio con-tent management are discussed in this paper. Our main target is to create a system, which has several co-operating management tools for audio database and which improve the results of each other. Bayesian networks based audio classification algorithm provides classification into four main audio classes (silence, speech, music, and noise) and serves as a first step for other subsequent analysis tools. For speech analysis we propose an improved Bayesian information criterion based speaker segmen-tation and clustering algorithm applying also a combined gender and emotion detection algorithm utilizing prosodic features. For the other main classes it is often hard to device any general and well functional pre-categorization that would fit the unforesee-able types of user recorded data. For compensating the absence of analysis tools for these classes we propose the use of efficient audio similarity measure and query-by-example algorithm with database clustering capabilities. The experimental results show that the combined use of the algorithms is feasible in practice.

AB - State-of-the-art automatic analysis tools for personal audio con-tent management are discussed in this paper. Our main target is to create a system, which has several co-operating management tools for audio database and which improve the results of each other. Bayesian networks based audio classification algorithm provides classification into four main audio classes (silence, speech, music, and noise) and serves as a first step for other subsequent analysis tools. For speech analysis we propose an improved Bayesian information criterion based speaker segmen-tation and clustering algorithm applying also a combined gender and emotion detection algorithm utilizing prosodic features. For the other main classes it is often hard to device any general and well functional pre-categorization that would fit the unforesee-able types of user recorded data. For compensating the absence of analysis tools for these classes we propose the use of efficient audio similarity measure and query-by-example algorithm with database clustering capabilities. The experimental results show that the combined use of the algorithms is feasible in practice.

KW - Personal audio content management

KW - audio classification

KW - speaker segmentation

KW - emotion detection

KW - query-by-example

U2 - 10.1145/1460096.1460116

DO - 10.1145/1460096.1460116

M3 - Conference article in proceedings

SN - 978-1-60558-312-9

SP - 113

EP - 120

BT - MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval

PB - Association for Computing Machinery ACM

ER -

Lahti T, Helén M, Vuorinen O, Väyrynen E, Partala J, Peltola J et al. On Enabling Techniques for Personal Audio Content Management. In MIR '08 Proceeding of the 1st ACM international conference on multimedia information retrieval. Association for Computing Machinery ACM. 2008. p. 113-120 https://doi.org/10.1145/1460096.1460116