TRECVID 2003 experiments at MediaTeam Oulu and VTT

Mika Rautiainen, Jani Penttilä, Paavo Pietarila, Kai Noponen, Matti Hosio, Timo Koskela, Satu-Marja Mäkelä, Johannes Peltola, Jialin Liu, Timo Ojala, Tapio Seppänen

Research output: Contribution to conferenceConference articleScientificpeer-review

Abstract

MediaTeam Oulu and VTT Technical Research Centre of Finland participated jointly in semantic feature extraction, manual search and interactive search tasks of TRECVID 2003. We participated to the semantic feature extraction by submitting results to 15 out of the 17 defined semantic categories. Our approach utilized spatio-temporal visual features based on correlations of quantized gradient edges and color values together with several physical features from the audio signal. Most recent version of our Video Browsing and Retrieval System (VIRE) contains an interactive cluster-temporal browser of video shots exploiting three semantic levels of similarity: visual, conceptual and lexical. The informativeness of the browser was enhanced by incorporating automatic speech transcription texts into the visual views based on shot key frames. The experimental results for interactive search task were obtained by conducting a user experiment of eight people with two system configurations: browsing by (I) visual features only (visual and conceptual browsing was allowed, no browsing with ASR text) or (II) visual features and ASR text (all semantic browsing levels were available and ASR-text content was visible). The interactive results using ASR-based features were better than the results using only visual features. This indicates the importance of successful integration of both visual and textual features for video browsing. In contrast to previous version of VIRE which performed early feature fusion by training unsupervised self-organizing maps, newest version capitalises on late fusion of features queries, which was evaluated in manual search task. This paper gives an overview of the developed system and summarises the results.
Original languageEnglish
Publication statusPublished - 2003
MoE publication typeNot Eligible
EventTRECVID Workshop at Text Retrieval Conference, TREC 2003 - Gaithersburg, United States
Duration: 1 Jan 20031 Jan 2003

Conference

ConferenceTRECVID Workshop at Text Retrieval Conference, TREC 2003
CountryUnited States
CityGaithersburg
Period1/01/031/01/03

Fingerprint

Semantics
Experiments
Feature extraction
Fusion reactions
Self organizing maps
Transcription
Color

Cite this

Rautiainen, M., Penttilä, J., Pietarila, P., Noponen, K., Hosio, M., Koskela, T., ... Seppänen, T. (2003). TRECVID 2003 experiments at MediaTeam Oulu and VTT. Paper presented at TRECVID Workshop at Text Retrieval Conference, TREC 2003, Gaithersburg, United States.
Rautiainen, Mika ; Penttilä, Jani ; Pietarila, Paavo ; Noponen, Kai ; Hosio, Matti ; Koskela, Timo ; Mäkelä, Satu-Marja ; Peltola, Johannes ; Liu, Jialin ; Ojala, Timo ; Seppänen, Tapio. / TRECVID 2003 experiments at MediaTeam Oulu and VTT. Paper presented at TRECVID Workshop at Text Retrieval Conference, TREC 2003, Gaithersburg, United States.
@conference{5c00073a0ecc481fbdff9abf6a4518c0,
title = "TRECVID 2003 experiments at MediaTeam Oulu and VTT",
abstract = "MediaTeam Oulu and VTT Technical Research Centre of Finland participated jointly in semantic feature extraction, manual search and interactive search tasks of TRECVID 2003. We participated to the semantic feature extraction by submitting results to 15 out of the 17 defined semantic categories. Our approach utilized spatio-temporal visual features based on correlations of quantized gradient edges and color values together with several physical features from the audio signal. Most recent version of our Video Browsing and Retrieval System (VIRE) contains an interactive cluster-temporal browser of video shots exploiting three semantic levels of similarity: visual, conceptual and lexical. The informativeness of the browser was enhanced by incorporating automatic speech transcription texts into the visual views based on shot key frames. The experimental results for interactive search task were obtained by conducting a user experiment of eight people with two system configurations: browsing by (I) visual features only (visual and conceptual browsing was allowed, no browsing with ASR text) or (II) visual features and ASR text (all semantic browsing levels were available and ASR-text content was visible). The interactive results using ASR-based features were better than the results using only visual features. This indicates the importance of successful integration of both visual and textual features for video browsing. In contrast to previous version of VIRE which performed early feature fusion by training unsupervised self-organizing maps, newest version capitalises on late fusion of features queries, which was evaluated in manual search task. This paper gives an overview of the developed system and summarises the results.",
author = "Mika Rautiainen and Jani Penttil{\"a} and Paavo Pietarila and Kai Noponen and Matti Hosio and Timo Koskela and Satu-Marja M{\"a}kel{\"a} and Johannes Peltola and Jialin Liu and Timo Ojala and Tapio Sepp{\"a}nen",
year = "2003",
language = "English",
note = "TRECVID Workshop at Text Retrieval Conference, TREC 2003 ; Conference date: 01-01-2003 Through 01-01-2003",

}

Rautiainen, M, Penttilä, J, Pietarila, P, Noponen, K, Hosio, M, Koskela, T, Mäkelä, S-M, Peltola, J, Liu, J, Ojala, T & Seppänen, T 2003, 'TRECVID 2003 experiments at MediaTeam Oulu and VTT' Paper presented at TRECVID Workshop at Text Retrieval Conference, TREC 2003, Gaithersburg, United States, 1/01/03 - 1/01/03, .

TRECVID 2003 experiments at MediaTeam Oulu and VTT. / Rautiainen, Mika; Penttilä, Jani; Pietarila, Paavo; Noponen, Kai; Hosio, Matti; Koskela, Timo; Mäkelä, Satu-Marja; Peltola, Johannes; Liu, Jialin; Ojala, Timo; Seppänen, Tapio.

2003. Paper presented at TRECVID Workshop at Text Retrieval Conference, TREC 2003, Gaithersburg, United States.

Research output: Contribution to conferenceConference articleScientificpeer-review

TY - CONF

T1 - TRECVID 2003 experiments at MediaTeam Oulu and VTT

AU - Rautiainen, Mika

AU - Penttilä, Jani

AU - Pietarila, Paavo

AU - Noponen, Kai

AU - Hosio, Matti

AU - Koskela, Timo

AU - Mäkelä, Satu-Marja

AU - Peltola, Johannes

AU - Liu, Jialin

AU - Ojala, Timo

AU - Seppänen, Tapio

PY - 2003

Y1 - 2003

N2 - MediaTeam Oulu and VTT Technical Research Centre of Finland participated jointly in semantic feature extraction, manual search and interactive search tasks of TRECVID 2003. We participated to the semantic feature extraction by submitting results to 15 out of the 17 defined semantic categories. Our approach utilized spatio-temporal visual features based on correlations of quantized gradient edges and color values together with several physical features from the audio signal. Most recent version of our Video Browsing and Retrieval System (VIRE) contains an interactive cluster-temporal browser of video shots exploiting three semantic levels of similarity: visual, conceptual and lexical. The informativeness of the browser was enhanced by incorporating automatic speech transcription texts into the visual views based on shot key frames. The experimental results for interactive search task were obtained by conducting a user experiment of eight people with two system configurations: browsing by (I) visual features only (visual and conceptual browsing was allowed, no browsing with ASR text) or (II) visual features and ASR text (all semantic browsing levels were available and ASR-text content was visible). The interactive results using ASR-based features were better than the results using only visual features. This indicates the importance of successful integration of both visual and textual features for video browsing. In contrast to previous version of VIRE which performed early feature fusion by training unsupervised self-organizing maps, newest version capitalises on late fusion of features queries, which was evaluated in manual search task. This paper gives an overview of the developed system and summarises the results.

AB - MediaTeam Oulu and VTT Technical Research Centre of Finland participated jointly in semantic feature extraction, manual search and interactive search tasks of TRECVID 2003. We participated to the semantic feature extraction by submitting results to 15 out of the 17 defined semantic categories. Our approach utilized spatio-temporal visual features based on correlations of quantized gradient edges and color values together with several physical features from the audio signal. Most recent version of our Video Browsing and Retrieval System (VIRE) contains an interactive cluster-temporal browser of video shots exploiting three semantic levels of similarity: visual, conceptual and lexical. The informativeness of the browser was enhanced by incorporating automatic speech transcription texts into the visual views based on shot key frames. The experimental results for interactive search task were obtained by conducting a user experiment of eight people with two system configurations: browsing by (I) visual features only (visual and conceptual browsing was allowed, no browsing with ASR text) or (II) visual features and ASR text (all semantic browsing levels were available and ASR-text content was visible). The interactive results using ASR-based features were better than the results using only visual features. This indicates the importance of successful integration of both visual and textual features for video browsing. In contrast to previous version of VIRE which performed early feature fusion by training unsupervised self-organizing maps, newest version capitalises on late fusion of features queries, which was evaluated in manual search task. This paper gives an overview of the developed system and summarises the results.

M3 - Conference article

ER -

Rautiainen M, Penttilä J, Pietarila P, Noponen K, Hosio M, Koskela T et al. TRECVID 2003 experiments at MediaTeam Oulu and VTT. 2003. Paper presented at TRECVID Workshop at Text Retrieval Conference, TREC 2003, Gaithersburg, United States.