Keyword Extraction from Short Documents Using Three Levels of Word Evaluation

Mika Timonen, Timo Toivanen, M. Kasari, Y. Teng, C. Cheng, L. He

Research output: Chapter in Book/Report/Conference proceedingChapter or book articleProfessional

1 Citation (Scopus)

Abstract

In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.
Original languageEnglish
Title of host publicationKnowledge Discovery, Knowledge Engineering and Knowledge Management
PublisherSpringer
Pages130 - 146
ISBN (Electronic)978-3-642-54105-6
ISBN (Print)978-3-642-54104-9
DOIs
Publication statusPublished - 2013
MoE publication typeD2 Article in professional manuals or guides or professional information systems or text book material

Publication series

SeriesKnowledge Discovery, Knowledge Engineering and Knowledge Management
Volume415

Cite this

Timonen, M., Toivanen, T., Kasari, M., Teng, Y., Cheng, C., & He, L. (2013). Keyword Extraction from Short Documents Using Three Levels of Word Evaluation. In Knowledge Discovery, Knowledge Engineering and Knowledge Management (pp. 130 - 146). Springer. Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol.. 415 https://doi.org/10.1007/978-3-642-54105-6_9
Timonen, Mika ; Toivanen, Timo ; Kasari, M. ; Teng, Y. ; Cheng, C. ; He, L. / Keyword Extraction from Short Documents Using Three Levels of Word Evaluation. Knowledge Discovery, Knowledge Engineering and Knowledge Management. Springer, 2013. pp. 130 - 146 (Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol. 415).
@inbook{7e7452e6cfd94476a99550eb9e1c0b34,
title = "Keyword Extraction from Short Documents Using Three Levels of Word Evaluation",
abstract = "In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.",
author = "Mika Timonen and Timo Toivanen and M. Kasari and Y. Teng and C. Cheng and L. He",
year = "2013",
doi = "10.1007/978-3-642-54105-6_9",
language = "English",
isbn = "978-3-642-54104-9",
series = "Knowledge Discovery, Knowledge Engineering and Knowledge Management",
publisher = "Springer",
pages = "130 -- 146",
booktitle = "Knowledge Discovery, Knowledge Engineering and Knowledge Management",
address = "Germany",

}

Timonen, M, Toivanen, T, Kasari, M, Teng, Y, Cheng, C & He, L 2013, Keyword Extraction from Short Documents Using Three Levels of Word Evaluation. in Knowledge Discovery, Knowledge Engineering and Knowledge Management. Springer, Knowledge Discovery, Knowledge Engineering and Knowledge Management, vol. 415, pp. 130 - 146. https://doi.org/10.1007/978-3-642-54105-6_9

Keyword Extraction from Short Documents Using Three Levels of Word Evaluation. / Timonen, Mika; Toivanen, Timo; Kasari, M.; Teng, Y.; Cheng, C.; He, L.

Knowledge Discovery, Knowledge Engineering and Knowledge Management. Springer, 2013. p. 130 - 146 (Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol. 415).

Research output: Chapter in Book/Report/Conference proceedingChapter or book articleProfessional

TY - CHAP

T1 - Keyword Extraction from Short Documents Using Three Levels of Word Evaluation

AU - Timonen, Mika

AU - Toivanen, Timo

AU - Kasari, M.

AU - Teng, Y.

AU - Cheng, C.

AU - He, L.

PY - 2013

Y1 - 2013

N2 - In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.

AB - In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.

U2 - 10.1007/978-3-642-54105-6_9

DO - 10.1007/978-3-642-54105-6_9

M3 - Chapter or book article

SN - 978-3-642-54104-9

T3 - Knowledge Discovery, Knowledge Engineering and Knowledge Management

SP - 130

EP - 146

BT - Knowledge Discovery, Knowledge Engineering and Knowledge Management

PB - Springer

ER -

Timonen M, Toivanen T, Kasari M, Teng Y, Cheng C, He L. Keyword Extraction from Short Documents Using Three Levels of Word Evaluation. In Knowledge Discovery, Knowledge Engineering and Knowledge Management. Springer. 2013. p. 130 - 146. (Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol. 415). https://doi.org/10.1007/978-3-642-54105-6_9