Keyword Extraction from Short Documents Using Three Levels of Word Evaluation

Mika Timonen, Timo Toivanen, Melissa Kasari, Yue Teng, Chao Cheng, Liang He

Research output: Chapter in Book/Report/Conference proceedingChapter or book articleProfessional

4 Citations (Scopus)

Abstract

In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.
Original languageEnglish
Title of host publicationKnowledge Discovery, Knowledge Engineering and Knowledge Management
Subtitle of host publication4th International Joint Conference, IC3K 2012
EditorsAna Fred, Jan L.G. Dietz, Kecheng Liu, Joaquim Filipe
PublisherSpringer
Pages130-146
ISBN (Electronic)978-3-642-54105-6
ISBN (Print)978-3-642-54104-9
DOIs
Publication statusPublished - 2013
MoE publication typeD2 Article in professional manuals or guides or professional information systems or text book material
Event4th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2012 - Barcelona, Spain
Duration: 4 Oct 20127 Oct 2012

Publication series

SeriesCommunications in Computer and Information Science
Volume415
ISSN1865-0929

Conference

Conference4th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2012
Abbreviated titleIC3K 2012
Country/TerritorySpain
CityBarcelona
Period4/10/127/10/12

Fingerprint

Dive into the research topics of 'Keyword Extraction from Short Documents Using Three Levels of Word Evaluation'. Together they form a unique fingerprint.

Cite this