Keyword Extraction from Short Documents Using Three Levels of Word Evaluation

Mika Timonen, Timo Toivanen, M. Kasari, Y. Teng, C. Cheng, L. He

Research output: Chapter in Book/Report/Conference proceedingChapter or book articleProfessional

3 Citations (Scopus)


In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.
Original languageEnglish
Title of host publicationKnowledge Discovery, Knowledge Engineering and Knowledge Management
ISBN (Electronic)978-3-642-54105-6
ISBN (Print)978-3-642-54104-9
Publication statusPublished - 2013
MoE publication typeD2 Article in professional manuals or guides or professional information systems or text book material

Publication series

SeriesCommunications in Computer and Information Science

Cite this