In this paper we propose a novel approach for keyword extraction from short documents where each document is assessed on three levels: corpus level, cluster level and document level. We focus our efforts on documents that contain less than 100 words. The main challenge we are facing comes from the main characteristic of short documents: each word occurs usually only once within the document. Therefore, the traditional approaches based on term frequency do not perform well with short documents. To tackle this challenge we propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE). We compare the performance of the proposed approach is against other keyword extraction methods, such as CollabRank, KeyGraph, Chi-squared, and TF-IDF. In the experimental evaluation IKE shows promising results by out-performing the competition.
|Title of host publication||Knowledge Discovery, Knowledge Engineering and Knowledge Management|
|Publication status||Published - 2013|
|MoE publication type||D2 Article in professional manuals or guides or professional information systems or text book material|
|Series||Communications in Computer and Information Science|
Timonen, M., Toivanen, T., Kasari, M., Teng, Y., Cheng, C., & He, L. (2013). Keyword Extraction from Short Documents Using Three Levels of Word Evaluation. In Knowledge Discovery, Knowledge Engineering and Knowledge Management (pp. 130-146). Springer. Communications in Computer and Information Science, Vol.. 415 https://doi.org/10.1007/978-3-642-54105-6_9