Abstract
Topic modeling is a type of statistical model for
discovering the latent "topics" that occur in a
collection of documents through machine learning.
Currently, latent Dirichlet allocation (LDA) is a popular
and common modeling approach. In this paper, we
investigate methods, including LDA and its extensions,
for separating a set of scientific publications into
several clusters. To evaluate the results, we generate a
collection of documents that contain academic papers from
several different fields and see whether papers in the
same field will be clustered together. We explore
potential scientometric applications of such text
analysis capabilities
Original language | English |
---|---|
Pages (from-to) | 767-786 |
Journal | Scientometrics |
Volume | 100 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2014 |
MoE publication type | A1 Journal article-refereed |
Keywords
- Topic modeling
- text analysis
- latent dirichlet allocation