Abstract
Topic modeling is a type of statistical model for
discovering the latent "topics" that occur in a
collection of documents through machine learning.
Currently, latent Dirichlet allocation (LDA) is a popular
and common modeling approach. In this paper, we
investigate methods, including LDA and its extensions,
for separating a set of scientific publications into
several clusters. To evaluate the results, we generate a
collection of documents that contain academic papers from
several different fields and see whether papers in the
same field will be clustered together. We explore
potential scientometric applications of such text
analysis capabilities
| Original language | English |
|---|---|
| Pages (from-to) | 767-786 |
| Journal | Scientometrics |
| Volume | 100 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 2014 |
| MoE publication type | A1 Journal article-refereed |
Keywords
- Topic modeling
- text analysis
- latent dirichlet allocation
Fingerprint
Dive into the research topics of 'Clustering scientific documents with topic modeling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver