Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification

Arho Suominen, Hannes Toivanen

    Research output: Contribution to journalArticleScientificpeer-review

    40 Citations (Scopus)

    Abstract

    The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-à-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.
    Original languageEnglish
    Pages (from-to)2464-2476
    JournalJournal of the Association for Information Science and Technology
    Volume67
    Issue number10
    DOIs
    Publication statusPublished - 2016
    MoE publication typeA1 Journal article-refereed

    Fingerprint

    Unsupervised learning
    science
    knowledge
    learning
    cartography
    realism
    data analysis
    Scientific knowledge
    Modeling

    Keywords

    • machine learning
    • automatic classification
    • text mining
    • science

    Cite this

    @article{9444cb3e2f1f422b96c5077954b28614,
    title = "Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification",
    abstract = "The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-{\`a}-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.",
    keywords = "machine learning, automatic classification, text mining, science",
    author = "Arho Suominen and Hannes Toivanen",
    note = "Project code: 101488",
    year = "2016",
    doi = "10.1002/asi.23596",
    language = "English",
    volume = "67",
    pages = "2464--2476",
    journal = "Journal of the Association for Information Science and Technology",
    issn = "2330-1635",
    publisher = "Wiley",
    number = "10",

    }

    Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification. / Suominen, Arho; Toivanen, Hannes.

    In: Journal of the Association for Information Science and Technology, Vol. 67, No. 10, 2016, p. 2464-2476.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification

    AU - Suominen, Arho

    AU - Toivanen, Hannes

    N1 - Project code: 101488

    PY - 2016

    Y1 - 2016

    N2 - The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-à-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.

    AB - The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-à-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.

    KW - machine learning

    KW - automatic classification

    KW - text mining

    KW - science

    U2 - 10.1002/asi.23596

    DO - 10.1002/asi.23596

    M3 - Article

    VL - 67

    SP - 2464

    EP - 2476

    JO - Journal of the Association for Information Science and Technology

    JF - Journal of the Association for Information Science and Technology

    SN - 2330-1635

    IS - 10

    ER -