Quality management architecture for social media data

Pekka Pääkkönen (Corresponding Author), Juha Jokitulppo

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)

Abstract

Social media data has provided various insights into the behaviour of consumers and businesses. However, extracted data may be erroneous, or could have originated from a malicious source. Thus, quality of social media should be managed. Also, it should be understood how data quality can be managed across a big data pipeline, which may consist of several processing and analysis phases. The contribution of this paper is evaluation of data quality management architecture for social media data. The theoretical concepts based on previous work have been implemented for data quality evaluation of Twitter-based data sets. Particularly, reference architecture for quality management in social media data has been extended and evaluated based on the implementation architecture. Experiments indicate that 150-800 tweets/s can be evaluated with two cloud nodes depending on the configuration.
Original languageEnglish
Article number6
Number of pages26
JournalJournal of Big Data
Volume4
Issue number6
DOIs
Publication statusPublished - 1 Dec 2017
MoE publication typeA1 Journal article-refereed

Fingerprint

Quality management
Information management
Pipelines
Processing
Industry
Experiments
Social media
Data quality
Big data

Keywords

  • quality attribute
  • quality metric
  • quality policy
  • spark
  • Cassandra
  • Word2Vec

Cite this

@article{f49b10c6c99040769758e019f8f150cb,
title = "Quality management architecture for social media data",
abstract = "Social media data has provided various insights into the behaviour of consumers and businesses. However, extracted data may be erroneous, or could have originated from a malicious source. Thus, quality of social media should be managed. Also, it should be understood how data quality can be managed across a big data pipeline, which may consist of several processing and analysis phases. The contribution of this paper is evaluation of data quality management architecture for social media data. The theoretical concepts based on previous work have been implemented for data quality evaluation of Twitter-based data sets. Particularly, reference architecture for quality management in social media data has been extended and evaluated based on the implementation architecture. Experiments indicate that 150-800 tweets/s can be evaluated with two cloud nodes depending on the configuration.",
keywords = "quality attribute, quality metric, quality policy, spark, Cassandra, Word2Vec",
author = "Pekka P{\"a}{\"a}kk{\"o}nen and Juha Jokitulppo",
year = "2017",
month = "12",
day = "1",
doi = "10.1186/s40537-017-0066-7",
language = "English",
volume = "4",
journal = "Journal of Big Data",
issn = "2196-1115",
publisher = "Springer",
number = "6",

}

Quality management architecture for social media data. / Pääkkönen, Pekka (Corresponding Author); Jokitulppo, Juha.

In: Journal of Big Data, Vol. 4, No. 6, 6, 01.12.2017.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Quality management architecture for social media data

AU - Pääkkönen, Pekka

AU - Jokitulppo, Juha

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Social media data has provided various insights into the behaviour of consumers and businesses. However, extracted data may be erroneous, or could have originated from a malicious source. Thus, quality of social media should be managed. Also, it should be understood how data quality can be managed across a big data pipeline, which may consist of several processing and analysis phases. The contribution of this paper is evaluation of data quality management architecture for social media data. The theoretical concepts based on previous work have been implemented for data quality evaluation of Twitter-based data sets. Particularly, reference architecture for quality management in social media data has been extended and evaluated based on the implementation architecture. Experiments indicate that 150-800 tweets/s can be evaluated with two cloud nodes depending on the configuration.

AB - Social media data has provided various insights into the behaviour of consumers and businesses. However, extracted data may be erroneous, or could have originated from a malicious source. Thus, quality of social media should be managed. Also, it should be understood how data quality can be managed across a big data pipeline, which may consist of several processing and analysis phases. The contribution of this paper is evaluation of data quality management architecture for social media data. The theoretical concepts based on previous work have been implemented for data quality evaluation of Twitter-based data sets. Particularly, reference architecture for quality management in social media data has been extended and evaluated based on the implementation architecture. Experiments indicate that 150-800 tweets/s can be evaluated with two cloud nodes depending on the configuration.

KW - quality attribute

KW - quality metric

KW - quality policy

KW - spark

KW - Cassandra

KW - Word2Vec

UR - http://www.scopus.com/inward/record.url?scp=85016150777&partnerID=8YFLogxK

U2 - 10.1186/s40537-017-0066-7

DO - 10.1186/s40537-017-0066-7

M3 - Article

VL - 4

JO - Journal of Big Data

JF - Journal of Big Data

SN - 2196-1115

IS - 6

M1 - 6

ER -