Abstract
Social media data has provided various insights into the
behaviour of consumers and businesses. However, extracted
data may be erroneous, or could have originated from a
malicious source. Thus, quality of social media should be
managed. Also, it should be understood how data quality
can be managed across a big data pipeline, which may
consist of several processing and analysis phases. The
contribution of this paper is evaluation of data quality
management architecture for social media data. The
theoretical concepts based on previous work have been
implemented for data quality evaluation of Twitter-based
data sets. Particularly, reference architecture for
quality management in social media data has been extended
and evaluated based on the implementation architecture.
Experiments indicate that 150-800 tweets/s can be
evaluated with two cloud nodes depending on the
configuration.
Original language | English |
---|---|
Article number | 6 |
Number of pages | 26 |
Journal | Journal of Big Data |
Volume | 4 |
Issue number | 6 |
DOIs | |
Publication status | Published - 1 Dec 2017 |
MoE publication type | A1 Journal article-refereed |
Keywords
- quality attribute
- quality metric
- quality policy
- spark
- Cassandra
- Word2Vec