Analysis of large sparse graphs using regular decomposition of graph distance matrices

Hannu Reittu, Lasse Leskelä, Marco Fiorucci, Tomi Räty

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)
20 Downloads (Pure)

Abstract

Statistical analysis of large and sparse graphs is a challenging problem in data science due to the high dimensionality and nonlinearity of the problem. This paper presents a fast and scalable algorithm for partitioning such graphs into disjoint groups based on observed graph distances from a set of reference nodes. The resulting partition provides a low-dimensional approximation of the full distance matrix which helps to reveal global structural properties of the graph using only small samples of the distance matrix. The presented algorithm is inspired by the information-theoretic minimum description principle. We investigate the performance of this algorithm for selected real data sets and for synthetic graph data sets generated using stochastic block models and power-law random graphs, together with analytical considerations for sparse stochastic block models with bounded average degrees.
Original languageEnglish
Title of host publicationProceedings 2018 IEEE International Conference on Big Data, Big Data 2018
PublisherIEEE Institute of Electrical and Electronic Engineers
Pages3784-3792
Number of pages9
ISBN (Electronic)978-1-5386-5035-6
ISBN (Print)978-1-5386-5036-3, 978-1-5386-5034-9
DOIs
Publication statusPublished - 22 Jan 2019
MoE publication typeA4 Article in a conference publication
EventAdvances in High Dimensional Big Data: Workshop in conjunction with the 2018 IEEE International Conference on Big Data (IEEE Big Data 2018) - Seattle, United States
Duration: 10 Dec 201813 Dec 2018

Publication series

SeriesProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Workshop

WorkshopAdvances in High Dimensional Big Data
CountryUnited States
CitySeattle
Period10/12/1813/12/18

Fingerprint

Decomposition
Structural properties
Statistical methods

Keywords

  • graph theory
  • statistical analysis
  • Big Data
  • peer-to-peer computing
  • partitioning
  • approximation algorithms
  • stochastic processes

Cite this

Reittu, H., Leskelä, L., Fiorucci, M., & Räty, T. (2019). Analysis of large sparse graphs using regular decomposition of graph distance matrices. In Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 3784-3792). [8622118] IEEE Institute of Electrical and Electronic Engineers . Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 https://doi.org/10.1109/BigData.2018.8622118
Reittu, Hannu ; Leskelä, Lasse ; Fiorucci, Marco ; Räty, Tomi. / Analysis of large sparse graphs using regular decomposition of graph distance matrices. Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018. IEEE Institute of Electrical and Electronic Engineers , 2019. pp. 3784-3792 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).
@inproceedings{cf0df62db8a04fc4a7447dea6468cbfd,
title = "Analysis of large sparse graphs using regular decomposition of graph distance matrices",
abstract = "Statistical analysis of large and sparse graphs is a challenging problem in data science due to the high dimensionality and nonlinearity of the problem. This paper presents a fast and scalable algorithm for partitioning such graphs into disjoint groups based on observed graph distances from a set of reference nodes. The resulting partition provides a low-dimensional approximation of the full distance matrix which helps to reveal global structural properties of the graph using only small samples of the distance matrix. The presented algorithm is inspired by the information-theoretic minimum description principle. We investigate the performance of this algorithm for selected real data sets and for synthetic graph data sets generated using stochastic block models and power-law random graphs, together with analytical considerations for sparse stochastic block models with bounded average degrees.",
keywords = "graph theory, statistical analysis, Big Data, peer-to-peer computing, partitioning, approximation algorithms, stochastic processes",
author = "Hannu Reittu and Lasse Leskel{\"a} and Marco Fiorucci and Tomi R{\"a}ty",
year = "2019",
month = "1",
day = "22",
doi = "10.1109/BigData.2018.8622118",
language = "English",
isbn = "978-1-5386-5036-3",
series = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",
pages = "3784--3792",
booktitle = "Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018",
publisher = "IEEE Institute of Electrical and Electronic Engineers",
address = "United States",

}

Reittu, H, Leskelä, L, Fiorucci, M & Räty, T 2019, Analysis of large sparse graphs using regular decomposition of graph distance matrices. in Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018., 8622118, IEEE Institute of Electrical and Electronic Engineers , Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, pp. 3784-3792, Advances in High Dimensional Big Data, Seattle, United States, 10/12/18. https://doi.org/10.1109/BigData.2018.8622118

Analysis of large sparse graphs using regular decomposition of graph distance matrices. / Reittu, Hannu; Leskelä, Lasse; Fiorucci, Marco; Räty, Tomi.

Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018. IEEE Institute of Electrical and Electronic Engineers , 2019. p. 3784-3792 8622118 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

TY - GEN

T1 - Analysis of large sparse graphs using regular decomposition of graph distance matrices

AU - Reittu, Hannu

AU - Leskelä, Lasse

AU - Fiorucci, Marco

AU - Räty, Tomi

PY - 2019/1/22

Y1 - 2019/1/22

N2 - Statistical analysis of large and sparse graphs is a challenging problem in data science due to the high dimensionality and nonlinearity of the problem. This paper presents a fast and scalable algorithm for partitioning such graphs into disjoint groups based on observed graph distances from a set of reference nodes. The resulting partition provides a low-dimensional approximation of the full distance matrix which helps to reveal global structural properties of the graph using only small samples of the distance matrix. The presented algorithm is inspired by the information-theoretic minimum description principle. We investigate the performance of this algorithm for selected real data sets and for synthetic graph data sets generated using stochastic block models and power-law random graphs, together with analytical considerations for sparse stochastic block models with bounded average degrees.

AB - Statistical analysis of large and sparse graphs is a challenging problem in data science due to the high dimensionality and nonlinearity of the problem. This paper presents a fast and scalable algorithm for partitioning such graphs into disjoint groups based on observed graph distances from a set of reference nodes. The resulting partition provides a low-dimensional approximation of the full distance matrix which helps to reveal global structural properties of the graph using only small samples of the distance matrix. The presented algorithm is inspired by the information-theoretic minimum description principle. We investigate the performance of this algorithm for selected real data sets and for synthetic graph data sets generated using stochastic block models and power-law random graphs, together with analytical considerations for sparse stochastic block models with bounded average degrees.

KW - graph theory

KW - statistical analysis

KW - Big Data

KW - peer-to-peer computing

KW - partitioning

KW - approximation algorithms

KW - stochastic processes

UR - http://www.scopus.com/inward/record.url?scp=85062642513&partnerID=8YFLogxK

U2 - 10.1109/BigData.2018.8622118

DO - 10.1109/BigData.2018.8622118

M3 - Conference article in proceedings

SN - 978-1-5386-5036-3

SN - 978-1-5386-5034-9

T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

SP - 3784

EP - 3792

BT - Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018

PB - IEEE Institute of Electrical and Electronic Engineers

ER -

Reittu H, Leskelä L, Fiorucci M, Räty T. Analysis of large sparse graphs using regular decomposition of graph distance matrices. In Proceedings 2018 IEEE International Conference on Big Data, Big Data 2018. IEEE Institute of Electrical and Electronic Engineers . 2019. p. 3784-3792. 8622118. (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). https://doi.org/10.1109/BigData.2018.8622118