Regular Decomposition of Large Graphs

Foundation of a Sampling Approach to Stochastic Block Model Fitting

Hannu Reittu, Ilkka Norros, Tomi Räty, Marianna Bolla, Fülöp Bazsó

Research output: Contribution to journalArticleScientificpeer-review

Abstract

We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.

Original languageEnglish
Pages (from-to)44-60
Number of pages17
JournalData Science and Engineering
Volume4
Issue number1
DOIs
Publication statusPublished - 7 Mar 2019
MoE publication typeA1 Journal article-refereed

Fingerprint

Sampling
Decomposition
Maximum likelihood

Keywords

  • Community detection
  • Consistency
  • Martingales
  • Sampling

Cite this

@article{5b6a27dac7dc4a90aa00d9d14f19dd3d,
title = "Regular Decomposition of Large Graphs: Foundation of a Sampling Approach to Stochastic Block Model Fitting",
abstract = "We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemer{\'e}di’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.",
keywords = "Community detection, Consistency, Martingales, Sampling",
author = "Hannu Reittu and Ilkka Norros and Tomi R{\"a}ty and Marianna Bolla and F{\"u}l{\"o}p Bazs{\'o}",
year = "2019",
month = "3",
day = "7",
doi = "10.1007/s41019-019-0084-x",
language = "English",
volume = "4",
pages = "44--60",
journal = "Data Science and Engineering",
issn = "2364-1185",
publisher = "Springer",
number = "1",

}

Regular Decomposition of Large Graphs : Foundation of a Sampling Approach to Stochastic Block Model Fitting. / Reittu, Hannu; Norros, Ilkka; Räty, Tomi; Bolla, Marianna; Bazsó, Fülöp.

In: Data Science and Engineering, Vol. 4, No. 1, 07.03.2019, p. 44-60.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Regular Decomposition of Large Graphs

T2 - Foundation of a Sampling Approach to Stochastic Block Model Fitting

AU - Reittu, Hannu

AU - Norros, Ilkka

AU - Räty, Tomi

AU - Bolla, Marianna

AU - Bazsó, Fülöp

PY - 2019/3/7

Y1 - 2019/3/7

N2 - We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.

AB - We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.

KW - Community detection

KW - Consistency

KW - Martingales

KW - Sampling

UR - http://www.scopus.com/inward/record.url?scp=85065022122&partnerID=8YFLogxK

U2 - 10.1007/s41019-019-0084-x

DO - 10.1007/s41019-019-0084-x

M3 - Article

VL - 4

SP - 44

EP - 60

JO - Data Science and Engineering

JF - Data Science and Engineering

SN - 2364-1185

IS - 1

ER -