TY - JOUR
T1 - Regular Decomposition of Large Graphs
T2 - Foundation of a Sampling Approach to Stochastic Block Model Fitting
AU - Reittu, Hannu
AU - Norros, Ilkka
AU - Räty, Tomi
AU - Bolla, Marianna
AU - Bazsó, Fülöp
PY - 2019/3/7
Y1 - 2019/3/7
N2 - We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.
AB - We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.
KW - Community detection
KW - Consistency
KW - Martingales
KW - Sampling
UR - http://www.scopus.com/inward/record.url?scp=85065022122&partnerID=8YFLogxK
U2 - 10.1007/s41019-019-0084-x
DO - 10.1007/s41019-019-0084-x
M3 - Article
AN - SCOPUS:85065022122
SN - 2364-1185
VL - 4
SP - 44
EP - 60
JO - Data Science and Engineering
JF - Data Science and Engineering
IS - 1
ER -