This paper studies estimation of stochastic block models with Rissanen's minimum description length (MDL) principle in the dense graph asymptotics. We focus on the problem of model specification, i.e., identification of the number of blocks. Refinements of the true partition always decrease the code part corresponding to the edge placement, and thus a respective increase of the code part specifying the model should overweight that gain in order to yield a minimum at the true partition. The balance between these effects turns out to be delicate. We show that the MDL principle identifies the true partition among models whose relative block sizes are bounded away from zero. The results are extended to models with Poisson-distributed edge weights.
- Minimum description length principle
- random graph
- regular decomposition
- Szemerédi's regularity lemma