MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification

Wei Zhao (Corresponding Author), Bo Zhang, Cong Yang, Xianfu Chen, Hui Chen

Research output: Contribution to journalArticleScientificpeer-review


With the popularization of surveillance cameras, public-safety related applications requiring the functionality of video-based person re-identification (Re-ID) thrive. Re-ID aims at accurately identifying a person-of-interest across video sequences from multiple cameras. Existing methods usually focus on either spatially salient regions, or temporal features among frames of fixed intervals (i.e., either short- or long-term temporal features), resulting in the under-utilization of neglected features and hence moderate identification accuracy. To achieve high Re-ID accuracy, we propose a novel framework termed Multi-granular Spatial–Temporal Network (MSTN), that facilitates full utilization of spatial–temporal features for video-based person Re-ID. Within MSTN, a Temporal Kernel Attention (TKA) module is proposed to adaptively capture both short- and long-term temporal relationships; a Feature Disentanglement Spatial Attention (FDSA) module is further proposed to mine spatially salient and subtle features. Extensive experiments on the MARS dataset demonstrate that MSTN can achieve high identification accuracy, exhibiting 86.1% in terms of mAP and 91.0% in terms of Rank-1, notably higher than state-of-the-art comparison schemes.

Original languageEnglish
Article number100633
JournalInternet of Things (Netherlands)
Publication statusPublished - Nov 2022
MoE publication typeA1 Journal article-refereed


  • 3D convolution
  • Attention module
  • Video person Re-ID


Dive into the research topics of 'MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification'. Together they form a unique fingerprint.

Cite this