VSMI2-PANet: Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques

  • Nayef Alqahtani
  • , Arfat Ahmad Khan
  • , Rakesh Kumar Mahendran
  • , Muhammad Faheem*
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Lung cancer (LC) is a major cancer which accounts for higher mortality rates worldwide. Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages. Nowadays, machine learning (ML) and deep learning (DL) methodologies are utilised for the robust detection and prediction of lung tumours. Recently, multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features. To cope with that, we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network ((Formula presented.)) which adopts three imaging modalities named computed tomography (CT), magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). The designed model accepts input from CT and MRI images and passes it to the (Formula presented.) module that is composed of three sub-modules named image cropping module, scale malleable convolution layer (SMCL) and PANet module. CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module. The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information. The output from the SMCL is then fused and provided to the PANet module. The PANet module examines the fused patches by analysing its height, width and channels of the image patch. As a result, it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours. The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer (LWT) for segmenting the lung tumours into three classes, such as normal, benign and malignant. In addition, the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours. The performance of the proposed model is validated using several performance metrics, such as accuracy, precision, recall, F1-score and AUC curve, and the results show that the proposed work outperforms the existing approaches.

Original languageEnglish
Pages (from-to)1376-1393
Number of pages18
JournalCAAI Transactions on Intelligence Technology
Volume10
Issue number5
DOIs
Publication statusPublished - Oct 2025
MoE publication typeA1 Journal article-refereed

Funding

The work of Muhammad Faheem is supported by the VTT Technical Research Centre of Finland and the work of Nayef Alqahtani is supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia (Grant KFU251882).

Keywords

  • computational intelligence
  • computer vision
  • data fusion
  • deep learning
  • feature extraction
  • image segmentation

Fingerprint

Dive into the research topics of 'VSMI2-PANet: Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques'. Together they form a unique fingerprint.

Cite this