A regression subset-selection strategy for fat-structure data

Christian Gatu*, Marko Sysi-Aho, Matej Orešič

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

A strategy is proposed for finding the most significant linear regression submodel for fat-structure data, that is when the number of variables n exceeds the number of available observations m. The method consists of two stages. First, a heuristic is employed to preselect a number of variables nS such that nS ≤m. The second stage performs an exhaustive search on the reduced list of variables. It employs a regression tree structure that generates all possible subset models. Non-optimal subtrees are pruned using a branch-and-bound device. Cross validation experiments on a real biomedical dataset are presented and analyzed.
Original languageEnglish
Title of host publicationCOMPSTAT 2008
Subtitle of host publicationProceedings in Computational Statistics
EditorsPaula Brito
Place of PublicationHeidelberg
PublisherPhysica Verlag
Pages349-358
ISBN (Electronic)978-3-7908-2084-3
ISBN (Print)978-3-7908-2083-6
DOIs
Publication statusPublished - 2008
MoE publication typeA4 Article in a conference publication
Event18th Symposium on Computational Statistics - Porto, Portugal
Duration: 24 Aug 200829 Aug 2008

Conference

Conference18th Symposium on Computational Statistics
Country/TerritoryPortugal
CityPorto
Period24/08/0829/08/08

Keywords

  • regression tree
  • branch-and-bound
  • model selection
  • fat-structure data

Fingerprint

Dive into the research topics of 'A regression subset-selection strategy for fat-structure data'. Together they form a unique fingerprint.

Cite this