Abstract
A strategy is proposed for finding the most significant linear regression submodel for fat-structure data, that is when the number of variables n exceeds the number of available observations m. The method consists of two stages. First, a heuristic is employed to preselect a number of variables nS such that nS ≤m. The second stage performs an exhaustive search on the reduced list of variables. It employs a regression tree structure that generates all possible subset models. Non-optimal subtrees are pruned using a branch-and-bound device. Cross validation experiments on a real biomedical dataset are presented and analyzed.
| Original language | English |
|---|---|
| Title of host publication | COMPSTAT 2008 |
| Subtitle of host publication | Proceedings in Computational Statistics |
| Editors | Paula Brito |
| Place of Publication | Heidelberg |
| Publisher | Physica Verlag |
| Pages | 349-358 |
| ISBN (Electronic) | 978-3-7908-2084-3 |
| ISBN (Print) | 978-3-7908-2083-6 |
| DOIs | |
| Publication status | Published - 2008 |
| MoE publication type | A4 Article in a conference publication |
| Event | 18th Symposium on Computational Statistics - Porto, Portugal Duration: 24 Aug 2008 → 29 Aug 2008 |
Conference
| Conference | 18th Symposium on Computational Statistics |
|---|---|
| Country/Territory | Portugal |
| City | Porto |
| Period | 24/08/08 → 29/08/08 |
Keywords
- regression tree
- branch-and-bound
- model selection
- fat-structure data