### Abstract

The efficiency of the otherwise expedient decision tree learning can be impaired in processing data-mining-sized data if superlineartime processing is required in attribute selection. An example of such a technique is optimal multisplitting of numerical attributes. Its efficiency is hit hard even by a single troublesome attribute in the domain. Analysis shows that there is a direct connection between the ratio of the numbers of boundary points and training examples and the maximum goodness score of a numerical attribute. Class distribution information from preprocessing can be applied to obtain tighter bounds for an attribute's relevance in class prediction. These analytical bounds, however, are too loose for practical purposes. We experiment with heuristic methods which postpone the evaluation of attributes that have a high number of boundary points. The results show that substantial time savings can be obtained in the most critical data sets without having to give up on the accuracy of the resulting classifier.

Original language | English |
---|---|

Title of host publication | Principles of Data Mining and Knowledge Discovery |

Subtitle of host publication | Second European Symposium, PKDD ’98 |

Publisher | Springer |

Pages | 221-229 |

ISBN (Electronic) | 978-3-540-49687-8 |

ISBN (Print) | 978-3-540-65068-3 |

Publication status | Published - 1998 |

MoE publication type | A4 Article in a conference publication |

Event | 2nd Eur. Symp., PKDD'98. Principles of Data Mining and Knowledge Discovery. Nantes, 23 - 26 Sept. 1998 - Duration: 1 Jan 1998 → … |

### Publication series

Series | Lecture Notes in Computer Science |
---|---|

Volume | 1510 |

ISSN | 0302-9743 |

### Conference

Conference | 2nd Eur. Symp., PKDD'98. Principles of Data Mining and Knowledge Discovery. Nantes, 23 - 26 Sept. 1998 |
---|---|

Period | 1/01/98 → … |

## Fingerprint Dive into the research topics of 'Postponing the evaluation of attributes with a high number of boundary points'. Together they form a unique fingerprint.

## Cite this

Elomaa, T., & Rousu, J. (1998). Postponing the evaluation of attributes with a high number of boundary points. In

*Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD ’98*(pp. 221-229). Springer. Lecture Notes in Computer Science, Vol.. 1510