SeerCuts: Explainable Attribute Discretization

Lai, Eugenie; Croitoru, Inbal; Bitton, Noam; Shalem, Ariel; Youngmann, Brit; Galhotra, Sainyam; Rezig, El Kindi; Cafarella, Michael

Author(s)

Lai, Eugenie; Croitoru, Inbal; Bitton, Noam; Shalem, Ariel; Youngmann, Brit; ... Show more

Download3722212.3725132.pdf (1.415Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

This demonstration showcases SeerCuts - a tool that suggests useful and semantically meaningful discretization strategies (partitions) for numerical attributes. SeerCuts is a generic, interactive framework where users specify attributes to discretize and their utility measure for a downstream task of choice. It uses GPT-4o to assess the semantic meaningfulness of candidate partitions and employs an efficient search strategy to explore the vast space of discretization options. With hierarchical clustering to group related partitions and a multi-armed bandit policy to identify useful partitions with only a few samples, SeerCuts quickly finds meaningful and useful partitions. In the demo, we will provide an overview of SeerCuts and allow the audience to explore various datasets and tasks, including data visualization and comprehensive modeling. The users will be able to evaluate how SeerCuts identifies meaningful discretization strategies and compare the tradeoff between different discretization options.

Description

SIGMOD-Companion ’25, Berlin, Germany

Date issued

2025-06-22

URI

https://hdl.handle.net/1721.1/164768

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Publisher

ACM|Companion of the 2025 International Conference on Management of Data

Citation

Eugenie Lai, Inbal Croitoru, Noam Bitton, Ariel Shalem, Brit Youngmann, Sainyam Galhotra, El Kindi Rezig, and Michael Cafarella. 2025. SeerCuts: Explainable Attribute Discretization. In Companion of the 2025 International Conference on Management of Data (SIGMOD/PODS '25). Association for Computing Machinery, New York, NY, USA, 143–146.

Version: Final published version

ISBN

979-8-4007-1564-8

Collections

MIT Open Access Articles