SeerCuts: Explainable Attribute Discretization
Author(s)
Lai, Eugenie; Croitoru, Inbal; Bitton, Noam; Shalem, Ariel; Youngmann, Brit; Galhotra, Sainyam; Rezig, El Kindi; Cafarella, Michael; ... Show more Show less
Download3722212.3725132.pdf (1.415Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
This demonstration showcases SeerCuts - a tool that suggests useful and semantically meaningful discretization strategies (partitions) for numerical attributes. SeerCuts is a generic, interactive framework where users specify attributes to discretize and their utility measure for a downstream task of choice. It uses GPT-4o to assess the semantic meaningfulness of candidate partitions and employs an efficient search strategy to explore the vast space of discretization options. With hierarchical clustering to group related partitions and a multi-armed bandit policy to identify useful partitions with only a few samples, SeerCuts quickly finds meaningful and useful partitions. In the demo, we will provide an overview of SeerCuts and allow the audience to explore various datasets and tasks, including data visualization and comprehensive modeling. The users will be able to evaluate how SeerCuts identifies meaningful discretization strategies and compare the tradeoff between different discretization options.
Description
SIGMOD-Companion ’25, Berlin, Germany
Date issued
2025-06-22Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryPublisher
ACM|Companion of the 2025 International Conference on Management of Data
Citation
Eugenie Lai, Inbal Croitoru, Noam Bitton, Ariel Shalem, Brit Youngmann, Sainyam Galhotra, El Kindi Rezig, and Michael Cafarella. 2025. SeerCuts: Explainable Attribute Discretization. In Companion of the 2025 International Conference on Management of Data (SIGMOD/PODS '25). Association for Computing Machinery, New York, NY, USA, 143–146.
Version: Final published version
ISBN
979-8-4007-1564-8