MatSub: A performance-oriented subgroup discovery framework for materials informatics

Published: 12 May 2026| Version 1 | DOI: 10.17632/9633bmtj4x.1
Contributors:
,
,
,

Description

This manuscript introduces MatSub, an open-access software package designed to facilitate the application of Subgroup Discovery (SGD) algorithms in machine learning and data-driven scientific discovery. A key contribution of MatSub lies in the development of novel quality functions tailored to materials informatics. While existing SGD algorithms with numerical targets often emphasize statistical exceptionality, materials research typically prioritizes the identification of subgroups with extreme or optimal property values. To address this gap, MatSub incorporates quality functions that (1) guide the discovery of subgroups maximizing or minimizing a target property, (2) enforce performance-based boundary constraints to filter out undesired materials, (3) promote orthogonal subgroup discovery to reveal multiple, physically distinct mechanisms affecting material behavior, and (4) enable multitask subgroup discovery to capture subgroups that simultaneously satisfy multiple property requirements. We demonstrate the utility of these quality functions in a case study on segregation energies of single-atom alloy catalysts (SAACs), where MatSub successfully identifies diverse and interpretable subgroups linked to distinct electronic and bonding characteristics. These results highlight the software’s ability to support mechanism-aware analysis and accelerate hypothesis generation in materials science and beyond.

Files

Categories

Materials Science, Physical Chemistry, Condensed Matter Physics, Molecular Physics, Computational Physics

Licence