Abstract
In this paper, we introduce the general framework of knowledge-enriched machine learning, for encoding and leveraging problem-specific deterministic knowledge, such as column descriptions in the tabular setting. We focus on a paradigmatic use case, supervised learning problems on tabular data. As a first step in this direction, we introduce a simple yet flexible encoding of such deterministic information in the form of concept kernels, and describe meta-algorithms which leverage this particular encoding of prior knowledge. To ground future research, we introduce KE-TALENT, a novel benchmarking suite for kernel-enriched supervised learning on tabular data, adapted from the recently-introduced TALENT benchmark to include concept kernels and other metadata for each dataset. Finally, to demonstrate the benefits of concept kernels, we provide results for several kernel-enriched versions of existing algorithms, also intended as a baseline for future research. Code is publicly available.
Type
Publication
The International Conference on Neuro-symbolic Systems