The last decade has witnessed the rise of a black box society where classification models that hide the logic of their internal decision processes are widely adopted due to their high accuracy. In this paper, we propose FEHAN, a modularized Framework for Explaining HiErarchical Attention Network trained to classify text data. Given a document, FEHAN extracts sentences most relevant to the assigned class. It then generates a set of similar sentences using a Markov chain text generator, and it replaces the salient sentences with the synthetic ones, resulting in a new set of semantically similar documents in the vicinity of a given instance. The generated documents are used to train an interpretable decision tree that identifies words explaining the reason for the classification outcome. A quick inspection of these synthetic documents and their salient words helps explain why the black-box has assigned a given class to a document. We performed a qualitative and quantitative evaluation of FEHAN and a baseline on four different datasets to show the effectiveness of our proposal.
Article ID: 2021L18
Venue: Canadian Conference on Artificial Intelligence
Publisher: Canadian Artificial Intelligence Association