A Modularized Framework for Explaining Black Box Classifiers for Text Data

Mahtab Sarvmaili; Riccardo Guidotti; Anna Monreale; Amilcar Soares; Zahra Sadeghi; Fosca Giannotti; Dino Pedreschi; Stan Matwin

doi:doi:10.21428/594757db.7e75dcdf

The cumbersome amount of textual data produced in social media and in the new digital life makes the usage of automatic decision systems necessary for acting on text. The most widely adopted natural language processing approaches guarantee high accuracy but are black-box systems, that hide the logic of their internal decision processes. Since in various applications there is the need to unveil the reasons for the classification of different texts, the urge to explain black-box behaviour is growing among scientists. Thus, we propose a local model-agnostic method for interpreting text classifiers. Our method explains the decision of a text classifier on a given document by generating similar samples in its vicinity. The new samples are generated by replacing words of the document under analysis with their synonyms, antonyms, hyponyms, hypernyms, and definitions. Finally, these synthetic texts are used to train a decision tree that enables the user to identify important words explaining the classification outcome. An inspection of the synthetic documents generated by our proposal together with a set of words appropriately highlighted explain why the black box assigns a certain label to a given document. Deep and wide experimentation on various datasets and classifiers shows the effectiveness of our proposal and that its performance overcomes state-of-the-art methods.

Article ID: 2022L35

Month: May

Year: 2022

Address: Online

Venue: Graduate Student Symposium- Canadian Conference on Artificial Intelligence

Publisher: Canadian Artificial Intelligence Association

URL: https://caiac.pubpub.org/pub/71c292m6