In applying Natural Language Processing to support mental health care, gathering annotated data is difficult. Recent work has pointed to lapses in approximative annotation schemes. While studying gaps in prediction accuracy can offer some information about these lapses, a more careful look is needed. Through the use of Influence Functions, quantification of the relevance of training examples according to their type of annotation is possible. Using a corpus aimed at suicidal risk assessment containing both crowdsourced and expert annotations, we examine the effects that these annotations have on model training at test time. Our results indicate that, while expert annotations are more helpful, the difference with respect to crowdsourced annotations is slight. Moreover, most globally helpful observations are crowdsourced, pointing to their potential.
Article ID: 2022S05
Publisher: Canadian Artificial Intelligence Association