In the past decade deep learning models have achieved impressive performance on a wide range of tasks. However, they still face challenges in many high-stakes problems. In this paper we study Legal Judgment Prediction (LJP), which is an important high-stakes task utilizing fact descriptions obtained from court cases to make final judgements. We investigate the state-of-the-art of the LJP task by leveraging the most recent deep learning models, longformer, and demonstrate that we obtain the state-of-the-art performance, even with a limited amount of training data, benefiting from the advantage of pretraining and the long-sequence modeling capability of longformer. However, our analyses suggest that the improvement is due to the model’s fitting to spurious correlations, in which the model makes correct decisions based on information irrelevant to the task itself. We advocate that caution should be seriously exercised when explaining the obtained results. The second challenge in many high-stakes problems is interpretability required for models. The final predictions made by deep learning models are useful only if the evidences that support the models’ decisions are consistent with those used by subject-matter experts. We demonstrate that by using post-hoc interpretation, the conventional method XGBoost is actually capable of providing explainable results with a performance comparable to the longformer model, while not being subject to the spurious correlation issue. We hope our work contributes to the line of research on understanding the advantages and limitations of deep learning for high-stakes problems.
Article ID: 2021S15
Publisher: Canadian Artificial Intelligence Association