Skip to main content
SearchLoginLogin or Signup

Class-wise Calibration: A Case Study on COVID-19 Hate Speech

Published onJun 08, 2021
Class-wise Calibration: A Case Study on COVID-19 Hate Speech


Proper calibration of deep-learning models is critical for many high-stakes problems. In this paper, we show that existing calibration metrics fail to pay attention to miscalibration on individual classes, hence overlooking minority classes and causing significant issues on imbalanced classification problems. Using a COVID-19 hate-speech dataset, we first discover that in imbalanced datasets, miscalibration error on an individual class varies greatly, and error on minority classes can be magnitude times worse than what is suggested by the overall calibration performance. To address this issue, we propose a new metric based on expected miscalibration error, named as Contraharmonic Expected Calibration Error (CECE), which punishes severe miscalibration on individual classes. We further devise a novel variant of temperature scaling for imbalanced data to improve class-wise miscalibration, which re-weights the loss function by the inverse class count to tune the scaling parameter to reduce worst-case minority calibration error. Our case study on a benchmarking COVID-19 hate speech task shows the effectiveness of our calibration metric and our temperature scaling strategy.

Article ID: 2021L24

Month: May

Year: 2021

Address: Online

Venue: Canadian Conference on Artificial Intelligence

Publisher: Canadian Artificial Intelligence Association


No comments here
Why not start the discussion?