Unsupervised pretraining has recently achieved significant success on a wide variety of natural language processing tasks. An important problem that remains understudied is how to effectively adapt such models to specific domains, which is critical for many real-life applications. In this paper, we explore to enhance pretraining by leveraging two typical sources of domain knowledge: unstructured domain-specific text and structured (often human-curated) domain knowledge. We propose models to jointly utilize these different sources of knowledge, which achieve the state-of-the-art results on two tasks of different domains: stock price movement prediction and software bug duplication detection, by adapting publicly available pretrained models obtained on generic domain-free corpora like book corpora and news articles.
Article ID: 2021S14
Publisher: Canadian Artificial Intelligence Association