Unsupervised pretraining has recently achieved significant success on a wide variety of natural language processing tasks. An important problem that remains understudied is how to effectively adapt such models to specific domains, which is critical for many real-life applications. In this paper, we explore to enhance pretraining by leveraging two typical sources of domain knowledge: unstructured domain-specific text and structured (often human-curated) domain knowledge. We propose models to jointly utilize these different sources of knowledge, which achieve the state-of-the-art results on two tasks of different domains: stock price movement prediction and software bug duplication detection, by adapting publicly available pretrained models obtained on generic domain-free corpora like book corpora and news articles.
Article ID: 2021S14
Month: May
Year: 2021
Address: Online
Venue: Canadian Conference on Artificial Intelligence
Publisher: Canadian Artificial Intelligence Association
URL: https://caiac.pubpub.org/pub/q5d3twsd/