Membership Query Synthesis (MQS) is an active learning paradigm in which one labels generated artificial examples instead of genuine ones to extend a dataset. Despite prodigious advances in the power of generative models, an essential component of MQS, the field stays severely under-studied, especially in the textual domain. We found only one other paper, which selects examples in a latent space close to the decision boundary and shows good results on a curated dataset of short sentences. We show that this performs poorly when used on a real dataset. We propose and report better results than random selection of unlabelled genuine data with random generation of artificial data from a variational auto-encoder coupled with a simple set of filtering mechanisms. This provides an improvement of 31.1% over the previous MQS state-of-the-art on the SST-2 dataset, and of 2.7% over random active learning. To the best of our knowledge, this is the first time MQS is reported to work on a textual task with no constraint on the size of the input sentences
Article ID: 2022L33
Month: May
Year: 2022
Address: Online
Venue: Canadian Conference on Artificial Intelligence
Publisher: Canadian Artificial Intelligence Association
URL: https://caiac.pubpub.org/pub/f6b0scvi