Deep learning models have been used successfully to solve many Natural Language Processing problems, but less is known about the mechanisms that make them work. Unlike simpler models that can be understood mathematically, experimental compar- isons and probing tasks are being used on deep neural net model attributes to try and determine how the syntactic and semantic information is being represented. In this work, we focus on models that produce contextual word vectors and sentence vectors. To determine how the information in these embeddings is being represented, we use a class of feed-forward neural networks to design a set of experiments that are run on the normal downstream tasks of two common Natural Language Inference benchmarks and one Sentiment Analysis benchmark, together with the classification tasks of a set of synthetic datasets that we have created. A quantitative analysis of these experiments shows that the behaviour of the models trained on the natural language benchmark data match those of the synthetic data in many cases. We suggest that this similar behaviour implies that the decomposition of semantic and syntactic data in the models trained on the language data are similar to the known structures of the synthetic data.
Article ID: 2021S08
Publisher: Canadian Artificial Intelligence Association