Deep Averaging Networks (DANs) show strong performance in several key Natural Language Processing (NLP) tasks. However, their chief drawback is not accounting for the position of tokens when encoding sequences. We study how existing position encodings might be integrated into the DAN architecture. In addition, we propose a novel position encoding built specifically for DANs, which allows greater generalization capabilities to unseen lengths of sequences. This is demonstrated on decision tasks on binary sequences. Further, the resulting architecture is compared against unordered aggregation on sentiment analysis both with word- and character-level tokenization, to mixed results.
Article ID: 2021L28
Publisher: Canadian Artificial Intelligence Association