Réunion du 7 mai 2021
In the past few years, contextualized language modeling techniques (such as BERT) have yielded substantial improvements in ad-hoc re-ranking. Though very effective, these models leave much to be desired in terms of computational efficiency. In this talk, I show a technique for reducing query-time computation cost by delaying cross-attention and pre-computing document representations (PreTTR). Armed with this knowledge, I show how neural re-ranking architectures can be designed to take advantage of this property to enhance efficiency and interpretability by predicting term salience scores (EPIC).