Look Both Ways and No Sink: Converting LLMs into Text Encoders without Training
ACL · 2025
Abstract
Recent advancements have demonstrated the advantage of converting pretrained large language models into powerful text encoders by enabling bidirectional attention in transformer layers. However, existing methods often require extensive training on large-scale datasets, posing challenges in domain-specific scenarios. In this work, we show that a domain-specific pretrained large language model can be converted into a strong domain-specific text encoder without additional training. We first conduct a comprehensive empirical study to investigate different conversion strategies and identify the impact of the attention sink phenomenon on the performance of converted encoder models. Based on our findings, we propose a novel approach that enables bidirectional attention and suppresses the attention sink phenomenon, resulting in superior performance. Extensive experiments on multiple domains demonstrate the effectiveness of our approach. Our work provides new insights into the training-free conversion of text encoders in low-resource scenarios and contributes to the advancement of domain-specific text representation generation.
Citation
@inproceedings{lin2025converting,
title={Look Both Ways and No Sink: Converting LLMs into Text Encoders without Training},
author={Li, Jiaqi and Wang, Mengmeng and Zheng, Zilong and Zhang, Muhan},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year={2024}
}