arXiv (Cornell University)
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
December 2024 • Wanlong Liu, J Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang
Retrieval-Augmented Generation (RAG) has emerged as a key paradigm for enhancing large language models (LLMs) by incorporating external knowledge. However, current RAG methods face two limitations: (1) they only cover limited RAG scenarios. (2) They suffer from limited task diversity due to the lack of a general RAG dataset. To address these limitations, we propose RAG-Instruct, a general method for synthesizing diverse and high-quality RAG instruction data based on any source corpus. Our approach leverages (1) fi…