Towards Retrieval-Augmented Large Language Models (2024)

Yujuan Ding¹, Wenqi Fan^1∗, Liangbo Ning¹, Shijie Wang¹,Hengyun Li¹,
Dawei Yin², Tat-Seng Chua³, and Qing Li¹¹The Hong Kong Polytechnic University, ²Baidu Inc, ³National University of Singapore.

(2024)

摘要。

作为人工智能领域最先进的技术之一，检索增强生成（RAG）技术可以提供可靠且最新的外部知识，为众多任务提供巨大便利。特别是在人工智能生成内容（AIGC）时代，RAG 强大的检索能力可以提供额外的知识，使得检索增强生成能够协助现有的生成式人工智能产生高质量的输出。近年来，大型语言模型在语言理解和生成方面表现出了革命性的能力，但仍然面临着固有的局限性，例如幻觉和过时的内部知识。鉴于RAG在提供最新且有用的辅助信息方面的强大能力，检索增强的大语言模型应运而生，它可以利用外部的权威知识库，而不是仅仅依靠模型的内部知识来增强大语言模型的生成质量。在本次调查中，我们全面回顾了检索增强大型语言模型(RA-LLMs)的现有研究，涵盖三个主要技术视角：架构、训练策略和应用。作为基础知识，我们简要介绍了大语言模型的基础和最新进展。然后，为了说明RAG对于大语言模型的实际意义，我们将主流相关工作按应用领域进行分类，具体详细说明了每个领域的挑战以及RA-LLM的相应能力。最后，为了提供更深入的见解，我们讨论了当前的局限性和未来研究的几个有希望的方向。

检索增强生成（RAG）、大语言模型（大语言模型）、预训练、微调、上下文学习、提示。

^†^†copyright: acmcopyright^†^†journalyear: 2024

1. 介绍

作为最基本的数据挖掘技术之一，检索旨在理解输入查询并从外部数据源中提取相关信息（Kobayashi和Takeda，2000；Singhal等人，2001）。它在搜索、问答和推荐系统等各个领域都有广泛的应用(Buttcher 等人, 2016; Yin 等人, 2016; O’Hare 等人, 2016)。例如，搜索引擎（例如 Google、Bing 和百度）是业界最成功的检索应用；他们可以过滤并检索与用户的查询相匹配的最相关的网页或文档（Croft等人，2010；Yin等人，2016），使用户能够有效地找到所需的信息。同时，检索模型通过对外部数据库的有效数据维护，可以提供忠实、及时的外部知识，从而在各种知识密集型任务中发挥重要作用。由于其强大的能力，检索技术已成功融入人工智能生成内容（AIGC）时代的先进生成模型（Li 等人，2023a；Wu 等人，2024；Sheynin 等人，2023；Zhang等人，2023a)。值得注意的是，检索模型与语言模型的融合催生了检索增强生成（RAG）（Lewis等人，2020c），它已成为检索领域最具代表性的技术之一。生成式人工智能，旨在提高文本内容的生成质量(Li 等人, 2023a; Lewis 等人, 2020c; Borgeaud 等人, 2022)。

Towards Retrieval-Augmented Large Language Models (1)

为了改进生成模型并增强生成的结果，RAG 结合了来自外部数据源的信息或知识，作为输入查询或生成的输出的补充（Min 等人，2020；Khandelwal 等人，2020）.具体来说，RAG 首先调用检索器从外部数据库中搜索和提取相关文档，然后将其用作上下文来增强生成过程（Izacard 和 Grave，2021b）。在实践中，RAG 技术是可行且高效的，可以通过简单地调整检索组件来应用于各种生成任务，需要最少甚至不需要额外的训练（Ram 等人，2023）。最近的研究表明，RAG 不仅适用于知识密集型任务，例如开放域问答 (OpenQA) (Borgeaud 等人, 2022; Guu 等人, 2020; Petroni 等人, 2020)，也适用于一般语言任务（Khandelwal 等人，2020；He 等人，2021a；Xu 等人，2020），以及各种下游应用（Wu 等人，2024 ；刘等人，2023）。

近年来，预训练基础模型特别是大语言模型得到了快速发展，在各种任务中都表现出了令人印象深刻的性能（Chowdhery 等人，2023；Achiam 等人，2023），包括推荐系统（赵等人，2024a）、分子发现（李等人，2023a）和报告生成（丁等人，2024））。大语言模型的巨大成功在技术上归功于先进的架构，在海量的训练语料上进行了数十亿级的参数预训练。这些技术进步使得大语言模型（Zhao等人，2023b，2024a）具有显着的涌现能力，特别是在语言理解和生成、上下文学习等方面。例如，GPT-FAR引入了详细的提示来教GPT-4执行图像标记、统计分析和文本分析，以生成多模态时尚报告（Ding等人，2024）。大语言模型还通过了解用户对项目的偏好在推荐系统中取得了可喜的性能（Wang 等人，2024a；Zhao 等人，2024a）。尽管取得了成功，大语言模型仍然存在固有的局限性（Zhao等人，2024a，2023b），例如缺乏特定领域的知识、“幻觉”问题以及大量的计算量。用于更新模型的资源。这些问题在医学和法律等特定领域尤其引人注目。例如，最近的一项研究表明，法律幻觉普遍存在且令人不安，针对最先进的大语言模型(Dahl 等人)的具体法律查询，幻觉率从 69% 到 88% 不等。，2024）。此外，由于使用特定领域或最新数据微调大语言模型需要大量计算资源，因此解决幻觉问题的挑战变得更加困难。这反过来又极大地阻碍了大语言模型在各种实际应用中的广泛采用。

为了解决这些局限性，最近人们努力利用 RAG 来增强大语言模型在各种任务中的能力(Shi 等人, 2023; Khandelwal 等人, 2020; Borgeaud 等人, 2022; Izacard 和Grave, 2021a)，尤其是那些对最新可靠知识要求较高的人，例如问答 (QA)、AI4Science 和软件工程。例如，Lozano 等人 (2023) 引入了一种基于动态检索科学文献的科学特定的 QA 系统。 MolReGPT 利用 RAG 增强 ChatGPT 的上下文学习能力，以进行分子发现（Li 等人，2023a）。如图1所示，基于LLM的对话系统将无法很好地回答超出范围的查询。相比之下，借助RAG从外部数据源检索相关知识并将其整合到生成过程中，对话系统成功地向用户给出了正确的答案。鉴于 RAG 在推进大语言模型方面取得的显着进展，迫切需要对检索增强大语言模型(RA-LLM)的最新进展进行系统回顾。

本综述旨在通过从 RA-LLM 的架构、训练和应用等方面总结代表性方法，对检索增强型大型语言模型（即 RA-LLM）进行全面概述。更具体地说，在2节简要介绍了大语言模型的背景知识后，我们在3节从检索、生成和增强几个主要角度回顾了RA-LLMs的现有研究，以及检索在RAG中的必要性和应用频率。然后，我们在4节中总结了RA-LLM的主要训练技术，在5节中总结了RA-LLM的各种应用。最后，在6节中，我们讨论了未来探索的主要挑战和潜在方向。

在我们进行调查的同时，一些相关调查对 RAG 和大语言模型也有不同的关注点。例如，Zhao等人(2023a)专门回顾了基于多模态信息的RAG技术，Zhao等人(2024b)讨论了AIGC的RAG。 Gao等人(2023b)对大语言模型的RAG进行了比较全面的概述。我们的调查与这些调查的不同之处在于，我们专注于技术视角，并根据 RA-LLM 的架构和范式以及应用任务系统地审查模型。

2. 背景

在本节中，我们简要介绍大型语言模型和即时学习的背景。

2.1. 大型语言模型（大语言模型）

最近，大语言模型的重大突破彻底改变了人工智能领域（赵等人，2023b；布朗等人，2020；范等人，2024）。先进的大语言模型通常在具有数十亿级参数的大量数据上进行预训练，并展示了理解和生成类人文本的能力，从而导致文本生成和信息检索等各种自然语言处理任务的进步(赵等人, 2023b, 2024a).大语言模型可以通过在特定数据集上进行微调来适应各种下游任务，从而使它们能够专注于特定领域或应用程序。一般来说，大多数现有的大语言模型可以大致分为三大类：仅编码器模型、仅解码器模型和编码器-解码器模型。

仅编码器模型，例如 BERT（来自 Transformers 的双向编码器表示）（Devlin 等人，2019）系列模型，通过将输入文本编码到高维空间来处理输入文本。仅编码器模型的关键特征是它们的双向性质，这意味着它们在编码时可以考虑每个词符的左右上下文。这种双向性使得仅编码器模型能够更好地理解上下文中单词的含义，这对于情感分析、评论阅读和文本分类等任务至关重要（Xu 等人，2019；Devlin 等人，2019）。与这些模型相反，仅解码器模型以从左到右的方式生成文本。作为代表性的仅解码器模型，GPT（生成式预训练 Transformer ）(Radford 等人, 2018) 根据先前标记提供的上下文来预测序列中的下一个词符。它们的架构使它们对于语言生成、代码生成和创意写作等任务特别有效。编码器-解码器模型，例如 T5（文本到文本传输转换器）（Raffel 等人，2020），独特地将各种 NLP 任务转换为文本生成问题。更具体地说，T5 中的编码器处理输入序列以捕获其含义，而解码器则根据编码信息生成输出序列。这种 T5 架构非常适合涉及将一个序列转换为另一序列的任务，例如机器翻译、摘要和会话响应生成。

2.2. 及时学习

2.2.1. 提示工程

由于大语言模型的参数庞大，即时学习成为一种范式，利用大语言模型的力量来实现各种任务（Zhao等人，2023b，2024a），而不是微调大语言模型广泛。即时学习精心设计了指导模型执行大语言模型下游任务的输入。例如，早期的方法（Petroni等人，2019；Brown等人，2020）提供手动制作的模板来处理NLP中的各种任务。具体来说，像 BERT 这样的纯编码器模型通常采用完形填空提示，因为它们与预训练任务的形式非常匹配（Petroni 等人，2019；Cui 等人，2021）。对于 GPT 等其他模型，前缀提示往往更合适，因为它们与生成任务（Brown等人，2020）很好地契合。然而，手动设计的提示依赖于人类经验，缺乏有效性保证。为了解决这个限制，人们开发了软提示调整来学习可训练的连续提示嵌入（Li和Liang，2021；Vu等人，2022；Tu等人，2022）。例如，前缀调整（Li和Liang，2021）在输入中预先添加一系列前缀嵌入，可以对其进行训练和更新。这种分配允许提示不是真实的文本，从而为提示的生成提供了更大的灵活性。然而，由于缺乏特定领域的知识，模型在面对新任务时可能仍然无法生成准确的响应。

2.2.2. 情境学习（ICL）

为了克服普通即时学习的局限性，最近的研究（Liu 等人，2022a；Kim 等人，2022；Zhang 等人，2023d）开发了上下文学习（ICL）。ICL 是一种特定的提示学习方法，它为模型提供了一些提示中任务的演示。这种范式允许预先训练的大语言模型理解演示提供的模式来解决新任务，而无需进行微调。例如，通过仔细选择一些演示，GPT-3 (Brown 等人, 2020) 展示了执行少样本任务的能力(Liu 等人, 2022a) 。这一成功表明大语言模型具有基于任务特定知识快速适应新任务的卓越能力。

尽管 ICL 很有效，但它通常严重依赖所提供演示的质量，这可能会导致生成次优输出。更糟糕的是，ICL 可能没有足够的必要信息或先验知识来指导大语言模型生成准确的响应。为了解决 ICL 的上述局限性，最近的研究引入了检索增强生成（RAG）技术（Lewis 等人，2020c；Ram 等人，2023；Shi 等人，2023）。通过将检索与生成相结合，RAG 模型为增强大语言模型在各种任务中的性能和适应性提供了一个有前景的方向。

3. 检索增强大型语言模型 (RA-LLM)

大语言模型时代的RAG框架一般由检索、生成、扩充三大流程以及判断是否需要检索的机制组成。在本节中，我们将介绍每个组件涉及的重要技术。

Towards Retrieval-Augmented Large Language Models (2)

Towards Retrieval-Augmented Large Language Models (3)

3.1. 恢复

给定大语言模型输入的查询，RAG 中的检索过程旨在从外部知识源提供相关信息，外部知识源可以是开源的，也可以是闭源的，如图2所示。关键组件检索器（如图 3 中进一步详细介绍）由多个过程组成，作为一个整体来衡量查询与数据库中文档之间的相关性，以实现有效的信息检索。检索的具体流程由是否包含检索前和检索后流程进一步确定。在本小节中，我们将介绍传统 RAG 和基于 LLM 的 RAG 检索中涉及的主要技术，包括检索器类型、检索粒度、检索前和检索后增强以及数据库构建。

3.1.1. 猎犬类型

根据信息编码方法，检索方法通常可分为两类：稀疏和密集。稀疏检索是基于单词的，主要应用于文本检索，而密集检索将查询和外部知识嵌入到向量空间中，很容易应用于各种数据格式。

作为一种简单的方法，稀疏检索，例如 TF-IDF 和 BM25（Sparck Jones，1972；Robertson 等人，2009），通常依赖于倒排索引匹配以及原始数据输入。例如，许多研究直接应用BM25进行段落级检索以促进其RAG（Chen等人，2017；Ram等人，2023；Zhong等人，2022；Jiang等人，2023；Zhou等人，2022 ; Xu 等人，2023b），其中段落被具体表示为词袋，并根据术语和逆文档频率进行排名（Izacard 和 Grave，2021b）。除了提供补充以增强生成器的输入之外，稀疏检索还被用来寻找示例作为 RA-LLM 的 ICL 演示 (Ye 等人, 2023b; Luo 等人, 2023b; Rubin 等人, 2022；Agrawal 等人，2023；Sia 和 Duh，2023）。在 RAG 中应用稀疏检索的主要限制是其无需训练的性质，这使得检索性能严重依赖于数据库构建和查询生成的质量。此外，这种基于固定术语的方法仅支持相似性检索，而无法适应大语言模型应用中要求的其他检索考虑，例如多样性（Drozdov等人，2022）。

相反，密集检索将查询和文档嵌入到具有一定标准的连续向量空间中，例如语义相似性（Karpukhin等人，2020）。密集检索方法通常可以被训练，因此具有更大的灵活性和适应潜力。作为密集检索器的关键组成部分，嵌入模型与现有的RAG模型有着微妙的不同设计。一个简单的设计(Yogatama 等人, 2021; Khandelwal 等人, 2020; Lewis 等人, 2020a; Wu 等人, 2022)就是直接使用生成模型的一部分作为检索器，它可能能够增强检索和生成过程之间的一致性。基于BERT的主干(Devlin等人, 2019)在检索模型中得到广泛应用。一种常见的检索器设计是构建具有 BERT 结构的双流编码器（一个编码器用于查询，另一个编码器用于文档），也称为双编码器 (Wu 等人, 2020; Shi 等人, 2023 ）。早期的RAG方法倾向于冻结（Borgeaud等人，2022；Ram等人，2023）或部分冻结（Lewis等人，2020c）检索器的参数进行一般级别的相关知识提取，并更加注重知识利用和生成器微调。大规模的专业预训练进一步增强了 RAG 模型，使其在知识密集型任务中表现出色。一个典型的成功案例是 Dense Passage Retriever (DPR)（Karpukhin 等人，2020），它使用基于 BERT 的骨干网，并专门针对使用问答对数据的 OpenQA 任务进行了预训练。 DPR 作为预训练检索器表现出了强大的能力，促进许多 RAG 模型在各种下游任务中取得成功（Lewis 等人，2020c；Izacard 和 Grave，2021b；Siriwardhana 等人，2023；Singh 等人，2021；施等人，2023）。它也被认为是 RAG 范式提高大语言模型性能的第一步，可以通过微调进一步增强查询和相关文本数据之间嵌入的对齐（Cheng 等人，2023））。最近的一项研究（Reichman 和 Heck，2024）还发现，DPR 训练分散了知识在网络中的存储方式，从而创建了对同一信息的多种访问路径。通过有效的微调，双编码器检索器也广泛应用于基于 ICL 的 RAG (Rubin 等人, 2022; Poesia 等人, 2022; Lu 等人, 2023; Ye 等人, 2023b; Milios 等人，2023；李和邱，2023）。具体来说，它们更常用于基于句子嵌入相似性的检索，以及 ICL 中的一些特殊要求，例如多样化示例检索（Ye等人，2023b）。

RA-LLM 中广泛应用的另一类密集检索器具有单编码器结构，该结构可能基于 Transformer、BERT 或其他现成的序列建模主干。这些单编码器检索器通常通过对比学习在大规模未对齐文档上进行预训练（Reichman 和 Heck，2024），因此可能因其多功能性而表现出色，这意味着它们可以更好地迁移和泛化到新的领域或任务。这种通用的预训练检索器，例如 Contriever (Gautier 等人, 2022) 和 Spider (Ram 等人, 2022)，将更灵活地用于大语言模型针对各种任务，并在许多 RA-LLM 方法中证明了其有效性，例如 In-Context RALM (Ram 等人, 2023)、Atlas (Izacard 等人, 2023 )、Self-RAG (Asai 等人, 2023b) 以及其他 (Shi 等人, 2023)。根据现有研究的实验结果（Yu等人，2023a），对于开放域QA任务，与InstructGPT配合（Ouyang等人，2022），应用通用未经微调的目的预训练检索器（Contriever）可实现与稀疏检索器（BM25）相当的性能。然而，它们都比在目标数据集上微调的 DPR 模型差，显示了对目标任务和数据进行微调的有效性。

3.1.2. 检索粒度

检索粒度表示对语料库进行索引的检索单元，例如文档、段落、词符或实体等其他级别。对于 RAG，检索粒度的选择可以显着影响模型在有效性和效率方面的整体性能，因为它们决定了数据库的节省空间以及搜索的计算成本（Asai 等人，2023a）。早期检索增强语言模型（Chen等人，2017）提出检索整篇文档，然后应用经过训练的机器理解模型来检测返回文档中的答案范围，该模型更关注语言阅读和文档中关键信息的定位。在生成语言模型中，组块检索（在一些参考文献中也称为段落(Karpukhin 等人, 2020; Guu 等人, 2020; Jiang 等人, 2023))很常见，已在传统和基于 LLM 的 RAG 模型中使用，例如 REALM (Guu 等人, 2020)、RAG (Lewis 等人, 2020c) 和 Atlas （Izacard 等人，2023）。更细粒度的检索，即词符检索，虽然搜索速度更快，但会给数据库节省带来更大的负担。词符检索更适合需要稀有模式或域外数据的情况（Khandelwal等人，2020），同时与kNN-LM中应用的Every-token检索策略配合良好，其他类似作品（Yogatama 等人，2021；He 等人，2021b；Min 等人，2023）。相比之下，文本块可以包含紧凑完整的信息，冗余和不相关性较少，因此成为RAG中主流的检索文本粒度。

RAG 中提出的另一个主要检索粒度是实体检索。与上述类型的粒度不同，实体检索是从知识而不是语言的角度设计的。 Févry等人(2020)引入了实体作为专家（EAE）模型，该模型根据实体身份划分语言模型的参数空间。所提出的 EAE 模型旨在通过维基百科数据库从文本中学习实体表示以及其他模型参数，并使用实体记忆表示知识。在更细粒度的层面上，de Jong 等人（2022）提出通过学习和检索提及而不是实体来构建知识库。总体而言，在 RAG 中应用实体或提及级检索对于以实体为中心的任务来说会更有效，并且与 Token 明智的检索相比，在空间上也更高效。

3.1.3. 检索前和检索后增强

为了确保检索质量，即提高检索结果的准确性和相关性，人们提出了各种检索前和检索后策略来进一步增强检索器的输入和输出。 Wang等人(2023f)提出了一种查询扩展方法Query2doc，通过少样本提示大语言模型生成伪文档，并用伪文档中的相关信息扩展查询。文档，可以帮助消除查询歧义并指导检索器。他们通过经验证明，这种方法可以提高稀疏和密集检索器（Karpukhin 等人，2020）在临时信息检索数据集上的性能。类似地，Gao 等人（2023a）提出了假设文档嵌入（HyDE）方法，该方法指示大语言模型为给定查询生成假设文档。然后，假设的文档被用作新的查询来嵌入并使用密集检索器搜索邻居。

另一种预检索策略，查询重写 (马等人, 2023a)，旨在缩小输入文本与检索所需知识之间的差距，重新表述原始文本问题转化为更有利于检索的版本。具体来说，Ma等人(2023a)提出了Rewrite-Retrieve-Read框架，该框架提示大语言模型生成检索函数的查询。重写步骤的动机是澄清新查询中的检索需求，以减轻检索功能理解输入并增强输出（即检索相关信息）的负担。他们测试了使用冻结大语言模型和可训练模型作为重写器的设置，两者都优于朴素的 RAG 或生成模型，但在不同的测试 QA 数据集上展示了不同的性能。

Yu 等人 (2023c) 提出查询增强，将原始查询和物品知识生成的输出结合起来作为一个新查询，进一步用于从外部检索相关信息数据库。检索到的结果可以启发语言模型重新思考生成的结果并对其进行增强。与仅应用原始查询相比，这种增强可以贡献从语料库检索到的更多相关信息，以直接阐明查询-输出关系。在新查询中包括初始输出进一步增强了要通过给定问题检索的支持文档之间的词汇和语义重叠。查询增强在这些查询增强策略中实现了总体更好的性能，因为它可以在生成答案的同时集中处理所有检索到的知识（Wang等人，2024c）。

检索后增强是指在将检索器提取的 top-k 文档馈送到生成器之前对其进行处理的过程，以便检索和生成阶段之间更好地对齐（Yang 等人，2023b），特别是对于闭源生成器，例如大语言模型。例如，Yang 等人 (2023b) 提出了可插拔奖励驱动上下文适配器 (PRCA)，它能够在特定数据集上配置轻量级适配器而不是生成器。它还通过强化学习和生成器产生的奖励来提取检索到的文档。 Glass等人(2022)提出Retrieve-Rerank-Generate (R²G)方法，通过rerank操作将不同检索方法的检索文档组合起来以提高鲁棒性的检索结果。应用检索后增强的另一个考虑因素是，检索到的信息有时可能不相关或包含噪声，这可能对任务的生成模型没有帮助，甚至更糟，损害生成过程（Wang等人，2023a））。 Wang 等人 (2023a)、Asai 等人 (2023b)、Yu 等人 (2023c) 提出了不同的策略来减轻检索中的噪声知识文档。然而，Xiong 等人(2023)根据经验研究发现，这些方法依赖于大语言模型的置信水平，可能不如预期的那么精确。针对这个问题，Wang等人(2024c)提出了BlendFilter，它同时考虑检索前的查询生成混合和检索后的知识过滤。该方法可以解决复杂问题以及噪声检索知识问题，从而全面提高 RA-LLM 的性能。

最近，人们提出了先进的 RAG 管道，使用大语言模型来生成推理路径和计划，并使用信息检索（IR）模块迭代检索知识以增强基于 LLM 的生成（Yao 等人，2023；Xu 等人，2023a；邵等人，2023）。然而，朱等人(2023)指出，如果IR和大语言模型的输出质量较低，检索和生成过程将在这种迭代引导管道中相互阻碍。为了克服这一障碍，他们提出了一种用于查询和检索知识增强的新推理方法。检索后策略还可以起到增强检索结果和生成模型之间的兼容性的作用。例如，现有大语言模型的主要限制之一是输入标记的长度，这阻止了长检索文档直接合并到现有的 RA-LLM 中。针对这一限制，Xu 等人 (2023b) 提出了检索、压缩、预置（RECOMP），它在生成过程中的上下文增强之前添加了一个中间步骤，将检索到的文档处理为文本摘要。

3.1.4. 数据库

RAG中的检索是基于外部知识源进行的，外部知识源可以是闭源或开源的(Ma 等人, 2023a; Menick 等人, 2022)，如图2。闭源数据库通常存储知识的键值对，可以通过多种方式构建。键主要用于相似性匹配，作为稀疏向量（例如 BM25 中的向量）或来自检索编码的密集嵌入。该值取决于具体的检索目标，大多数情况下为原始文本(Guu 等人, 2020; Lewis 等人, 2020c; Izacard and Grave, 2021b; Borgeaud 等人, 2022; Lewis 等人, 2020a; Seo 等人，2019）。例如，每篇维基百科文章被分割成不相交的 100 个单词的块，从而在早期 RAG (Lewis 等人，2020c) 中形成总共 21M 个文档。每个文档都通过密集嵌入进行编码，并分别作为值和键保存在数据库中。该值也可以存储 Token ，每个 Token 一个，如 kNN-LM 和 Spalm （Yogatama 等人，2021；Khandelwal 等人，2020）中应用的那样。数据库的来源取决于具体的应用领域和任务。维基百科是以往 RAG 工作中最常应用的通用检索集之一，它存储事实结构化信息，并有多个不同规模的版本，从十亿 Token 级别（Khandelwal 等人，2020；Yogatama 等人，2021； Lewis 等人, 2020c; Guu 等人, 2020; de Jong 等人, 2022; Xu 等人, 2023; Ram 等人, 2023) -level (Borgeaud 等人, 2022)。特定于域的数据库也用于下游任务。例如，对于代码生成任务，Zan等人(2022)收集公共图书馆的API信息和代码文件来构建其APIretriever数据库。此外，Zhou 等人（Zhou 等人，2022）建议在其模型中使用经常更新新内容（新发布的库）的文档池。

应用Bing、Google等互联网搜索引擎（罗等人，2023a），避免了搜索索引的维护，可以获取最新的知识（Lazaridou等人，2022）. 同时，它提供了比闭源数据库更广泛的知识库（Asai等人，2023b；Lazaridou等人，2022）。互联网搜索已广泛与黑盒大语言模型结合，并显示出不同功能的有效性，例如知识增强（Lazaridou等人，2022），事实检查（Menick等人，2022） ) 和大语言模型代理增强(Yao 等人, 2023)。与传统的RAG相比，由于大语言模型作为读者理解搜索结果（即检索到的文档）以及大语言模型的非凡能力，互联网搜索在RA-LLM中被更多地用作检索器使用工具处理和分析它们的能力（马等人，2023a）。现有研究（Yu等人，2023a）表明，利用搜索引擎（例如InstrucGPT）对于大语言模型在零样本知识密集型任务（例如OpenQA和事实检查）上特别有效。

3.2. 一代

生成器的设计在很大程度上取决于下游任务。对于大多数文本生成任务，仅解码器和编码器-解码器是两种主要结构（Zhao等人，2023b）。最近商业闭源大型基础模型的发展使得黑盒生成模型成为 RA-LLM 的主流。在这一部分中，我们将简要回顾这两种类型的生成器的研究：参数可访问（白盒）和参数不可访问（黑盒）。

3.2.1. 参数可访问的生成器（白盒）

Encoder-Decoder的结构使用不同的参数集独立地处理输入和目标，其中开发了交叉注意组件来将输入标记连接到目标标记。代表性的编码器-解码器模型包括 T5 (Raffel 等人, 2020) 和 BART (Lewis 等人, 2020b)。相比之下，纯解码器模型在连接后处理输入和目标，这使得两部分的表征在网络上传播时逐层同时建立。这两类发电机在现有的RAG工作中得到了广泛的应用。例如，RAG (Lewis 等人, 2020c) 和 Re²G (Glass 等人, 2022) 采用 BART； FID （Izacard 和 Grave，2021b）和 EMDR² 使用 T5。还有其他模型（Borgeaud 等人，2022；Li 等人，2022a）利用基于 Transformer 的编码器-解码器架构，但进行了一些定制设计。 RAG 中的生成器与一般生成器的不同之处在于，它合并了检索到的数据以提高生成的准确性和相关性。此外，白盒生成器允许参数优化，可以对其进行训练以适应不同的检索和增强方法，以获得更好的生成性能。

3.2.2. 参数不可访问的生成器（黑盒）

一定比例的大语言模型是在没有公开内部结构或参数可访问性的情况下发布的，特别是那些特别大规模的模型，例如GPT系列（Achiam等人，2023），Codex(Chen 等人, 2021) 和 Claude，称为黑盒生成模型。这些生成器仅允许提供查询（输入）和接收响应（输出）的操作，而不允许更改内部结构或更新参数。从另一个角度来看，大语言模型，即使是那些开放微调的模型，其规模也很大，并且很难利用有限的数据来调整下游特定领域的任务。因此，黑盒RA-LLM更多地关注检索和增强过程，试图通过用更好的知识、指导或生成示例来增强输入（在大语言模型中也称为提示）来增强生成器。例如，Rubin 等人 (2022) 提出用语言模型本身标记的数据训练提示检索器，这可以为上下文学习提供更好的示例，从而增强最终生成表现。Xu等人(2023b)提出在上下文集成之前压缩检索到的文档，这可以减少计算成本，也减轻LM在长检索文档中识别相关信息的负担。

3.3. 用于生成增强的检索集成

增强描述了集成检索和生成部分的技术流程，这是 RA-LLM 的重要组成部分。在本小节中，我们介绍三种主要的增强设计，分别在生成器的输入层、输出层和中间层进行，如图2所示。

3.3.1. 输入层集成

集成检索到的信息/文档的常见方法是将它们与原始输入/查询相结合，并将它们共同传递给生成器，这称为输入层集成。例如，In-Context RALM (Ram 等人, 2023) 通过将原始输入和所有检索到的文档专门连接到单个序列中，作为生成模型的新输入来应用输入层集成。尽管有效，但这种集成仅限于检索到的文档数量，因为连接的新输入可能太长而无法由生成模型处理。上下文中的 RALM 通过从新输入的开头删除标记来专门缓解此限制。为了避免这种词符删除策略的信息损失，FID （Izacard and Grave，2021b）采用了不同的集成方法，在编码器中独立处理每个检索到的文档。该策略可扩展到大量上下文，因为它在后续处理中一次仅对一个上下文执行自注意力。 Atlas (Izacard 等人, 2023) 和 REPLUG (Shi 等人, 2023) 应用通过一次连接查询和一个检索到的文档来实现类似的并行集成。一般来说，大多数基于黑盒生成的 RAG 方法都应用输入层集成，因为生成模型的中间层或输出分布都不可访问。

更具体地说，对于大语言模型，输入层集成除了像传统 RAG 中那样将检索到的内容用作原始输入的补充外，还可以将检索到的内容用作（附加）提示或演示（Rubin 等人，2022）. 提示检索旨在通过检索自动找到合适的自然语言提示，教大语言模型在上下文中学习（Brown等人，2020）或诱导大语言模型推理（Wei）等人，2022）。无需精密的即时工程，可提高大语言模型的零样本能力。例如，Cheng 等人（2023）提出基于输入提示对数据和来自冻结大语言模型的分数标签来学习提示检索器。

3.3.2. 输出层集成

另一种增强是事后增强，即输出层集成，它将检索和生成结果结合起来。例如，kNN-LM (Khandelwal 等人, 2020) 在预测中插入两个下一个标记分布：一个由 LM 诱导，另一个由检索语料库中的最近邻居诱导。输出层线性积分（Grave等人，2017；Zhong等人，2022）应用灵活，因为它可以插入大多数生成模型而无需额外的训练。然而，输出层集成的简单性也限制了模型推理检索到的文本的能力。为了解决这个限制，Yogatama 等人 (2021) 建议添加一个额外的门控网络来对检索到的数据进行后处理并获得相对更好的性能。对于大语言模型，输出层集成与输入层集成一样合理且具有适应性。 REFEED (Yu 等人, 2023c)提出了一种答案精炼机制，应用大语言模型来评估检索到的信息并调整相应地初始答案，以提高响应的准确性。类似地，Zhang 等人 (2023b) 提出了 COMBO 框架，该框架根据预先训练的鉴别器将 LLM 生成的段落与检索到的对应内容匹配成兼容对。然后，这些段落对由基于 Fusion-in-Decoder 的（Izacard 和 Grave，2021b）进行处理，以得出最终答案。

3.3.3. 中间层集成

与上述两种非参数方法相比，更有吸引力的增强是设计一个半参数模块，通过生成模型的内部层集成检索结果，这称为中间层集成。这种集成可能会增加额外的复杂性，并有望有效增强生成模型的训练能力。通常，引入 Transformer 模块来将检索到的信息（主要编码为密集表示）引入到生成模型中，以便在生成的中间阶段与表示进行交互。例如，RETRO (Borgeaud 等人, 2022) 引入了分块交叉注意（CCA）层来处理生成器块中检索到的块，并且 Wu等人(2022)介绍了kNN-Augmented Attention Layer。同样，EAE (Févry 等人, 2020) 和 TOME (de Jong 等人, 2022)使用 Entity Memory 和 MemoryAttention 层分别合并检索到的实体和实体提及。这种中间层集成可以频繁有效地使用许多块，以增强整个RAG模型的能力。它提供了一种有效的替代方案来合并大量经常检索的文本块，由于 LM （Borgeaud 等人，2022）的输入长度限制，输入层集成处理这些文本块具有挑战性。但也需要注意的是，中间层集成对生成模型的访问权限要求较高，这对于大多数通过推理 API 访问的大语言模型来说是不可行的（马等人，2023a）。

3.4. 检索增强的必要性和频率

基于LLM的生成中的检索操作通常旨在补充知识以增强生成。尽管检索增强模型前景广阔，但它们仍被批评为不是通用解决方案（Li 等人，2022b；Petroni 等人，2020），因为不加区别地用不相关的段落增强大语言模型可能会覆盖大语言模型已经拥有的潜在正确知识并导致错误的响应（Maekawa 等人，2024）。Thakur 等人 (2023) 贡献了一个人工注释的数据集，以帮助评估大语言模型针对外部检索知识错误的鲁棒性，并观察到大语言模型可能会使非相关知识的幻觉率增加一倍检索到的段落比相关的段落要多。因此，对于 RA-LLM 来说，准确回忆先验知识，同时仅在必要时选择性地合并检索到的信息至关重要，这是通往稳健的 RA-LLM 的道路。

现有方法大多根据大语言模型的知识知识答案或其内部推理结果来确定检索的必要性（Ram等人，2023；Min等人，2022）。例如，Self-RAG (Asai 等人, 2023b)引入了特殊的标记来评估检索的必要性并控制检索行为。其他方法设计迭代提示来决定在生成过程中是否需要额外信息，从而需要调用大语言模型（Yao 等人，2023；Wei 等人，2022）的检索或其他操作。在传统的RAG中，检索必要性判断也被探索并提出通过直观的方法来解决，例如评估生成模型产生的logits的置信度（Jiang等人，2021；Kadavath等人，2022；He等人，2021b）。这样的解决方案也适用于 RA-LLM，例如 FLARE (Jiang 等人, 2023) 如果 logits 低于特定阈值，则动态触发 RAG。更灵活的是，Tan 等人 (2024) 引入了 SlimPLM，这是一种协作方法，通过细长代理模型检测大语言模型中缺失的知识，其功能是生成“启发式答案”。 “启发式答案”用于评估检索的必要性，并在必要时通过应用于查询重写来促进检索过程。

在很少考虑检索必要性的传统RAG中，检索频率（也称为检索步幅）是一个重要的设计方面，决定着生成中使用检索的程度，从而极大地影响RAG模型的整体性能(Ram等人，2023）。检索频率控制着对检索结果的依赖程度，从而影响模型的效率和有效性。在不考虑检索的必要性时，检索频率往往是预先定义和固定的，常见的设置有一次性、每n个 Token 和每个 Token 三种。一次性检索仅调用检索函数一次，并尝试在该一次性操作中查找所有所需信息。一次性检索通常在生成过程开始时进行，然后将所有检索到的文档与原始输入一起提供给生成模型，如 REALM (Guu 等人, 2020) 中所应用的那样。一次性检索更适合大语言模型（Jiang等人，2023）对外部数据库的信息需求明显的情况。然而，对于需要长格式输出的语言任务（例如开放域摘要），在生成过程中更需要考虑输出中标记之间的依赖性。在这些情况下，预先检索的文档（通过一次性检索）可能不足以支持整个输出序列的生成，这需要生成内检索操作。为此，In-Context RALM (Ram 等人，2023) 和 RETRO (Borgeaud 等人，2022) 应用 every-n-token 在生成过程中进行检索以实现更好的增强。相比之下，kNN-LM (Khandelwal 等人, 2020) 采用了更频繁的检索策略，在生成过程中检索用于预测每个词符的信息。总体而言，应用不同的检索频率会影响整个 RAG 方法的有效性和效率。例如，更频繁的检索会带来更好的性能，但也会增加计算成本（Ram等人，2023）。选择检索频率几乎是计算成本和性能之间的权衡。

4. RA-LLM 培训

根据是否需要训练，现有的 RAG 方法可以分为两大类：无训练方法和基于训练方法。免训练方法通常在推理期间直接利用检索到的知识，而无需通过将检索到的文本插入到提示中来引入额外的知识，这在计算上是高效的。然而，一个潜在的挑战是检索器和生成器组件没有专门针对下游任务进行优化，这很容易导致检索到的知识的利用率不高。为了充分利用外部知识，提出了广泛的方法来修饰检索器和生成器，从而指导大型语言模型有效地适应和集成检索到的信息（Sarto等人，2022；Wang等人，2023c；Schick等人， 2024；朱等人，2024；邵等人，2023；史等人，2023）。

根据训练策略，我们将这些基于训练的方法分为三类：1）独立训练方法独立训练RAG过程中的每个组件，2）顺序训练方法首先训练一个模块并冻结训练有素的组件以指导另一部分的调整过程，并且3）联合训练同时接近训练检索器和生成器。在下面的部分中，我们将全面回顾免训练、独立训练、顺序训练和联合训练方法。这些不同训练方法的比较如图4所示。

Towards Retrieval-Augmented Large Language Models (4)

4.1. 免培训

凭借庞大的参数，大语言模型展现了人类水平的智能，并在各种下游任务上取得了良好的预测性能。然而，由于需要大量的时间和计算资源，频繁地进行微调和更新模型参数中存储的知识（Lewis等人，2020c）极具挑战性。最近，大量研究建议通过检索机制增强大语言模型，使其能够动态地从外部来源获取新知识，而无需额外的过程（即免训练)（Izacard 和 Grave， 2021b; Jiang 等人, 2023; Khattab 等人, 2022)，而不是仅仅依赖于模型参数中编码的隐式知识。这些方法在各种知识密集型任务上表现出了显着的性能提升，例如开放域问答（Lewis等人，2020c）和文档摘要（Song等人，2023） t1>.根据大语言模型利用检索信息的不同方式，我们将这些免训练方法分为两类：1）基于提示工程的方法直接将检索到的知识集成到原始提示中，2） ) 检索引导的词符生成方法检索信息以校准词符生成过程。

4.1.1. 基于工程的快速方法

由于大语言模型的生成性能高度依赖于输入查询，因此许多免训练的RAG方法通过细化原始提示来使用外部知识（Jiang等人，2023；Khattab等人，2022；Li等人，2023c ）。具体来说，检索到的文本通常作为上下文信息并与原始提示结合起来指导大语言模型的生成（Izacard and Grave, 2021b; Jiang 等人, 2023; Khattab 等人, 2022; Purwar and Sundar , 2023; 李等人, 2023c; 王等人, 2023g; Kim 等人, 2023)。例如，In-Context RALM (Ram 等人, 2023) 保持大语言模型参数不变，并在原始提示之前直接合并检索到的文档以增强生成过程。IRCoT (Trivedi 等人, 2023) 将思想链 (CoT) 生成和知识检索步骤交织在一起，与仅依赖于知识检索的标准检索方法相比，能够为后续推理步骤检索更多相关信息问题作为查询。GENREAD (Yu 等人, 2023a) 不是从大型语料库中检索知识，而是首先提示大语言模型根据查询生成上下文文档，然后根据给定的上下文和问题生成答案。SKR (Wang 等人, 2023a)提出引导大语言模型根据其内部知识来确定是否能够回答给定的问题，通过选择性地调用检索器来灵活利用内部和外部知识。TOC (Kim 等人, 2023) 首先检索歧义问题的相关知识，并通过将歧义问题澄清为多个消歧问题来递归构建树结构，进一步聚合以生成长格式答案。

4.1.2. 检索引导词符生成方法

除了将外部知识直接融入到原始提示中外，还可以利用辅助信息来调整词符的生成过程。例如，KNN-KMs (Khandelwal 等人, 2020) 首先根据给定的查询从数据存储中检索 $k$ 最相关的上下文，并根据距离计算邻居分布。通过对邻近分布和原始模型的输出分布进行插值来校准输出分布。Rest (He 等人, 2023) 建议用非参数检索数据存储替换参数草稿模型，并根据当前上下文检索相关标记进行推测解码 (Chen 等人, 2023a；利维坦等人，2023；孙等人，2024）。

4.2. 独立训练

独立训练是指将猎犬和大语言模型作为两个完全独立的过程进行训练，训练过程中猎犬和大语言模型之间没有交互(Karpukhin 等人, 2020; Zhou 等人, 2022 ; 兰等人, 2022).与无训练方法相比，通过训练大语言模型可以有效增强 RAG 模型的性能，以利用检索到的知识或检索器来弥合信息检索和语言生成之间的差距。对于大语言模型的训练来说，负对数似然损失是最具代表性的训练目标(Radford 等人, 2019; Touvron 等人, 2023)，旨在引导大语言模型生成期望的结果基于给定输入的输出。关于检索器，可以分为两种类型：1）稀疏检索器（Ramos等人，2003；Robertson等人，2009）和2）密集检索器（Lan等人， 2022；卡尔普欣等人，2020；周等人，2022）。稀疏检索器通常利用稀疏特征（例如词频）来表示文档并根据特定于任务的指标计算相关性分数（Li 等人，2023a；Ramos 等人，2003；Robertson 等人，2009）例如TF-IDF和BM25。对于密集检索器，采用深度神经网络将查询和文档编码为密集表示，然后通常使用内积来计算相关性分数并检索相关的外部知识。例如，DPR (Karpukhin 等人, 2020) 采用两个独立的 BERT (Devlin 等人, 2019) 网络分别对查询和段落进行编码，并通过以下方式训练这些模型利用对比学习。CoG (Lan 等训练人, 2022) 提出了一个前缀编码器和一个短语编码器，用于检索并将文本生成重新表述为来自现有源文本集合的多个复制粘贴操作。

4.3. 顺序训练

独立训练是在生成过程中利用外部知识的有效方法，因为检索器和生成器可以离线训练，并且可以使用任何现成的模型，从而避免额外的训练成本。为了更好地增强检索器和生成器训练之间的协同作用，相继提出了检索器和大语言模型的几种方法。在这些顺序训练方法中，该过程通常从检索器或生成器的独立预训练开始，之后固定预训练模块，同时另一个模块进行训练。请注意，各种现有模型（例如，BERT （Devlin 等人，2019；Reimers 和 Gurevych，2019；Khattab 和 Zaharia，2020）、CLIP （Radford 等人，2021）, T5 (Raffel 等人, 2020)) 可以直接用作固定检索器和生成器，从而绕过第一个相关过程。与独立训练相比，顺序训练涉及检索器和生成器的协调训练，其中可训练模块受益于固定模块的协助。根据检索器和生成器之间的训练顺序，顺序训练可以分为两类： 1) Retriever First (Sarto 等人, 2022; Wang 等人, 2023c; Schick 等人, 2024; 朱等人, 2024;浅井等人, 2023b), 2) 大语言模型第一 (史等人, 2023; 王等人, 2024b;邵等人，2023）。

4.3.1. 猎犬优先

这些方法首先训练检索模型，然后修复它。然后利用检索到的知识来训练大语言模型。例如，RETRO (Borgeaud 等人, 2022) 采用独立预训练的 BERT 模型作为检索器，并训练编码器-解码器架构以将检索块集成到模型的预测中。RALMs (Yoran 等人, 2023) 采用 Google 搜索和开源 COLBERTV2 (Khattab 和 Zaharia, 2020) 作为预训练的检索器，并对大语言模型有效地利用检索到的段落。ITER-RTGEN (Ren 等人, 2023) 利用预训练的 S-BERT (Reimers and Gurevych, 2019) 作为检索器，并引入了自适应混合检索策略检索示威。此外，它利用 T5 (Raffel 等人，2020) 作为生成器，根据目标标签和输入结合原始提示与检索到的演示进行进一步微调。SMALLCAP (Ramos 等人, 2023) 提出使用 CLIP (Radford 等人, 2021)（这是一个强大的预训练多模态网络）来对输入进行编码图像和外部数据存储的文本数据，并根据余弦相似度检索最相关的项目。训练交叉注意力层，并使用 GPT-2 (Radford 等人, 2019) 作为解码器来生成字幕。

4.3.2. 大语言模型第一

同样，它也可以先预训练大语言模型，然后在训练有素的大语言模型的监督下调整检索器。例如，DKRR （Izacard 和 Grave，2021a）表明序列到序列模型的注意力分数可以指示文档的相关性。因此，他们建议利用阅读器模型的注意力分数来为检索器生成合成标签。AAR (Yu 等人, 2023b) 建议使用小型语言模型来生成训练检索器的监督信号。训练有素的检索器可以进一步提高黑盒大语言模型的性能。RA-DIT (Lin 等人, 2023) 首先对大语言模型进行微调，以增强其利用检索到的知识的能力，然后训练检索器以更好地使其输出与大语言模型保持一致。UPRISE (Cheng 等人, 2023)提出了一种轻量级方法，通过引入提示检索器来增强大语言模型在未见过的任务中的零样本性能。采用冻结的大语言模型来指导提示检索器的微调过程，然后该检索器在推理过程中使用各种大语言模型检索不同任务的提示。

4.4. 联合训练

联合训练方法(钟等人, 2022;康等人,2023;李等人,2023b;徐等人,2023c;胡等人,2023;程等人,2024)采用结尾同时优化检索器和生成器的端到端范例。联合训练方法不是按顺序训练每个模块，而是有效增强检索器定位外部知识以进行生成的能力以及生成器有效利用检索到的信息的能力。例如，RAG (Lewis 等人, 2020c) 最小化负对数似然来联合训练检索器和生成器。REALM (Guu 等人, 2020) 采用与 RAG (Lewis 等人, 2020c) 类似的范例，以及最大内积搜索 (MIPS) (Ram and Gray, 2012; Chen 等人, 2019; Shen 等人, 2015; Ding 等人, 2020)技术用于定位最相关的文档。为了使用 MIPS，首先嵌入所有外部文档，并为每个嵌入生成搜索索引。提出了一种异步索引更新策略(Guu 等人, 2020; Izacard 等人, 2023; Siriwardhana 等人, 2023; Huang 等人, 2023)，每几百个训练步刷新一次索引，以避免重新索引所有文档的时间消耗。

5. 应用领域

在本节中，我们将介绍检索增强大语言模型（RA-LLM）的一些代表性应用。为了清楚地概述 RA-LLM 的应用，我们将从三个角度对其进行回顾：NLP 应用、下游任务和特定领域应用.本节提到的研究在图5中进行了总结和分类。

5.1. 自然语言处理应用

由于文本生成的内在能力，RA-LLM 在 NLP 领域有各种应用，例如问答（QA）系统、ChatBot 和事实验证。

5.1.1. 质量保证系统

QA 系统旨在为用户的查询提供准确的答案。然而，即使接受了大量数据的训练，这些系统也可能缺乏训练数据中未包含的最新信息或特定领域知识（Izacard 和 Grave，2021b；Liu 等人，2022b）。为了解决这一限制，RA-LLM 的集成通过增强 QA 系统检索和综合相关信息的能力，在提高 QA 系统的能力方面发挥了至关重要的作用（Borgeaud 等人，2022；Izacard 和 Grave，2021b）.具体来说，RA-LLM 可以利用其检索组件访问庞大的知识库，提供连贯且上下文相关的答案。例如，REALM (Guu 等人, 2020) 集成了知识检索器，可以在预训练、微调和推理过程中从大型语料库中检索信息。这种方法使 REALM 能够有效地从庞大的知识库中检索，从而提高其响应的准确性。同样，Fusion-in-Decoder （Izacard 和 Grave，2021b）从支持文档中检索段落，然后将它们与问题融合以生成答案，从而实现更高的准确性。此外，Borgeaud 等人 (2022) 表明答案的质量可能更多地依赖于检索编码器的输出。

5.1.2. 聊天机器人

ChatBot 旨在以自然的对话方式与用户交互（Liu 等人，2020）。与 QA 系统不同，ChatBot 专注于与用户保持连贯且上下文丰富的对话。为了增强这些能力，最近的方法侧重于集成 RA-LLM（Komeili 等人，2022；Zhang 等人，2020；Kang 等人，2023），因为它能够利用相关外部知识增强 ChatBot ，促进与用户更具吸引力和上下文丰富的交互。例如，一些研究（Ghazvininejad 等人，2018；Chen 等人，2020）从静态数据库（例如维基百科转储）中检索相关知识以增强对话。 Komeili 等人 (2022) 建议从互联网搜索中检索信息，以进一步增强对话性能。考虑到世界知识的动态性，另一个模型（Wang等人，2023d）进一步访问搜索引擎中的大量动态信息以生成响应。

5.1.3. 事实核查

事实验证是验证信息准确性和可靠性的关键任务。由于需要可信证据，RA-LLM 被用来增强事实验证的能力（Lewis 等人，2020c；Izacard 等人，2023；Lewis 等人，2020c）。 Lewis 等人 (2020c) 首先提出检索外部知识来增强一系列知识密集型任务，包括事实验证。另一方面，Atlas (Izacard 等人, 2023) 检查 RA-LLM 在少样本学习下进行事实验证的表现。最近，Self-RAG（浅井等人，2023b）通过融入自我反思机制，给人留下了深刻的印象。具体来说，Self-RAG会反思检索到的信息是否有帮助，并判断检索到的信息的可靠性，从而大大提高了验证的准确性。

5.2. 下游任务

除了 NLP 应用之外，RA-LLM 还可以应用于各种下游任务，例如推荐和软件工程。

5.2.1. 建议

推荐系统在建模用户偏好和提供个性化推荐方面发挥着重要作用（张等人，2024；王等人，2024a；范等人，2019；赵等人，2024a；范等人，2020，2022a ）。最近，RA-LLM 在通过集成检索和生成过程来提供个性化和上下文相关的推荐方面表现出了巨大的潜力（Di Palma，2023；Wu 等人，2024；Lu 等人，2021）。例如，Di Palma (2023) 提出了一种简单的检索增强推荐模型，该模型利用电影或书籍数据集中的知识来增强推荐。此外，Lu 等人 (2021) 进一步从评论中检索以丰富推荐系统中的项目信息。CoRAL (Wu 等人, 2024) 利用强化学习从数据集中检索协作信息，并将其与语义信息对齐，以获得更准确的推荐。

5.2.2. 软件工程

RA-LLM 的兴起影响了软件工程的许多方面（周等人，2022；Nashid 等人，2023；叶等人，2023a）。例如，一些研究提出了代码生成（Zhou等人，2022）和程序修复（Nashid等人，2023）的检索增强生成范式。同样，Parvez 等人 (2021) 从代码库中检索排名靠前的代码或摘要，并将它们与输入聚合，以增强代码生成和摘要。此外，RA-LLM 在表格数据处理 (Ye 等人, 2023a; Li 等人, 2024b) 和 Text-to-SQL 语义解析 (Shi 等人, 2022; Poesia 等人，2022）。

5.3. 特定领域的应用程序

RA-LLM 已广泛应用于各种特定领域的任务，例如科学和金融领域的人工智能。

5.3.1. 科学人工智能

RA-LLM 已被证明对分子和蛋白质等科学领域有益。分子包括识别分子的特性并预测新分子，从而有利于药物发现。目前，一些RA-LLM已通过整合分子结构检索和蛋白质、分子、疾病等生物医学实体的检索来应用于分子（Wang等人，2023b；Liu等人，2023；Yang等人，2023a；Wang等人, 2023e), 等 Wang 等人 (2023b); Li等人(2023a)提出了基于检索的框架，通过从数据库中检索来指导分子生成。Liu等人(2023)通过从大规模数据集中检索文本知识来引入多模态分子结构文本模型以进行分子属性预测。此外，RA-LLM 还显着影响蛋白质的表达和生成（Sun 等人，2023；Ma 等人，2023b）。例如，RSA (Ma 等人, 2023b) 查询与数据库中结构或功能相似序列集合相关的蛋白质序列，以增强蛋白质表示。此外，Lozano 等人 (2023) 提出了一种基于检索已发表评论文章的临床质量保证系统。

5.3.2. 金融

在高度数据驱动和信息密集的金融领域，RA-LLM已被证明是增强决策的重要技术(Zhang 等人, 2023c; Yepes 等人, 2024; Li 等人, 2024a ）。例如，Zhang 等人(2023c)从外部来源检索财务信息，例如新闻平台（例如彭博和路透社）和社交媒体平台（例如Twitter、Reddit），以与原创查询，提升金融情绪分析精准度。此外，财务QA是财务分析的另一个主要任务，通常从财务文档中提取相关知识。由于专业文档通常以 PDF 格式存储，Lin (2024) 引入了 PDF 解析器与 RA-LLM 相结合，从财务报告中检索知识。另一方面，Yepes等人(2024)提出了一种基于结构的文档分块方法，而不是基于段落的分块，进一步提高了RA-LLMs输出的质量。

6. 未来的挑战和机遇

由于RA-LLMs的研究仍处于早期阶段，我们提出了一些未来可以在RA-LLMs领域探索的潜在研究方向。

值得信赖的 RA-LLM。开发RAG赋能的大语言模型的根本目的是增强语言模型的能力，从而造福用户和社会，减少冗余和无意义的劳动，增加便利性，促进社会进步。然而，最近的研究表明，RA-LLM可能被恶意、无意地操纵，做出不可靠的决策并伤害人类（Deng等人，2024；Zou等人，2024），这可能会产生严重的安全后果-关键场景（刘等人，2021；范等人，2022b，2021；陈等人，2023b，2022）。此外，私人检索数据库存在泄露风险，引发对RA-LLM隐私的担忧（Zeng等人，2024）。因此，开发值得信赖的RA-LLM至关重要，因为它可以显着减轻大语言模型技术潜在的负面影响，为人们提供可以完全信任的强大AI模型。具体来说，RA-LLMs系统中理想的可信度应具备以下特征：1）稳健性，2）公平性，3）可解释性和 4) 隐私。例如，稳健性意味着值得信赖的 RA-LLM 系统应该能够抵御攻击者恶意或无意的干扰。公平表示一个值得信赖的RA-LLMs系统应该在决策过程中避免歧视。可解释性需要完全理解 RA-LLMs 系统的内在工作原理，即 RA-LLMs 系统的预测是可解释的且透明的。隐私需要在建立值得信赖的 RA-LLM 系统时保护数据存储中存储的私人信息的安全。

多语言 RA-LLM。利用多种语言知识的能力可以极大地增强检索增强语言模型的能力。随着世界变得越来越互联，对能够理解和跨不同语言进行交流的人工智能系统的需求不断增长。通过结合多语言知识检索和生成，这些模型可以访问和合成来自不同语言源的信息，从而实现更全面、细致的理解和生成能力。此外，多语言模式可以促进跨文化交流和知识共享，打破语言障碍，从而为世界不同地区的人们，特别是小语种地区的人们带来便利（Kabra等人，2023；Li等人，2023c)。例如，来自语言不太流行的国家的用户可以利用丰富的英文和中文语料库进行知识检索，从而增强大型语言模型在下游任务中的性能。

多模式 RA-LLM。多模态检索增强生成将知识源扩展到文本之外，包括图像、视频和音频等各种数据模态。通过整合多种模态，大语言模型可以利用比单模态RAG更丰富的上下文信息，更全面地了解用户的需求，带来精准、细粒度、高质量的生成。例如，图像或视频可以提供有价值的视觉线索来补充文本信息，从而导致更精确的语言生成（朱等人，2024；胡等人，2023）。通过融合多种模式，多模式 RA-LLM 可以对世界有更全面的了解，从而产生更准确和更有洞察力的输出，使包括医疗保健在内的广泛领域受益（朱等人，2024）、药物发现（Shtar，2021）、分子分析（Liu 等人，2023；Andrews 等人，2022）等。

外部知识的质量。作为当前 RAG 系统中常用的数据存储，维基百科（Zhu 等人，2024；Karpukhin 等人，2020）作为用于增强生成过程的外部文本知识的巨大存储库，其中包含数百万条涵盖各个学科的文章。然而，维基百科内个别文章的可靠性和准确性差异很大，一些偏离事实的文本的引入甚至可能会误导模型的生成过程。因此，提高外部知识库的质量，减轻低质量知识对大语言模型性能的负面影响至关重要。通过过滤低质量或不可靠的信息来提高外部知识的质量和尾随稳健机制，RAG 授权的大语言模型系统可以产生更准确、可靠的输出，从而提高其在各种实际应用中的有效性。

7. 结论

检索增强生成（RAG）是一种前沿的人工智能技术，由于检索在提供补充信息以增强生成方面的强大能力，在推荐、分子生成、蛋白质表示和软件工程等各种应用中取得了显着的成功表现。近年来，人们越来越努力地通过利用检索提供最新的辅助信息并教授大语言模型来利用大语言模型来缓解大语言模型的幻觉和过时的内部知识等局限性。检索外部知识。随着检索增强大语言模型（RA-LLM）的快速发展，迫切需要全面、系统的概述。为了弥补这一差距，在本文中，我们从架构、训练策略和应用角度全面回顾了 RA-LLM，为研究人员提供了深入的理解。此外，由于RA-LLMs的研究仍处于早期阶段，我们还讨论了当前的局限性和未来研究的几个潜在研究方向。

参考

(1)
Achiam et al. (2023)Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023.Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023).
Agrawal et al. (2023)Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, and Marjan Ghazvininejad. 2023.In-context Examples Selection for Machine Translation. In ACL (Findings). Association for Computational Linguistics, 8857–8873.
Andrews et al. (2022)Miles C Andrews, Junna Oba, Chang-Jiun Wu, Haifeng Zhu, Tatiana Karpinets, Caitlin A Creasy, Marie-Andrée Forget, Xiaoxing Yu, Xingzhi Song, Xizeng Mao, et al. 2022.Multi-modal molecular programs regulate melanoma cell state.Nature communications 13, 1 (2022), 4000.
Asai et al. (2023a)Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. 2023a.Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts). 41–46.
Asai et al. (2023b)Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023b.Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. In The Twelfth International Conference on Learning Representations.
Borgeaud et al. (2022)Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022.Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, 2206–2240.
Brown et al. (2020)Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020.Language models are few-shot learners.Advances in neural information processing systems 33 (2020), 1877–1901.
Buttcher et al. (2016)Stefan Buttcher, Charles LA Clarke, and Gordon V Cormack. 2016.Information retrieval: Implementing and evaluating search engines.Mit Press.
Chen et al. (2023a)Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, and John Jumper. 2023a.Accelerating large language model decoding with speculative sampling.arXiv preprint arXiv:2302.01318 (2023).
Chen et al. (2017)Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017.Reading Wikipedia to Answer Open-Domain Questions. In ACL (1). Association for Computational Linguistics, 1870–1879.
Chen et al. (2022)Jingfan Chen, Wenqi Fan, Guanghui Zhu, Xiangyu Zhao, Chunfeng Yuan, Qing Li, and Yihua Huang. 2022.Knowledge-enhanced Black-box Attacks for Recommendations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 108–117.
Chen et al. (2021)Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021.Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374 (2021).
Chen et al. (2023b)Xiao Chen, Wenqi Fan, Jingfan Chen, Haochen Liu, Zitao Liu, Zhaoxiang Zhang, and Qing Li. 2023b.Fairly adaptive negative sampling for recommendations. In Proceedings of the ACM Web Conference 2023. 3723–3733.
Chen et al. (2020)Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, and Jie Zhou. 2020.Bridging the gap between prior and posterior knowledge selection for knowledge-grounded dialogue generation. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 3426–3437.
Chen et al. (2019)Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, and Wai Keung Wong. 2019.Deep supervised hashing with anchor graph. In Proceedings of the IEEE/CVF international conference on computer vision. 9796–9804.
Cheng et al. (2023)Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Weiwei Deng, and Qi Zhang. 2023.UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 12318–12337.
Cheng et al. (2024)Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan. 2024.Lift yourself up: Retrieval-augmented text generation with self-memory.Advances in Neural Information Processing Systems 36 (2024).
Chowdhery et al. (2023)Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2023.Palm: Scaling language modeling with pathways.Journal of Machine Learning Research 24, 240 (2023), 1–113.
Croft et al. (2010)W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010.Search engines: Information retrieval in practice. Vol. 520.Addison-Wesley Reading.
Cui et al. (2021)Leyang Cui, Yu Wu, Jian Liu, Sen Yang, and Yue Zhang. 2021.Template-Based Named Entity Recognition Using BART. In ACL/IJCNLP (Findings) (Findings of ACL, Vol. ACL/IJCNLP 2021). Association for Computational Linguistics, 1835–1845.
Dahl et al. (2024)Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E Ho. 2024.Large legal fictions: Profiling legal hallucinations in large language models.arXiv preprint arXiv:2401.01301 (2024).
de Jong et al. (2022)Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, and William W. Cohen. 2022.Mention Memory: incorporating textual knowledge into Transformers through entity mention attention. In ICLR. OpenReview.net.
Deng et al. (2024)Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, and Yang Liu. 2024.Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning.arXiv preprint arXiv:2402.08416 (2024).
Devlin et al. (2019)Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171–4186.
Di Palma (2023)Dario Di Palma. 2023.Retrieval-augmented recommender system: Enhancing recommender systems with large language models. In Proceedings of the 17th ACM Conference on Recommender Systems. 1369–1373.
Ding et al. (2024)Yujuan Ding, Yunshan Ma, Wenqi Fan, Yige Yao, Tat-Seng Chua, and Qing Li. 2024.FashionReGen: LLM-Empowered Fashion Report Generation.arXiv preprint arXiv:2403.06660 (2024).
Ding et al. (2020)Yujuan Ding, Wai Keung Wong, Zhihui Lai, and Zheng Zhang. 2020.Discriminative dual-stream deep hashing for large-scale image retrieval.Information Processing & Management 57, 6 (2020), 102288.
Drozdov et al. (2022)Andrew Drozdov, Nathanael Schärli, Ekin Akyürek, Nathan Scales, Xinying Song, Xinyun Chen, Olivier Bousquet, and Denny Zhou. 2022.Compositional semantic parsing with large language models. In The Eleventh International Conference on Learning Representations.
Fan et al. (2021)Wenqi Fan, Tyler Derr, Xiangyu Zhao, Yao Ma, Hui Liu, Jianping Wang, Jiliang Tang, and Qing Li. 2021.Attacking black-box recommendations via copying cross-domain user profiles. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1583–1594.
Fan et al. (2022a)Wenqi Fan, Xiaorui Liu, Wei Jin, Xiangyu Zhao, Jiliang Tang, and Qing Li. 2022a.Graph Trend Filtering Networks for Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 112–121.
Fan et al. (2019)Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019.Graph neural networks for social recommendation. In The world wide web conference. 417–426.
Fan et al. (2020)Wenqi Fan, Yao Ma, Qing Li, Jianping Wang, Guoyong Cai, Jiliang Tang, and Dawei Yin. 2020.A graph neural network framework for social recommendations.IEEE Transactions on Knowledge and Data Engineering (2020).
Fan et al. (2024)Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, et al. 2024.Graph Machine Learning in the Era of Large Language Models (LLMs).arXiv preprint arXiv:2404.14928 (2024).
Fan et al. (2022b)Wenqi Fan, Xiangyu Zhao, Xiao Chen, Jingran Su, Jingtong Gao, Lin Wang, Qidong Liu, Yiqi Wang, Han Xu, Lei Chen, et al. 2022b.A Comprehensive Survey on Trustworthy Recommender Systems.arXiv preprint arXiv:2209.10117 (2022).
Févry et al. (2020)Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, and Tom Kwiatkowski. 2020.Entities as Experts: Sparse Memory Access with Entity Supervision. In EMNLP (1). Association for Computational Linguistics, 4937–4951.
Gao et al. (2023a)Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023a.Precise Zero-Shot Dense Retrieval without Relevance Labels. In ACL (1). Association for Computational Linguistics, 1762–1777.
Gao et al. (2023b)Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. 2023b.Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997 (2023).
Gautier et al. (2022)Izacard Gautier, Caron Mathilde, Hosseini Lucas, Riedel Sebastian, Bojanowski Piotr, Joulin Armand, and Grave Edouard. 2022.Unsupervised dense information retrieval with contrastive learning.Transactions on Machine Learning Research (2022).
Ghazvininejad et al. (2018)Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, and Michel Galley. 2018.A knowledge-grounded neural conversation model. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
Glass et al. (2022)Michael R. Glass, Gaetano Rossiello, Md. Faisal Mahbub Chowdhury, Ankita Naik, Pengshan Cai, and Alfio Gliozzo. 2022.Re2G: Retrieve, Rerank, Generate. In NAACL-HLT. Association for Computational Linguistics, 2701–2715.
Grave et al. (2017)Edouard Grave, Armand Joulin, and Nicolas Usunier. 2017.Improving Neural Language Models with a Continuous Cache. In ICLR (Poster). OpenReview.net.
Guu et al. (2020)Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020.Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
He et al. (2021b)Junxian He, Graham Neubig, and Taylor Berg-Kirkpatrick. 2021b.Efficient Nearest Neighbor Language Models. In EMNLP (1). Association for Computational Linguistics, 5703–5714.
He et al. (2021a)Qiuxiang He, Guoping Huang, Qu Cui, Li Li, and Lemao Liu. 2021a.Fast and accurate neural machine translation with translation memory. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 3170–3180.
He et al. (2023)Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, and Di He. 2023.Rest: Retrieval-based speculative decoding.arXiv preprint arXiv:2311.08252 (2023).
Hu et al. (2023)Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A Ross, and Alireza Fathi. 2023.Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 23369–23379.
Huang et al. (2023)Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, and Bryan Catanzaro. 2023.Raven: In-context learning with retrieval augmented encoder-decoder language models.arXiv preprint arXiv:2308.07922 (2023).
Izacard and Grave (2021a)Gautier Izacard and Edouard Grave. 2021a.Distilling Knowledge from Reader to Retriever for Question Answering. In ICLR 2021-9th International Conference on Learning Representations.
Izacard and Grave (2021b)Gautier Izacard and Edouard Grave. 2021b.Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 874–880.
Izacard et al. (2023)Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023.Atlas: Few-shot Learning with Retrieval Augmented Language Models.Journal of Machine Learning Research 24, 251 (2023), 1–43.
Jiang et al. (2021)Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neubig. 2021.How can we know when language models know? on the calibration of language models for question answering.Transactions of the Association for Computational Linguistics 9 (2021), 962–977.
Jiang et al. (2023)Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023.Active Retrieval Augmented Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 7969–7992.
Kabra et al. (2023)Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, and Graham Neubig. 2023.Multi-lingual and Multi-cultural Figurative Language Understanding. In The 61st Annual Meeting Of The Association For Computational Linguistics.
Kadavath et al. (2022)Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. 2022.Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221 (2022).
Kang et al. (2023)Minki Kang, Jin Myung Kwak, Jinheon Baek, and Sung Ju Hwang. 2023.Knowledge graph-augmented language models for knowledge-grounded dialogue generation.arXiv preprint arXiv:2305.18846 (2023).
Karpukhin et al. (2020)Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020.Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP (1). Association for Computational Linguistics, 6769–6781.
Khandelwal et al. (2020)Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020.Generalization through Memorization: Nearest Neighbor Language Models. In International Conference on Learning Representations.
Khattab et al. (2022)Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, and Matei Zaharia. 2022.Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp.arXiv preprint arXiv:2212.14024 (2022).
Khattab and Zaharia (2020)Omar Khattab and Matei Zaharia. 2020.Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48.
Kim et al. (2023)Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, and Jaewoo Kang. 2023.Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models. In The 2023 Conference on Empirical Methods in Natural Language Processing.
Kim et al. (2022)Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, and Sang-goo Lee. 2022.Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator.arXiv preprint arXiv:2206.08082 (2022).
Kobayashi and Takeda (2000)Mei Kobayashi and Koichi Takeda. 2000.Information retrieval on the web.ACM computing surveys (CSUR) 32, 2 (2000), 144–173.
Komeili et al. (2022)Mojtaba Komeili, Kurt Shuster, and Jason Weston. 2022.Internet-Augmented Dialogue Generation. In ACL (1). Association for Computational Linguistics, 8460–8478.
Lan et al. (2022)Tian Lan, Deng Cai, Yan Wang, Heyan Huang, and Xian-Ling Mao. 2022.Copy is All You Need. In The Eleventh International Conference on Learning Representations.
Lazaridou et al. (2022)Angeliki Lazaridou, Elena Gribovskaya, Wojciech Stokowiec, and Nikolai Grigorev. 2022.Internet-augmented language models through few-shot prompting for open-domain question answering.arXiv preprint arXiv:2203.05115 (2022).
Leviathan et al. (2023)Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023.Fast inference from transformers via speculative decoding. In International Conference on Machine Learning. PMLR, 19274–19286.
Lewis et al. (2020a)Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, and Luke Zettlemoyer. 2020a.Pre-training via paraphrasing.Advances in Neural Information Processing Systems 33 (2020), 18470–18481.
Lewis et al. (2020b)Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020b.BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In ACL. Association for Computational Linguistics, 7871–7880.
Lewis et al. (2020c)Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020c.Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
Li et al. (2022b)Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, and Sanjiv Kumar. 2022b.Large language models with controllable working memory.arXiv preprint arXiv:2211.05110 (2022).
Li et al. (2024b)Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and ZHAO-XIANG ZHANG. 2024b.SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models.Advances in Neural Information Processing Systems 36 (2024).
Li et al. (2023a)Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and Qing Li. 2023a.Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective.arXiv preprint arXiv:2306.06615 (2023).
Li et al. (2024a)Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, and Wei Lin. 2024a.AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework.arXiv preprint arXiv:2403.12582 (2024).
Li et al. (2023b)Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yu Gu, Zhiyuan Liu, and Ge Yu. 2023b.Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data. In The 61st Annual Meeting Of The Association For Computational Linguistics.
Li et al. (2023c)Xiaoqian Li, Ercong Nie, and Sheng Liang. 2023c.From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
Li and Qiu (2023)Xiaonan Li and Xipeng Qiu. 2023.Mot: Pre-thinking and recalling enable chatgpt to self-improve with memory-of-thoughts.arXiv preprint arXiv:2305.05181 (2023).
Li and Liang (2021)Xiang Lisa Li and Percy Liang. 2021.Prefix-Tuning: Optimizing Continuous Prompts for Generation. In ACL/IJCNLP (1). Association for Computational Linguistics, 4582–4597.
Li et al. (2022a)Zonglin Li, Ruiqi Guo, and Sanjiv Kumar. 2022a.Decoupled context processing for context augmented language modeling.Advances in Neural Information Processing Systems 35 (2022), 21698–21710.
Lin (2024)Demiao Lin. 2024.Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Recognition.arXiv preprint arXiv:2401.12599 (2024).
Lin et al. (2023)Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Richard James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, et al. 2023.RA-DIT: Retrieval-Augmented Dual Instruction Tuning. In The Twelfth International Conference on Learning Representations.
Liu et al. (2020)Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu, and Jiliang Tang. 2020.Does Gender Matter? Towards Fairness in Dialogue Systems. In Proceedings of the 28th International Conference on Computational Linguistics. 4403–4416.
Liu et al. (2021)Haochen Liu, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil K Jain, and Jiliang Tang. 2021.Trustworthy ai: A computational perspective.arXiv preprint arXiv:2107.06641 (2021).
Liu et al. (2022a)Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2022a.What Makes Good In-Context Examples for GPT-3?. In DeeLIO@ACL. Association for Computational Linguistics, 100–114.
Liu et al. (2023)Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Animashree Anandkumar. 2023.Multi-modal molecule structure–text model for text-based retrieval and editing.Nature Machine Intelligence 5, 12 (2023), 1447–1457.
Liu et al. (2022b)Ye Liu, Semih Yavuz, Rui Meng, Dragomir Radev, Caiming Xiong, and Yingbo Zhou. 2022b.Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database. In EMNLP. Association for Computational Linguistics, 8858–8869.
Lozano et al. (2023)Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, and Nigam Shah. 2023.Clinfo. ai: An open-source retrieval-augmented large language model system for answering medical questions using scientific literature. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024. World Scientific, 8–23.
Lu et al. (2023)Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, and Ashwin Kalyan. 2023.Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning. In ICLR. OpenReview.net.
Lu et al. (2021)Yu Lu, Junwei Bao, Yan Song, Zichen Ma, Shuguang Cui, Youzheng Wu, and Xiaodong He. 2021.RevCore: Review-Augmented Conversational Recommendation. In ACL/IJCNLP (Findings) (Findings of ACL, Vol. ACL/IJCNLP 2021). Association for Computational Linguistics, 1161–1173.
Luo et al. (2023a)Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, and James Glass. 2023a.Sail: Search-augmented instruction learning.arXiv preprint arXiv:2305.15225 (2023).
Luo et al. (2023b)Man Luo, Xin Xu, Zhuyun Dai, Panupong Pasupat, Mehran Kazemi, Chitta Baral, Vaiva Imbrasaite, and Vincent Y Zhao. 2023b.Dr. icl: Demonstration-retrieved in-context learning.arXiv preprint arXiv:2305.14128 (2023).
Ma et al. (2023b)Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, and Lingpeng Kong. 2023b.Retrieved Sequence Augmentation for Protein Representation Learning.bioRxiv (2023), 2023–02.
Ma et al. (2023a)Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. 2023a.Query rewriting for retrieval-augmented large language models.arXiv preprint arXiv:2305.14283 (2023).
Maekawa et al. (2024)Seiji Maekawa, Hayate Iso, Sairam Gurajada, and Nikita Bhutani. 2024.Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models.arXiv preprint arXiv:2402.13492 (2024).
Menick et al. (2022)Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy Campbell-Gillingham, Geoffrey Irving, et al. 2022.Teaching language models to support answers with verified quotes.arXiv preprint arXiv:2203.11147 (2022).
Milios et al. (2023)Aristides Milios, Siva Reddy, and Dzmitry Bahdanau. 2023.In-context learning for text classification with many labels. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP. 173–184.
Min et al. (2022)Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022.Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. In EMNLP. Association for Computational Linguistics, 11048–11064.
Min et al. (2020)Sewon Min, Julian Michael, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2020.AmbigQA: Answering Ambiguous Open-domain Questions. In EMNLP (1). Association for Computational Linguistics, 5783–5797.
Min et al. (2023)Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2023.Nonparametric Masked Language Modeling. In ACL (Findings). Association for Computational Linguistics, 2097–2118.
Nashid et al. (2023)Noor Nashid, Mifta Sintaha, and Ali Mesbah. 2023.Retrieval-based prompt selection for code-related few-shot learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2450–2462.
O’Hare et al. (2016)Neil O’Hare, Paloma De Juan, Rossano Schifanella, Yunlong He, Dawei Yin, and Yi Chang. 2016.Leveraging user interaction signals for web image search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 559–568.
Ouyang et al. (2022)Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022.Training language models to follow instructions with human feedback.Advances in neural information processing systems 35 (2022), 27730–27744.
Parvez et al. (2021)Md. Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021.Retrieval Augmented Code Generation and Summarization. In EMNLP (Findings). Association for Computational Linguistics, 2719–2734.
Petroni et al. (2020)Fabio Petroni, Patrick S. H. Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2020.How Context Affects Language Models’ Factual Predictions. In AKBC.
Petroni et al. (2019)Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019.Language models as knowledge bases?arXiv preprint arXiv:1909.01066 (2019).
Poesia et al. (2022)Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022.Synchromesh: Reliable Code Generation from Pre-trained Language Models. In ICLR. OpenReview.net.
Purwar and Sundar (2023)Anupam Purwar and Rahul Sundar. 2023.Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface.arXiv preprint arXiv:2310.04205 (2023).
Radford et al. (2021)Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021.Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
Radford et al. (2018)Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018.Improving language understanding by generative pre-training.(2018).
Radford et al. (2019)Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019.Language models are unsupervised multitask learners.OpenAI blog 1, 8 (2019), 9.
Raffel et al. (2020)Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020.Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research 21, 140 (2020), 1–67.
Ram et al. (2023)Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023.In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics 11 (2023), 1316–1331.
Ram et al. (2022)Ori Ram, Gal Shachaf, Omer Levy, Jonathan Berant, and Amir Globerson. 2022.Learning to Retrieve Passages without Supervision. In NAACL-HLT. Association for Computational Linguistics, 2687–2700.
Ram and Gray (2012)Pariksh*t Ram and Alexander G Gray. 2012.Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 931–939.
Ramos et al. (2003)Juan Ramos et al. 2003.Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29–48.
Ramos et al. (2023)Rita Ramos, Bruno Martins, Desmond Elliott, and Yova Kementchedjhieva. 2023.Smallcap: lightweight image captioning prompted with retrieval augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2840–2849.
Reichman and Heck (2024)Benjamin Z. Reichman and Larry Heck. 2024.Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving?CoRR abs/2402.11035 (2024).
Reimers and Gurevych (2019)Nils Reimers and Iryna Gurevych. 2019.Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992.
Ren et al. (2023)Yubing Ren, Yanan Cao, Ping Guo, Fang Fang, Wei Ma, and Zheng Lin. 2023.Retrieve-and-sample: Document-level event argument extraction via hybrid retrieval augmentation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 293–306.
Robertson et al. (2009)Stephen Robertson, Hugo Zaragoza, et al. 2009.The probabilistic relevance framework: BM25 and beyond.Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
Rubin et al. (2022)Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2022.Learning To Retrieve Prompts for In-Context Learning. In NAACL-HLT. Association for Computational Linguistics, 2655–2671.
Sarto et al. (2022)Sara Sarto, Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. 2022.Retrieval-augmented transformer for image captioning. In Proceedings of the 19th international conference on content-based multimedia indexing. 1–7.
Schick et al. (2024)Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2024.Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems 36 (2024).
Seo et al. (2019)Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P Parikh, Ali Farhadi, and Hannaneh Hajishirzi. 2019.Real-time open-domain question answering with dense-sparse phrase index.arXiv preprint arXiv:1906.05807 (2019).
Shao et al. (2023)Zhihong Shao, Yeyun Gong, Minlie Huang, Nan Duan, Weizhu Chen, et al. 2023.Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. In The 2023 Conference on Empirical Methods in Natural Language Processing.
Shen et al. (2015)Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, and Heng Tao Shen. 2015.Learning binary codes for maximum inner product search. In Proceedings of the IEEE International Conference on Computer Vision. 4148–4156.
Sheynin et al. (2023)Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, and Yaniv Taigman. 2023.kNN-Diffusion: Image Generation via Large-Scale Retrieval. In ICLR. OpenReview.net.
Shi et al. (2022)Peng Shi, Rui Zhang, He Bai, and Jimmy Lin. 2022.XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing. In EMNLP (Findings). Association for Computational Linguistics, 5248–5259.
Shi et al. (2023)Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023.Replug: Retrieval-augmented black-box language models.arXiv preprint arXiv:2301.12652 (2023).
Shtar (2021)Guy Shtar. 2021.Multimodal machine learning for drug knowledge discovery. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 1115–1116.
Sia and Duh (2023)Suzanna Sia and Kevin Duh. 2023.In-context learning as maintaining coherency: A study of on-the-fly machine translation using large language models.arXiv preprint arXiv:2305.03573 (2023).
Singh et al. (2021)Devendra Singh, Siva Reddy, Will Hamilton, Chris Dyer, and Dani Yogatama. 2021.End-to-end training of multi-document reader and retriever for open-domain question answering.Advances in Neural Information Processing Systems 34 (2021), 25968–25981.
Singhal et al. (2001)Amit Singhal et al. 2001.Modern information retrieval: A brief overview.IEEE Data Eng. Bull. 24, 4 (2001), 35–43.
Siriwardhana et al. (2023)Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, and Suranga Nanayakkara. 2023.Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering.Transactions of the Association for Computational Linguistics 11 (2023), 1–17.
Song et al. (2023)Mingyang Song, Yi Feng, and Liping Jing. 2023.Hisum: Hyperbolic interaction model for extractive multi-document summarization. In Proceedings of the ACM Web Conference 2023. 1427–1436.
Sparck Jones (1972)Karen Sparck Jones. 1972.A statistical interpretation of term specificity and its application in retrieval.Journal of documentation 28, 1 (1972), 11–21.
Sun et al. (2023)Fang Sun, Zhihao Zhan, Hongyu Guo, Ming Zhang, and Jian Tang. 2023.Graphvf: Controllable protein-specific 3d molecule generation with variational flow.arXiv preprint arXiv:2304.12825 (2023).
Sun et al. (2024)Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, and Felix Yu. 2024.Spectr: Fast speculative decoding via optimal transport.Advances in Neural Information Processing Systems 36 (2024).
Tan et al. (2024)Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, and Ji-Rong Wen. 2024.Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs.arXiv preprint arXiv:2402.12052 (2024).
Thakur et al. (2023)Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, et al. 2023.NoMIRACL: Knowing When You Don’t Know for Robust Multilingual Retrieval-Augmented Generation.arXiv preprint arXiv:2312.11361 (2023).
Touvron et al. (2023)Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023.Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288 (2023).
Trivedi et al. (2023)Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2023.Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In The 61st Annual Meeting Of The Association For Computational Linguistics.
Tu et al. (2022)Lifu Tu, Caiming Xiong, and Yingbo Zhou. 2022.Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models. In EMNLP (Findings). Association for Computational Linguistics, 5478–5485.
Vu et al. (2022)Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou’, and Daniel Cer. 2022.SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer. In ACL (1). Association for Computational Linguistics, 5039–5059.
Wang et al. (2023d)Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, Jinsong Su, and Dong Yu. 2023d.Search-engine-augmented dialogue response generation with cheaply supervised query production.Artificial Intelligence 319 (2023), 103874.
Wang et al. (2023c)Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, et al. 2023c.Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 7763–7786.
Wang et al. (2024a)Hanbing Wang, Xiaorui Liu, Wenqi Fan, Xiangyu Zhao, Venkataramana Kini, Devendra Yadav, Fei Wang, Zhen Wen, Jiliang Tang, and Hui Liu. 2024a.Rethinking Large Language Model Architectures for Sequential Recommendations.arXiv preprint arXiv:2402.09543 (2024).
Wang et al. (2024c)Haoyu Wang, Tuo Zhao, and Jing Gao. 2024c.BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering.arXiv preprint arXiv:2402.11129 (2024).
Wang et al. (2023f)Liang Wang, Nan Yang, and Furu Wei. 2023f.Query2doc: Query Expansion with Large Language Models. In EMNLP. Association for Computational Linguistics, 9414–9423.
Wang et al. (2024b)Liang Wang, Nan Yang, and Furu Wei. 2024b.Learning to Retrieve In-Context Examples for Large Language Models. In EACL (1). Association for Computational Linguistics, 1752–1767.
Wang et al. (2023g)Xintao Wang, Qianwen Yang, Yongting Qiu, Jiaqing Liang, Qianyu He, Zhouhong Gu, Yanghua Xiao, and Wei Wang. 2023g.Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases.arXiv preprint arXiv:2308.11761 (2023).
Wang et al. (2023a)Yile Wang, Peng Li, Maosong Sun, and Yang Liu. 2023a.Self-Knowledge Guided Retrieval Augmentation for Large Language Models. In The 2023 Conference on Empirical Methods in Natural Language Processing.
Wang et al. (2023b)Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard G. Baraniuk, and Anima Anandkumar. 2023b.Retrieval-based Controllable Molecule Generation. In ICLR. OpenReview.net.
Wang et al. (2023e)Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N Ioannidis, Huzefa Rangwala, and Rish*ta Anubhai. 2023e.BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph.arXiv preprint arXiv:2310.03320 (2023).
Wei et al. (2022)Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022.Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems 35 (2022), 24824–24837.
Wu et al. (2024)Junda Wu, Cheng-Chun Chang, Tong Yu, Zhankui He, Jianing Wang, Yupeng Hou, and Julian McAuley. 2024.CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation.arXiv preprint arXiv:2403.06447 (2024).
Wu et al. (2020)Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020.Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In EMNLP (1). Association for Computational Linguistics, 6397–6407.
Wu et al. (2022)Yuhuai Wu, Markus Norman Rabe, DeLesley Hutchins, and Christian Szegedy. 2022.Memorizing Transformers. In ICLR. OpenReview.net.
Xiong et al. (2023)Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, and Bryan Hooi. 2023.Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms.arXiv preprint arXiv:2306.13063 (2023).
Xu et al. (2023c)Benfeng Xu, Chunxu Zhao, Wenbin Jiang, PengFei Zhu, Songtai Dai, Chao Pang, Zhuo Sun, Shuohuan Wang, and Yu Sun. 2023c.Retrieval-augmented domain adaptation of language models. In Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023). 54–64.
Xu et al. (2023b)Fangyuan Xu, Weijia Shi, and Eunsol Choi. 2023b.RECOMP: Improving retrieval-augmented LMs with context compression and selective augmentation. In The Twelfth International Conference on Learning Representations.
Xu et al. (2019)Hu Xu, Bing Liu, Lei Shu, and Philip S. Yu. 2019.BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis. In NAACL-HLT (1). Association for Computational Linguistics, 2324–2335.
Xu et al. (2020)Jitao Xu, Josep-Maria Crego, and Jean Senellart. 2020.Boosting neural machine translation with similar translations. In Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1570–1579.
Xu et al. (2023a)Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng, and Tat-seng Chua. 2023a.Search-in-the-chain: Towards the accurate, credible and traceable content generation for complex knowledge-intensive tasks.arXiv preprint arXiv:2304.14732 (2023).
Yang et al. (2023b)Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, and Jing Xiao. 2023b.PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter. In EMNLP. Association for Computational Linguistics, 5364–5375.
Yang et al. (2023a)Ling Yang, Zhilin Huang, Xiangxin Zhou, Minkai Xu, Wentao Zhang, Yu Wang, Xiawu Zheng, Wenming Yang, Ron O Dror, Shenda Hong, et al. 2023a.Prompt-based 3d molecular diffusion models for structure-based drug design.(2023).
Yao et al. (2023)Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023.ReAct: Synergizing Reasoning and Acting in Language Models. In ICLR. OpenReview.net.
Ye et al. (2023b)Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, and Lingpeng Kong. 2023b.Compositional exemplars for in-context learning. In International Conference on Machine Learning. PMLR, 39818–39833.
Ye et al. (2023a)Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, and Yongbin Li. 2023a.Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning. In SIGIR. ACM, 174–184.
Yepes et al. (2024)Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, and Leah Li. 2024.Financial Report Chunking for Effective Retrieval Augmented Generation.arXiv preprint arXiv:2402.05131 (2024).
Yin et al. (2016)Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et al. 2016.Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 323–332.
Yogatama et al. (2021)Dani Yogatama, Cyprien de Masson d’Autume, and Lingpeng Kong. 2021.Adaptive semiparametric language models.Transactions of the Association for Computational Linguistics 9 (2021), 362–373.
Yoran et al. (2023)Ori Yoran, Tomer Wolfson, Ori Ram, and Jonathan Berant. 2023.Making Retrieval-Augmented Language Models Robust to Irrelevant Context. In The Twelfth International Conference on Learning Representations.
Yu et al. (2023a)Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. 2023a.Generate rather than Retrieve: Large Language Models are Strong Context Generators. In ICLR. OpenReview.net.
Yu et al. (2023c)Wenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, and Ashish Sabharwal. 2023c.Improving language models via plug-and-play retrieval feedback.arXiv preprint arXiv:2305.14002 (2023).
Yu et al. (2023b)Zichun Yu, Chenyan Xiong, Shi Yu, and Zhiyuan Liu. 2023b.Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2421–2436.
Zan et al. (2022)Daoguang Zan, Bei Chen, Zeqi Lin, Bei Guan, Yongji Wang, and Jian-Guang Lou. 2022.When Language Model Meets Private Library. In EMNLP (Findings). Association for Computational Linguistics, 277–288.
Zeng et al. (2024)Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, et al. 2024.The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG).arXiv preprint arXiv:2402.16893 (2024).
Zhang et al. (2023c)Boyu Zhang, Hongyang Yang, Tianyu Zhou, Muhammad Ali Babar, and Xiao-Yang Liu. 2023c.Enhancing financial sentiment analysis via retrieval augmented large language models. In Proceedings of the Fourth ACM International Conference on AI in Finance. 349–356.
Zhang et al. (2020)Houyu Zhang, Zhenghao Liu, Chenyan Xiong, and Zhiyuan Liu. 2020.Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs. In ACL. Association for Computational Linguistics, 2031–2043.
Zhang et al. (2024)Jiahao Zhang, Rui Xue, Wenqi Fan, Xin Xu, Qing Li, Jian Pei, and Xiaorui Liu. 2024.Linear-Time Graph Neural Networks for Scalable Recommendations.arXiv preprint arXiv:2402.13973 (2024).
Zhang et al. (2023a)Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, and Ziwei Liu. 2023a.Remodiffuse: Retrieval-augmented motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 364–373.
Zhang et al. (2023b)Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, and Lu Wang. 2023b.Merging generated and retrieved knowledge for open-domain QA.arXiv preprint arXiv:2310.14393 (2023).
Zhang et al. (2023d)Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2023d.Automatic Chain of Thought Prompting in Large Language Models. In ICLR. OpenReview.net.
Zhao et al. (2024b)Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. 2024b.Retrieval-Augmented Generation for AI-Generated Content: A Survey.arXiv preprint arXiv:2402.19473 (2024).
Zhao et al. (2023a)Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, et al. 2023a.Retrieving multimodal information for augmented generation: A survey.arXiv preprint arXiv:2303.10868 (2023).
Zhao et al. (2023b)Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023b.A survey of large language models.arXiv preprint arXiv:2303.18223 (2023).
Zhao et al. (2024a)Zihuai Zhao, Wenqi Fan, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, et al. 2024a.Recommender systems in the era of large language models (llms).IEEE Transactions on Knowledge and Data Engineering (2024).
Zhong et al. (2022)Zexuan Zhong, Tao Lei, and Danqi Chen. 2022.Training Language Models with Memory Augmentation. In 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022.
Zhou et al. (2022)Shuyan Zhou, Uri Alon, Frank F Xu, Zhengbao Jiang, and Graham Neubig. 2022.Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations.
Zhu et al. (2023)Yin Zhu, Zhiling Luo, and Gong Cheng. 2023.Furthest Reasoning with Plan Assessment: Stable Reasoning Path with Retrieval-Augmented Large Language Models.arXiv preprint arXiv:2309.12767 (2023).
Zhu et al. (2024)Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, et al. 2024.REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models.arXiv preprint arXiv:2402.07016 (2024).
Zou et al. (2024)Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. 2024.PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models.arXiv preprint arXiv:2402.07867 (2024).

Towards Retrieval-Augmented Large Language Models (2024)

FAQs

What are retrieval augmented language models? ›

With RAG, an information retrieval component is introduced that utilizes the user input to first pull information from a new data source. The user query and the relevant information are both given to the LLM. The LLM uses the new knowledge and its training data to create better responses.

Learn More Now ›

What is retrieval augmented generation for classification? ›

Retrieval augmented generation, or RAG, is a technique used to provide custom data to an LLM in order to allow the LLM to answer queries about data on which it was not trained.

Find Out More ›

What is an advanced rag? ›

Advanced RAG Techniques stand at the cutting edge of artificial intelligence, transforming how machines understand and interact with human language. These sophisticated methods are not just about making smarter chatbots or more intuitive search engines; they're reshaping our expectations of technology.

What is rag fusion? ›

RAG fusion is an enhanced RAG method which mainly focus on the quality of the response produced by the LLM. In RAG fusion we have 2 extra steps when compared to RAG. Here are the steps involved in RAG fusion: Query generation: get the input query and use LLM to generate multiple queries out of it.

Learn More ›

How does retrieval augmented generation work? ›

How does retrieval augmented generation (RAG) work? RAG is about feeding language models with necessary information. Instead of asking LLM directly(like in general-purpose models), we first retrieve the very accurate data from our knowledge library that is well maintained and then use that context to return the answer.

See Details ›

What are augmented language models? ›

While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs).

Learn More ›

What is the difference between generative model and retrieval model? ›

In a RAG-based AI system, a retrieval model is used to find relevant information from existing information sources while the generative model takes the retrieved information, synthesizes all the data, and shapes it into a coherent and contextually appropriate response.

See Details ›

What is retrieval augmentation? ›

Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is done by retrieving data/documents relevant to a question or task and providing them as context for the LLM.

Keep Reading ›

What are retrieval-based models? ›

Retrieval-based models utilize a set of predetermined responses and experiences to determine the most suitable reply, taking into account the input and context provided.

Tell Me More ›

What is the difference between rag and LLM? ›

RAG integrates external data for enriched responses, while LLM Fine-Tuning adjusts pre-trained models for domain-specific accuracy. This introduction outlines their roles in refining AI's adaptability and precision in industry-specific applications.

Know More ›

What are the disadvantages of rag? ›

RAG is unable to fully understand whether the data that is being retrieved is the most relevant information the language model needs to effectively solve the problem.

What is rag slang for? ›

If your clothes are torn and dirty, they're also rags, and from the sense of "worthless scrap," trashy or low quality newspapers have also long been called rags. A rag is also a somewhat annoying joke, and to rag someone is to annoy or harass them.

Discover More Details ›

Is rag the future of LLM? ›

As experts hint at the nearing technical limits of large language models (LLMs), the spotlight turns to retrieval-augmented generation (RAG) — a promising advancement that could redefine artificial intelligence (AI) by merging information retrieval with natural language generation.

Discover More Details ›

What's new in rag? ›

One such groundbreaking development is RAG, one of the latest advancements in AI and language processing. RAG is an effective AI framework to provide relevant data as context for generative AI (GenAI) models – improving the quality and accuracy of GenAI and LLM output.

Discover More ›

What is reciprocal rank fusion in rag? ›

The RAG Fusion technique uses a programming language, vector search database, LLM with query generation, and results re-ranking steps. Reciprocal Rank Fusion(RRF), is a data re-ranking technique deployed to combine the results from different queries seamlessly.

Get More Info Here ›

What is retrieval augmented generation for translation? ›

Retrieval-Augmented Generation (RAG) is a Generative AI (GenAI) framework that augments a Large Language Model (LLM) with fresh, trusted data retrieved from authoritative internal knowledge bases and enterprise systems, to generate more informed and reliable responses.

Get More Info Here ›

What are retrieval based models? ›

Retrieval-based models utilize a set of predetermined responses and experiences to determine the most suitable reply, taking into account the input and context provided.

Get More Info Here ›

What is a retrieval model? ›

A retrieval model (IR) chooses and ranks relevant pages based on a user's query. Document selection and ranking can be formalized using matching functions that return retrieval status values (RSVs) for each document in a collection since documents and queries are written in the same way.

See Details ›