从原理到两大主流实现方法

重排序在检索增强生成（RAG）过程中扮演着至关重要的角色。在简单的RAG方法中，可能会检索到大量上下文，但并非所有上下文都与问题相关。通过重排序，可以对文档进行重新排序和筛选，将相关文档置于前列，从而提升RAG的效果。

本文介绍RAG的重排序技术，并展示如何通过两种方法实现重排序功能。

重排序简介

如图1所示，重排序任务如同一个智能筛选器。当检索器从索引集合中检索出多个上下文时，这些上下文与用户查询的相关性可能各不相同。有些上下文可能非常相关（如图1中红色框所示），而其他上下文可能只是略有关系甚至无关（如图1中绿色和蓝色框所示）。

重排序的任务是对这些上下文的相关性进行评估，并优先考虑那些最有可能提供准确且相关答案的上下文。这样，LLM在生成答案时可以优先考虑这些排名靠前的上下文，从而提高回答的准确性和质量。

简而言之，重排序就像是在开卷考试中帮助你从一堆学习材料中挑选出最相关的参考资料，以便你能更高效、准确地回答问题。

本文介绍的重排序方法主要可分为以下两种类型：

重排序模型：这些模型考虑文档与查询之间的交互特征，以更准确地评估它们的相关性。
LLM：LLM的出现为重排序开辟了新的可能性。通过深入理解整个文档和查询，可以更全面地捕捉语义信息。

使用重排序模型作为重排器

与嵌入模型不同，重排序模型接受查询和上下文作为输入，并直接输出相似度分数，而不是嵌入向量。需要注意的是，重排序模型使用交叉熵损失进行优化，允许相关性分数不受特定范围限制，甚至可以是负值。

目前，可用的重排序模型并不多。一种选择是通过API访问的Cohere 在线模型。还有开源模型，如bge-reranker-base和bge-reranker-large 等。

图2展示了使用命中率（Hit Rate）和平均倒数排名（Mean Reciprocal Rank, MRR）指标的评估结果：

从这一评估结果中，我们可以看到：

无论使用哪种嵌入模型，重排序都显示出更高的命中率和MRR，表明重排序具有显著影响。
目前，最佳的重排序模型是Cohere ，但它是一项付费服务。开源的bge-reranker-large 模型与Cohere具有相似的能力。
嵌入模型和重排序模型的组合也会产生影响，因此开发人员可能需要在实际过程中尝试不同的组合。

在本文中，将使用bge-reranker-base模型。

环境配置

导入相关库，设置环境和全局变量

import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker
from llama_index.schema import QueryBundle

dir_path = "YOUR_DIR_PATH"

目录中仅包含一个PDF文件，即论文“TinyLlama: 一个开源的小型语言模型 ”。

(py) Florian:~ Florian$ ls /Users/Florian/Downloads/pdf_test/
tinyllama.pdf

使用 LlamaIndex 构建简单检索器

documents = SimpleDirectoryReader(dir_path).load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k = 3)

基本检索

query = "Can you provide a concise description of the TinyLlama model?"
nodes = retriever.retrieve(query)
for node in nodes:
    print('----------------------------------------------------')
    display_source_node(node, source_length = 500)

display_source_node 函数改编自 llama_index 源代码。原函数为 Jupyter 笔记本设计，因此已作如下修改：

from llama_index.schema import ImageNode, MetadataMode, NodeWithScore
from llama_index.utils import truncate_text

def display_source_node(
    source_node: NodeWithScore,
    source_length: int = 100,
    show_source_metadata: bool = False,
    metadata_mode: MetadataMode = MetadataMode.NONE,
) -> None:
    """Display source node"""
    source_text_fmt = truncate_text(
        source_node.node.get_content(metadata_mode=metadata_mode).strip(), source_length
    )
    text_md = (
        f"Node ID: {source_node.node.node_id} \n"
        f"Score: {source_node.score} \n"
        f"Text: {source_text_fmt} \n"
    )
    if show_source_metadata:
        text_md += f"Metadata: {source_node.node.metadata} \n"
    if isinstance(source_node.node, ImageNode):
        text_md += "Image:"

    print(text_md)
    # display(Markdown(text_md))
    # if isinstance(source_node.node, ImageNode) and source_node.node.image is not None:
    #     display_image(source_node.node.image)

基本检索结果如下，代表重排序前的 top 3 节点：

----------------------------------------------------
Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 
Score: 0.8706055408845863 
Text: 4 结论
本文介绍了 TinyLlama，一个开源的小型语言模型。为了促进开源 LLM 预训练社区的透明度，我们发布了所有相关信息，包括我们的预训练代码、所有中间模型检查点以及数据处理步骤的详细信息。凭借其紧凑的架构和有前景的性能，TinyLlama 可以在移动设备上实现终端用户应用，并作为一个轻量级平台用于测试...

----------------------------------------------------
Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 
Score: 0.8624531691777889 
Text: TinyLlama：一个开源的小型语言模型
张培源∗曾广涛∗王天多 陆伟
StatNLP 研究小组
新加坡科技设计大学
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
[email protected]
摘要
我们提出 TinyLlama，一个在约 1 万亿个 token 上预训练了约 3 个 epoch 的 1.1B 语言模型。基于 Llama 2（Touvron 等人，2023b）的架构和分词器，TinyLlama 利用了各种先进的...

----------------------------------------------------
Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f 
Score: 0.8346160605298356 
Text: 尽管这些工作明显偏好大型模型，但用大型数据集训练小型模型的潜力仍未得到充分探索。Touvron 等人（2023a）强调了推理预算的重要性，而不是仅仅关注训练计算最优的语言模型。推理最优的语言模型旨在在特定推理约束下实现最优性能，这是通过用更多 token 训练模型来实现的...

重排序

要重排上述节点，请使用 bge-reranker-base 模型。

print('------------------------------------------------------------------------------------------------')
print('开始重排序...')

reranker = FlagEmbeddingReranker(
    top_n = 3,
    model = "BAAI/bge-reranker-base",
)

query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle = query_bundle)
for ranked_node in ranked_nodes:
    print('----------------------------------------------------')
    display_source_node(ranked_node, source_length = 500)

重排序后的结果如下：

------------------------------------------------------------------------------------------------
开始重排序...
----------------------------------------------------
节点ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 
得分: -1.584416151046753 
文本: TinyLlama: 一个开源的小型语言模型
张培源∗曾广涛∗王天铎 陆伟
StatNLP 研究小组
新加坡科技设计大学
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
[email protected]
摘要
我们介绍 TinyLlama，一个在约 1 万亿个令牌上预训练了大约 3 个周期的 1.1B 小型语言模型。基于 Llama 2 的架构和分词器（Touvron 等人，2023b），TinyLlama 利用了各种先进技术...

----------------------------------------------------
节点ID: e2d97411-8dc0-40a3-9539-a860d1741d4f 
得分: -1.7028117179870605 
文本: 尽管这些工作明显偏好大型模型，但使用大型数据集训练小型模型的潜力仍未得到充分探索。Touvron 等人（2023a）强调了推理预算的重要性，而不是仅仅关注训练计算最优的语言模型。推理最优的语言模型旨在在特定推理约束下实现最佳性能。这是通过使用更多令牌训练模型来实现的...

----------------------------------------------------
节点ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 
得分: -2.904750347137451 
文本: 4 结论
在本文中，我们介绍了 TinyLlama，一个开源的小型语言模型。为了促进开源 LLM 预训练社区的透明度，我们发布了所有相关信息，包括我们的预训练代码、所有中间模型检查点以及我们的数据处理步骤的详细信息。凭借其紧凑的架构和有前景的性能，TinyLlama 可以在移动设备上启用终端用户应用程序，并作为一个轻量级平台用于测试...

显然，重排序后，ID 为 ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 的节点排名从 2 变为 1。这意味着最相关的内容被排在首位。

使用LLM作为重排序器

利用LLM进行重排序的方法大致可分为三类：通过重排序任务微调LLM、通过提示LLM进行重排序，以及在训练过程中使用LLM进行数据增强。

通过提示LLM进行重排序的方法成本较低。以下是使用RankGPT 的演示，该方法已集成到LlamaIndex中。

RankGPT的思想是利用LLM（如ChatGPT或GPT-4或其他LLM）进行零样本列表式段落重排序。它采用排列生成方法和滑动窗口策略，高效地对段落进行重排序。

如图3所示，该论文提出了三种可行方法。

前两种方法是传统方法，即对每个文档给出一个分数，然后根据这个分数对所有段落进行排序。

第三种方法，排列生成，是本文提出的。具体来说，模型不是依赖外部分数，而是直接进行端到端的段落排序。 换言之，它直接利用LLM的语义理解能力对所有候选段落进行相关性排序。

然而，通常候选文档数量非常庞大，而LLM的输入受限。因此，往往无法一次性输入所有文本。

因此，如图4所示，引入了滑动窗口方法，借鉴了冒泡排序的思想。每次只对前**4**个文本进行排序，然后移动窗口，对后续的**4**个文本进行排序。遍历整个文本后，即可获得性能最佳的顶部文本。

请注意，为了使用RankGPT，您需要安装较新版本的LlamaIndex。我之前安装的版本（0.9.29）不包含RankGPT所需的代码。因此，我创建了一个新的conda环境，安装了LlamaIndex版本0.9.45.post1。

代码很简单，基于上一节的代码，只需将RankGPT设置为重排序器。

from llama_index.postprocessor import RankGPTRerank
from llama_index.llms import OpenAI
reranker = RankGPTRerank(
    top_n = 3,
    llm = OpenAI(model="gpt-3.5-turbo-16k"),
    # verbose=True,
)

总体结果如下：

(llamaindex_new) Florian:~ Florian$ python /Users/Florian/Documents/rerank.py 
----------------------------------------------------
Node ID: 20de8234-a668-442d-8495-d39b156b44bb 
Score: 0.8703492815379594 
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w... 

----------------------------------------------------
Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd 
Score: 0.8621633467539512 
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
[email protected]
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr... 

----------------------------------------------------
Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 
Score: 0.8343984516104476 
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens... 

------------------------------------------------------------------------------------------------
Start reranking...
----------------------------------------------------
Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd 
Score: 0.8621633467539512 
Text: TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sg
[email protected]
Abstract
We present TinyLlama, a compact 1.1B language model pretrained on around 1
trillion tokens for approximately 3 epochs. Building on the architecture and tok-
enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances
contr... 

----------------------------------------------------
Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 
Score: 0.8343984516104476 
Text: Although these works show a clear preference on large models, the potential of training smaller
models with larger dataset remains under-explored. Instead of training compute-optimal language
models, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusing
solely on training compute-optimal language models. Inference-optimal language models aim for
optimal performance within specific inference constraints This is achieved by training models with
more tokens... 

----------------------------------------------------
Node ID: 20de8234-a668-442d-8495-d39b156b44bb 
Score: 0.8703492815379594 
Text: 4 Conclusion
In this paper, we introduce TinyLlama, an open-source, small-scale language model. To promote
transparency in the open-source LLM pre-training community, we have released all relevant infor-
mation, including our pre-training code, all intermediate model checkpoints, and the details of our
data processing steps. With its compact architecture and promising performance, TinyLlama can
enable end-user applications on mobile devices, and serve as a lightweight platform for testing a
w...

请注意，由于使用了LLM，重排序后的分数并未更新。 当然，这并不关键。

从结果来看，经过重排序后，排名第一的结果是包含正确答案的文本，这与之前使用重排序模型得到的结果一致。

评估

我们可以采用本系列前一篇文章中描述的方法：

具体流程在前一篇文章中有详细说明。修改后的代码如下：

reranker = FlagEmbeddingReranker(
    top_n = 3,
    model = "BAAI/bge-reranker-base",
    use_fp16 = False
)

# 或者使用LLM作为重排序器
# from llama_index.postprocessor import RankGPTRerank
# from llama_index.llms import OpenAI
# reranker = RankGPTRerank(
#     top_n = 3,
#     llm = OpenAI(model="gpt-3.5-turbo-16k"),
#     # verbose=True,
# )

query_engine = index.as_query_engine(       # 将重排序器添加到查询引擎中
    similarity_top_k = 3, 
    node_postprocessors=[reranker]
)
# query_engine = index.as_query_engine()    # 原始查询引擎

感兴趣的读者可以自行测试。

结论

总的来说，本文介绍了重排序的原理及两种主流方法。

其中，采用重排序模型的方法轻量且开销较小。

另一方面，使用LLM的方法在多个基准测试中表现优异，但成本较高，且仅在使用ChatGPT和GPT-4时表现出色，而使用其他开源模型如FLAN-T5和Vicuna-13B时性能则不尽如人意。

因此，在实际项目中需要进行具体权衡。

最后，如有任何疑问，请在评论区指出。

Barry's Home

高级RAG 04重排序