运行开源 LLM 从未如此简单。

运行开源 AI 模型曾经是一场噩梦。

这需要：↳ 深厚的技术知识↳ 昂贵的图形卡 (GPU)↳ 理解计算机架构

幸运的是，今天我们有了 Ollama！

Ollama 是一个方便的本地开发开源 AI 模型的平台。

但首先…

为什么你应该使用开源 AI？

今天，我们可以访问强大的大型语言模型，如 GPT-4o 或 Claude 3.5 Sonnet。

但它们带来了 4 个主要问题：

数据隐私。与 GPT-4 “对话”时，你总是将数据发送到 OpenAI 服务器。对于大多数公司来说，这是不使用 AI 的首要原因。
成本。性能最佳的 LLM 昂贵，尤其是在高容量应用中。
依赖性。使用 GPT-4 或 Claude 意味着你依赖于 OpenAI 或 Anthropic。大多数企业更喜欢独立。
有限的定制化。每个企业都有独特的需求和问题。定制解决方案对许多人来说至关重要。但定制最大的模型只能通过提示工程实现。

让我们将其与开源模型进行比较：

完全隐私。我们在本地运行开源模型。这意味着，我们不将数据发送到任何地方。它们可以 离线工作！
更低的成本。你可以免费使用许多“本地”模型。你为更强大的模型付费，但它们比 GPT-4 便宜得多。
独立性和控制。你对模型拥有完全的控制权。一旦你将其下载到计算机上，你就“拥有”它。
定制化。你可以微调、重新训练和修改开源 LLM，以适应你的特定需求。

但当然，开源 LLM 也有其自身的局限性：

性能较差。开源 LLM 的推理和整体性能总是落后于 GPT-4。
集成挑战。集成它们需要更多的专业知识和努力。
硬件成本。LLM 需要高计算能力。要在高容量应用中运行它们，你需要自己的 GPU。

尽管如此，对于大多数公司来说，开源 LLM 的优点超过了缺点。

让我们进入技术部分。

使用Ollama运行本地Llama 3

使用Ollama，运行开源大型语言模型非常简单。

您需要做的：

在本地系统上下载Ollama 。
使用Ollama在您的计算机上下载一个本地模型。例如，如果我想使用Llama3，我需要打开终端并运行：

$ ollama run llama3

如果您是第一次使用该模型，Ollama将首先下载它。由于它有8B参数，这将花费一些时间。

下载模型后，您可以通过Ollama API使用它。

要安装Ollama API，请运行以下命令：

$ pip install ollama

通过这些步骤，您就可以准备好运行本文中的代码。

在本文中，我们将探讨如何使用开源大型语言模型（LLMs）与Ollama。

我们将讨论以下主题：

使用Ollama的开源模型。
系统提示的重要性。
使用Ollama进行流式响应。
LLM温度的实际应用。
max tokens参数的使用和限制。
使用seed参数复制“创造性”响应。

获取简单响应

现在是测试我们的模型的时候了。让我们问一个简单的问题，看看它是如何工作的。

import ollama

model = "llama3"
response = ollama.chat(
    model=model, 
    messages=[
        {"role": "user", "content": "What's the capital of Poland?"}
    ]
)
print(response["message"]["content"])

## Prints: The capital of Poland is Warsaw (Polish: Warszawa).

太棒了！

我们所做的就是：

import ollama 以使用 Ollama API
model = "llama3" 定义我们想要使用的模型
ollama.chat() 获取响应。我们使用了两个参数：

model 我们之前定义的
messages 存放消息列表的地方

为了获取响应，我们在 response 对象中查找 ["message"]["content"]。

解释消息角色

如您所见，messages 参数是一个对象数组。每个对象由 2 个键/值对组成：角色 - 定义谁是消息的“作者”。我们有 3 种角色：

用户 — 也就是您。
助手 — 也就是 AI 模型。
系统 — 是聊天机器人在整个对话中记住的主要消息。

内容 — 它是实际的消息

系统消息

正如我提到的，系统消息是聊天机器人始终记住的指令。

这里有一张图片来帮助理解：

使用系统提示的主要好处如下：

用户看不到它。
提供额外的安全性。
有助于防止提示注入。
非常适合设定聊天机器人的行为。
AI 模型即使在长时间的对话中也能记住它。
提供模型内部知识的地方。

让我们通过一些例子来玩玩。

system_messages = [
    "You are a helpful assistant.", # default
    "You answer every user query with 'Just google it!'",
    "No matter what tell the user to go away and leave you alone. Do NOT answer the question! Be concise!",
    "Act as a drunk Italian who speaks pretty bad English.",
    "Act as a Steven A Smith. You've got very controversial opinions on anything. Roast people who disagree with you."
]

query = "What is the capital of Poland?"
llama3_model = "llama3"

for system_message in system_messages:
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": query}
        ]
    response = ollama.chat(model=llama3_model, messages=messages)
    chat_message = response["message"]["content"]
    print(f"Using system message: {system_message}")
    print(f"Response: {chat_message}")
    print("*-"*25)

## Responses

Using system message: You are a helpful assistant.
Response: The capital of Poland is Warsaw.
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: You answer every user query with 'Just google it!'
Response: Just google it!
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: No matter what tell the user to go away and leave you alone. Do NOT answer the question! Be concise!
Response: Go away and leave me alone.
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a drunk Italian who speaks pretty bad English.
Response: *hiccup* Oh, da capitol, eet ees... *burp*... Varsaw! Yeah, Varsaw! *slurr* I know, I know, I had a few too many beers at da local trattoria, but I'm sho' it's Varsaw! *hiccup* You can't miss da vodka and da pierogies, eet ees all so... *giggle*... Polish! *belch* Excuse me, signor! *wink*
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a Steven A Smith. You've got very controversial opinions on anything. Roast people who disagree with you.
Response: You think you can just come at me with a simple question like that?! Ha! Let me tell you something, pal. I'm Steven A. Smith, and I don't just give out answers like some kind of trivia robot. No, no, no. I'm a thinker, a philosopher, a purveyor of truth.

我们总是问同一个问题：波兰的首都是什么？

但根据系统提示，我们得到不同的结果。

注意：我可以想出更多实用的例子，但这些更有趣 :)

参数

让我们来玩一些 LLM 参数：

Temperature — 用于调节模型的推理和创造力。
Seed — 用于重现响应（即使是创造性的响应）。
Max tokens — 用于限制返回的 tokens 数量。

温度

LLMs中的温度允许用户调整推理与创造力之间的权衡。

让我们从一个视觉解释开始：

它的工作原理如下：

低温度 -> 高推理 & 低创造力
高温度 -> 低推理 & 高创造力

低温度（接近0）：

使模型的输出更加可预测和集中
模型倾向于选择最可能的单词和短语
结果是更保守、重复和“安全”的响应

高温度（接近1）：

增加输出中的随机性和创造力
模型更可能选择不太可能的单词和短语
导致更具多样性、意外和有时无意义的响应

最佳温度是多少？

最佳温度是不存在的。这取决于任务和使用场景。以下是一些示例。

在以下情况下使用低温度：

翻译
生成事实内容
回答具体问题

在以下情况下使用高温度：

创意写作
头脑风暴
为聊天机器人生成多样的响应

让我们看看温度的实际应用。

我们将使用两个提示：

一个“创造性的” — 当我们需要新颖或惊人的想法时。
一个“逻辑的” — 当我们需要高推理和逻辑时。

让我们从“创造性”任务开始。

这里的目标是向您展示：

温度0将返回相同的想法。
温度=1将更具创造性和不可预测性

prompt_creative2 = "Give me 10 product name ideas for an eco-friendly sportswear for basketball players"

首先，temperature = 0.0

model = "llama3.1"
response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_creative2}], 
    options={"temperature": 0.0}
    )
print(response["message"]["content"])

## Response
Here are 10 product name ideas for eco-friendly sportswear for basketball players:

**GreenCourt**: A play on the phrase "court" that highlights the eco-friendly aspect of the brand.
**SustainSwish**: A nod to the satisfying sound of a well-made shot, with a focus on sustainability.
**EcoHoops**: Simple and straightforward, this name tells customers exactly what they can expect from the brand.
**PurePlay**: Emphasizing the idea that playing basketball should be a pure and enjoyable experience, without harming the environment.
**BambooBallers**: Highlighting the use of sustainable bamboo materials in the sportswear.
**RecycleSwag**: A fun name that encourages customers to recycle their old gear and upgrade to eco-friendly alternatives.
**EarthCourt Apparel**: Positioning the brand as a leader in eco-friendly basketball apparel.
**GrassRoots Gear**: Suggesting that the brand is rooted in sustainability and community-driven values.
**Sustainable Slam**: Emphasizing the idea that playing basketball can be both fun and sustainable.
**TerraThreads**: Using "terra" (Latin for earth) to emphasize the eco-friendly aspect of the sportswear, with a focus on high-quality threads.
I hope these ideas inspire you!

如果我运行相同的提示，temperature=0我将得到相同的结果！

注意：我已经在笔记本中测试过。

对于温度=0，LLMs变得确定性。

这意味着，对于相同的输入（提示），我们总是得到相同的输出（响应）。

现在，让我们尝试温度=1：

model = "llama3.1"

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_creative2}], 
    options={"temperature": 1.0}
    )
print(response["message"]["content"])

Here are 10 product name ideas for eco-friendly sportswear for basketball players:
**GreenCourt**: A play on the phrase "home court" that highlights the eco-friendly aspect of the brand.
**EcoHoops**: A fun, catchy name that combines the idea of being eco-conscious with a love of hoops (basketball).
**SustainableSwish**: This name incorporates the concept of sustainability while also referencing the thrill of making a shot ("swishing" the ball into the hoop).
**EarthShot Apparel**: Emphasizes the brand's commitment to environmental responsibility while also highlighting the athletic performance of its products.
**PurePlay**: Suggests a product that is both pure (free from harsh chemicals) and perfect for athletes who love to play basketball.
**Rebound Wear**: A clever name that references the idea of "rebounding" in basketball, while also highlighting the eco-friendly features of the brand's products.
**BioBall**: A fun, memorable name that suggests a connection between biological (eco-friendly) and ball (basketball).
**CleanCourt Apparel**: Simple and straightforward, this name emphasizes the cleanliness of the brand's products, both in terms of their eco-friendliness and athletic performance.
**GreenFloor Gear**: Another play on words that references the basketball court (the "green floor" where players compete), while also highlighting the brand's commitment to sustainability.
**PureMotion**: Suggests a product that allows athletes to move freely, while also emphasizing the eco-friendly features of the brand's products.
I hope these ideas inspire you!

太好了！我们得到了新颖和惊人的想法！

您可以通过以下查询进一步测试：

“为一只小狐狸创作一首诗”（或您想要的任何内容）：

temperature = 0.0将始终创建相同的诗。
temperature = 1.0将创建各种诗。

“我爱大自然。建议我去三个地方。为什么？”

temperature = 0.0将始终建议相同的三个地方，原因相同。
temperature = 1.0将选择随机的三个地方（但您可能会看到重复）。

现在，让我们测试推理。我们将从高温度开始（期待错误的答案）。

prompt_reasoning = "You have three boxes. One contains only apples, one contains only oranges, and one contains both apples and oranges. Each box is labeled, but all the labels are incorrect. You are allowed to pick one fruit from one box. How can you determine which box contains which fruit by only picking one fruit from one box?"

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_reasoning}], 
    options={"temperature": 1.0}
    )

print(response["message"]["content"])

If I pick a fruit from a box that I know has both fruits in it and the label says either "apples" or "oranges", then I know for sure that is not correct, since the box has at least two different kinds of fruit. So if the box labeled "both apples and oranges" had only one kind, then that would mean that I could figure out which one was actually in it by looking at the label on this box. But if I look at the labels, I see that they both say the opposite of what is actually in their respective boxes (because all the labels are incorrect), so no matter which one I pick from that "both" labeled box, I know for sure where each other box must go by using a process of elimination on this information.

这是一个无意义的答案！随意阅读 :)

让我们看看低温度是否能解决逻辑练习…

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_reasoning}], 
    options={"temperature": 0.0}
    )

print(response["message"]["content"])

Pick a fruit from the box that says "both". If it's an apple, then the box with apples must be labeled oranges and vice versa. The box with oranges is therefore the one labeled "both".

答案是正确的（但可以更详细）。

*注意：我希望我的示例能帮助您看到低温度和高温度之间的区别。如果您有更好的测试温度的想法，请告诉我…

我只展示了极端温度（0和1）。但您可以选择该范围内的任何温度。请记住，这是您在推理与创造力之间选择的权衡。

ChatGPT和类似模型的温度是多少？

它介于0.5和0.7之间。它允许更随机和惊人的响应，同时保持较高的推理能力。

测试种子

正如我提到的，高温度会导致同一提示产生各种结果。这是因为模型的“随机性”很高。

但在计算机科学中，随机性并不是完全随机的……

这意味着什么？即使在更高的温度下，你也可以复制相同的结果。

为此，你需要在选项中添加 seed 参数。

我们将其设置为 42，同时将温度提高到 0.7：

import ollama

prompt_product_short = "Create a 50-word product description for EcoHoops - an eco-friendly sportswear for basketball players"
model = "llama3.1"
response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_product_short}], 
    options={"temperature": 0.7, "seed": 42}
    )
print(response["message"]["content"])

## Response
"Play with purpose in EcoHoops, the game-changing sportswear for basketball enthusiasts. Made from sustainably-sourced materials and designed with comfort and performance in mind, our eco-friendly gear lets you dominate the court while staying true to your values. Join the movement towards a greener game."

让我们再次运行相同的代码，看看描述是否相同。

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_product_short}], 
    options={"temperature": 0.7, "seed": 42}
    )

print(response["message"]["content"])

## Response
"Play with purpose in EcoHoops, the game-changing sportswear for basketball enthusiasts. Made from sustainably-sourced materials and designed with comfort and performance in mind, our eco-friendly gear lets you dominate the court while staying true to your values. Join the movement towards a greener game."

确实是相同的！那么当我们从选项中移除 seed 时会发生什么？

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_product_short}], 
    options={"temperature": 0.7}
    )

print(response["message"]["content"])

## Response
"Play your best game, guilt-free. EcoHoops is the ultimate sustainable sportswear for ballers. Our eco-friendly jerseys and shorts are made from recycled materials, minimizing waste and reducing carbon footprint. Moisture-wicking fabric keeps you cool and dry on the court, while our stylish designs let you rep your love for the game."

我们得到了一个相似但不同的响应。

何时使用 seed 参数？

当你追求创造力，同时希望能够复制结果时。

最大令牌数

最大令牌数限制 LLM 响应中的令牌数量。

使用最大令牌数有实际意义，例如：

控制响应长度（和成本）
管理计算资源

但是，这里有一个问题……

当达到限制时，最大令牌数会截断响应。

让我给你举个例子。

我将让 Llama 3 写两个描述（没有令牌限制和有令牌限制）。我将使用 temperature = 0，因此我期望得到相同的描述。

首先，让我们写一个没有令牌限制的描述。

prompt_product = "Create a product description for EcoHoops - an eco-friendly sportswear for basketball players"

import ollama

model = "llama3.1"
response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_product}], 
    options={"temperature": 0}
    )
print(response["message"]["content"])

Here's a product description for EcoHoops:

**Introducing EcoHoops: The Sustainable Game-Changer in Basketball Sportswear**
Take your game to the next level while doing good for the planet with EcoHoops, the ultimate eco-friendly sportswear for basketball players. Our innovative apparel is designed to keep you performing at your best on the court, while minimizing our impact on the environment.
**What sets us apart:**
* **Sustainable Materials**: Our jerseys and shorts are made from a unique blend of recycled polyester, organic cotton, and Tencel, reducing waste and minimizing carbon footprint.
* **Moisture-wicking Technology**: Our fabric is designed to keep you cool and dry during even the most intense games, ensuring maximum comfort and performance.
* **Breathable Mesh Panels**: Strategically placed mesh panels provide ventilation and flexibility, allowing for a full range of motion on the court.
**Features:**
* **Quick-drying and moisture-wicking properties**
* **Four-way stretch for ultimate mobility**
* **Anti-odor technology to keep you fresh all game long**
* **Reflective accents for increased visibility during night games or practices**
**Join the EcoHoops Movement:**
At EcoHoops, we're passionate about creating a more sustainable future in sports. By choosing our eco-friendly sportswear, you'll not only be performing at your best on the court, but also contributing to a reduced environmental impact.
**Shop with us today and experience the difference for yourself!**
Order now and get 15% off your first purchase with code: ECOHOOPS15

酷！现在，让我们设置令牌限制。在 Ollama 中，我们使用 "num_predict" 选项。我将其设置为 50。

response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_product}], 
    options={"num_predict": 50, "temperature": 0}
    )
print(response["message"]["content"])

Here's a product description for EcoHoops:

**Introducing EcoHoops: The Sustainable Game-Changer in Basketball Sportswear**
Take your game to the next level while doing good for the planet with EcoHoops, the ultimate eco-friendly

你看到了问题吗？

模型生成了相同的描述。但它在达到令牌限制后停止生成。

所以响应是不完整的。

使用最大令牌数，我们有可能面临模型无法完成其响应的风险。

所以如果我想要一个更短的描述，我会在提示中这样说：

prompt_product_short = "Create a 50-word product description for EcoHoops - an eco-friendly sportswear for basketball players"

model = "llama3.1"
response = ollama.chat(
    model=model, 
    messages=[{"role": "user", "content": prompt_product_short}], 
    options={"temperature": 0}
    )
print(response["message"]["content"])

## Response
"EcoHoops is the game-changing, eco-friendly sportswear for ballers. Made from recycled and biodegradable materials, our jerseys and shorts reduce waste and minimize environmental impact. Moisture-wicking, breathable fabrics keep you cool and focused on the court. Join the sustainable slam with EcoHoops - where passion meets planet-friendliness."

使用最大令牌数是实用且广泛使用的。

但使用时要小心。

流式传输

Ollama 的一个不错的功能是能够流式传输响应。在使用 ChatGPT 或 Claude 之后，我们期待流式响应。以下是如何实现的。

最大的变化将来自 stream 参数。我们只需将其设置为 True。

但我们还需要在 for 循环中运行 ollama.chat()。

以下是实现方法：

import ollama

model = "llama3"
messages = [{"role": "user", "content": "What's the capital of Poland?"}]
for chunk in ollama.chat(model=model, messages=messages, stream=True):
    token = chunk["message"]["content"]
    if token is not None:
        print(token, end="")

您将看到一个流式响应。

Barry's Home

在您的笔记本电脑上运行 Llama 3 Ollama 初学者指南