多 Agent 系统
多 Agent 系统使专用 Agent 协作完成复杂任务,从而提高模块化、可扩展性和鲁棒性。与依赖单一 Agent 不同,任务被分配给具有不同能力的 Agent。
在 smolagents 中,不同 Agent 可以组合,生成 Python 代码、调用外部工具、执行网络搜索等。通过编排这些 Agent,可以创建强大的工作流。
典型的设置可能包括:
- 用于任务委派的 Manager Agent
- 用于代码执行的 Code Interpreter Agent
- 用于信息检索的 Web Search Agent
下图展示简单的多 Agent 架构,其中 Manager Agent 协调 Code Interpreter Tool 和 Web Search Agent,而 Web Search Agent 则利用 DuckDuckGoSearchTool
和 VisitWebpageTool
等工具收集相关信息。

1. 多 Agent 系统实战
多 Agent 系统由多个专门的 Agent 组成,它们在编排 Agent 的协调下协同工作。这种方法通过将任务分配给具有不同角色的 Agent 的方式,实现复杂的工作流。
比如多 Agent RAG 系统可以集成:
- Web Agent,用于浏览互联网
- Retriever Agent,用于从知识库中获取信息
- Image Generation Agent,用于生成视觉内容
所有 Agent 都在编排器的管理下运行,编排器负责管理任务委派和交互。
2. 使用多 Agent 层次结构解决复杂任务
招待会即将开始!在你的帮助下,Alfred 已经基本完成准备工作。
但现在出现一个问题:蝙蝠车不见了。Alfred 需要找到替代品,而且要快。
幸运的是,已经拍摄过几部关于布鲁斯·韦恩生平的传记电影,所以 Alfred 也许可以从某个电影片场找到一辆遗留的汽车,然后将其升级改造至现代标准,当然包括全自动驾驶功能。
但它可能在世界各地的任何拍摄地点 - 这些地点可能有很多。
所以 Alfred 需要你的帮助。你可以构建能够解决该任务的 Agent 吗?
👉 找到世界上所有蝙蝠侠电影的拍摄地点,计算乘船到达那里的时间,并且将它们在地图上表示出来,颜色根据乘船时间变化。同时,用相同的乘船时间表示一些超级跑车工厂。
该示例需要额外的包,先安装它们:
pip install 'smolagents[litellm]' plotly geopandas shapely kaleido -q
2.1. 创建获取货运飞机运输时间的工具
import math
from typing import Optional, Tuple
from smolagents import tool
@tool
def calculate_cargo_travel_time(
origin_coords: Tuple[float, float],
destination_coords: Tuple[float, float],
cruising_speed_kmh: Optional[float] = 750.0, # Average speed for cargo planes
) -> float:
"""
Calculate the travel time for a cargo plane between two points on Earth using great-circle distance.
Args:
origin_coords: Tuple of (latitude, longitude) for the starting point
destination_coords: Tuple of (latitude, longitude) for the destination
cruising_speed_kmh: Optional cruising speed in km/h (defaults to 750 km/h for typical cargo planes)
Returns:
float: The estimated travel time in hours
Example:
>>> # Chicago (41.8781° N, 87.6298° W) to Sydney (33.8688° S, 151.2093° E)
>>> result = calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093))
"""
def to_radians(degrees: float) -> float:
return degrees * (math.pi / 180)
# Extract coordinates
lat1, lon1 = map(to_radians, origin_coords)
lat2, lon2 = map(to_radians, destination_coords)
# Earth's radius in kilometers
EARTH_RADIUS_KM = 6371.0
# Calculate great-circle distance using the haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (
math.sin(dlat / 2) ** 2
+ math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
)
c = 2 * math.asin(math.sqrt(a))
distance = EARTH_RADIUS_KM * c
# Add 10% to account for non-direct routes and air traffic controls
actual_distance = distance * 1.1
# Calculate flight time
# Add 1 hour for takeoff and landing procedures
flight_time = (actual_distance / cruising_speed_kmh) + 1.0
# Format the results
return round(flight_time, 2)
print(calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093)))
2.2. 设置 Agent
对于模型提供商,我们使用 Together AI,这是 Hub 上的新推理提供商之一!GoogleSearchTool
使用 Serper API 进行网络搜索,因此需要设置环境变量 SERPER_API_KEY
,同时传递 provider="serpapi"
,或者设置 SERPER_API_KEY
,同时传递 provider=serper
。
如果没有设置任何 Serp API 提供商,那么可以使用 DuckDuckGoSearchTool
,但要注意它有速率限制。
import os
from PIL import Image
from smolagents import CodeAgent, GoogleSearchTool, HfApiModel, VisitWebpageTool
model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")
先创建简单的 Agent 作为基准,生成简单的报告。
task = """Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W), and return them to me as a pandas dataframe.
Also give me some supercar factories with the same cargo plane transfer time."""
agent = CodeAgent(
model=model,
tools=[GoogleSearchTool("serper"), VisitWebpageTool(), calculate_cargo_travel_time],
additional_authorized_imports=["pandas"],
max_steps=20,
)
result = agent.run(task)
result
本示例生成如下输出:
| | Location | Travel Time to Gotham (hours) |
|--|------------------------------------------------------|------------------------------|
| 0 | Necropolis Cemetery, Glasgow, Scotland, UK | 8.60 |
| 1 | St. George's Hall, Liverpool, England, UK | 8.81 |
| 2 | Two Temple Place, London, England, UK | 9.17 |
| 3 | Wollaton Hall, Nottingham, England, UK | 9.00 |
| 4 | Knebworth House, Knebworth, Hertfordshire, UK | 9.15 |
| 5 | Acton Lane Power Station, Acton Lane, Acton, UK | 9.16 |
| 6 | Queensboro Bridge, New York City, USA | 1.01 |
| 7 | Wall Street, New York City, USA | 1.00 |
| 8 | Mehrangarh Fort, Jodhpur, Rajasthan, India | 18.34 |
| 9 | Turda Gorge, Turda, Romania | 11.89 |
| 10 | Chicago, USA | 2.68 |
| 11 | Hong Kong, China | 19.99 |
| 12 | Cardington Studios, Northamptonshire, UK | 9.10 |
| 13 | Warner Bros. Leavesden Studios, Hertfordshire, UK | 9.13 |
| 14 | Westwood, Los Angeles, CA, USA | 6.79 |
| 15 | Woking, UK (McLaren) | 9.13 |
可以通过添加专门的规划步骤和更多提示词的方式,进行改进。
规划步骤允许 Agent 提前思考,以及规划其下一步行动,这对于更复杂的任务很有用。
agent.planning_interval = 4
detailed_report = agent.run(f"""
You're an expert analyst. You make comprehensive reports after visiting many websites.
Don't hesitate to search for many queries at once in a for loop.
For each data point that you find, visit the source url to confirm numbers.
{task}
""")
print(detailed_report)
detailed_report
本例生成如下输出:
| | Location | Travel Time (hours) |
|--|--------------------------------------------------|---------------------|
| 0 | Bridge of Sighs, Glasgow Necropolis, Glasgow, UK | 8.6 |
| 1 | Wishart Street, Glasgow, Scotland, UK | 8.6 |
得益于这些快速改进,我们通过简单地给 Agent 提供详细的提示词,以及给予它规划能力的方式,就得到更加简洁的报告!
模型的上下文窗口很快被填满。因此,如果
要求 Agent 将详细搜索结果与另一个结果结合,那么它将变得更慢,并且迅速增加 Token 和成本。
下面优化系统的结构。
2.3. ✌️ 在两个 Agent 之间分配任务
多 Agent 结构允许在不同子任务之间分离记忆,这带来两个主要好处:
- 每个 Agent 更专注于其核心任务,因此性能更高
- 分离记忆减少每个步骤的输入 Token 数量,从而降低延迟和成本
下面创建拥有专门网络搜索 Agent 的团队,由另一个 Agent 管理。
管理 Agent 应该具备绘图能力,以便编写最终报告:因此给予它对额外导入的访问,包括用于空间绘图的 plotly
和 geopandas
+ shapely
。
model = HfApiModel(
"Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=8096
)
web_agent = CodeAgent(
model=model,
tools=[
GoogleSearchTool(provider="serper"),
VisitWebpageTool(),
calculate_cargo_travel_time,
],
name="web_agent",
description="Browses the web to find information",
verbosity_level=0,
max_steps=10,
)
管理 Agent 需要承担一些繁重的脑力工作。
因此,我们给它配备更强大的模型 DeepSeek-R1
,并且添加 planning_interval
参数。
from smolagents.utils import encode_image_base64, make_image_url
from smolagents import OpenAIServerModel
def check_reasoning_and_plot(final_answer, agent_memory):
multimodal_model = OpenAIServerModel("gpt-4o", max_tokens=8096)
filepath = "saved_map.png"
assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
image = Image.open(filepath)
prompt = (
f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
"Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
"First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
"Don't be harsh: if the plot mostly solves the task, it should pass."
"To pass, a plot should be made using px.scatter_map and not any other method (scatter_map looks nicer)."
)
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {"url": make_image_url(encode_image_base64(image))},
},
],
}
]
output = multimodal_model(messages).content
print("Feedback: ", output)
if "FAIL" in output:
raise Exception(output)
return True
manager_agent = CodeAgent(
model=HfApiModel("deepseek-ai/DeepSeek-R1", provider="together", max_tokens=8096),
tools=[calculate_cargo_travel_time],
managed_agents=[web_agent],
additional_authorized_imports=[
"geopandas",
"plotly",
"shapely",
"json",
"pandas",
"numpy",
],
planning_interval=5,
verbosity_level=2,
final_answer_checks=[check_reasoning_and_plot],
max_steps=15,
)
下面看该团队的具体组成:
manager_agent.visualize()
这将生成类似下面的内容,帮助我们理解 Agent 之间以及它们与所用工具之间的结构和关系:
CodeAgent | deepseek-ai/DeepSeek-R1
├── ✅ Authorized imports: ['geopandas', 'plotly', 'shapely', 'json', 'pandas', 'numpy']
├── 🛠️ Tools:
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
│ ┃ Name ┃ Description ┃ Arguments ┃
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ │ calculate_cargo_travel_time │ Calculate the travel time for a cargo │ origin_coords (`array`): Tuple of │
│ │ │ plane between two points on Earth │ (latitude, longitude) for the │
│ │ │ using great-circle distance. │ starting point │
│ │ │ │ destination_coords (`array`): Tuple │
│ │ │ │ of (latitude, longitude) for the │
│ │ │ │ destination │
│ │ │ │ cruising_speed_kmh (`number`): │
│ │ │ │ Optional cruising speed in km/h │
│ │ │ │ (defaults to 750 km/h for typical │
│ │ │ │ cargo planes) │
│ │ final_answer │ Provides a final answer to the given │ answer (`any`): The final answer to │
│ │ │ problem. │ the problem │
│ └─────────────────────────────┴───────────────────────────────────────┴───────────────────────────────────────┘
└── 🤖 Managed agents:
└── web_agent | CodeAgent | Qwen/Qwen2.5-Coder-32B-Instruct
├── ✅ Authorized imports: []
├── 📝 Description: Browses the web to find information
└── 🛠️ Tools:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ Description ┃ Arguments ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ web_search │ Performs a google web search for │ query (`string`): The search │
│ │ your query then returns a string │ query to perform. │
│ │ of the top search results. │ filter_year (`integer`): │
│ │ │ Optionally restrict results to a │
│ │ │ certain year │
│ visit_webpage │ Visits a webpage at the given url │ url (`string`): The url of the │
│ │ and reads its content as a │ webpage to visit. │
│ │ markdown string. Use this to │ │
│ │ browse webpages. │ │
│ calculate_cargo_travel_time │ Calculate the travel time for a │ origin_coords (`array`): Tuple of │
│ │ cargo plane between two points on │ (latitude, longitude) for the │
│ │ Earth using great-circle │ starting point │
│ │ distance. │ destination_coords (`array`): │
│ │ │ Tuple of (latitude, longitude) │
│ │ │ for the destination │
│ │ │ cruising_speed_kmh (`number`): │
│ │ │ Optional cruising speed in km/h │
│ │ │ (defaults to 750 km/h for typical │
│ │ │ cargo planes) │
│ final_answer │ Provides a final answer to the │ answer (`any`): The final answer │
│ │ given problem. │ to the problem │
└─────────────────────────────┴───────────────────────────────────┴───────────────────────────────────┘
manager_agent.run("""
Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W).
Also give me some supercar factories with the same cargo plane transfer time. You need at least 6 points in total.
Represent this as spatial map of the world, with the locations represented as scatter points with a color that depends on the travel time, and save it to saved_map.png!
Here's an example of how to plot and return a map:
import plotly.express as px
df = px.data.carshare()
fig = px.scatter_map(df, lat="centroid_lat", lon="centroid_lon", text="name", color="peak_hour", size=100,
color_continuous_scale=px.colors.sequential.Magma, size_max=15, zoom=1)
fig.show()
fig.write_image("saved_image.png")
final_answer(fig)
Never try to process strings using code: when you have a string to read, just print it and you'll see it.
""")
我不知道在你的运行中情况如何,但在我的运行中,Manager Agent 将任务巧妙地分解为两个步骤:首先让 Web Agent 查找蝙蝠侠的拍摄地点,然后搜索超级跑车工厂的位置,最后将两个搜索结果整合,并且绘制成地图。
直接从 Agent 的状态中查看生成的地图:
manager_agent.python_executor.state["fig"]
输出的地图如下所示:
