Prompt 模式 · Crawlbase Documentation

从单个页面提取数据

当您希望从特定 URL 中获取结构化数据时使用。直接，无需 agent 循环。

You will be given a URL. Use the crawl_url tool to fetch it, then
extract a JSON object matching this schema:

{
  "title": string,
  "author": string | null,
  "published_date": ISO 8601 date | null,
  "main_image_url": string | null,
  "summary": string  // 2-3 sentences
}

Return ONLY the JSON object, no commentary.

URL: {url}

提示：如果您的客户端支持，请将模型固定为 JSON 输出模式。否则，请使用能够容忍前导/尾随空白字符的 JSON 解析器。

多源研究

适用于「关于 X，网络上有哪些说法」类任务。结合搜索与抓取。

You are a research assistant. Given a topic, you must:

1. Use search_web to find 5-8 high-quality recent sources.
2. Use crawl_url on the top 3-4 to read them in full.
3. Synthesize findings into a brief with:
   - Key facts (bulleted)
   - Points of agreement across sources
   - Points of disagreement, with attribution
   - Open questions

Always cite sources by URL. Reject low-quality results (forums,
content farms) and search again if needed.

Topic: {topic}

变更检测

适用于「当 X 发生变化时通知我」的工作流。可与定时任务搭配使用。

You are monitoring this URL: {url}
The previous snapshot is in ... tags below.

Use crawl_url to fetch the current version. Compare them and report:

- Has the page changed in any meaningful way? (Ignore timestamps,
  view counts, ad rotations.)
- If yes, summarize what changed in 1-3 bullet points.
- If no, respond with the single word "UNCHANGED".


{previous_snapshot}

视觉 QA

将截图工具与模型的视觉能力相结合，用于布局审查。

Use the screenshot tool with mode=fullpage on this URL: {url}.

Then evaluate the page on these criteria:
- Is there a clear primary call-to-action above the fold?
- Is the hero text scannable in under 3 seconds?
- Are there any obvious layout regressions (overlapping elements,
  truncated text, broken images)?

Be specific - point to coordinates or sections, not vague feelings.

线索补全

适用于销售/营销场景：从姓名 + 公司名开始，最终得到完整资料。

You will receive a name and company. Your job is to enrich them
into a structured profile.

1. search_web for "{name} {company} linkedin" - find the LinkedIn URL.
2. scrape_structured with scraper=linkedin-profile on that URL.
3. search_web for "{company}" to find their domain.
4. crawl_url the company homepage and extract a 1-line description.

Return:
{
  "name": ..., "title": ..., "linkedin": ...,
  "company": ..., "company_domain": ..., "company_description": ...
}

If any step fails or returns low-confidence results, set the field
to null rather than guessing.

始终包含失败回退路径

当您告诉 AI 工具失败时该怎么做时，它们会更优雅地失败。「设为 null 而不是猜测」远胜于让模型从训练数据中悄悄编造答案。

通用技巧

明确指定 schema。不要泛泛地索取「此页面上的数据」，而要描述您想要的具体字段。
限制递归爬取。告诉 agent 在单轮交互中最多应抓取多少个 URL。
尽可能使用缓存。使用 store=true 避免在多轮交互中重复爬取相同的 URL。
为 SPA 设置 page_wait。在 prompt 中说明：「对于客户端渲染的站点，使用 page_wait=2000」。