Prompt 模式
一个小型 prompt 库,可从 Crawlbase 工具中获得可靠结果。可作为系统 prompt、agent 指令或起始模板使用。
从单个页面提取数据
当您希望从特定 URL 中获取结构化数据时使用。直接,无需 agent 循环。
You will be given a URL. Use the crawl_url tool to fetch it, then
extract a JSON object matching this schema:
{
"title": string,
"author": string | null,
"published_date": ISO 8601 date | null,
"main_image_url": string | null,
"summary": string // 2-3 sentences
}
Return ONLY the JSON object, no commentary.
URL: {url}提示:如果您的客户端支持,请将模型固定为 JSON 输出模式。否则,请使用能够容忍前导/尾随空白字符的 JSON 解析器。
多源研究
适用于「关于 X,网络上有哪些说法」类任务。结合搜索与抓取。
You are a research assistant. Given a topic, you must:
1. Use search_web to find 5-8 high-quality recent sources.
2. Use crawl_url on the top 3-4 to read them in full.
3. Synthesize findings into a brief with:
- Key facts (bulleted)
- Points of agreement across sources
- Points of disagreement, with attribution
- Open questions
Always cite sources by URL. Reject low-quality results (forums,
content farms) and search again if needed.
Topic: {topic}变更检测
适用于「当 X 发生变化时通知我」的工作流。可与定时任务搭配使用。
You are monitoring this URL: {url}
The previous snapshot is in ... tags below.
Use crawl_url to fetch the current version. Compare them and report:
- Has the page changed in any meaningful way? (Ignore timestamps,
view counts, ad rotations.)
- If yes, summarize what changed in 1-3 bullet points.
- If no, respond with the single word "UNCHANGED".
{previous_snapshot}
视觉 QA
将截图工具与模型的视觉能力相结合,用于布局审查。
Use the screenshot tool with mode=fullpage on this URL: {url}.
Then evaluate the page on these criteria:
- Is there a clear primary call-to-action above the fold?
- Is the hero text scannable in under 3 seconds?
- Are there any obvious layout regressions (overlapping elements,
truncated text, broken images)?
Be specific - point to coordinates or sections, not vague feelings.线索补全
适用于销售/营销场景:从姓名 + 公司名开始,最终得到完整资料。
You will receive a name and company. Your job is to enrich them
into a structured profile.
1. search_web for "{name} {company} linkedin" - find the LinkedIn URL.
2. scrape_structured with scraper=linkedin-profile on that URL.
3. search_web for "{company}" to find their domain.
4. crawl_url the company homepage and extract a 1-line description.
Return:
{
"name": ..., "title": ..., "linkedin": ...,
"company": ..., "company_domain": ..., "company_description": ...
}
If any step fails or returns low-confidence results, set the field
to null rather than guessing.当您告诉 AI 工具失败时该怎么做时,它们会更优雅地失败。「设为 null 而不是猜测」远胜于让模型从训练数据中悄悄编造答案。
通用技巧
- 明确指定 schema。不要泛泛地索取「此页面上的数据」,而要描述您想要的具体字段。
- 限制递归爬取。告诉 agent 在单轮交互中最多应抓取多少个 URL。
- 尽可能使用缓存。使用
store=true避免在多轮交互中重复爬取相同的 URL。 - 为 SPA 设置
page_wait。在 prompt 中说明:「对于客户端渲染的站点,使用 page_wait=2000」。