What is Crawlbase Cloud Storage?

A scalable cloud store for your crawled and scraped data. Add store=true to a crawl and the rendered HTML, JSON and screenshots are kept for you, ready to retrieve by RID or search, with no database of your own to run.

How do I store a page?

Add store=true to your Crawling API call, or use Cloud Storage as the webhook target for a Crawler. The response returns a storage RID that identifies the stored page.

How do I get my data back?

Send a GET request to the storage endpoint with the RID, run a full-text search across your stored pages, or browse and download from the storage dashboard.

Does it store screenshots too?

Yes. Stored crawls can include the rendered HTML, structured JSON and full-page screenshots, all retrievable by the same RID.

Do I have to manage scaling or backups?

No. Crawlbase handles scaling, backups and cleanup of your storage space, so you can stop maintaining an S3 bucket or a database of your own.

How much does it cost?

It is free for developers, storing up to 10,000 documents for 14 days with no credit card. Beyond that it scales inexpensively with your crawl volume, and the same token works across the Crawling API, the Crawler and every scraper.

产品 / Cloud Storage

Cloud Storage。
每次抓取都保留，无需重新获取。

添加 store=true 到任意抓取，Crawlbase 便会将渲染后的页面、JSON 和截图保留在云端。
按 RID 取回，运行全文搜索，省去数据库、S3 存储桶和备份。

免费开始阅读文档

10,000 个文档免费一个标志：store=true全文搜索

实时存储写入1.24M req/min流式传输

200ebay.com/itm/204512389011DE135ms

200walmart.com/ip/55048794DE100ms

200glassdoor.com/Reviews/index.htmIN203ms

200ebay.com/itm/204512389011AU158ms

200booking.com/searchresults.html?ss=ParisGB200ms

200amazon.com/dp/B08N5WRWNWGB184ms

200tripadvisor.com/Restaurants-g60763ES43ms

200linkedin.com/jobs/searchES146ms

200booking.com/searchresults.html?ss=ParisBR57ms

200indeed.com/jobs?q=developerSG106ms

200yelp.com/biz/blue-bottle-coffeeIN125ms

200target.com/p/-/A-79404211US183ms

200tripadvisor.com/Restaurants-g60763SG46ms

200amazon.com/dp/B08N5WRWNWFR193ms

200linkedin.com/jobs/searchFR168ms

200glassdoor.com/Reviews/index.htmAU151ms

200stackoverflow.com/questions/11227809SG92ms

200target.com/p/-/A-79404211IN89ms

200ebay.com/itm/204512389011FR125ms

200booking.com/searchresults.html?ss=ParisES153ms

200amazon.com/dp/B08N5WRWNWES108ms

200indeed.com/jobs?q=developerFR97ms

200tripadvisor.com/Restaurants-g60763JP120ms

200glassdoor.com/Reviews/index.htmES65ms

200google.com/search?q=web+scrapingCA124ms

200ebay.com/itm/204512389011IN213ms

200ebay.com/itm/204512389011DE135ms

200walmart.com/ip/55048794DE100ms

200glassdoor.com/Reviews/index.htmIN203ms

200ebay.com/itm/204512389011AU158ms

200booking.com/searchresults.html?ss=ParisGB200ms

200amazon.com/dp/B08N5WRWNWGB184ms

200tripadvisor.com/Restaurants-g60763ES43ms

200linkedin.com/jobs/searchES146ms

200booking.com/searchresults.html?ss=ParisBR57ms

200indeed.com/jobs?q=developerSG106ms

200yelp.com/biz/blue-bottle-coffeeIN125ms

200target.com/p/-/A-79404211US183ms

200tripadvisor.com/Restaurants-g60763SG46ms

200amazon.com/dp/B08N5WRWNWFR193ms

200linkedin.com/jobs/searchFR168ms

200glassdoor.com/Reviews/index.htmAU151ms

200stackoverflow.com/questions/11227809SG92ms

200target.com/p/-/A-79404211IN89ms

200ebay.com/itm/204512389011FR125ms

200booking.com/searchresults.html?ss=ParisES153ms

200amazon.com/dp/B08N5WRWNWES108ms

200indeed.com/jobs?q=developerFR97ms

200tripadvisor.com/Restaurants-g60763JP120ms

200glassdoor.com/Reviews/index.htmES65ms

200google.com/search?q=web+scrapingCA124ms

200ebay.com/itm/204512389011IN213ms

01 实时演示

用一个标志存储。按 RID 拉取。

Cloud Storage，实时演示。用一个参数存储一次抓取，之后再按其 RID 取回。悬停以暂停并阅读。

就绪

按键 1-2 切换 · 点击暂停运行你自己的 URL

几分钟内存储你的第一次抓取。10,000 个文档免费，无需信用卡。免费开始

02 功能

专为抓取而建的存储。

无需自建存储，即可保留、查找并拉取你抓取的数据。

store

用一个标志存储

添加 store=true 到任意 Crawling API 调用，或将 Crawler 指向云存储，渲染后的页面便会自动保留。

retrieve

按 RID 取回

每一次存储的抓取都会获得一个 RID。带上它发送一个 GET 请求，页面便会直接返回，无需重新抓取。

全文搜索

在你抓取过的所有内容中搜索，找到你需要的确切页面，而无需扫描自己的数据库。

capture

页面和截图

将渲染后的 HTML、结构化 JSON 和整页截图一同保留，全部可按同一个 RID 取回。

scale

扩展已处理

Crawlbase 负责管理你空间的扩展、备份和清理，让你可以停用 S3 存储桶及其维护。

pipe

Crawler webhook 目标

将存储用作异步 Crawler的投递目标，让大规模抓取落地即可拉取。

03 工作原理

一个标志存储，一次调用拉取。

保持你的抓取原样。添加一个参数即可存储，一个 GET 即可取回。

添加 store=true

用 store=true 抓取任意 URL，或将云存储设为你的 Crawler webhook 目标。

我们存储并建立索引

渲染后的页面、JSON 和截图会被存储并建立索引以供全文搜索，扩展由我们处理。

保存 RID

响应会返回一个存储 RID，唯一标识你刚刚保留的页面。

取回或搜索

带上 RID 向存储端点发送一个 GET，在你的抓取中搜索，或从仪表板拉取。

04 使用场景

团队在 Cloud Storage 中保留什么。

USE / 01数据管道

你技术栈的缓冲区

将抓取的页面落地到存储中，按你自己的节奏拉取到数据仓库、索引或模型中。

USE / 02历史

随时间推移的快照

保留每一次抓取，这样你就能跨日期对比一个页面，而无需再次抓取它。

USE / 03搜索

在你的抓取中查找

对存储的所有内容运行全文搜索，定位你需要的确切页面和字段。

USE / 04成本

停用你自己的存储

放弃 S3 存储桶和数据库。存储会为你自动扩展、备份和清理。

USE / 05AI

用于训练和 RAG 的语料库

直接从存储中构建并重新拉取大型、干净的页面集，用于训练和检索。

USE / 06异步

Crawler 投递

将存储与 Crawler 搭配，让大批量异步抓取到达即可取回。

05 价格

免费起步，低价扩展。

Cloud Storage 对开发者免费，对公司也很便宜。你为抓取量付费，而非为运行一个数据库付费。

免费版开始构建 AI 和网页数据工作流

$0/ mo

免费开始

1 万次请求
保留 14 天
Get API
Delete API
Bulk API

开发者版为 AI 与自动化工作流提供可靠存储

$29/ mo

选择开发者版

10 万次请求
保留 30 天
Get API
Delete API
Bulk API

最受欢迎

商业版专为可扩展的数据运营而打造

$249/ mo

选择商业版

100 万次请求
保留 30 天
Get API
Delete API
Bulk API

企业版为大规模数据系统定制的基础设施

联系我们

自定义容量
自定义保留期
Get API
Delete API
Bulk API

06 为什么选择 Crawlbase

无需你自己运行的存储。

Cloud Storage 运行在为 70,000+ 开发者提供服务的同一网络上。无需预置 S3，无需备份数据库，存满时也无需清理。

99%

平均请求成功率

70K+

网络上的客户

10K

免费保留 14 天的文档

99.99%

网络正常运行时间

添加 store=true 一次，每一次抓取都会被保留、建立索引并可随时拉取。

07 FAQ

Cloud Storage 问题。

一个用于存储你抓取和采集数据的可扩展云存储。为一次抓取添加 store=true，渲染后的 HTML、JSON 和截图便会为你保留，可按 RID 取回或搜索，无需运行你自己的数据库。

为你的 Crawling API 调用添加 store=true，或将 Cloud Storage 用作 Crawler 的 webhook 目标。响应会返回一个标识该存储页面的存储 RID。

带上 RID 向存储端点发送一个 GET 请求，在你存储的页面中运行全文搜索，或从存储仪表板浏览和下载。

是的。存储的抓取可以包含渲染后的 HTML、结构化 JSON 和整页截图，全部可按同一个 RID 取回。

不需要。Crawlbase 负责你存储空间的扩展、备份和清理，因此你可以不再维护 S3 存储桶或自己的数据库。

它对开发者免费，可存储多达 10,000 个文档，保留 14 天，无需信用卡。超出后会随你的抓取量低价扩展，同一个令牌可在 Crawling API、Crawler 和每一个采集器中通用。

保留你抓取的每一个页面。
省去数据库。

免费起步，可存储多达 10,000 个文档，保留 14 天。一个令牌通用于 Cloud Storage 和每一个产品。