How do I scrape The Washington Post?

Send the Washington Post URL to the Crawlbase Crawling API with your token. Crawlbase rotates a residential proxy, renders the page in a real browser, clears bot checks, and returns the fully rendered HTML. Add scraper=generic-extractor to get structured JSON instead.

Can I get Washington Post articles as JSON?

Yes. By default the Crawling API returns rendered HTML; add the generic extractor (scraper=generic-extractor) to receive title, meta, content, images and links as JSON, or parse the HTML yourself.

Does it render JavaScript on article pages?

Yes. A real browser executes the page, so headlines, bylines, timestamps and full body text that load through JavaScript are captured, not just the initial HTML.

How do I avoid getting blocked scraping The Washington Post?

Crawlbase routes each request through rotating residential IPs across 30 geographies and clears bot detection automatically. You do not manage proxies or solve CAPTCHAs, and there is nothing to maintain when the site changes its setup.

What about the paywall?

The Crawling API reads publicly visible pages as a logged-out visitor sees them. On metered or paywalled articles you receive the public markup the page serves, with no login and no subscriber credentials.

Which Washington Post pages can I crawl?

Any public URL: the front page, section fronts like politics or business, individual article pages, and search results. The same API works on any other site too.

How much does it cost?

Start free with up to 20,000 requests and no credit card. Paid plans scale with usage, and the same token works across the Crawling API and every Crawlbase scraper.

Crawling API / Washington Post

Washington Post 抓取器。
任意文章，完整渲染。

发送任意 Washington Post URL，即可拿回完整渲染的 HTML，通过住宅代理传输并内置反爬处理。
使用 generic extractor 将其转换为 JSON。

免费开始在线查看

99% 成功率140M 住宅 IP30 个地区

实时抓取动态 · Washington Post1.24M req/min流式传输中

404washingtonpost.com/sports/2026/01/12/playoff-recap/CA69ms

200washingtonpost.com/technology/2026/01/10/ai-regulation/AU109ms

200washingtonpost.com/sports/nflBR215ms

200washingtonpost.com/businessFR202ms

200washingtonpost.com/politicsCA203ms

200washingtonpost.com/world/middle-east/2026/01/08/ceasefire/CA147ms

200washingtonpost.com/lifestyle/2026/01/09/feature-piece/US210ms

301washingtonpost.com/lifestyle/2026/01/09/feature-piece/FR54ms

200washingtonpost.com/national/2026/01/15/storm-coverage/SG141ms

200washingtonpost.com/technology/2026/01/10/ai-regulation/BR87ms

200washingtonpost.com/businessES58ms

200washingtonpost.com/climate-environment/2026/01/11/emissions-report/GB177ms

200washingtonpost.com/world/middle-east/2026/01/08/ceasefire/FR94ms

200washingtonpost.com/sports/2026/01/12/playoff-recap/ES179ms

301washingtonpost.com/politics/2026/01/15/election-update/AU113ms

200washingtonpost.com/technology/2026/01/15/sample-story/IN146ms

200washingtonpost.com/world/2026/01/14/summit-talks/FR129ms

200washingtonpost.com/world/asia-pacificAU68ms

301washingtonpost.com/opinionsIN125ms

200washingtonpost.com/technology/2026/01/10/ai-regulation/SG169ms

200washingtonpost.com/business/technologyES121ms

200washingtonpost.com/opinions/2026/01/13/policy-debate/FR44ms

200washingtonpost.com/world/2026/01/14/summit-talks/NL98ms

200washingtonpost.com/climate-environment/2026/01/11/emissions-report/DE133ms

404washingtonpost.com/world/middle-east/2026/01/08/ceasefire/CA102ms

200washingtonpost.com/world/2026/01/14/summit-talks/FR158ms

404washingtonpost.com/sports/2026/01/12/playoff-recap/CA69ms

200washingtonpost.com/technology/2026/01/10/ai-regulation/AU109ms

200washingtonpost.com/sports/nflBR215ms

200washingtonpost.com/businessFR202ms

200washingtonpost.com/politicsCA203ms

200washingtonpost.com/world/middle-east/2026/01/08/ceasefire/CA147ms

200washingtonpost.com/lifestyle/2026/01/09/feature-piece/US210ms

301washingtonpost.com/lifestyle/2026/01/09/feature-piece/FR54ms

200washingtonpost.com/national/2026/01/15/storm-coverage/SG141ms

200washingtonpost.com/technology/2026/01/10/ai-regulation/BR87ms

200washingtonpost.com/businessES58ms

200washingtonpost.com/climate-environment/2026/01/11/emissions-report/GB177ms

200washingtonpost.com/world/middle-east/2026/01/08/ceasefire/FR94ms

200washingtonpost.com/sports/2026/01/12/playoff-recap/ES179ms

301washingtonpost.com/politics/2026/01/15/election-update/AU113ms

200washingtonpost.com/technology/2026/01/15/sample-story/IN146ms

200washingtonpost.com/world/2026/01/14/summit-talks/FR129ms

200washingtonpost.com/world/asia-pacificAU68ms

301washingtonpost.com/opinionsIN125ms

200washingtonpost.com/technology/2026/01/10/ai-regulation/SG169ms

200washingtonpost.com/business/technologyES121ms

200washingtonpost.com/opinions/2026/01/13/policy-debate/FR44ms

200washingtonpost.com/world/2026/01/14/summit-talks/NL98ms

200washingtonpost.com/climate-environment/2026/01/11/emissions-report/DE133ms

404washingtonpost.com/world/middle-east/2026/01/08/ceasefire/CA102ms

200washingtonpost.com/world/2026/01/14/summit-talks/FR158ms

01 在线演示

输入任意 Washington Post URL，输出 HTML 或 JSON。

Crawling API，实时呈现。获取渲染后的 HTML，或切换到 generic extractor 获取 JSON。悬停可暂停以细读。

就绪

按键 1-2 切换 · 点击暂停运行你自己的 URL

几分钟内运行你的第一个请求。最多 20,000 次免费请求，无需信用卡。免费开始

02 功能

一个 API，应对 Washington Post 抛出的一切。

Washington Post 将文章置于计量付费墙之后，用 JavaScript 渲染报道，并在文章页和栏目页监测机器人。Crawling API 在真实浏览器中渲染页面，通过住宅 IP 访问，并交付干净的 HTML 或 JSON。

render

完整 JavaScript 渲染

真实浏览器执行页面，因此通过 JavaScript 加载的标题、署名、时间戳和正文都会被捕获，而不仅仅是初始 HTML。

proxies

140M 住宅 IP

每个请求都会在 30 个地区轮换住宅 IP，让你像真实的本地读者一样访问 Washington Post。

anti-bot

拦截由我们处理

机器人检测、计量付费墙和速率限制都会被自动清除。无需破解，也无需维护。

format

HTML 或 JSON

获取完整渲染的 HTML，或添加 scraper=generic-extractor，以结构化 JSON 返回标题、内容、图片和链接。

extras

截图与异步

同一次调用即可捕获整页截图，或借助 webhook 与云存储异步运行。

one token

一个 API 应对每个站点

Crawling API 可用于任意 URL，因此同一个令牌既覆盖 Washington Post，也覆盖你爬取的其他一切。查看在线演示.

03 输出

渲染后的 HTML，或干净的 JSON。

默认情况下你会得到渲染后的 HTML。添加 generic-extractor，同一页面便会以类型化的 JSON 返回。

{ "title": "Sample Story | The Washington Post", "favicon": "https://www.washingtonpost.com/favicon.ico", "meta": { "description": "The latest news and analysis from The Washington Post.", "keywords": "..." }, "content": "Article headline, byline, timestamp and body for the story...", "canonical": "https://www.washingtonpost.com/technology/2026/01/15/sample-story/", "images": [ "..." ], "og_images": [ "..." ], "links": [ "..." ] }

页面

title · string canonical · string favicon · string

元信息

meta.description · string meta.keywords · string

内容

content · string

媒体

images · array og_images · array

链接

links · array

04 工作原理

一次调用，从 URL 到数据。

每个 Washington Post 请求都走同一条路径。你发送 URL，中间的一切由我们负责。

发送 URL

带上你的令牌传入任意公开的 Washington Post URL：首页、栏目、文章或搜索。

轮换代理

一个能干净访问 Washington Post 的住宅 IP 与地区，取自 30 个地区的 140M IP。

渲染页面

真实浏览器加载页面，让标题、署名、时间戳和完整正文在捕获前完成渲染。

清除反爬

文章页和栏目页上的计量付费墙、机器人检测和速率限制都会被自动处理。无需破解，也无需维护。

返回 HTML 或 JSON

拿回完整渲染的 HTML，或在你添加 generic extractor 时获得类型化的 JSON。

05 应用场景

团队用 Washington Post 数据构建什么。

USE / 01监测

新闻监测

追踪首页、栏目页和文章页，在突发报道和更新发布时及时捕获。

USE / 02叙事

媒体与叙事分析

追踪话题、人物和政策在政治、商业与评论报道中的呈现方式。

USE / 03情感

情感与语气分析

提取标题和正文，随时间对各栏目的情感与语气进行评分。

USE / 04研究

研究与归档

为研究数据集和长期归档捕获干净的文章文本与元数据。

USE / 05训练

训练数据与 RAG

通过一个 API 将干净的文章文本输入模型、RAG 管线和智能体。

USE / 06覆盖

任意 URL，一个 API

爬取首页、栏目、文章和搜索，以及你需要的任何其他站点。

06 须知

抓取 The Washington Post 时值得了解的要点。

像真实浏览器一样渲染

Washington Post 用 JavaScript 渲染文章；Crawling API 运行真实浏览器，让标题、署名、时间戳和正文在捕获前加载完成。

默认 HTML，按需 JSON

你会得到完整渲染的 HTML。添加 scraper=generic-extractor 可获得解析后的标题、内容、图片和链接，或者你也可以自行解析 HTML。

计量付费墙，公开视图

文章位于计量付费墙之后；Crawling API 无需登录即可读取公开可见的页面，因此你得到的是未登录读者所看到的内容。

随处访问 Washington Post

跨 30 个地区的地理定向和 140M 住宅 IP，意味着无需管理代理即可获得稳定的访问。

07 为何选择 Crawlbase

为大规模爬取 The Washington Post 而打造。

Crawling API 运行在为 46,000+ 付费客户和 70,000+ 开发者提供服务的同一网络之上。无需购买代理，无需运行浏览器，Washington Post 变更时也无需打补丁。

99%

平均请求成功率

140M

住宅 IP，另有 98M 数据中心 IP

用于精准本地结果的地区数

20/s

默认每秒请求数，可按需提升

一个令牌，面向 Python、Node 和 Ruby 的官方 SDK，底层是 99.99% 正常运行时间的网络。

08 FAQ

Washington Post 抓取常见问题。

带上你的令牌，将 Washington Post URL 发送到 Crawlbase Crawling API。Crawlbase 会轮换住宅代理，在真实浏览器中渲染页面，清除机器人检测，并返回完整渲染的 HTML。添加 scraper=generic-extractor 即可改为获取结构化 JSON。

可以。Crawling API 默认返回渲染后的 HTML；添加 generic extractor（scraper=generic-extractor）即可以 JSON 形式接收标题、元信息、内容、图片和链接，或者你也可以自行解析 HTML。

会。真实浏览器执行页面，因此通过 JavaScript 加载的标题、署名、时间戳和完整正文都会被捕获，而不仅仅是初始 HTML。

Crawlbase 会将每个请求通过跨 30 个地区轮换的住宅 IP 传输，并自动清除机器人检测。你无需管理代理或破解 CAPTCHA，站点变更设置时也无需维护。

Crawling API 会读取未登录访客所见的公开可见页面。对于计量或付费墙文章，你会收到页面所提供的公开标记，无需登录，也无需订阅者凭据。

任意公开 URL：首页、政治或商业等栏目页、单篇文章页以及搜索结果。同一个 API 同样适用于任何其他站点。

免费开始，赠送最多 20,000 次请求，无需信用卡。付费套餐随用量扩展，同一个令牌可通用于 Crawling API 和每一个 Crawlbase 抓取器。

开始抓取 The Washington Post。
跳过付费墙和拦截。

免费开始，赠送最多 20,000 次请求。一个令牌通用于 Crawling API 和每一个抓取器。

免费开始阅读文档