SUEN

WeChat → HTML

微信這種玩意最大的噁心就一個詞:封閉!微信公眾號就更是!
無 oEmbed 的嵌入 API,無法還原全文,禁止 iframe 打開,各種所謂圖片防盜鏈⋯⋯
從互聯網的本真視角,是個號,很微信,並不公眾。
之前用到微信公號文章轉論壇時,專門為自己機器寫過一個腳本,通過 getwc 直接抓圖片再轉論壇;但寫完後也沒用過。
今天學生發微信公號內容到論壇,問“論壇怎麼搬微信公眾號文章()貼連結看著太不方便了”⋯⋯ 回覆了學生一段話:

大多數正常網站或鏈接都支持嵌入,但微信⋯⋯ 從來就不是正常玩意兒。
公眾號文章,直接複製粘貼,文字是可以的,但圖片會因為微信官方做的設置直接不能正常顯示。
也因為微信官方的設置,整個鏈接在論壇無法正常顯示。
這是個人對微信這玩意兒深惡痛絕的原因之一。

然後,現身說法,抱怨之外,要自己解決問題。

兩個小時的成果:通過 https://wx.bdfz.net/ 網頁,學生此後可以直接快速搬遷微信公號內容到論壇了。

授之以魚與漁,手冊:

1) Overview

WeChat → HTML is a Cloudflare Worker that turns a WeChat article URL (mp.weixin.qq.com/s/...) into a permanent, fast‑loading web page with all text and images preserved. It also provides one‑click copy for Discourse and a Markdown export.


2) Features


3) Architecture

text
Browser → Worker (/api/ingest) → Fetch WeChat page
                   Extract #js_content
        Upload images to R2 (bucket: blog-images)
     Render final HTML, write to R2 (bucket: wx-articles)
     Save URL→slug index for future instant hits
Return { ok, url: https://wx.bdfz.net/<slug> } + render preview

R2 keys


4) Requirements


5) Cloudflare Resources (bindings)

Create two R2 buckets and two environment variables:

These are bound inside the Worker as:

text
env.WX_ART   -> R2 bucket wx-articles
env.WX_IMG   -> R2 bucket blog-images
env.PUBLIC_IMG_BASE -> "https://img.bdfz.net"
env.SITE_TITLE      -> "WeChat → HTML"

Routing: add a Worker route wx.bdfz.net/* for the zone bdfz.net.


6) Install & Deploy

Login once (OAuth)

bash
unset CF_API_TOKEN CLOUDFLARE_API_TOKEN CLOUDFLARE_ACCOUNT_ID
npx wrangler login
wrangler whoami

Create (or confirm) buckets

bash
npx wrangler r2 bucket create wx-articles
npx wrangler r2 bucket create blog-images

Deploy

bash
# From the project root
npx wrangler deploy

You should see bindings listed and the route wx.bdfz.net/* applied.


7) Using the Service

A) Web UI

Open:

text
https://wx.bdfz.net/

Auto‑submit shortcut

Open with a prefilled URL (auto‑submit once; query is removed to prevent flicker):

text
https://wx.bdfz.net/?url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2F...

If the same URL was processed before, you’ll see “Already existed — served instantly.”.

B) API Endpoints

POST /api/ingest

Body can be form or JSON:

bash
# Form
curl -X POST -F 'url=https://mp.weixin.qq.com/s/...' https://wx.bdfz.net/api/ingest

# JSON
curl -X POST -H 'content-type: application/json' \
  -d '{"url":"https://mp.weixin.qq.com/s/..."}' \
  https://wx.bdfz.net/api/ingest

Success:

json
{
  "ok": true,
  "url": "https://wx.bdfz.net/<slug>",
  "slug": "<slug>",
  "title": "Article Title",
  "existed": false
}

Already exists:

json
{ "ok": true, "url": "...", "slug": "...", "title": "...", "existed": true }

GET /<slug>

Returns the rendered HTML page with sanitized content and rehosted images.

GET /api/export/<slug>.md

Downloads Markdown built from the saved HTML.

GET /img?u=<http-url>

Fallback image proxy if some uploads fail or exceed the per‑article cap.

GET /health

Basic health check: { ok: true, service: "wx-ingest" }.


8) Technical Notes


9) Discourse Workflow

No Discourse config changes needed.


10) Operations

Re‑ingest / Update

Delete the article and index keys from wx-articles, then re‑run /api/ingest for the URL.

Purge Completely

Logs & Tail

bash
npx wrangler tail

11) Configuration & Tuning


12) Known Limitations


That’s it. Paste a WeChat URL → get a permanent link, full‑text preview, and one‑click copy/Markdown. Enjoy!

WAF 內加入了防刷,也。