WeChat → HTML
微信這種玩意最大的噁心就一個詞:封閉!微信公眾號就更是!
無 oEmbed 的嵌入 API,無法還原全文,禁止 iframe
打開,各種所謂圖片防盜鏈⋯⋯
從互聯網的本真視角,是個號,很微信,並不公眾。
之前用到微信公號文章轉論壇時,專門為自己機器寫過一個腳本,通過 getwc 直接抓圖片再轉論壇;但寫完後也沒用過。
今天學生發微信公號內容到論壇,問“論壇怎麼搬微信公眾號文章()貼連結看著太不方便了”⋯⋯ 回覆了學生一段話:
大多數正常網站或鏈接都支持嵌入,但微信⋯⋯ 從來就不是正常玩意兒。
公眾號文章,直接複製粘貼,文字是可以的,但圖片會因為微信官方做的設置直接不能正常顯示。
也因為微信官方的設置,整個鏈接在論壇無法正常顯示。
這是個人對微信這玩意兒深惡痛絕的原因之一。
然後,現身說法,抱怨之外,要自己解決問題。
兩個小時的成果:通過 https://wx.bdfz.net/ 網頁,學生此後可以直接快速搬遷微信公號內容到論壇了。
授之以魚與漁,手冊:
1) Overview
WeChat → HTML is a Cloudflare Worker that turns a WeChat article URL (mp.weixin.qq.com/s/...) into a permanent, fast‑loading web page with all text and images preserved. It also provides one‑click copy for Discourse and a Markdown export.
- Frontend: a minimal home page where you paste a WeChat URL. It shows a permanent link, a live preview, and copy buttons.
- Backend: fetches the article, sanitizes HTML, rehosts images to Cloudflare R2, stores the final HTML in R2, and returns a canonical slug URL like
https://wx.bdfz.net/<slug>. - Instant hit (“秒回”): repeated requests for the same URL return immediately via a short‑hash URL index.
2) Features
- Full‑text extraction of
#js_content - Image normalization & rehosting to R2
- Clean, mobile‑friendly article page (dark‑mode aware)
- Markdown export:
GET /api/export/<slug>.md - One‑click copy: “full text + link” (HTML + plaintext) and “Markdown”
- Instant return if a URL has been ingested before
- Health check endpoint
- Fallback image proxy
/img?u=
3) Architecture
Browser → Worker (/api/ingest) → Fetch WeChat page
↓
Extract #js_content
↓
Upload images to R2 (bucket: blog-images)
↓
Render final HTML, write to R2 (bucket: wx-articles)
↓
Save URL→slug index for future instant hits
↓
Return { ok, url: https://wx.bdfz.net/<slug> } + render previewR2 keys
- Article HTML:
articles/<slug>.html - URL index:
index/byurl/<sha1(url).slice(0,8)>.json - Images:
wechat/<slug>/<date>_<sha1>.<ext>
4) Requirements
- Cloudflare account + zone for your domain (e.g.
bdfz.net) - Cloudflare Workers & R2 enabled
- DNS record for subdomain (e.g.
wx.bdfz.net) proxied through Cloudflare - Node.js on your machine (commands use
npx wrangler)
5) Cloudflare Resources (bindings)
Create two R2 buckets and two environment variables:
- R2 Buckets
wx-articles→ stores rendered article HTML & URL indexblog-images→ stores rehosted images
- Environment variables
PUBLIC_IMG_BASE— e.g.https://img.bdfz.net(public host forblog-images)SITE_TITLE— UI title, e.g.WeChat → HTML
These are bound inside the Worker as:
env.WX_ART -> R2 bucket wx-articles
env.WX_IMG -> R2 bucket blog-images
env.PUBLIC_IMG_BASE -> "https://img.bdfz.net"
env.SITE_TITLE -> "WeChat → HTML"Routing: add a Worker route wx.bdfz.net/* for the zone bdfz.net.
6) Install & Deploy
Login once (OAuth)
unset CF_API_TOKEN CLOUDFLARE_API_TOKEN CLOUDFLARE_ACCOUNT_ID
npx wrangler login
wrangler whoamiCreate (or confirm) buckets
npx wrangler r2 bucket create wx-articles
npx wrangler r2 bucket create blog-imagesDeploy
# From the project root
npx wrangler deployYou should see bindings listed and the route wx.bdfz.net/* applied.
7) Using the Service
A) Web UI
Open:
https://wx.bdfz.net/- Paste a WeChat URL → click Convert
- You’ll get:
- Permanent link:
https://wx.bdfz.net/<slug> - Live preview rendered from the saved page
- Copy full text + link (rich HTML + plaintext fallback)
- Copy Markdown
- Download .md
- Permanent link:
Auto‑submit shortcut
Open with a prefilled URL (auto‑submit once; query is removed to prevent flicker):
https://wx.bdfz.net/?url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2F...If the same URL was processed before, you’ll see “Already existed — served instantly.”.
B) API Endpoints
POST /api/ingest
Body can be form or JSON:
# Form
curl -X POST -F 'url=https://mp.weixin.qq.com/s/...' https://wx.bdfz.net/api/ingest
# JSON
curl -X POST -H 'content-type: application/json' \
-d '{"url":"https://mp.weixin.qq.com/s/..."}' \
https://wx.bdfz.net/api/ingestSuccess:
{
"ok": true,
"url": "https://wx.bdfz.net/<slug>",
"slug": "<slug>",
"title": "Article Title",
"existed": false
}Already exists:
{ "ok": true, "url": "...", "slug": "...", "title": "...", "existed": true }GET /<slug>
Returns the rendered HTML page with sanitized content and rehosted images.
GET /api/export/<slug>.md
Downloads Markdown built from the saved HTML.
GET /img?u=<http-url>
Fallback image proxy if some uploads fail or exceed the per‑article cap.
GET /health
Basic health check: { ok: true, service: "wx-ingest" }.
8) Technical Notes
- Extraction: finds inner HTML of
<div id="js_content">...</div>. - Sanitization: strips
<script>/<style>, event handlers (on*), andjavascript:URLs. - Images:
- Reads
data-srcorsrc, normalizes to HTTPS. - Uploads up to 40 images per article to R2 in parallel (concurrency 8).
- Excess images fall back to
/img?u=...proxy.
- Reads
- Slug:
slugify(title) + '-' + sha1(url).slice(0,8). - Index for instant hits:
index/byurl/<sha1(url).slice(0,8)>.json. - Caching:
- Articles:
Cache-Control: public, max-age=3600 - Proxy images: Cloudflare cache ~1 hour
- Articles:
9) Discourse Workflow
No Discourse config changes needed.
- Use Copy full text + link and paste into the Discourse editor.
The clipboard contains HTML (rich) and plaintext (fallback). - Or just paste the permanent link
https://wx.bdfz.net/<slug>. - Prefer Markdown? Use Copy Markdown or Download .md.
10) Operations
Re‑ingest / Update
Delete the article and index keys from wx-articles, then re‑run /api/ingest for the URL.
Purge Completely
- Remove
articles/<slug>.html - Remove
index/byurl/<hash8>.json - (Optional) remove images under
wechat/<slug>/fromblog-images
Logs & Tail
npx wrangler tail11) Configuration & Tuning
- Max images per article:
MAX_FETCH = 40 - Parallelism:
CONCURRENCY = 8 - Timeout:
TIMEOUT_MS = 15000 - Site title:
SITE_TITLEenv var - Image CDN base:
PUBLIC_IMG_BASEenv var
12) Known Limitations
- Dynamic components (audio/video) may not render identically after sanitization.
- Very large articles may proxy some images instead of rehosting them.
- If WeChat hotlink protections change, update fetch headers (we already send mobile UA & Referer).
That’s it. Paste a WeChat URL → get a permanent link, full‑text preview, and one‑click copy/Markdown. Enjoy!
WAF 內加入了防刷,也。