Of course! Here is the content formatted in Markdown. +++ date = “2025-09-12T00:30:00+08:00” draft = false title = “Discourse migration log” slug = “rn2ovh” layout = “single” type = “blog” +++
This is a post-mortem/runbook of a real migration. I skip common Discourse prep (official docs cover it). I focus on the exact switches, Cloudflare/R2 gotchas, the rails/rake one-liners that mattered, what failed, and how to make the same move low-risk next time.
Target end-state
- Discourse runs on the new host (Docker, single app container).
- TLS via Let’s Encrypt.
- Traffic proxied by Cloudflare for
forum.example.com
(orange cloud). - Uploads + front-end assets live on Cloudflare R2:
- Bucket
discourse-uploads
(public) - Bucket
discourse-backups
(private)
- Bucket
- Custom domain:
https://files.example.com
(R2 “Custom domain”, not a manual CNAME)
0) Before you start (on the old host)
Announce maintenance → enable read-only: discourse enable_readonly
(inside the app container).
Take a DB-only backup (no uploads) and verify it:
|
|
Copy to the new host:
|
|
If you need an almost-zero content gap, you can repeat the DB-only dump/copy right before DNS cutover.
1) New host bootstrap
Install dependencies and Docker:
|
|
Set up Discourse:
|
|
Create containers/app.yml
with your production values. Until DNS points here, keep SSL templates commented out to avoid Let’s Encrypt failures. The key env
you must set:
|
|
Add asset-publish hooks
so CSS/JS/fonts get pushed to R2 during rebuild:
|
|
Bring the container up (HTTP-only for now):
|
|
2) Restore DB-only (fast cutover, tiny backup)
Important: A .sql.gz
file is not a standard Discourse restore input. You must import it with psql
inside the container.
|
|
If you still host local uploads for now and want a one-time copy (optional safety net before moving to R2):
|
|
3) Cloudflare R2: the knobs that actually matter
3.1 Buckets + token
- Create
discourse-uploads
(public) anddiscourse-backups
(private). - Create an Account API Token initially with Admin Read & Write scoped to those two buckets (this allows
Get/PutBucketCors
). - After successful bootstrap, rotate to Object Read & Write and rebuild to drop excess permissions.
3.2 Custom domain (no 1014s)
- Do it in R2 → Custom domains inside the same Cloudflare account as your DNS zone. Add
files.example.com
there and wait for Status: Active. - Don’t hand-roll cross-account CNAMEs (this causes a 1014 error).
- Keep
files.example.com
proxied (orange cloud), TLS min 1.2, Brotli on.
3.3 CORS on discourse-uploads
|
|
4) Discourse + R2: publish front-end assets to CDN
With the env
and hooks
in place (see §1), rebuild so assets land on R2:
|
|
Now CSS/JS/fonts serve from https://files.example.com/...
.
5) Migrate historical uploads to R2 (one-time)
Run inside the container, always with bundler + the discourse
user:
|
|
What you should see: “Listing local files → Listing S3 files → Syncing files”, “Updating the URLs in the database…”, “Flagging posts for rebake…”, then Done
.
If it says “N posts are not remapped…”, see §7.2.
6) Switch production domain to the new host
In containers/app.yml
ensure:
|
|
- Cloudflare DNS: point
forum.example.com
A record to the new IP, orange cloud ON, SSL/TLS Full (Strict), Brotli ON. Do not cache forum HTML. - Enable SSL templates in
app.yml
and rebuild:
|
|
Sanity checks:
|
|
Seeing HTTP/2 403
for anonymous is often the login_required
setting, not an outage.
7) Things that actually broke (and the fixes)
7.1 R2 checksum conflict (hard blocker)
Symptom
Aws::S3::Errors::InvalidRequest: You can only specify one non-default checksum at a time.
Fix — keep both envs set permanently (already in §1):
|
|
7.2 “X posts are not remapped to new S3 upload URL” (soft blocker)
Reason: some cooked
HTML still references /uploads/<db>/original/...
even after DB URL updates.
Fix — target only those posts (no full-site rebake):
List offenders:
|
|
Option A: rebake only those IDs:
|
|
Option B: if you truly have static strings, remap then rebake:
|
|
Re-run the migration (fast, to confirm it’s clean):
|
|
7.3 Tasks “missing” / rake -T
empty (false trails)
Always run with bundler and the correct environment:
|
|
To print effective S3 settings:
|
|
7.4 s3:upload_assets
AccessDenied (permissions)
Bootstrap with an Admin Read & Write token (for bucket-level ops). After assets publish, rotate the token to Object Read & Write and rebuild.
8) Verification
Inside the container
|
|
Browser
- View-source / Network → assets from
files.example.com
. - Old topics show images under
https://files.example.com/original/...
.
Backups
- Admin → Backups → “Create backup”; confirm a new object appears under
discourse-backups
on R2.
9) Cleanup (after you’re sure)
When cooked references to local paths are essentially 0 and old topics look good:
|
|
Rotate secrets:
- Create a new R2 token (Object RW), update
app.yml
, rebuild, then revoke the old token. - Rotate your SMTP app password if it ever touched logs.
10) Next time (playbook) — R2-first path
- Old → New (DB-only): Set old to read-only, make DB-only dump; import
.sql.gz
viapsql
on new host. - Wire R2 before DNS: Create buckets, Account API Token (Admin RW → later Object RW), custom domain in R2 UI (same CF account), and CORS.
- Discourse
env
+hooks
: Set S3/R2 env vars + checksum flags; addafter_assets_precompile
withs3:upload_assets
; rebuild to push assets to R2. - DNS Cutover: Point
forum.example.com
to new IP (orange cloud, Full/Strict). - Migrate Uploads: Run the
uploads:migrate_to_s3
one-liner from §5. - Fix Stragglers: Use targeted rebake/remap for any remaining local URLs; re-run the migration check.
- Let Sidekiq process the rebake queue, or run
posts:rebake_uncooked_posts
to accelerate. - Backups to R2: Create a backup and verify the new object appears in the
discourse-backups
bucket. - Permissions Hardening: Rotate R2 token to Object RW and keep checksum flags.
- Final Cleanup: Archive/remove local uploads after a cooling-off period.
Appendix — run commands the right way
Inside the container, always use the full context:
|
|
Running rake
/rails
as root or without bundler hides tasks and causes false errors.
This is everything I actually had to touch. No theory, just the levers that moved.
兩天之內,三個機房來回折騰⋯⋯
再碰 OVH 我就是:dog_face:!
這兩天所有問題都是他們家機器帶來的,這家美西的機器 IP 竟然被 Gemini 拉黑了⋯⋯
因為完全沒想到這點,直接遷移過去了就,本次遷移同時,腦子一熱,同步做了極端複雜的論壇附件全部 S2 化,然後⋯⋯
當我確認是 IP 被拉黑且強制設置 IP V6 無效後,就不得不備份論壇數據並遷移下一個機房⋯⋯
然後,OVH 就開始各種作祟⋯⋯ 論壇備份文件竟然不完整不完整不完整不完整⋯⋯ 又因為是網頁端做的備份,就完全沒意識到有這個天坑⋯⋯
再從網頁操作服務器我就是蠢:dog_face:!
查明是備份不完整後,終於⋯⋯
回來了。
R2 設置不變,此後遷移的事,我就是良醫!為什麼呢?
三折肱⋯⋯
疼!