Suen

Of course! Here is the content formatted in Markdown. +++ date = “2025-09-12T00:30:00+08:00” draft = false title = “Discourse migration log” slug = “rn2ovh” layout = “single” type = “blog” +++

This is a post-mortem/runbook of a real migration. I skip common Discourse prep (official docs cover it). I focus on the exact switches, Cloudflare/R2 gotchas, the rails/rake one-liners that mattered, what failed, and how to make the same move low-risk next time.


Target end-state


0) Before you start (on the old host)

Announce maintenance → enable read-only: discourse enable_readonly (inside the app container).

Take a DB-only backup (no uploads) and verify it:

1
2
3
ls -lh /var/discourse/shared/standalone/backups/default/
zcat -t /var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz
sha256sum /var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz > /tmp/backup.sha256

Copy to the new host:

1
2
3
4
5
6
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
  root@OLD:/var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz \
  /var/discourse/shared/standalone/backups/default/
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
  root@OLD:/tmp/backup.sha256 \
  /var/discourse/shared/standalone/backups/default/

If you need an almost-zero content gap, you can repeat the DB-only dump/copy right before DNS cutover.


1) New host bootstrap

Install dependencies and Docker:

1
2
3
4
# bare minimum
apt-get update && apt-get install -y git curl tzdata
curl -fsSL [https://get.docker.com](https://get.docker.com) | sh
systemctl enable --now docker

Set up Discourse:

1
2
# discourse_docker
git clone [https://github.com/discourse/discourse_docker](https://github.com/discourse/discourse_docker) /var/discourse

Create containers/app.yml with your production values. Until DNS points here, keep SSL templates commented out to avoid Let’s Encrypt failures. The key env you must set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
env:
  DISCOURSE_HOSTNAME: forum.example.com

  # R2 / S3
  DISCOURSE_USE_S3: "true"
  DISCOURSE_S3_REGION: "auto"
  DISCOURSE_S3_ENDPOINT: "https://<ACCOUNT_ID>.r2.cloudflarestorage.com"
  DISCOURSE_S3_FORCE_PATH_STYLE: "true"
  DISCOURSE_S3_BUCKET: "discourse-uploads"
  DISCOURSE_S3_BACKUP_BUCKET: "discourse-backups"
  DISCOURSE_S3_ACCESS_KEY_ID: "<R2_KEY>"
  DISCOURSE_S3_SECRET_ACCESS_KEY: "<R2_SECRET>"
  DISCOURSE_S3_CDN_URL: "[https://files.example.com](https://files.example.com)"
  DISCOURSE_BACKUP_LOCATION: "s3"

  # R2 checksum knobs (prevent conflicts)
  AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
  AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"

  # SMTP / Let’s Encrypt email
  DISCOURSE_SMTP_ADDRESS: smtp.gmail.com
  DISCOURSE_SMTP_PORT: 587
  DISCOURSE_SMTP_USER_NAME: you@example.com
  DISCOURSE_SMTP_PASSWORD: "<app-password>"
  DISCOURSE_SMTP_DOMAIN: example.com
  DISCOURSE_NOTIFICATION_EMAIL: you@example.com
  LETSENCRYPT_ACCOUNT_EMAIL: you@example.com

Add asset-publish hooks so CSS/JS/fonts get pushed to R2 during rebuild:

1
2
3
4
5
6
7
hooks:
  after_assets_precompile:
    - exec:
        cd: $home
        cmd:
          - sudo -E -u discourse bundle exec rake s3:upload_assets
          - sudo -E -u discourse bundle exec rake s3:expire_missing_assets

Bring the container up (HTTP-only for now):

1
2
cd /var/discourse
./launcher rebuild app

2) Restore DB-only (fast cutover, tiny backup)

Important: A .sql.gz file is not a standard Discourse restore input. You must import it with psql inside the container.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
cd /var/discourse
./launcher enter app

# inside the container:
sv stop unicorn || true; sv stop sidekiq || true

# make sure the DB is clean (prevents “already exists” spam)
sudo -u postgres psql -c "REVOKE CONNECT ON DATABASE discourse FROM public;"
sudo -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='discourse';"
sudo -u postgres psql -c "DROP DATABASE IF EXISTS discourse;"
sudo -u postgres psql -c "CREATE DATABASE discourse WITH OWNER discourse TEMPLATE template0 ENCODING 'UTF8';"
sudo -u postgres psql -d discourse -c "CREATE EXTENSION IF NOT EXISTS citext;"
sudo -u postgres psql -d discourse -c "CREATE EXTENSION IF NOT EXISTS hstore;"

# import
zcat /shared/backups/default/<DB_ONLY>.sql.gz | sudo -u postgres psql discourse

# bring web back
sv start unicorn
[ -d /etc/service/sidekiq ] && sv start sidekiq || true
exit

If you still host local uploads for now and want a one-time copy (optional safety net before moving to R2):

1
2
3
4
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
  root@OLD:/var/discourse/shared/standalone/uploads/ \
  /var/discourse/shared/standalone/uploads/
chown -R 1000:1000 /var/discourse/shared/standalone/uploads

3) Cloudflare R2: the knobs that actually matter

3.1 Buckets + token

3.2 Custom domain (no 1014s)

3.3 CORS on discourse-uploads

1
2
3
4
5
6
7
8
9
[
  {
    "AllowedOrigins": ["[https://forum.example.com](https://forum.example.com)", "[https://files.example.com](https://files.example.com)"],
    "AllowedMethods": ["GET", "HEAD"],
    "AllowedHeaders": ["*"],
    "ExposeHeaders": ["*"],
    "MaxAgeSeconds": 86400
  }
]

4) Discourse + R2: publish front-end assets to CDN

With the env and hooks in place (see §1), rebuild so assets land on R2:

1
2
cd /var/discourse
./launcher rebuild app

Now CSS/JS/fonts serve from https://files.example.com/....


5) Migrate historical uploads to R2 (one-time)

Run inside the container, always with bundler + the discourse user:

1
2
3
4
5
./launcher enter app

# one-time migration (auto-answers the prompt)
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3

What you should see: “Listing local files → Listing S3 files → Syncing files”, “Updating the URLs in the database…”, “Flagging posts for rebake…”, then Done.

If it says “N posts are not remapped…”, see §7.2.


6) Switch production domain to the new host

In containers/app.yml ensure:

1
2
DISCOURSE_HOSTNAME: forum.example.com
LETSENCRYPT_ACCOUNT_EMAIL: you@example.com
1
2
cd /var/discourse
./launcher rebuild app

Sanity checks:

1
2
curl -I [https://forum.example.com](https://forum.example.com)
./launcher logs app | tail -n 200

Seeing HTTP/2 403 for anonymous is often the login_required setting, not an outage.


7) Things that actually broke (and the fixes)

7.1 R2 checksum conflict (hard blocker)

Symptom Aws::S3::Errors::InvalidRequest: You can only specify one non-default checksum at a time.

Fix — keep both envs set permanently (already in §1):

1
2
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"

7.2 “X posts are not remapped to new S3 upload URL” (soft blocker)

Reason: some cooked HTML still references /uploads/<db>/original/... even after DB URL updates.

Fix — target only those posts (no full-site rebake):

List offenders:

1
2
3
4
sudo -E -u discourse RAILS_ENV=production bundle exec rails r '
db = RailsMultisite::ConnectionManagement.current_db
puts Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").pluck(:id,:topic_id).map{|id,tid| "#{id}:#{tid}"}
'

Option A: rebake only those IDs:

1
2
3
4
5
6
sudo -E -u discourse RAILS_ENV=production bundle exec rails r '
db  = RailsMultisite::ConnectionManagement.current_db
ids = Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").pluck(:id)
ids.each { |pid| Post.find(pid).rebake! }
puts "rebaked=#{ids.size}"
'

Option B: if you truly have static strings, remap then rebake:

1
2
sudo -E -u discourse RAILS_ENV=production bundle exec \
rake "posts:remap[/uploads/default/original,[https://files.example.com/original](https://files.example.com/original)]"

Re-run the migration (fast, to confirm it’s clean):

1
2
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3

7.3 Tasks “missing” / rake -T empty (false trails)

Always run with bundler and the correct environment:

1
2
sudo -E -u discourse RAILS_ENV=production bundle exec rake -T s3
sudo -E -u discourse RAILS_ENV=production bundle exec rake -T uploads

To print effective S3 settings:

1
2
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts({enable_env: ENV["DISCOURSE_USE_S3"], bucket: ENV["DISCOURSE_S3_BUCKET"], endpoint: ENV["DISCOURSE_S3_ENDPOINT"], cdn: ENV["DISCOURSE_S3_CDN_URL"]})'

7.4 s3:upload_assets AccessDenied (permissions)

Bootstrap with an Admin Read & Write token (for bucket-level ops). After assets publish, rotate the token to Object Read & Write and rebuild.


8) Verification

Inside the container

1
2
3
4
5
6
7
# a few URLs that now use the CDN
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts Upload.where("url LIKE ?", "%files.example.com%").limit(5).pluck(:url)'

# remaining cooked references to local uploads (should trend to 0)
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'db=RailsMultisite::ConnectionManagement.current_db; puts Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").count'

Browser

Backups


9) Cleanup (after you’re sure)

When cooked references to local paths are essentially 0 and old topics look good:

1
2
3
4
5
6
mv /var/discourse/shared/standalone/uploads /var/discourse/shared/standalone/uploads.bak
mkdir -p /var/discourse/shared/standalone/uploads
chown -R 1000:1000 /var/discourse/shared/standalone/uploads

# after a few stable days:
rm -rf /var/discourse/shared/standalone/uploads.bak

Rotate secrets:


10) Next time (playbook) — R2-first path

  1. Old → New (DB-only): Set old to read-only, make DB-only dump; import .sql.gz via psql on new host.
  2. Wire R2 before DNS: Create buckets, Account API Token (Admin RW → later Object RW), custom domain in R2 UI (same CF account), and CORS.
  3. Discourse env + hooks: Set S3/R2 env vars + checksum flags; add after_assets_precompile with s3:upload_assets; rebuild to push assets to R2.
  4. DNS Cutover: Point forum.example.com to new IP (orange cloud, Full/Strict).
  5. Migrate Uploads: Run the uploads:migrate_to_s3 one-liner from §5.
  6. Fix Stragglers: Use targeted rebake/remap for any remaining local URLs; re-run the migration check.
  7. Let Sidekiq process the rebake queue, or run posts:rebake_uncooked_posts to accelerate.
  8. Backups to R2: Create a backup and verify the new object appears in the discourse-backups bucket.
  9. Permissions Hardening: Rotate R2 token to Object RW and keep checksum flags.
  10. Final Cleanup: Archive/remove local uploads after a cooling-off period.

Appendix — run commands the right way

Inside the container, always use the full context:

1
sudo -E -u discourse RAILS_ENV=production bundle exec <rake|rails ...>

Running rake/rails as root or without bundler hides tasks and causes false errors.

This is everything I actually had to touch. No theory, just the levers that moved.


兩天之內,三個機房來回折騰⋯⋯
再碰 OVH 我就是:dog_face:!

這兩天所有問題都是他們家機器帶來的,這家美西的機器 IP 竟然被 Gemini 拉黑了⋯⋯
因為完全沒想到這點,直接遷移過去了就,本次遷移同時,腦子一熱,同步做了極端複雜的論壇附件全部 S2 化,然後⋯⋯
當我確認是 IP 被拉黑且強制設置 IP V6 無效後,就不得不備份論壇數據並遷移下一個機房⋯⋯

然後,OVH 就開始各種作祟⋯⋯ 論壇備份文件竟然不完整不完整不完整不完整⋯⋯ 又因為是網頁端做的備份,就完全沒意識到有這個天坑⋯⋯
再從網頁操作服務器我就是蠢:dog_face:!

查明是備份不完整後,終於⋯⋯
回來了。

R2 設置不變,此後遷移的事,我就是良醫!為什麼呢?
三折肱⋯⋯
疼!