This is a post-mortem/runbook of a real migration. I skip common Discourse prep (the official docs cover it). I focus on the exact switches, Cloudflare R2 gotchas, the rails/rake one-liners that mattered, what failed, and how to make the same move low-risk next time.
Target end-state
- Discourse runs on the new host (Docker, single
app
container).
- TLS via Let’s Encrypt.
- Traffic optionally proxied by a lightweight front proxy for
forum.example.com
(or direct DNS to the origin).
- Uploads + front-end assets live on Cloudflare R2:
- Bucket
discourse-uploads
(public)
- Bucket
discourse-backups
(private)
- R2 custom domain:
https://files.example.com
(created in R2 → Custom domains, not a manual cross-account CNAME).
0) DB backups that actually work (nightly and cutover)
Nightly backups are for disaster recovery. A last-minute backup is for migration cutover. Keep both.
0.1 Policy
- Nightly: DB-only backup (
.sql.gz
, no uploads) → verify locally → upload to R2. Keep ≥7 copies (or use R2 lifecycle).
- Cutover: right before DNS switch, make another DB-only backup and restore that to the new host to minimize content gap.
0.2 Make a DB-only backup and verify
Inside the container:
1
2
3
4
5
6
7
8
9
10
|
# Optional but nice: reduce writes while snapshotting
discourse enable_readonly
# Trigger a DB-only backup from Admin UI (uncheck "with uploads")
# or CLI:
discourse backup
# Verify the artifact
ls -lh /var/discourse/shared/standalone/backups/default/
zcat -t /var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz
|
Deep verify (best): restore to a temporary DB and count rows:
1
2
3
4
5
6
7
8
9
10
|
cd /var/discourse && ./launcher enter app
sudo -E -u postgres psql -tc "DROP DATABASE IF EXISTS verifydb;"
sudo -E -u postgres createdb verifydb
zcat /shared/backups/default/<DB_ONLY>.sql.gz | sudo -E -u postgres psql verifydb
sudo -E -u postgres psql -d verifydb -c "select count(*) from topics where deleted_at is null;"
sudo -E -u postgres psql -d verifydb -c "select count(*) from posts where post_type=1 and deleted_at is null;"
sudo -E -u postgres dropdb verifydb
exit
|
If the gzip test or the temporary restore fails, do not upload that file to R2—fix and re-backup.
0.3 Push to R2 only after it passes
1
2
|
aws s3 cp /var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz \
s3://discourse-backups/
|
0.4 Why sizes differ (1–4 GB is normal)
Both Admin nightly and manual pg_dump
produce DB-only .sql.gz
. Size differences usually come from included tables and compression, not “missing posts”. If you want to see what’s inside:
1
2
3
4
5
6
|
# Which tables have data in the dump?
zcat <DB_ONLY>.sql.gz | grep -E '^COPY public\.' | awk '{print $2}' | sort -u | head
# Quick line-count approximation for key tables
zcat <DB_ONLY>.sql.gz | awk '/^COPY public.posts /{c=1;next}/^\\\./{c=0} c' | wc -l
zcat <DB_ONLY>.sql.gz | awk '/^COPY public.topics /{c=1;next}/^\\\./{c=0} c' | wc -l
|
If those counts match expectations, the backup contains all posts/topics regardless of the file size.
1) Old host: prepare and copy the (verified) DB-only backup
Announce maintenance → enable read-only:
1
2
3
|
cd /var/discourse && ./launcher enter app
discourse enable_readonly
exit
|
Copy the verified .sql.gz
to the new host:
1
2
3
|
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
root@OLD:/var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz \
/var/discourse/shared/standalone/backups/default/
|
If you want an almost-zero content gap, repeat this step right before DNS cutover.
2) New host bootstrap
Install Docker + discourse_docker:
1
2
3
4
5
|
apt-get update && apt-get install -y git curl tzdata
curl -fsSL https://get.docker.com | sh
systemctl enable --now docker
git clone https://github.com/discourse/discourse_docker /var/discourse
|
Create containers/app.yml
with production values. Keep SSL templates commented until DNS points here. Minimum env
set:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
env:
DISCOURSE_HOSTNAME: forum.example.com
# R2 / S3
DISCOURSE_USE_S3: "true"
DISCOURSE_S3_REGION: "auto"
DISCOURSE_S3_ENDPOINT: "https://<ACCOUNT_ID>.r2.cloudflarestorage.com"
DISCOURSE_S3_FORCE_PATH_STYLE: "true"
DISCOURSE_S3_BUCKET: "discourse-uploads"
DISCOURSE_S3_BACKUP_BUCKET: "discourse-backups"
DISCOURSE_S3_ACCESS_KEY_ID: "<R2_KEY>"
DISCOURSE_S3_SECRET_ACCESS_KEY: "<R2_SECRET>"
DISCOURSE_S3_CDN_URL: "https://files.example.com"
DISCOURSE_BACKUP_LOCATION: "s3"
# R2 checksum knobs (prevent conflicts)
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"
# SMTP / Let’s Encrypt email
DISCOURSE_SMTP_ADDRESS: smtp.gmail.com
DISCOURSE_SMTP_PORT: 587
DISCOURSE_SMTP_USER_NAME: you@example.com
DISCOURSE_SMTP_PASSWORD: "<app-password>"
DISCOURSE_SMTP_DOMAIN: example.com
DISCOURSE_NOTIFICATION_EMAIL: you@example.com
LETSENCRYPT_ACCOUNT_EMAIL: you@example.com
|
Publish assets to R2 during rebuild:
1
2
3
4
5
6
7
|
hooks:
after_assets_precompile:
- exec:
cd: $home
cmd:
- sudo -E -u discourse bundle exec rake s3:upload_assets
- sudo -E -u discourse bundle exec rake s3:expire_missing_assets
|
Bring the container up (HTTP-only for now):
1
|
cd /var/discourse && ./launcher rebuild app
|
3) Restore the DB-only dump (.sql.gz
via psql)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
cd /var/discourse && ./launcher enter app
sv stop unicorn || true; sv stop sidekiq || true
# ensure a clean DB
sudo -E -u postgres psql -c "REVOKE CONNECT ON DATABASE discourse FROM public;"
sudo -E -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='discourse';"
sudo -E -u postgres psql -c "DROP DATABASE IF EXISTS discourse;"
sudo -E -u postgres psql -c "CREATE DATABASE discourse WITH OWNER discourse TEMPLATE template0 ENCODING 'UTF8';"
sudo -E -u postgres psql -d discourse -c "CREATE EXTENSION IF NOT EXISTS citext;"
sudo -E -u postgres psql -d discourse -c "CREATE EXTENSION IF NOT EXISTS hstore;"
# import the dump
zcat /shared/backups/default/<DB_ONLY>.sql.gz | sudo -E -u postgres psql discourse
sv start unicorn
[ -d /etc/service/sidekiq ] && sv start sidekiq || true
exit
|
If you’re still carrying local uploads pre-R2, you can rsync them once as a safety net; we’ll migrate them to R2 next.
4) R2 knobs that mattered
Buckets & token: create discourse-uploads
(public) and discourse-backups
(private). Bootstrap with an Account API Token scoped to those two buckets with Admin Read & Write (so PutBucketCors
works), then rotate to Object Read & Write after success.
Custom domain: add files.example.com
in R2 → Custom domains under the same Cloudflare account as your DNS zone (avoids 1014 cross-account CNAME errors).
CORS on discourse-uploads
:
1
2
3
4
5
6
7
8
9
|
[
{
"AllowedOrigins": ["https://forum.example.com","https://files.example.com"],
"AllowedMethods": ["GET","HEAD"],
"AllowedHeaders": ["*"],
"ExposeHeaders": ["*"],
"MaxAgeSeconds": 86400
}
]
|
Rebuild so CSS/JS/fonts publish to R2:
1
|
cd /var/discourse && ./launcher rebuild app
|
5) One-time migration of historical uploads to R2
1
2
3
4
|
cd /var/discourse && ./launcher enter app
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3
|
If you get “X posts not remapped…”, see §7.2 for targeted fixes.
6) Switch production domain
Set in app.yml
:
1
2
|
DISCOURSE_HOSTNAME: forum.example.com
LETSENCRYPT_ACCOUNT_EMAIL: you@example.com
|
DNS: point forum.example.com
to the new front (or origin) IP, enable SSL templates, then:
1
|
cd /var/discourse && ./launcher rebuild app
|
Sanity:
1
2
|
curl -I https://forum.example.com
./launcher logs app | tail -n 200
|
Seeing HTTP/2 403
for anonymous usually means login_required
—not an outage.
7) Things that actually broke (and fixes)
7.1 R2 checksum conflict
Aws::S3::Errors::InvalidRequest: You can only specify one non-default checksum at a time.
Fix (keep permanently):
1
2
|
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"
|
7.2 “X posts are not remapped to new S3 upload URL”
Reason: some cooked
HTML still points at /uploads/<db>/original/...
.
Targeted rebake:
1
2
3
4
5
6
|
sudo -E -u discourse RAILS_ENV=production bundle exec rails r '
db = RailsMultisite::ConnectionManagement.current_db
ids = Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").pluck(:id)
ids.each { |pid| Post.find(pid).rebake! }
puts "rebaked=#{ids.size}"
'
|
Or remap a static prefix then rebake touched posts:
1
2
|
sudo -E -u discourse RAILS_ENV=production bundle exec \
rake "posts:remap[/uploads/default/original,https://files.example.com/original]"
|
Re-run the migration to confirm clean:
1
2
|
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3
|
7.3 Tasks “missing”
Always run with bundler + env:
1
2
|
sudo -E -u discourse RAILS_ENV=production bundle exec rake -T s3
sudo -E -u discourse RAILS_ENV=production bundle exec rake -T uploads
|
Print effective S3 settings:
1
2
|
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts({ use_s3: ENV["DISCOURSE_USE_S3"], bucket: ENV["DISCOURSE_S3_BUCKET"], endpoint: ENV["DISCOURSE_S3_ENDPOINT"], cdn: ENV["DISCOURSE_S3_CDN_URL"] })'
|
7.4 s3:upload_assets
AccessDenied
Use an Admin RW token for bootstrap (bucket-level CORS ops), then rotate to Object RW.
8) Verification
Inside the container
1
2
3
4
5
6
7
|
# URLs now using the CDN
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts Upload.where("url LIKE ?", "%files.example.com%").limit(5).pluck(:url)'
# Remaining cooked references to local uploads (should trend to 0)
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'db=RailsMultisite::ConnectionManagement.current_db; puts Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").count'
|
Browser
- Network tab shows assets from
files.example.com
.
- Old topics show images under
https://files.example.com/original/...
.
Backups
- Admin → Backups → create one; confirm a new object appears in
discourse-backups
on R2.
9) Cleanup
When cooked references are essentially 0:
1
2
3
4
5
6
|
mv /var/discourse/shared/standalone/uploads /var/discourse/shared/standalone/uploads.bak
mkdir -p /var/discourse/shared/standalone/uploads
chown -R 1000:1000 /var/discourse/shared/standalone/uploads
# after a few stable days
rm -rf /var/discourse/shared/standalone/uploads.bak
|
Rotate secrets (R2 token → Object RW; SMTP app password if it ever hit logs).
10) Next time (playbook) — R2-first path
- Old → New (DB-only): read-only → backup → restore
.sql.gz
via psql
.
- Wire R2 before DNS: buckets, token (Admin RW → later Object RW), custom domain, CORS.
env
+ hooks
: checksum flags + s3:upload_assets
; rebuild.
- DNS cutover to the new host.
- Migrate uploads to R2.
- Fix stragglers (targeted rebake/remap) → quick re-run of the migration.
- Sidekiq finishes background rebakes (or
posts:rebake_uncooked_posts
).
- Backups to R2 verified.
- Permissions hardening and secret rotation.
- Cleanup local uploads after a cooling-off period.
Appendix A — “verify-before-upload” nightly (pseudo-cron)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
LATEST=$(ls -1t /var/discourse/shared/standalone/backups/default/*.sql.gz | head -n1)
# 1) gzip integrity
gzip -t "$LATEST" || exit 1
# 2) temporary-DB row counts
cd /var/discourse && ./launcher enter app <<'EOS'
sudo -E -u postgres psql -tc "DROP DATABASE IF EXISTS verifydb;"
sudo -E -u postgres createdb verifydb
zcat /shared/backups/default/$(basename '"$LATEST"') | sudo -E -u postgres psql verifydb
sudo -E -u postgres psql -d verifydb -c "select count(*) as topics from topics where deleted_at is null;"
sudo -E -u postgres psql -d verifydb -c "select count(*) as posts from posts where post_type=1 and deleted_at is null;"
sudo -E -u postgres dropdb verifydb
exit
EOS
# 3) only then upload to R2
aws s3 cp "$LATEST" s3://discourse-backups/
|
Appendix B — Minimal front proxy (optional)
A tiny reverse proxy VM in front can terminate TLS and forward to the origin over HTTPS. Replace IPs with your own.
Upstream: /etc/nginx/conf.d/upstream.conf
1
2
3
4
|
upstream origin_forum {
server <ORIGIN_IP>:443;
keepalive 64;
}
|
Site: /etc/nginx/sites-available/forum.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
|
server {
listen 80;
listen [::]:80;
server_name forum.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name forum.example.com;
ssl_certificate /etc/letsencrypt/live/forum.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/forum.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_session_timeout 1d;
client_max_body_size 100m;
add_header Strict-Transport-Security "max-age=31536000" always;
location / {
proxy_pass https://origin_forum;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host forum.example.com;
proxy_ssl_server_name on;
proxy_ssl_name forum.example.com;
# optional verification:
# proxy_ssl_verify on;
# proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_buffering off;
proxy_read_timeout 360s;
proxy_send_timeout 360s;
proxy_connect_timeout 60s;
add_header X-Relay relay-min always;
}
location /message-bus/ {
proxy_pass https://origin_forum;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host forum.example.com;
proxy_ssl_server_name on;
proxy_ssl_name forum.example.com;
proxy_buffering off;
proxy_read_timeout 3600s;
}
}
``
Enable & reload:
```bash
ln -sf /etc/nginx/sites-available/forum.conf /etc/nginx/sites-enabled/forum.conf
rm -f /etc/nginx/sites-enabled/default
nginx -t && systemctl reload nginx
|
Quick check:
1
|
curl -I https://forum.example.com # expect HTTP/2 200/302 and X-Relay header
|
这是一篇真实迁移的复盘/Runbook。常规准备不赘述(官方文档已覆盖),这里只写真正踩到的开关、R2 的坑位、关键的 rails/rake 单行命令、失败与补救,以及如何把同样的迁移做得更低风险。所有真实 IP/密钥已移除,统一使用 forum.example.com
、files.example.com
和占位符。
目标状态
- 新主机运行 Discourse(Docker,单
app
容器)。
- TLS 使用 Let’s Encrypt。
forum.example.com
可选择放在一个轻量前置代理后(或直接指向源站)。
- 上传与前端资源托管在 Cloudflare R2:
discourse-uploads
(public)
discourse-backups
(private)
- R2 自定义域:
https://files.example.com
(在 R2 → Custom domains 创建,不要手工跨账号 CNAME)。
0) 真正“可靠”的数据库备份(夜备 + 切换前备份)
夜备用来做灾难恢复;切换前的备份用来缩小内容差距。两者都要保留。
0.1 策略
- 每天:仅数据库(
.sql.gz
,不含上传)→ 本地验证 → 上传到 R2。至少保留 7 份(或用 R2 生命周期)。
- 切换前:再做一份 DB-only 并在新机恢复,尽量减少内容缺口。
0.2 生成与验证
容器内:
1
2
3
4
5
6
7
8
9
10
|
# 可选:降低写入波动
discourse enable_readonly
# 后台 UI 勾选“仅数据库”(不含 uploads)
# 或者:
discourse backup
# 验证产物
ls -lh /var/discourse/shared/standalone/backups/default/
zcat -t /var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz
|
深度验证(推荐):还原到临时库计数对比:
1
2
3
4
5
6
7
8
9
10
|
cd /var/discourse && ./launcher enter app
sudo -E -u postgres psql -tc "DROP DATABASE IF EXISTS verifydb;"
sudo -E -u postgres createdb verifydb
zcat /shared/backups/default/<DB_ONLY>.sql.gz | sudo -E -u postgres psql verifydb
sudo -E -u postgres psql -d verifydb -c "select count(*) from topics where deleted_at is null;"
sudo -E -u postgres psql -d verifydb -c "select count(*) from posts where post_type=1 and deleted_at is null;"
sudo -E -u postgres dropdb verifydb
exit
|
gzip 或临时还原失败,不要上传到 R2;修复后重做。
0.3 仅在通过验证后再上传 R2
1
2
|
aws s3 cp /var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz \
s3://discourse-backups/
|
0.4 为什么容量不同(1–4 GB 都正常)
后台夜备与手动 pg_dump
本质都是 DB-only .sql.gz
。容量差通常来自包含的表不同与压缩差异,而不是“少帖子”。想看内部:
1
2
3
4
5
6
|
# 导出了哪些表的数据?
zcat <DB_ONLY>.sql.gz | grep -E '^COPY public\.' | awk '{print $2}' | sort -u | head
# 关键表行数的近似估算
zcat <DB_ONLY>.sql.gz | awk '/^COPY public.posts /{c=1;next}/^\\\./{c=0} c' | wc -l
zcat <DB_ONLY>.sql.gz | awk '/^COPY public.topics /{c=1;next}/^\\\./{c=0} c' | wc -l
|
行数对得上,就说明帖子/主题完整,文件大小并不代表“缺数据”。
1) 旧机:准备并复制(已验证的)DB-only 备份
公告维护 → 开只读:
1
2
3
|
cd /var/discourse && ./launcher enter app
discourse enable_readonly
exit
|
复制到新机:
1
2
3
|
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
root@OLD:/var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz \
/var/discourse/shared/standalone/backups/default/
|
若追求几乎零差距,可在切 DNS 前再重复一次。
2) 新机引导
安装 Docker + discourse_docker:
1
2
3
4
5
|
apt-get update && apt-get install -y git curl tzdata
curl -fsSL https://get.docker.com | sh
systemctl enable --now docker
git clone https://github.com/discourse/discourse_docker /var/discourse
|
创建 containers/app.yml
(在 DNS 切过来之前,先别启用 SSL 模板)。关键 env
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
env:
DISCOURSE_HOSTNAME: forum.example.com
# R2 / S3
DISCOURSE_USE_S3: "true"
DISCOURSE_S3_REGION: "auto"
DISCOURSE_S3_ENDPOINT: "https://<ACCOUNT_ID>.r2.cloudflarestorage.com"
DISCOURSE_S3_FORCE_PATH_STYLE: "true"
DISCOURSE_S3_BUCKET: "discourse-uploads"
DISCOURSE_S3_BACKUP_BUCKET: "discourse-backups"
DISCOURSE_S3_ACCESS_KEY_ID: "<R2_KEY>"
DISCOURSE_S3_SECRET_ACCESS_KEY: "<R2_SECRET>"
DISCOURSE_S3_CDN_URL: "https://files.example.com"
DISCOURSE_BACKUP_LOCATION: "s3"
# R2 校验参数(避免冲突)
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"
# SMTP / 证书邮箱
DISCOURSE_SMTP_ADDRESS: smtp.gmail.com
DISCOURSE_SMTP_PORT: 587
DISCOURSE_SMTP_USER_NAME: you@example.com
DISCOURSE_SMTP_PASSWORD: "<app-password>"
DISCOURSE_SMTP_DOMAIN: example.com
DISCOURSE_NOTIFICATION_EMAIL: you@example.com
LETSENCRYPT_ACCOUNT_EMAIL: you@example.com
|
在编译后发布前端资源到 R2:
1
2
3
4
5
6
7
|
hooks:
after_assets_precompile:
- exec:
cd: $home
cmd:
- sudo -E -u discourse bundle exec rake s3:upload_assets
- sudo -E -u discourse bundle exec rake s3:expire_missing_assets
|
先跑起来(暂时 HTTP):
1
|
cd /var/discourse && ./launcher rebuild app
|
3) 恢复 DB-only 转储(用 psql 导入 .sql.gz
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
cd /var/discourse && ./launcher enter app
sv stop unicorn || true; sv stop sidekiq || true
# 清库
sudo -E -u postgres psql -c "REVOKE CONNECT ON DATABASE discourse FROM public;"
sudo -E -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='discourse';"
sudo -E -u postgres psql -c "DROP DATABASE IF EXISTS discourse;"
sudo -E -u postgres psql -c "CREATE DATABASE discourse WITH OWNER discourse TEMPLATE template0 ENCODING 'UTF8';"
sudo -E -u postgres psql -d discourse -c "CREATE EXTENSION IF NOT EXISTS citext;"
sudo -E -u postgres psql -d discourse -c "CREATE EXTENSION IF NOT EXISTS hstore;"
# 导入
zcat /shared/backups/default/<DB_ONLY>.sql.gz | sudo -E -u postgres psql discourse
sv start unicorn
[ -d /etc/service/sidekiq ] && sv start sidekiq || true
exit
|
若还暂存本地上传,可先 rsync 一次做“兜底”;接下来会迁移到 R2。
4) R2 关键设置
桶与令牌:建 discourse-uploads
(public)和 discourse-backups
(private)。引导期用 Account API Token 且对这两个桶授 Admin Read & Write(允许 PutBucketCors
),成功后旋转为 Object Read & Write。
自定义域:在 R2 → Custom domains 添加 files.example.com
,且必须与 DNS 同一 Cloudflare 账号(避免 1014)。
CORS(在 discourse-uploads
):
1
2
3
4
5
6
7
8
9
|
[
{
"AllowedOrigins": ["https://forum.example.com","https://files.example.com"],
"AllowedMethods": ["GET","HEAD"],
"AllowedHeaders": ["*"],
"ExposeHeaders": ["*"],
"MaxAgeSeconds": 86400
}
]
|
重建,让 CSS/JS/字体发布到 R2:
1
|
cd /var/discourse && ./launcher rebuild app
|
5) 一次性把历史上传迁到 R2
1
2
3
4
|
cd /var/discourse && ./launcher enter app
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3
|
若出现“有帖子未重映射到新的 S3 URL”,见 §7.2 做定向 rebake。
6) 切换生产域名
app.yml
中确保:
1
2
|
DISCOURSE_HOSTNAME: forum.example.com
LETSENCRYPT_ACCOUNT_EMAIL: you@example.com
|
DNS:把 forum.example.com
指到新的前置(或源站)IP,启用 SSL 模板并:
1
|
cd /var/discourse && ./launcher rebuild app
|
自检:
1
2
|
curl -I https://forum.example.com
./launcher logs app | tail -n 200
|
匿名 HTTP/2 403
多半是 login_required
,不是故障。
7) 真踩到的坑与修复
7.1 R2 校验冲突
Aws::S3::Errors::InvalidRequest: You can only specify one non-default checksum at a time.
修复(永久保留):
1
2
|
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"
|
7.2 “仍有 X 帖子的链接未重映射为新 S3 URL”
原因:部分 cooked
HTML 仍指向 /uploads/<db>/original/...
。
定向重烘焙:
1
2
3
4
5
6
|
sudo -E -u discourse RAILS_ENV=production bundle exec rails r '
db = RailsMultisite::ConnectionManagement.current_db
ids = Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").pluck(:id)
ids.each { |pid| Post.find(pid).rebake! }
puts "rebaked=#{ids.size}"
'
|
或先 remap 再自动 rebake:
1
2
|
sudo -E -u discourse RAILS_ENV=production bundle exec \
rake "posts:remap[/uploads/default/original,https://files.example.com/original]"
|
复跑迁移确认干净:
1
2
|
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3
|
7.3 任务“消失”
务必加 bundler 与环境:
1
2
|
sudo -E -u discourse RAILS_ENV=production bundle exec rake -T s3
sudo -E -u discourse RAILS_ENV=production bundle exec rake -T uploads
|
打印有效 S3 设置:
1
2
|
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts({ use_s3: ENV["DISCOURSE_USE_S3"], bucket: ENV["DISCOURSE_S3_BUCKET"], endpoint: ENV["DISCOURSE_S3_ENDPOINT"], cdn: ENV["DISCOURSE_S3_CDN_URL"] })'
|
7.4 s3:upload_assets
权限拒绝
引导期用 Admin RW 令牌(需要桶级 CORS 操作),成功后旋转为 Object RW。
8) 验证
容器内
1
2
3
4
5
6
7
|
# CDN 命中的 URL
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts Upload.where("url LIKE ?", "%files.example.com%").limit(5).pluck(:url)'
# 剩余指向本地上传的 cooked(应逐步降为 0)
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'db=RailsMultisite::ConnectionManagement.current_db; puts Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").count'
|
浏览器
- Network 面板能看到资源从
files.example.com
加载;
- 老帖图片路径类似
https://files.example.com/original/...
。
备份
- 后台触发一次备份;R2 的
discourse-backups
新增对象。
9) 清理
当 cooked 基本为 0:
1
2
3
4
5
6
|
mv /var/discourse/shared/standalone/uploads /var/discourse/shared/standalone/uploads.bak
mkdir -p /var/discourse/shared/standalone/uploads
chown -R 1000:1000 /var/discourse/shared/standalone/uploads
# 观察几天稳定后
rm -rf /var/discourse/shared/standalone/uploads.bak
|
旋转密钥(R2 令牌降为 Object RW;如有需要,轮换 SMTP 应用密码)。
10) 下次迁移(R2-first)速查
- 旧 → 新(DB-only):只读 → 备份 → 用
psql
导入 .sql.gz
。
- 在 DNS 前接好 R2:桶、令牌(Admin RW → 之后 Object RW)、自定义域、CORS。
env
+ hooks
:校验参数 + s3:upload_assets
;重建。
- DNS 切换到新主机。
- 迁移上传到 R2。
- 修复尾巴(定向 rebake/remap)→ 快速复跑迁移确认干净。
- Sidekiq 跟进后台重烘焙(或
posts:rebake_uncooked_posts
)。
- 备份到 R2 并验证。
- 权限收紧与密钥轮换。
- 清理本地 uploads。
附录 A — “先验后传”的夜备(伪 cron)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
LATEST=$(ls -1t /var/discourse/shared/standalone/backups/default/*.sql.gz | head -n1)
# 1) gzip 完整性
gzip -t "$LATEST" || exit 1
# 2) 临时库计数
cd /var/discourse && ./launcher enter app <<'EOS'
sudo -E -u postgres psql -tc "DROP DATABASE IF EXISTS verifydb;"
sudo -E -u postgres createdb verifydb
zcat /shared/backups/default/$(basename '"$LATEST"') | sudo -E -u postgres psql verifydb
sudo -E -u postgres psql -d verifydb -c "select count(*) as topics from topics where deleted_at is null;"
sudo -E -u postgres psql -d verifydb -c "select count(*) as posts from posts where post_type=1 and deleted_at is null;"
sudo -E -u postgres dropdb verifydb
exit
EOS
# 3) 通过再上传 R2
aws s3 cp "$LATEST" s3://discourse-backups/
|
附录 B — 极简前置代理(可选)
前面放一台轻量 Nginx 终止 TLS,再转发到源站 HTTPS。替换为你自己的 IP。
上游: /etc/nginx/conf.d/upstream.conf
1
2
3
4
|
upstream origin_forum {
server <ORIGIN_IP>:443;
keepalive 64;
}
|
站点: /etc/nginx/sites-available/forum.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
server {
listen 80;
listen [::]:80;
server_name forum.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name forum.example.com;
ssl_certificate /etc/letsencrypt/live/forum.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/forum.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_session_timeout 1d;
client_max_body_size 100m;
add_header Strict-Transport-Security "max-age=31536000" always;
location / {
proxy_pass https://origin_forum;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host forum.example.com;
proxy_ssl_server_name on;
proxy_ssl_name forum.example.com;
# 可选校验:
# proxy_ssl_verify on;
# proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_buffering off;
proxy_read_timeout 360s;
proxy_send_timeout 360s;
proxy_connect_timeout 60s;
add_header X-Relay relay-min always;
}
location /message-bus/ {
proxy_pass https://origin_forum;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host forum.example.com;
proxy_ssl_server_name on;
proxy_ssl_name forum.example.com;
proxy_buffering off;
proxy_read_timeout 3600s;
}
}
|
启用与重载:
1
2
3
|
ln -sf /etc/nginx/sites-available/forum.conf /etc/nginx/sites-enabled/forum.conf
rm -f /etc/nginx/sites-enabled/default
nginx -t && systemctl reload nginx
|
快速检查:
1
|
curl -I https://forum.example.com # 期望 HTTP/2 200/302 且有 X-Relay
|
兩天之內,三個機房,來來回回折騰⋯⋯
再碰 OVH 我就是🐶!
這兩天所有問題幾乎都是他們家機器帶來的,這家美西的機器 IP 竟然被 Gemini 拉黑了⋯⋯
因為完全沒想到這點,直接遷移過去了就,本次遷移同時,腦子一熱,同步做了極端複雜的論壇附件全部 S2 化,然後⋯⋯
當我確認是 IP 被拉黑且強制設置 IPV6 無效後,就不得不備份論壇數據並遷移下一個機房⋯⋯
然後,論壇的 DB-only 備份文件在恢復後帖子竟然不完整⋯⋯ 因為是網頁端做的備份,就完全沒意識到有這個天坑⋯⋯
再從網頁操作服務器我就是蠢:dog_face:!
查明是備份不完整後,重新在終端備份出完整文件,論壇才終於重新上線。
R2 設置不變,此後遷移的事,我就是良醫!為什麼呢?
三折肱⋯⋯
真的疼!
上線後,學生體感是新機器訪問慢了很多。
作為建站,這個新機器路由並非優化線路,確實一般。
禍福相倚的是,R2 設置完成後,加前置代理就很簡單了,選一台現成的三網優化線路機直接套上去,綠意盎然⋯⋯