Suen

Discourse migration log

This is a post-mortem/runbook of a real migration. I skip common Discourse prep (official docs cover it). I focus on the exact switches, Cloudflare/R2 gotchas, the rails/rake one-liners that mattered, what failed, and how to make the same move low-risk next time.


Target end-state


1) Restore DB-only first (fast cutover, tiny backup)

On the old machine (RN):

On the new machine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# copy the DB-only backup
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
  root@OLD:/var/discourse/shared/standalone/backups/default/<DB_ONLY>.sql.gz \
  /var/discourse/shared/standalone/backups/default/

cd /var/discourse
./launcher enter app
discourse enable_restore
discourse restore <DB_ONLY>.sql.gz
exit

If you need “almost zero” content gap, repeat this DB-only hop right before DNS cutover.


2) (Optional) bring local uploads once, before switching to R2

1
2
3
4
rsync -avP -e "ssh -o StrictHostKeyChecking=no" \
  root@OLD:/var/discourse/shared/standalone/uploads/ \
  /var/discourse/shared/standalone/uploads/
chown -R 1000:1000 /var/discourse/shared/standalone/uploads

This is just a safety net; we will move all uploads to R2 shortly.


3) Switch production domain

1
2
DISCOURSE_HOSTNAME: forum.example.com
LETSENCRYPT_ACCOUNT_EMAIL: you@example.com
1
2
cd /var/discourse
./launcher rebuild app

Cloudflare DNS:

Sanity:

1
2
curl -I [https://forum.example.com](https://forum.example.com)
./launcher logs app | tail -n 200

Seeing HTTP/2 403 for anonymous is often login_required (a setting), not a failure.


4) R2: all the knobs that actually matter

4.1 Create buckets + Account API Token

After it works, rotate to Object Read & Write (least privilege) and rebuild.

4.2 Custom domain for R2 (how to avoid CF 1014)

The 1014 (“CNAME Cross-User Banned”) happens when a hostname on Cloudflare tries to CNAME to a target that is also on Cloudflare but belongs to another account. Two safe patterns:

Checklist to stay clean:

4.3 Bucket CORS

R2 → discourse-uploads → CORS:

1
2
3
4
5
6
7
8
9
[
  {
    "AllowedOrigins": ["[https://forum.example.com](https://forum.example.com)", "[https://files.example.com](https://files.example.com)"],
    "AllowedMethods": ["GET", "HEAD"],
    "AllowedHeaders": ["*"],
    "ExposeHeaders": ["*"],
    "MaxAgeSeconds": 86400
  }
]

5) Discourse: enable S3 (R2) + push assets there

In containers/app.ymlenv: add:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# R2 endpoint
DISCOURSE_USE_S3: "true"
DISCOURSE_S3_REGION: "auto"
DISCOURSE_S3_ENDPOINT: "https://<ACCOUNT_ID>.r2.cloudflarestorage.com"
DISCOURSE_S3_FORCE_PATH_STYLE: "true"

# uploads
DISCOURSE_S3_BUCKET: "discourse-uploads"
DISCOURSE_S3_ACCESS_KEY_ID: "<R2_KEY>"
DISCOURSE_S3_SECRET_ACCESS_KEY: "<R2_SECRET>"
DISCOURSE_S3_CDN_URL: "[https://files.example.com](https://files.example.com)"

# backups to R2
DISCOURSE_BACKUP_LOCATION: "s3"
DISCOURSE_S3_BACKUP_BUCKET: "discourse-backups"

# critical for R2 (avoid double checksums, see §7.1)
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"

Add hooks so front-end assets publish to R2 during rebuild:

1
2
3
4
5
6
7
hooks:
  after_assets_precompile:
    - exec:
        cd: $home
        cmd:
          - sudo -E -u discourse bundle exec rake s3:upload_assets
          - sudo -E -u discourse bundle exec rake s3:expire_missing_assets

Rebuild:

1
2
cd /var/discourse
./launcher rebuild app

Now CSS/JS/fonts serve from https://files.example.com/....


6) Migrate historical uploads to R2 (one-time)

Run inside the container, always with bundler + discourse user:

1
2
3
4
5
./launcher enter app

# one-time migration (auto-answers the prompt)
yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3

What to expect:

If it screams about posts “not remapped to new S3 upload URL”, see §7.2.


7) What actually broke (and how we fixed it)

7.1 R2 checksum conflict (hard blocker)

Symptom

Aws::S3::Errors::InvalidRequest:
You can only specify one non-default checksum at a time.

Root cause Newer AWS SDKs auto-add x-amz-checksum-*. Discourse sometimes adds Content-MD5. R2 rejects having both.

Fix (keep permanently in env):

1
2
AWS_REQUEST_CHECKSUM_CALCULATION: "WHEN_REQUIRED"
AWS_RESPONSE_CHECKSUM_VALIDATION: "WHEN_REQUIRED"

(Alternative older switch: AWS_S3_DISABLE_CHECKSUMS=true, but the two above are the modern knobs.)

7.2 “X posts are not remapped to new S3 upload URL” (soft blocker)

Symptom At the end of migration:

FileStore::ToS3MigrationError:
35 posts are not remapped to new S3 upload URL

Why

Fix — do NOT full-site rebake. Do targeted work:

7.3 “s3:info” doesn’t exist / rake -T shows nothing (false trails)

7.4 s3:upload_assets AccessDenied on CORS (permissions)


8) Verification

Inside container

1
2
3
4
5
6
7
# a few URLs now on CDN:
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'puts Upload.where("url LIKE ?", "%files.example.com%").limit(5).pluck(:url)'

# remaining cooked references to local paths (should trend to 0)
sudo -E -u discourse RAILS_ENV=production bundle exec rails r \
'db=RailsMultisite::ConnectionManagement.current_db; puts Post.where("cooked LIKE ?", "%/uploads/#{db}/original%").count'

Browser

Backups


9) Cleanup (only after you’re sure)

When cooked references are effectively 0 and random old topics look good:

1
2
3
4
5
6
7
# keep a safety copy
mv /var/discourse/shared/standalone/uploads /var/discourse/shared/standalone/uploads.bak
mkdir -p /var/discourse/shared/standalone/uploads
chown -R 1000:1000 /var/discourse/shared/standalone/uploads

# after a few days without regressions:
rm -rf /var/discourse/shared/standalone/uploads.bak

Rotate secrets:


10) Next time (playbook) — R2-first version

  1. Old → New (DB-only):
    • Old: enable read-only; make DB-only backup.
    • New: restore DB-only.
  2. R2 wiring before DNS:
    • Create discourse-uploads (public), discourse-backups (private).
    • Account API Token (Admin RW, scoped to these buckets).
    • Custom domain files.example.com in R2 UI (same CF account as the DNS zone). Wait for Active.
    • Add CORS (GET/HEAD from forum + files).
  3. Discourse env + hooks:
    • In app.ymlenv: set S3/R2 vars + checksum flags:
      1
      2
      3
      4
      5
      6
      7
      8
      
      DISCOURSE_USE_S3: true
      DISCOURSE_S3_ENDPOINT: https://<ACCOUNT_ID>.r2.cloudflarestorage.com
      DISCOURSE_S3_BUCKET: discourse-uploads
      DISCOURSE_S3_CDN_URL: [https://files.example.com](https://files.example.com)
      DISCOURSE_BACKUP_LOCATION: s3
      DISCOURSE_S3_BACKUP_BUCKET: discourse-backups
      AWS_REQUEST_CHECKSUM_CALCULATION: WHEN_REQUIRED
      AWS_RESPONSE_CHECKSUM_VALIDATION: WHEN_REQUIRED
      
    • Add after_assets_precompile with s3:upload_assets + s3:expire_missing_assets.
    • ./launcher rebuild app (assets go to R2).
  4. DNS cutover for forum.example.com (orange cloud, Full/Strict).
  5. Migrate uploads to R2:
    1
    2
    3
    
    ./launcher enter app
    yes "" | AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED AWS_RESPONSE_CHECKSUM_VALIDATION=WHEN_REQUIRED \
    sudo -E -u discourse RAILS_ENV=production bundle exec rake uploads:migrate_to_s3
    
  6. Fix stragglers (if any):
    • List posts whose cooked still references /uploads/<db>/original.
    • Either targeted rebake those posts, or posts:remap legacy strings.
    • Re-run the migration check once; expect Done! without complaints.
  7. Speed up rebake (optional):
    1
    
    sudo -E -u discourse RAILS_ENV=production bundle exec rake posts:rebake_uncooked_posts
    
    Or let Sidekiq process in background; forum stays live.
  8. Backups to R2: trigger one; verify an object appears in discourse-backups.
  9. Permissions hardening:
    • Rotate the R2 token to Object RW; rebuild.
    • Leave the checksum flags in env for good.
  10. Final cleanup after 1–2 days of smooth traffic:
    • Archive then remove local uploads to reclaim disk.

Appendix — run commands the right way

Inside the container, always:

1
sudo -E -u discourse RAILS_ENV=production bundle exec <rake|rails ...>

Running rake/rails as root without bundler will hide tasks and cause false errors.


This is everything I actually had to touch. No theory, just the levers that moved.