Discourse Forum Ops Playbook (Front + Backend)
Scope & Baseline
- Site stays login-required (anonymous
/latest.jsonreturns 403 by design). - Goal: eliminate false 429s during hot-topic spikes while keeping abuse controls.
Placeholders you must replace:
<FORUM_DOMAIN>— e.g.,forum.example.com<ORIGIN_UPSTREAM>— origin host/IP that serves Discourse over HTTPS<LE_CERT_FULLCHAIN>/<LE_CERT_PRIVKEY>— your TLS cert/key paths
1) Front (Nginx) — Design
- Exempt read/heartbeat paths from rate-limits.
- Apply
limit_reqonly to write endpoints. - Upstream over HTTPS + SNI; long-poll
/message-bus/with no buffering and long timeouts. - Forward real client IP: prefer
CF-Connecting-IP, elseremote_addr.
1.1 Global http{} drop-in
Create /etc/nginx/conf.d/10-front-global.conf:
nginx
log_format main_t '$remote_addr - $remote_user [$time_iso8601] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" rt=$request_time uct=$upstream_connect_time urt=$upstream_response_time ucs=$upstream_status rid=$request_id';
access_log /var/log/nginx/access.log main_t;
proxy_buffer_size 8k;
proxy_buffers 16 8k;
proxy_busy_buffers_size 16k;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 30s;
map $http_cf_connecting_ip $client_ip { default $remote_addr; "~^[0-9a-f:.]+$" $http_cf_connecting_ip; }
map $request_method $rl_key { GET ""; HEAD ""; OPTIONS ""; default $client_ip$http_user_agent; }
limit_req_zone $rl_key zone=perkey:20m rate=30r/s;1.2 Site vhost
/etc/nginx/sites-available/<FORUM_DOMAIN>.conf:
nginx
server {
listen 443 ssl http2;
server_name <FORUM_DOMAIN>;
ssl_certificate <LE_CERT_FULLCHAIN>;
ssl_certificate_key <LE_CERT_PRIVKEY>;
ssl_protocols TLSv1.2 TLSv1.3;
proxy_set_header Host <FORUM_DOMAIN>;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $client_ip;
proxy_set_header X-Request-Id $request_id;
proxy_ssl_server_name on;
proxy_ssl_name <FORUM_DOMAIN>;
proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
proxy_ssl_verify on;
proxy_ssl_verify_depth 3;
client_max_body_size 100m;
location = /srv/status { return 200 "ok\n"; add_header Content-Type text/plain; }
location = /robots.txt { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location ^~ /message-bus/ {
proxy_pass https://<ORIGIN_UPSTREAM>:443;
proxy_buffering off;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
proxy_http_version 1.1;
}
location = /topics/timings { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location ^~ /presence/ { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location ~ ^/drafts(\.json|/.*)$ { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location = /latest.json { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location ^~ /notifications { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location ~ ^/(posts|t|u|users|user_actions|session|password|uploads|invites|admin) {
limit_req zone=perkey burst=120 nodelay;
proxy_pass https://<ORIGIN_UPSTREAM>:443;
}
location / { proxy_pass https://<ORIGIN_UPSTREAM>:443; }
location ^~ /.well-known/acme-challenge/ {
root /var/www/html;
default_type text/plain;
allow all;
}
access_log /var/log/nginx/access.log main_t;
error_log /var/log/nginx/error.log warn;
}
server {
listen 80;
server_name <FORUM_DOMAIN>;
location ^~ /.well-known/acme-challenge/ {
root /var/www/html;
default_type text/plain;
allow all;
}
return 301 https://<FORUM_DOMAIN>$request_uri;
}Enable & reload:
bash
sudo ln -sf /etc/nginx/sites-available/<FORUM_DOMAIN>.conf /etc/nginx/sites-enabled/<FORUM_DOMAIN>.conf
sudo nginx -t && sudo systemctl reload nginx2) Backend (Discourse) — Design
- Run Rails tasks as
discourseuser. - Use here-doc to avoid shell parsing issues.
- Remove any accidental
require "debug/prelude"from production. - Keep login-required; reduce false 429s by:
presence_enabled = falseactive_user_rate_limit_secs = 3rate_limit_search_user = 30
2.1 Prep
bash
docker exec -it app bash -lc 'git config --global --add safe.directory /var/www/discourse'2.2 Disable debug prelude (idempotent)
bash
docker exec -it app bash -lc 'perl -0777 -pe "s/^\s*require\s*([\"\\047])debug\/prelude\\1/# \$&/m" -i.bak /var/www/discourse/config/application.rb'2.3 Apply settings (here-doc; idempotent)
bash
docker exec -i app bash -lc 'sudo -u discourse -H bash -lc "RUBYOPT= RAILS_ENV=production bundle exec rails r -"' <<'RUBY'
{
presence_enabled: false,
active_user_rate_limit_secs: 3,
rate_limit_search_user: 30
}.each{|k,v| SiteSetting.public_send("#{k}=", v) if SiteSetting.respond_to?("#{k}=")}
RUBY2.4 Verify
bash
docker exec -i app bash -lc 'sudo -u discourse -H bash -lc "RUBYOPT= RAILS_ENV=production bundle exec rails r -"' <<'RUBY'
pp({
login_required: SiteSetting.login_required,
active_user_rate_limit_secs: SiteSetting.active_user_rate_limit_secs,
rate_limit_search_user: SiteSetting.rate_limit_search_user,
presence_enabled: SiteSetting.presence_enabled
})
RUBY3) Diagnostics
3.1 Anonymous vs logged-in
bash
curl -sS -o /dev/null -D - https://<FORUM_DOMAIN>/latest.json | sed -n '1,20p'(Expect 403 when anonymous on a login-required site. For logged-in, copy the request as cURL from browser DevTools and run—expect 200.)
3.2 Front 429 (last 5 minutes) — correct URI extraction
bash
SINCE=$(date -u -d "5 minutes ago" "+[%d/%b/%Y:%H:%M]")
awk -v s="$SINCE" '$4 > s && $9==429 { if (match($0, /"[^"]+"/, m)) { n=split(m[0], a, " "); if (n>=2) print a[2]; }}' /var/log/nginx/access.log \
| sort | uniq -c | sort -nr | head3.3 Front 429 by IP
bash
SINCE=$(date -u -d "5 minutes ago" "+[%d/%b/%Y:%H:%M]")
awk -v s="$SINCE" '$4 > s && $9==429{print $1}' /var/log/nginx/access.log \
| sort | uniq -c | sort -nr | head3.4 Origin container 429 (URIs)
bash
docker exec -it app bash -lc 'since=$(date -u -d "5 minutes ago" "+[%d/%b/%Y:%H:%M]"); awk -v s="$since" '\''$4 > s && $9==429{ if (match($0, /"[^"]+"/, m)) { split(m[0], a, " "); if (a[2]) print a[2]; }}'\'' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head'3.5 Application log tail for 429
bash
docker exec -it app bash -lc 'sudo -u discourse -H bash -lc "grep -n \"Completed 429\" /var/www/discourse/log/production.log | tail"'4) Persistence Script (run after rebuilds)
Create /root/discourse_tune_limits.sh:
bash
cat >/root/discourse_tune_limits.sh <<'SH'
docker exec -i app bash -lc 'sudo -u discourse -H bash -lc "RUBYOPT= RAILS_ENV=production bundle exec rails r -"' <<'RUBY'
{ presence_enabled: false, active_user_rate_limit_secs: 3, rate_limit_search_user: 30 }.each{|k,v|
SiteSetting.public_send("#{k}=", v) if SiteSetting.respond_to?("#{k}=")
}
RUBY
SH
chmod +x /root/discourse_tune_limits.sh(Optional) systemd timer:
/etc/systemd/system/discourse-rate-ensure.service
ini
[Unit]
Description=Ensure Discourse rate settings after (re)deploy
After=docker.service
[Service]
Type=oneshot
ExecStart=/root/discourse_tune_limits.sh/etc/systemd/system/discourse-rate-ensure.timer
ini
[Unit]
Description=Run once at boot and daily
[Timer]
OnBootSec=60
OnUnitActiveSec=1d
[Install]
WantedBy=timers.target
bash
systemctl daemon-reload
systemctl enable --now discourse-rate-ensure.timer5) Rollback
Front:
bash
sudo cp -a /etc/nginx /root/nginx-bak/$(date +%F)/
sudo nginx -t && sudo systemctl reload nginxBackend (revert settings quickly):
bash
docker exec -i app bash -lc 'sudo -u discourse -H bash -lc "RUBYOPT= RAILS_ENV=production bundle exec rails r -"' <<'RUBY'
{ active_user_rate_limit_secs: 60, rate_limit_search_user: 30, presence_enabled: false }.each{|k,v|
SiteSetting.public_send("#{k}=", v) if SiteSetting.respond_to?("#{k}=")
}
RUBY6) Why this works (root cause & fixes)
- Root cause 1: App-layer throttles (active user beat, search cooldown) + presence heartbeats amplified 429s during hot-topic bursts.
- Fix: relaxed per-user beats and disabled presence.
- Root cause 2: A truncated front vhost produced HTTP/2 403.
- Fix: rebuild vhost; keep read/heartbeat exempt, limit only write endpoints.
7) Quick Runbook (hot issue)
- Confirm anonymous
/latest.json→ 403; logged-in → 200. - Check front 429 URIs (command 3.2).
- Check origin 429 URIs and app log (3.4, 3.5).
- If
/topics/timingsspikes → adjustactive_user_rate_limit_secsto 2–3. - If
/searchspikes → lowerrate_limit_search_userto 20–30. - Ensure
/message-bus/long-poll settings and read-path exemptions remain intact.