Multi-VPS watcher
為什麼我會有十台以上的機器⋯⋯
好吧,還是一勞而逸,把早該寫的部署的,做完。
Sentinel
Run it like this on any Ubuntu 20.04/22.04+ VPS (root):
bash
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/vps.sh)The script installs a tiny systemd service that watches your host and sends alerts to Telegram. It’s safe for ultra-small VPSs and doesn’t interfere with Discourse (purely read-only metrics).
What it does
- One-time setup: installs minimal deps, creates
/etc/sentinel/sentinel.env, installssentinel.py, and a systemd unitsentinel.service. - Sends alerts to Telegram for:
- CPU/load, memory/swap thrash, high iowait
- Network spikes (bps/pps), route changes, connectivity loss/recovery
- Disk usage high (/ threshold)
- Web scan signatures from Nginx access log (optional)
- SSH brute-force bursts (source IP included)
- TLS certificate expiry (auto-discovers certs via
nginx -T+ common paths) - Daily snapshot at Beijing 12:00
- Monthly traffic (last-month summary + current month rollup)
- Low noise & robust:
- Cooldown for each alert key (persisted across restarts)
- 64-bit counters for network; overflow-safe deltas
- State persisted at
/var/lib/sentinel/state.json - Service guardrails: memory and task caps via systemd.
Telegram setup
During install you’ll be asked for:
- Bot token (from BotFather)
- Send any message to your bot in Telegram, then press Enter; the script auto-detects your chat ID.
Non-interactive option
If you deploy at scale, set env vars before running:
bash
export TELE_TOKEN="123456:ABC..."
export TELE_CHAT_ID="5016203472"
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/vps.sh)These override the prompts.
Key file & service
- Config:
/etc/sentinel/sentinel.env - State (cooldowns, traffic):
/var/lib/sentinel/state.json - Service:
sentinel.service
Useful commands:
bash
# live logs
journalctl -u sentinel.service -f
# check status
systemctl status sentinel.service
# restart service
systemctl restart sentinel.serviceCore configuration
(edit in /etc/sentinel/sentinel.env)
Telegram
bash
TELE_TOKEN=... # required
TELE_CHAT_ID=... # requiredActive network probing
bash
PING_TARGETS=1.1.1.1,cloudflare.com
PING_ENGINE=tcp # tcp or icmp (tcp is lighter, no raw socket)
PING_TCP_PORT=443
PING_INTERVAL_SEC=60 # lower => faster detection, higher => lighter
PING_TIMEOUT_MS=1500
PING_ROUND_ROBIN=1 # 1=probe one target per tick; 0=all
LOSS_WINDOW=20
LOSS_ALERT_PCT=60
LATENCY_ALERT_MS=400
JITTER_ALERT_MS=150
FLAP_SUPPRESS_SEC=300System thresholds
bash
MEM_AVAIL_PCT_MIN=10
SWAP_USED_PCT_MAX=50
SWAPIN_PPS_MAX=1000
LOAD1_PER_CORE_MAX=1.5
CPU_IOWAIT_PCT_MAX=50
ROOT_FS_PCT_MAX=90
COOLDOWN_SEC=600 # per-alert backoff (persisted)Network spikes
bash
NET_RX_BPS_ALERT=5242880 # 5 MB/s
NET_TX_BPS_ALERT=5242880
NET_RX_PPS_ALERT=2000
NET_TX_PPS_ALERT=2000Process watch (auto-detect by default)
bash
WATCH_PROCS=auto # auto = watch nginx & docker if installed/running
WATCH_PROCS_REQUIRE_ENABLED=1 # if you list names, only alert when enabled/runningNginx access log scan (optional)
bash
NGINX_ACCESS_LOG=/var/log/nginx/access.log
# Supports host-quoted or combined formats; alerts on common exploit paths.TLS certificate expiry
bash
CERT_MIN_DAYS=3
CERT_AUTO_DISCOVER=1
CERT_SEARCH_GLOBS=/etc/letsencrypt/live/*/fullchain.pem,/etc/nginx/ssl/*/*.pem,...
CERT_CHECK_DOMAINS= # optional: comma-separated hostnames to test over TLSAuto-discovery: parses nginx -T for ssl_certificate and checks those files.
SSH brute-force detection
bash
AUTH_LOG_PATH=/var/log/auth.log
AUTH_FAIL_COUNT=30
AUTH_FAIL_WINDOW_MIN=10Daily & monthly reporting
bash
HEARTBEAT_HOURS=24 # daily “System OK”
DAILY_BJ_SNAPSHOT_HOUR=12 # Beijing time; sends one snapshot at this hour
TRAFFIC_REPORT_EVERY_DAYS=10 # intra-month rollups every N days + month end
TRAFFIC_TRACK_IF="" # "" = sum all NICs; or set e.g. ens3How alerts are formatted
- Title with emoji + host + public IP
- Bullet list with concise metrics
- Cooldown keys prevent spam; persisted in
state.jsonso restarts won’t flood you.
Testing
- Connectivity: temporarily set
PING_INTERVAL_SEC=5, unplug networking or block egress, watch for “Network down / recovered”. - Process watch: leave
WATCH_PROCS=auto. Install/uninstall nginx/docker or stop/start them; ensures no alerts on hosts without these services. - SSH brute-force:
tail /var/log/auth.logand simulate failed logins from another IP, or reduceAUTH_FAIL_COUNT/window briefly. - Nginx scan: request
/.envor/wp-login.phpon your site (test env), watch for scan alert. - TLS: set
CERT_MIN_DAYS=365to force a warning (and set back after).
Resource footprint
- Python loop sleeps 10s between checks.
- TCP probe mode avoids raw sockets and keeps overhead tiny.
- Systemd limits:
MemoryMax=150M,TasksMax=64.
Safety & polish in this build
- Non-interactive
apt&needrestartauto-accept to avoid hanging. tmsggenerator uses a quoted heredoc (no premature variable expansion).- Fixed
sedfor IP extraction in the install banner. - Startup beacon has cooldown to avoid restart spam.
- Network stats prefer 64-bit sysfs counters; overflow-safe deltas.
- Auto-detect watch list so hosts without nginx/docker don’t spam “not running”.
Uninstall (cleanly)
bash
systemctl disable --now sentinel.service
rm -f /etc/systemd/system/sentinel.service
rm -rf /etc/sentinel /var/lib/sentinel /usr/local/bin/sentinel.py /usr/local/bin/tmsg
systemctl daemon-reloadIf you want a “baseline template” for multiple machines, copy a tuned /etc/sentinel/sentinel.env to each VPS before enabling the service. The install script will respect pre-existing values and environment overrides.