Multi-VPS watcher

2025-10-23 17:25

為什麼我會有十台以上的機器⋯⋯
好吧，還是一勞而逸，把早該寫的部署的，做完。

Sentinel

Run it like this on any Ubuntu 20.04/22.04+ VPS (root):

    bash
    
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/vps.sh)

The script installs a tiny systemd service that watches your host and sends alerts to Telegram. It’s safe for ultra-small VPSs and doesn’t interfere with Discourse (purely read-only metrics).

What it does

One-time setup: installs minimal deps, creates /etc/sentinel/sentinel.env, installs sentinel.py, and a systemd unit sentinel.service.
Sends alerts to Telegram for:
- CPU/load, memory/swap thrash, high iowait
- Network spikes (bps/pps), route changes, connectivity loss/recovery
- Disk usage high (/ threshold)
- Web scan signatures from Nginx access log (optional)
- SSH brute-force bursts (source IP included)
- TLS certificate expiry (auto-discovers certs via nginx -T + common paths)
- Daily snapshot at Beijing 12:00
- Monthly traffic (last-month summary + current month rollup)
Low noise & robust:
- Cooldown for each alert key (persisted across restarts)
- 64-bit counters for network; overflow-safe deltas
- State persisted at /var/lib/sentinel/state.json
- Service guardrails: memory and task caps via systemd.

Telegram setup

During install you’ll be asked for:

Bot token (from BotFather)
Send any message to your bot in Telegram, then press Enter; the script auto-detects your chat ID.

Non-interactive option

If you deploy at scale, set env vars before running:

    bash
    
export TELE_TOKEN="123456:ABC..."
export TELE_CHAT_ID="5016203472"
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/vps.sh)

These override the prompts.

Key file & service

Config: /etc/sentinel/sentinel.env
State (cooldowns, traffic): /var/lib/sentinel/state.json
Service: sentinel.service

Useful commands:

    bash
    
# live logs
journalctl -u sentinel.service -f

# check status
systemctl status sentinel.service

# restart service
systemctl restart sentinel.service

Core configuration

(edit in /etc/sentinel/sentinel.env)

    bash
    
TELE_TOKEN=...           # required
TELE_CHAT_ID=...         # required

Active network probing

    bash
    
    
  
PING_TARGETS=1.1.1.1,cloudflare.com
PING_ENGINE=tcp          # tcp or icmp (tcp is lighter, no raw socket)
PING_TCP_PORT=443
PING_INTERVAL_SEC=60     # lower => faster detection, higher => lighter
PING_TIMEOUT_MS=1500
PING_ROUND_ROBIN=1       # 1=probe one target per tick; 0=all
LOSS_WINDOW=20
LOSS_ALERT_PCT=60
LATENCY_ALERT_MS=400
JITTER_ALERT_MS=150
FLAP_SUPPRESS_SEC=300

System thresholds

    bash
    
    
  
MEM_AVAIL_PCT_MIN=10
SWAP_USED_PCT_MAX=50
SWAPIN_PPS_MAX=1000
LOAD1_PER_CORE_MAX=1.5
CPU_IOWAIT_PCT_MAX=50
ROOT_FS_PCT_MAX=90
COOLDOWN_SEC=600         # per-alert backoff (persisted)

Network spikes

    bash
    
    
  
NET_RX_BPS_ALERT=5242880   # 5 MB/s
NET_TX_BPS_ALERT=5242880
NET_RX_PPS_ALERT=2000
NET_TX_PPS_ALERT=2000

Process watch (auto-detect by default)

    bash
    
WATCH_PROCS=auto            # auto = watch nginx & docker if installed/running
WATCH_PROCS_REQUIRE_ENABLED=1  # if you list names, only alert when enabled/running

Nginx access log scan (optional)

    bash
    
NGINX_ACCESS_LOG=/var/log/nginx/access.log
# Supports host-quoted or combined formats; alerts on common exploit paths.

TLS certificate expiry

    bash
    
    
  
CERT_MIN_DAYS=3
CERT_AUTO_DISCOVER=1
CERT_SEARCH_GLOBS=/etc/letsencrypt/live/*/fullchain.pem,/etc/nginx/ssl/*/*.pem,...
CERT_CHECK_DOMAINS=         # optional: comma-separated hostnames to test over TLS

Auto-discovery: parses nginx -T for ssl_certificate and checks those files.

SSH brute-force detection

    bash
    
AUTH_LOG_PATH=/var/log/auth.log
AUTH_FAIL_COUNT=30
AUTH_FAIL_WINDOW_MIN=10

Daily & monthly reporting

    bash
    
HEARTBEAT_HOURS=24           # daily “System OK”
DAILY_BJ_SNAPSHOT_HOUR=12    # Beijing time; sends one snapshot at this hour

TRAFFIC_REPORT_EVERY_DAYS=10 # intra-month rollups every N days + month end
TRAFFIC_TRACK_IF=""          # "" = sum all NICs; or set e.g. ens3

How alerts are formatted

Title with emoji + host + public IP
Bullet list with concise metrics
Cooldown keys prevent spam; persisted in state.json so restarts won’t flood you.

Testing

Connectivity: temporarily set PING_INTERVAL_SEC=5, unplug networking or block egress, watch for “Network down / recovered”.
Process watch: leave WATCH_PROCS=auto. Install/uninstall nginx/docker or stop/start them; ensures no alerts on hosts without these services.
SSH brute-force: tail /var/log/auth.log and simulate failed logins from another IP, or reduce AUTH_FAIL_COUNT/window briefly.
Nginx scan: request /.env or /wp-login.php on your site (test env), watch for scan alert.
TLS: set CERT_MIN_DAYS=365 to force a warning (and set back after).

Resource footprint

Python loop sleeps 10s between checks.
TCP probe mode avoids raw sockets and keeps overhead tiny.
Systemd limits: MemoryMax=150M, TasksMax=64.

Safety & polish in this build

Non-interactive apt & needrestart auto-accept to avoid hanging.
tmsg generator uses a quoted heredoc (no premature variable expansion).
Fixed sed for IP extraction in the install banner.
Startup beacon has cooldown to avoid restart spam.
Network stats prefer 64-bit sysfs counters; overflow-safe deltas.
Auto-detect watch list so hosts without nginx/docker don’t spam “not running”.

Uninstall (cleanly)

    bash
    
    
  
systemctl disable --now sentinel.service
rm -f /etc/systemd/system/sentinel.service
rm -rf /etc/sentinel /var/lib/sentinel /usr/local/bin/sentinel.py /usr/local/bin/tmsg
systemctl daemon-reload

If you want a “baseline template” for multiple machines, copy a tuned /etc/sentinel/sentinel.env to each VPS before enabling the service. The install script will respect pre-existing values and environment overrides.