AI 能不能判作業

2025-11-02 23:45

AGrader (Seiue Auto‑Grader)

Version: v0.3

當然能，但都是在論壇裡。畢竟當初選擇 Discourse，關鍵就是看中了 AI 整合能力。問題是，開學至今，還是有學生更適應學校那個希悅。
這個週末就都砸上了。任務很明確：任意一台 VPS 上，一個腳本，把希悅內學生提交的任何一次乃至多次作業，提取文本和附件，遞交 AI 做結構化回覆，之後將 AI 的作業分數和評語，寫回希悅。AI 批閱的內容和結果，都 Tele 實時通知。

從考勤項目到通知項目到這個 AI 作業項目，這次抓完數據包，現希悅的整體結構，不想知道也刻到腦子裡了⋯⋯
一個教務平台可以做教學嗎？當然不能。
骨子裏這個平台是沒什麼出息的，從最初見 LCH 至今，我一直沒改過這看法。

1) What this is / current status

AGrader is an API‑first auto‑grading pipeline for Seiue/Chalk (希悦). It consists of:

A drop‑in shell installer/runner agrader.sh that:
- Prepares system deps and a Python venv in /opt/agrader.
- Writes configs and a systemd service on Linux (or LaunchAgent on macOS).
- Orchestrates the Python runtime (main.py) and environment.
A Python core (main.py) that:
- Resolves the item_id for a Seiue task.
- Optionally posts a review per student.
- Posts scores via the official item scoring API.
- Verifies written scores.

Today’s result (verified logs): Full‑score mode completed 38/38 for a live task, with item ID validated upfront and each review+score written then verified. An existing score was detected for one student and respected (no overwrite).

2) Architecture at a glance

    text
    
    
  
+--------------------------+       +-------------------------+
|  agrader.sh (installer)  | ----> |  /opt/agrader structure |
+--------------------------+       +-------------------------+
        |                                |
        v                                v
  System deps, venv,              main.py  +  requirements.txt
  env file, service               ├─ item_id resolver (TTL + refresh signals)
  orchestration                   ├─ review poster
                                  ├─ score poster (array endpoint)
                                  └─ verifier (readback)

Modes: watch (daemon) or oneshot (run and exit).
Stop criteria: e.g., score_and_review (stop when both done).
Safety: Verification after write; item_id cache with TTL and auto re‑resolve on error signals.

3) Seiue API framework (what we rely on)

All endpoints below are documented as templates. Do not invent IDs. When needed, reconfirm with live capture.

3.1 Auth surface

Login/authorize (if needed):
- https://passport.seiue.com/login?school_id=<SCHOOL_ID>
- https://passport.seiue.com/authorize (or .../token)
  Returns bearer token with X-School-Id, X-Role, and sometimes X-Reflection-Id aligned to the current operator context.
  Current pipeline accepts direct SEIUE_BEARER & SEIUE_REFLECTION_ID when available.

3.2 Task / assignee context

Reviews (post) — template used by the pipeline:
POST /chalk/task/v2/assignees/{receiver_id}/tasks/{task_id}/reviews
- Note: A GET to some review endpoints can return 405; treat that as “no existing review,” not an error.

3.3 Scoring (write)

Scores (array) — canonical current endpoint:
POST /vnas/klass/items/{item_id}/scores/sync?async=true&from_task=true
- Array payload per student (built by the pipeline).
- Item ⇄ Task coupling: item_id must match the task we’re grading.

3.4 Scoring (verify)

Scores (readback) — verification endpoint:
GET /vnas/common/items/{item_id}/scores?paginated=0&type=item_score
Pipeline verifies presence and values (owner/task coupling where available).

3.5 Error meanings we treat specially

401/403 — token/role/reflection scope issue ⇒ re‑auth or correct headers.
404 (score) — often stale or mismatched item_id ⇒ re‑resolve item_id.
405 (review GET) — “no review route” used as signal for no existing reviews.
422 (score) — format/value conflict or duplicate semantics:
- Pipeline will retry once after item_id re‑resolve when heuristics indicate mismatch.
- Existing valid scores are detected; pipeline logs [OK*existing] and won’t clobber.

4) `agrader.sh` logic (installer/runner)

Idempotent steps (safe to re‑run):

Install deps: Python 3 + venv, jq, OCR stack (Tesseract + poppler + ghostscript).
Create /opt/agrader and write:
- requirements.txt, main.py, prompt.txt (if missing), .env scaffolding.
Patch .env defaults (add keys only if missing; preserve your values).
Create venv and pip install -r requirements.txt.
Linux: write systemd unit agrader.service; enable+start. macOS: create LaunchAgent plist; load on demand.
Prompt for:
- Task IDs (comma‑separated or pasted URLs).
- Mode: full‑score to all vs. normal workflow (AI flow to be added next).
Start and tail logs.

Signals / safeguards included in this version:

Item ID Resolver + TTL cache
- Cache key: <TASK_ID> → <ITEM_ID>, recorded with timestamp.
- Refresh on signals (configurable): score_404, score_422, verify_miss, ttl.
- First error: re‑resolve + retry once.
- Successful resolve is logged as “[ITEM] task -> item_id= (validated)”.
Verification‑after‑write (toggleable).
Respect existing scores (no overwrite unless policy changes later).
Stop criteria: end the daemon once the requested work is complete (e.g., after both reviews and scores are confirmed).

5) Install

5.1 Linux (Debian/Ubuntu, RHEL/Fedora, Alpine)

    bash
    
# As root (or with sudo):
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/agrader.sh)

The script installs packages, creates /opt/agrader, and configures systemd:
- Service name: agrader.service
- Logs: journalctl -u agrader.service -f

5.2 macOS (Homebrew)

    bash
    
# In Terminal as your user (admin recommended):
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/agrader.sh)

It will use Homebrew Python and tools, create the app dir, and set a LaunchAgent.
Logs: ~/Library/Logs/agrader.log (or run foreground to observe).

6) Environment variables (definitive reference)

The installer adds defaults only if not present. Your values are preserved.
Never place secrets in scripts; keep them in .env. Use placeholders like <TASK_ID> in docs.

6.1 Core runtime

RUN_MODE — watch (daemon) | oneshot (single pass).
Default: watch
STOP_CRITERIA — e.g., score_and_review.
Default: score_and_review
POLL_INTERVAL — seconds between cycles in watch mode.
(Prompted on first install; default 10s if omitted.)
MONITOR_TASK_IDS — comma‑separated list of task IDs.
(Prompted; accepts pasted URLs and extracts IDs.)

6.2 Seiue headers / auth

SEIUE_BASE — API base, e.g., https://api.seiue.com.
SEIUE_SCHOOL_ID — required header for school scope.
SEIUE_ROLE — usually teacher.
SEIUE_REFLECTION_ID — operator reflection context (owner) if known.
SEIUE_BEARER — Bearer token (JWT).

Either provide bearer+reflection directly, or integrate a login/authorize flow (future module).
On 401/403, re‑capture to confirm actual current headers.

6.3 Reviews and scoring endpoints (templates)

SEIUE_REVIEW_POST_TEMPLATE (default added)
"/chalk/task/v2/assignees/{receiver_id}/tasks/{task_id}/reviews"
SEIUE_SCORE_ENDPOINTS (default added; array API)
"POST:/vnas/klass/items/{item_id}/scores/sync?async=true&from_task=true:array"
SEIUE_VERIFY_SCORE_GET_TEMPLATE (default added)
"/vnas/common/items/{item_id}/scores?paginated=0&type=item_score"

6.4 Scoring strategy / cache

ITEM_ID_REFRESH_ON — CSV of signals: score_404,score_422,verify_miss,ttl
Default: score_404,score_422,verify_miss,ttl
ITEM_ID_CACHE_TTL — seconds to trust cached item_id before a re‑resolve.
Default: 900
MAX_SCORE_CACHE_TTL — seconds to keep short‑term score memoization.
Default: 600
SCORE_CLAMP_ON_MAX — 1 to clamp over‑max to task max.
Default: 1
VERIFY_AFTER_WRITE — 1 to readback after posting scores.
Default: 1
REVERIFY_BEFORE_WRITE — 1 to skip writing if already correct.
Default: 1
RETRY_ON_422_ONCE — 1 to re‑resolve item_id and retry once on 422.
Default: 1

6.5 Full‑score mode (today’s run)

FULL_SCORE_MODE — all | off.
Default: off (installer can set all on prompt)
FULL_SCORE_COMMENT — review text used when FULL_SCORE_MODE=all.
Default: 記得看高考真題。

6.6 AI (for upcoming “review by model”)

AI_PROVIDER — gemini | deepseek.
AI_KEY_STRATEGY — roundrobin (rotate keys on 429/5xx). Default: roundrobin
GEMINI_API_KEYS — comma‑separated (optional for now).
DEEPSEEK_API_KEYS — comma‑separated (optional for now).
GEMINI_MODEL — e.g. gemini-2.5-pro.
DEEPSEEK_MODEL — e.g. deepseek-reasoner.
AI_PARALLEL — concurrent calls. Default: 1
AI_MAX_RETRIES / AI_BACKOFF_BASE_SECONDS / AI_JITTER_SECONDS — retry policy.

6.7 Concurrency & attachments

STUDENT_WORKERS — per‑task concurrency over students. Default: 1
ATTACH_WORKERS — number of parallel attachment processors. Default: 3
MAX_ATTACHMENT_BYTES — upper bound accepted per attachment. Default: 25165824 (~24 MiB)
OCR_LANG — Tesseract languages. Default: chi_sim+eng
ENABLE_PDF_OCR_FALLBACK / MAX_PDF_OCR_PAGES / MAX_PDF_OCR_SECONDS — optional knobs (if present).

6.8 Logging

LOG_FORMAT — default provided.
LOG_DATEFMT — default provided.
LOG_FILE — default ${APP_DIR}/agrader.log.
LOG_LEVEL — INFO recommended.

6.9 Optional Telegram

TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID — if you want status summaries via Telegram.

7) Operating the service

7.1 Set tasks and mode

    bash
    
    
  
# Re-run installer to (re)prompt, or edit /opt/agrader/.env directly:
sudo nano /opt/agrader/.env
# Example (placeholders only):
MONITOR_TASK_IDS=<TASK_ID_1>,<TASK_ID_2>
FULL_SCORE_MODE=all
FULL_SCORE_COMMENT=Remember to review the sample answers.

7.2 Start / stop / logs (Linux)

    bash
    
    
  
sudo systemctl restart agrader.service
journalctl -u agrader.service -f
# graceful stop
sudo systemctl stop agrader.service
# resume
sudo systemctl start agrader.service

7.3 Expected log lines (today’s run)

[ITEM] task <id> -> item_id=<id> (validated)
[API] Review posted for rid=<RECEIVER_ID> task=<TASK_ID>
[FULL][DONE][TASK <id>] n/N rid=<RID> name=<Name> score=<S> status=ok
[FULL][OK*existing] ... (already exists: <S>)
[SUMMARY][TASK <id>] ✅ ... FULL-SCORED ...
[EXEC] All tasks complete. Shutting down.

8) Troubleshooting & pitfalls

item_id mismatch → 404/422
- Resolver auto re‑runs and retries once. If still failing, re‑capture task/item flows to confirm the mapping changed.
Existing scores
- Pipeline does not overwrite valid existing scores; logs [OK*existing].
405 on review GET
- Not an error; treat as “no existing review.”
401/403
- Check SEIUE_BEARER, SEIUE_SCHOOL_ID, SEIUE_ROLE, SEIUE_REFLECTION_ID.
- Tokens can expire or change scopes; re‑login and re‑capture headers.
Rate limits / retries
- Keep AI_MAX_RETRIES conservative; the AI path is not in use yet for scoring.
- Network hiccups: the daemon re‑tries per policy.
OCR chain cost
- Enable OCR fallback only if necessary; it’s expensive. Prefer exact digital text paths when possible.

9) Roadmap / next steps

AI review mode:
- Generate rubric‑consistent review text via model (Gemini/DeepSeek) with key‑rotation and backoff.
- Decide overwrite policy for existing reviews (probably append while keeping originals).
Granular scoring:
- Move from full‑score mode to computed scores with clamp against task max.
- Introduce per‑task policy in .env (e.g., FULL_SCORE_MODE=off, AI_SCORE_MODE=formula).
Better observability:
- Per‑task JSON report (success/failure/retry) under /opt/agrader/work/.

10) Quick command crib sheet

    bash
    
    
  
# Install / reconfigure
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/agrader.sh)

# Edit env
sudo nano /opt/agrader/.env

# Restart service and watch logs (Linux)
sudo systemctl restart agrader.service
journalctl -u agrader.service -f

# One-shot run (set in .env)
RUN_MODE=oneshot
STOP_CRITERIA=score_and_review

Maintainer’s note (2025‑11‑02):
This version successfully executed a full‑score cycle end‑to‑end. Next milestone is AI‑assisted review, built on the same verified API surfaces above.