AI 能不能判作業
AGrader (Seiue Auto‑Grader)
Version: v0.3
當然能,但都是在論壇裡。畢竟當初選擇 Discourse,關鍵就是看中了 AI 整合能力。問題是,開學至今,還是有學生更適應學校那個希悅。
這個週末就都砸上了。任務很明確:任意一台 VPS 上,一個腳本,把希悅內學生提交的任何一次乃至多次作業,提取文本和附件,遞交 AI 做結構化回覆,之後將 AI 的作業分數和評語,寫回希悅。AI 批閱的內容和結果,都 Tele 實時通知。從考勤項目到通知項目到這個 AI 作業項目,這次抓完數據包,現希悅的整體結構,不想知道也刻到腦子裡了⋯⋯
一個教務平台可以做教學嗎?當然不能。
骨子裏這個平台是沒什麼出息的,從最初見 LCH 至今,我一直沒改過這看法。
1) What this is / current status
AGrader is an API‑first auto‑grading pipeline for Seiue/Chalk (希悦). It consists of:
- A drop‑in shell installer/runner
agrader.shthat:- Prepares system deps and a Python venv in
/opt/agrader. - Writes configs and a systemd service on Linux (or LaunchAgent on macOS).
- Orchestrates the Python runtime (
main.py) and environment.
- Prepares system deps and a Python venv in
- A Python core (
main.py) that:- Resolves the
item_idfor a Seiue task. - Optionally posts a review per student.
- Posts scores via the official item scoring API.
- Verifies written scores.
- Resolves the
Today’s result (verified logs): Full‑score mode completed 38/38 for a live task, with item ID validated upfront and each review+score written then verified. An existing score was detected for one student and respected (no overwrite).
2) Architecture at a glance
+--------------------------+ +-------------------------+
| agrader.sh (installer) | ----> | /opt/agrader structure |
+--------------------------+ +-------------------------+
| |
v v
System deps, venv, main.py + requirements.txt
env file, service ├─ item_id resolver (TTL + refresh signals)
orchestration ├─ review poster
├─ score poster (array endpoint)
└─ verifier (readback)- Modes:
watch(daemon) oroneshot(run and exit). - Stop criteria: e.g.,
score_and_review(stop when both done). - Safety: Verification after write; item_id cache with TTL and auto re‑resolve on error signals.
3) Seiue API framework (what we rely on)
All endpoints below are documented as templates. Do not invent IDs. When needed, reconfirm with live capture.
3.1 Auth surface
- Login/authorize (if needed):
https://passport.seiue.com/login?school_id=<SCHOOL_ID>https://passport.seiue.com/authorize(or.../token)
Returns bearer token withX-School-Id,X-Role, and sometimesX-Reflection-Idaligned to the current operator context.
Current pipeline accepts directSEIUE_BEARER&SEIUE_REFLECTION_IDwhen available.
3.2 Task / assignee context
- Reviews (post) — template used by the pipeline:
POST /chalk/task/v2/assignees/{receiver_id}/tasks/{task_id}/reviews- Note: A GET to some review endpoints can return 405; treat that as “no existing review,” not an error.
3.3 Scoring (write)
- Scores (array) — canonical current endpoint:
POST /vnas/klass/items/{item_id}/scores/sync?async=true&from_task=true- Array payload per student (built by the pipeline).
- Item ⇄ Task coupling: item_id must match the task we’re grading.
3.4 Scoring (verify)
- Scores (readback) — verification endpoint:
GET /vnas/common/items/{item_id}/scores?paginated=0&type=item_score
Pipeline verifies presence and values (owner/task coupling where available).
3.5 Error meanings we treat specially
- 401/403 — token/role/reflection scope issue ⇒ re‑auth or correct headers.
- 404 (score) — often stale or mismatched
item_id⇒ re‑resolve item_id. - 405 (review GET) — “no review route” used as signal for no existing reviews.
- 422 (score) — format/value conflict or duplicate semantics:
- Pipeline will retry once after item_id re‑resolve when heuristics indicate mismatch.
- Existing valid scores are detected; pipeline logs
[OK*existing]and won’t clobber.
4) agrader.sh logic (installer/runner)
Idempotent steps (safe to re‑run):
- Install deps: Python 3 + venv,
jq, OCR stack (Tesseract + poppler + ghostscript). - Create
/opt/agraderand write:requirements.txt,main.py,prompt.txt(if missing),.envscaffolding.
- Patch
.envdefaults (add keys only if missing; preserve your values). - Create venv and
pip install -r requirements.txt. - Linux: write systemd unit
agrader.service; enable+start. macOS: create LaunchAgent plist; load on demand. - Prompt for:
- Task IDs (comma‑separated or pasted URLs).
- Mode: full‑score to all vs. normal workflow (AI flow to be added next).
- Start and tail logs.
Signals / safeguards included in this version:
- Item ID Resolver + TTL cache
- Cache key:
<TASK_ID>→<ITEM_ID>, recorded with timestamp. - Refresh on signals (configurable):
score_404,score_422,verify_miss,ttl. - First error: re‑resolve + retry once.
- Successful resolve is logged as “[ITEM] task
-> item_id= .(validated)”
- Cache key:
- Verification‑after‑write (toggleable).
- Respect existing scores (no overwrite unless policy changes later).
- Stop criteria: end the daemon once the requested work is complete (e.g., after both reviews and scores are confirmed).
5) Install
5.1 Linux (Debian/Ubuntu, RHEL/Fedora, Alpine)
# As root (or with sudo):
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/agrader.sh)- The script installs packages, creates
/opt/agrader, and configures systemd:- Service name:
agrader.service - Logs:
journalctl -u agrader.service -f
- Service name:
5.2 macOS (Homebrew)
# In Terminal as your user (admin recommended):
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/agrader.sh)- It will use Homebrew Python and tools, create the app dir, and set a LaunchAgent.
- Logs:
~/Library/Logs/agrader.log(or run foreground to observe).
6) Environment variables (definitive reference)
The installer adds defaults only if not present. Your values are preserved.
Never place secrets in scripts; keep them in.env. Use placeholders like<TASK_ID>in docs.
6.1 Core runtime
RUN_MODE—watch(daemon) |oneshot(single pass).
Default:watchSTOP_CRITERIA— e.g.,score_and_review.
Default:score_and_reviewPOLL_INTERVAL— seconds between cycles inwatchmode.
(Prompted on first install; default 10s if omitted.)MONITOR_TASK_IDS— comma‑separated list of task IDs.
(Prompted; accepts pasted URLs and extracts IDs.)
6.2 Seiue headers / auth
SEIUE_BASE— API base, e.g.,https://api.seiue.com.SEIUE_SCHOOL_ID— required header for school scope.SEIUE_ROLE— usuallyteacher.SEIUE_REFLECTION_ID— operator reflection context (owner) if known.SEIUE_BEARER— Bearer token (JWT).Either provide bearer+reflection directly, or integrate a login/authorize flow (future module).
On 401/403, re‑capture to confirm actual current headers.
6.3 Reviews and scoring endpoints (templates)
SEIUE_REVIEW_POST_TEMPLATE(default added)
"/chalk/task/v2/assignees/{receiver_id}/tasks/{task_id}/reviews"SEIUE_SCORE_ENDPOINTS(default added; array API)
"POST:/vnas/klass/items/{item_id}/scores/sync?async=true&from_task=true:array"SEIUE_VERIFY_SCORE_GET_TEMPLATE(default added)
"/vnas/common/items/{item_id}/scores?paginated=0&type=item_score"
6.4 Scoring strategy / cache
ITEM_ID_REFRESH_ON— CSV of signals:score_404,score_422,verify_miss,ttl
Default:score_404,score_422,verify_miss,ttlITEM_ID_CACHE_TTL— seconds to trust cached item_id before a re‑resolve.
Default:900MAX_SCORE_CACHE_TTL— seconds to keep short‑term score memoization.
Default:600SCORE_CLAMP_ON_MAX—1to clamp over‑max to task max.
Default:1VERIFY_AFTER_WRITE—1to readback after posting scores.
Default:1REVERIFY_BEFORE_WRITE—1to skip writing if already correct.
Default:1RETRY_ON_422_ONCE—1to re‑resolve item_id and retry once on 422.
Default:1
6.5 Full‑score mode (today’s run)
FULL_SCORE_MODE—all|off.
Default:off(installer can setallon prompt)FULL_SCORE_COMMENT— review text used whenFULL_SCORE_MODE=all.
Default:記得看高考真題。
6.6 AI (for upcoming “review by model”)
AI_PROVIDER—gemini|deepseek.AI_KEY_STRATEGY—roundrobin(rotate keys on 429/5xx). Default:roundrobinGEMINI_API_KEYS— comma‑separated (optional for now).DEEPSEEK_API_KEYS— comma‑separated (optional for now).GEMINI_MODEL— e.g.gemini-2.5-pro.DEEPSEEK_MODEL— e.g.deepseek-reasoner.AI_PARALLEL— concurrent calls. Default:1AI_MAX_RETRIES/AI_BACKOFF_BASE_SECONDS/AI_JITTER_SECONDS— retry policy.
6.7 Concurrency & attachments
STUDENT_WORKERS— per‑task concurrency over students. Default:1ATTACH_WORKERS— number of parallel attachment processors. Default:3MAX_ATTACHMENT_BYTES— upper bound accepted per attachment. Default:25165824(~24 MiB)OCR_LANG— Tesseract languages. Default:chi_sim+engENABLE_PDF_OCR_FALLBACK/MAX_PDF_OCR_PAGES/MAX_PDF_OCR_SECONDS— optional knobs (if present).
6.8 Logging
LOG_FORMAT— default provided.LOG_DATEFMT— default provided.LOG_FILE— default${APP_DIR}/agrader.log.LOG_LEVEL—INFOrecommended.
6.9 Optional Telegram
TELEGRAM_BOT_TOKEN/TELEGRAM_CHAT_ID— if you want status summaries via Telegram.
7) Operating the service
7.1 Set tasks and mode
# Re-run installer to (re)prompt, or edit /opt/agrader/.env directly:
sudo nano /opt/agrader/.env
# Example (placeholders only):
MONITOR_TASK_IDS=<TASK_ID_1>,<TASK_ID_2>
FULL_SCORE_MODE=all
FULL_SCORE_COMMENT=Remember to review the sample answers.7.2 Start / stop / logs (Linux)
sudo systemctl restart agrader.service
journalctl -u agrader.service -f
# graceful stop
sudo systemctl stop agrader.service
# resume
sudo systemctl start agrader.service7.3 Expected log lines (today’s run)
[ITEM] task <id> -> item_id=<id> (validated)[API] Review posted for rid=<RECEIVER_ID> task=<TASK_ID>[FULL][DONE][TASK <id>] n/N rid=<RID> name=<Name> score=<S> status=ok[FULL][OK*existing] ... (already exists: <S>)[SUMMARY][TASK <id>] ✅ ... FULL-SCORED ...[EXEC] All tasks complete. Shutting down.
8) Troubleshooting & pitfalls
item_idmismatch → 404/422- Resolver auto re‑runs and retries once. If still failing, re‑capture task/item flows to confirm the mapping changed.
- Existing scores
- Pipeline does not overwrite valid existing scores; logs
[OK*existing].
- Pipeline does not overwrite valid existing scores; logs
- 405 on review GET
- Not an error; treat as “no existing review.”
- 401/403
- Check
SEIUE_BEARER,SEIUE_SCHOOL_ID,SEIUE_ROLE,SEIUE_REFLECTION_ID. - Tokens can expire or change scopes; re‑login and re‑capture headers.
- Check
- Rate limits / retries
- Keep
AI_MAX_RETRIESconservative; the AI path is not in use yet for scoring. - Network hiccups: the daemon re‑tries per policy.
- Keep
- OCR chain cost
- Enable OCR fallback only if necessary; it’s expensive. Prefer exact digital text paths when possible.
9) Roadmap / next steps
- AI review mode:
- Generate rubric‑consistent review text via model (Gemini/DeepSeek) with key‑rotation and backoff.
- Decide overwrite policy for existing reviews (probably append while keeping originals).
- Granular scoring:
- Move from full‑score mode to computed scores with clamp against task max.
- Introduce per‑task policy in
.env(e.g.,FULL_SCORE_MODE=off,AI_SCORE_MODE=formula).
- Better observability:
- Per‑task JSON report (success/failure/retry) under
/opt/agrader/work/.
- Per‑task JSON report (success/failure/retry) under
10) Quick command crib sheet
# Install / reconfigure
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/agrader.sh)
# Edit env
sudo nano /opt/agrader/.env
# Restart service and watch logs (Linux)
sudo systemctl restart agrader.service
journalctl -u agrader.service -f
# One-shot run (set in .env)
RUN_MODE=oneshot
STOP_CRITERIA=score_and_reviewMaintainer’s note (2025‑11‑02):
This version successfully executed a full‑score cycle end‑to‑end. Next milestone is AI‑assisted review, built on the same verified API surfaces above.