真的吐了

2025-11-16 11:16

k12media 閱卷系統的成績報表有學生試卷原圖，有下載，但要一個一個點擊，煩。於是想自動爬下來。結果吐了。
這套系統基本上是「2008–2012 年 Java Web 技術堆疊」，為了一勞永逸，硬生生填了一整套舊體系跟現代腳本世界之間的坑；我去找找二十多年前的塑料袋⋯⋯
之所以痛罵所謂教育信息化，原因之一就是這種服務應試的所謂信息化，骨子裏，這些玩意兒之所以是純垃圾⋯⋯就是因為完全不是用來幫學生學習的。

下載圖片也是想補充到 NotebookLM ，但實際測試學生的考試報告：

Screenshot 2025-11-15 at 18.37.54.png (20251115001)

大面積的手寫識別實在還是災難，所以，也就用來看下文面，說下手寫用得到了。
手寫漢字這種落後玩意兒會消亡的，漢字手寫的識別，相信也會解決，畢竟要處理之前遺產；而考試不手寫的日子，繼續純期待吧。

引入這類閱卷和所謂成績報表乃至所謂 AI 的學校越多，教育的希望也自然越微茫；而教育信息化的牛皮也會越大而光鮮。

k12media exam image scraper manual

This document explains how to download all scanned answer sheet images for an exam from the legacy k12media system, using a small Python script that replays the same HTTP requests your browser sends.

The goal: a new admin, with no prior context, can follow this manual and successfully pull all images for a full grade/year.

1. Big picture

The exam image system is an early-2010s Java Web stack split over two domains:

Main / reports: https://test.k12media.cn
Image server: https://yue.k12media.cn

On the student image report page you see:

A class selector (e.g. “致知3班”, “格物3班”)
A student list on the left
A central carousel showing scanned pages (one student at a time)
Hidden <input> fields storing exam metadata: testId, subjectId, schoolId, testState, etc.

The browser does not load all students at once. For each selected student it:

Submits a form to ShowStudentImgsAction.a?findStudentImgs on test.k12media.cn.
That returns an HTML snippet containing several <img> tags.
Each <img> points to yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&....
The browser then fetches those image URLs and renders them in the carousel.

The Python script simply:

Uses a class → student list API (DWR) to enumerate all students.
For each student, calls the same findStudentImgs endpoint.
Extracts the same <img> src URLs as the browser uses.
Downloads the actual image bytes from yue.k12media.cn.

There is no “thumbnail” API in between. The script downloads exactly the same images the teacher sees in the page.

2. Site architecture & data flow

2.1 Domains & roles

https://test.k12media.cn
Login, main portal, and reports.
Hosts the student image report page.
Handles:
- “Which students are in this class?”
- “Which image files belong to this student for this exam?”
https://yue.k12media.cn
Hosts the actual scanned image files under
/tqms_image_server/DemoAction.a?showImg&....
Shares the same login session (cookies) with the main site.

You log in once in your browser. The Python script then reuses your browser cookies to talk to both domains.

2.2 Key endpoints

Exact query strings vary, but the structure is stable.

Student image report page (what you open in the browser):
text
```
https://test.k12media.cn/tqms/report/ShowStudentImgsAction.a?method=showStudentImgReport&testId=<TEST_ID>&subjectId=<SUBJECT_ID>&schoolId=<SCHOOL_ID>&testState=<STATE>&...
```
On this page you see:
- Top: exam summary
- Left: student list
- Middle: image carousel
- Several hidden <input>s with metadata.

Student list by class (DWR):

text

POST https://test.k12media.cn/tqms/dwr/call/plaincall/SelectSchoolUtil.findStudentListByClassId.dwr

The response is a JavaScript snippet that contains objects like:

    js
    
    
  
dwr.engine.remote.handleCallback("1","0",[
  { classId: 91268,
    noInClass: "2721101",
    orgUser: { name:"\u5F20\u4F55..." },
    ... },
  ...
]);

The script parses classId, noInClass, and name from this.

Student image list (findStudentImgs):

In this deployment, the image list is identified by student name + class + type of class, not by student number:
text
```
POST https://test.k12media.cn/tqms/report/ShowStudentImgsAction.a?findStudentImgs
```
Form fields include:
- schoolId
- testId
- testState
- studentName
- classId
- isTeacherClass (0 = administrative class, 1 = teaching group)
- subjectId
Response: HTML with several <img> tags, e.g.:
html
```
<img src="/tqms_image_server/DemoAction.a?showImg&imgFliePath=...&imgFileName=...">
```
Actual images:
text
```
GET https://yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&imgFliePath=<...>&imgFileName=<...>
```
Headers:
- Content-Type: image/jpeg (or sometimes image/png)
- Content-Length: ...
These are the full-size page scans, exactly what the carousel shows.

2.3 Front-end behaviour

When you click a student in the left list:

A form is submitted to ShowStudentImgsAction.a?findStudentImgs with:
- studentName = the label you clicked
- classId = current class
- isTeacherClass determined by the page (administrative vs teaching class)
- plus testId, subjectId, schoolId, testState.
The server returns HTML that contains <img src="...DemoAction.a?showImg..."> for that student.
The browser then issues GET requests to yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&... for each page and passes them into a jQuery FlexSlider carousel.

The script does the same thing, but loops over:

every class in a configured list, and
every student returned by DWR.

3. Packet capture: rediscovering things if they break

The system is old and unlikely to change radically, but if it does, you can always re-discover the APIs.

3.1 Tools

For this site, Chrome/Brave DevTools is enough; no need for full Wireshark.

Steps:

Open the student image report page for the exam.
Press F12 or ⌥⌘I → open Developer Tools.
Go to the Network tab.
Enable “Preserve log” so navigation doesn’t clear the list.

3.2 Finding findStudentListByClassId

Filter by dwr in the Network tab.
Change class in the dropdown or reload the page with a specific class selected.

Look for:

text

/tqms/dwr/call/plaincall/SelectSchoolUtil.findStudentListByClassId.dwr

Click it. Under Request Payload you’ll see a body similar to:

    text
    
    
  
callCount=1
nextReverseAjaxIndex=0
c0-scriptName=SelectSchoolUtil
c0-methodName=findStudentListByClassId
c0-id=0
c0-param0=string:<TEST_ID>
c0-param1=string:<SCHOOL_ID>
c0-param2=string:<CLASS_ID>
c0-param3=string:<0 or 1>  # isTeacherClass
batchId=1
instanceId=0
page=/tqms/report/ShowStudentImgsAction.a
scriptSessionId=<DWRSESSIONID>/<TIMESTAMP>

This is exactly the body the Python script constructs.

In Response, you’ll see JavaScript representing the student list. The script parses out:
- noInClass (class-internal student number)
- orgUser.name (student name)
- classId.

3.3 Finding findStudentImgs

With Network tab open, click a student on the left.
Filter by findStudentImgs.

Look for:

text

POST /tqms/report/ShowStudentImgsAction.a?findStudentImgs

Under Form Data you should see the parameters described above:

    text
    
    
  
schoolId: ...
testId: ...
testState: ...
studentName: (Chinese name)
classId: ...
isTeacherClass: 0 or 1
subjectId: ...

Under Response or Preview, you’ll see the small HTML snippet including:

    html
    
<img src="/tqms_image_server/DemoAction.a?showImg&imgFliePath=...&imgFileName=...">

The script calls this endpoint once per student and collects all such src values.

3.4 Finding the image URLs

Filter Network by DemoAction.

After you click a student, you’ll see several GET requests like:

text

GET https://yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&imgFliePath=...&imgFileName=...

These URLs are exactly what the script downloads; there is no intermediate “thumbnail” version.

Screenshot 2025-11-15 at 19.10.49.png (20251115003)

4. Credentials & constants you must copy

The script does not perform login. It relies on your browser session.

4.1 Session cookies RAW_COOKIE

Make sure you’re already logged in and can see the student image report page.
Open DevTools → Network.
Click any request to https://test.k12media.cn.

Under Request Headers, find the line:

text

Cookie: JSESSIONID=...; DWRSESSIONID=...; SERVERID=...; <possibly more>

Copy the entire value (everything after Cookie:) and paste it into the script:

    python
    
    
  
RAW_COOKIE = (
    "JSESSIONID=...; "
    "DWRSESSIONID=...; "
    "SERVERID=...; "
    "<other cookies if present>"
)

When cookies expire (you start getting redirected to a login page), just repeat these steps and update RAW_COOKIE.

4.2 Exam metadata

On the student image report page:

View page source or use DevTools Elements panel.
Search for testId, schoolId, subjectId, testState.

You should see hidden <input>s like:

    html
    
    
  
<input type="hidden" id="testId" name="testId" value="119274">
<input type="hidden" id="schoolId" name="schoolId" value="3600">
<input type="hidden" id="testState" name="testState" value="1">
<input type="hidden" id="subjectId" name="subjectId" value="1">

Copy these values into the script:

    python
    
    
  
TEST_ID    = 119274
SCHOOL_ID  = 3600
TEST_STATE = 1
SUBJECT_ID = 1

4.3 Class list

In this version, the script uses a small dataclass:

    python
    
    
  
@dataclass
class ClassConfig:
    class_id: int
    is_teacher_class: bool  # False = administrative class, True = teaching class
    label: str

CLASSES: List[ClassConfig] = [
    ClassConfig(class_id=91266,   is_teacher_class=False, label="格物1班"),
    ClassConfig(class_id=91267,   is_teacher_class=False, label="格物2班"),
    ClassConfig(class_id=91270,   is_teacher_class=False, label="致知1班"),
    ClassConfig(class_id=91271,   is_teacher_class=False, label="致知2班"),
    ClassConfig(class_id=91268,   is_teacher_class=False, label="格物3班"),
    ClassConfig(class_id=91272,   is_teacher_class=False, label="致知3班"),
    ClassConfig(class_id=1883835, is_teacher_class=True,  label="格物3班班"),
    ClassConfig(class_id=1883842, is_teacher_class=True,  label="致知3班班"),
]

How to obtain class_id:

On the report page, look for the <select> for class, or
Watch Network when switching class; the URL or request body often contains the classId.

If a new exam uses different classes, just adjust this list.

5. Python script behaviour

High-level phases:

Session setup
- Creates a requests.Session().
- Sets User-Agent to a realistic browser agent.
- Parses RAW_COOKIE into separate cookies and attaches them.

Fetch all students for all classes (via DWR)

For each ClassConfig in CLASSES:

Build a DWR POST body:

    text
    
    
  
callCount=1
nextReverseAjaxIndex=0
c0-scriptName=SelectSchoolUtil
c0-methodName=findStudentListByClassId
c0-id=0
c0-param0=string:<TEST_ID>
c0-param1=string:<SCHOOL_ID>
c0-param2=string:<CLASS_ID>
c0-param3=string:<0 or 1>  # isTeacherClass
batchId=1
instanceId=0
page=/tqms/report/ShowStudentImgsAction.a
scriptSessionId=<DWRSESSIONID>/<TIMESTAMP>

POST it to:

text

https://test.k12media.cn/tqms/dwr/call/plaincall/SelectSchoolUtil.findStudentListByClassId.dwr

Parse the response text with a regex to extract:
- classId
- noInClass
- orgUser.name (decoded from \uXXXX)
Construct Student objects with fields:
- class_id
- class_label
- is_teacher_class
- no_in_class
- name
Merge students from all classes into one list and de-duplicate by (class_id, no_in_class, name).

For each student: fetch image HTML + image URLs

For each unique Student:
- Build a student directory:
  - Class folder: <class_label>_<class_id>/
  - Student folder: <no_in_class>_<student_name>/
- POST to:
  text
```
https://test.k12media.cn/tqms/report/ShowStudentImgsAction.a?findStudentImgs
```
  with form data:
  text
```
schoolId      = SCHOOL_ID
testId        = TEST_ID
testState     = TEST_STATE
studentName   = student.name
classId       = student.class_id
isTeacherClass= 1 if student.is_teacher_class else 0
subjectId     = SUBJECT_ID
```
- Get back the HTML and extract all <img src="..."> values whose src contains DemoAction.a.
Download all images for that student

For each src extracted:
- If it’s relative, join with:
  text
```
IMG_SERVER_BASE = "https://yue.k12media.cn/tqms_image_server/"
```
- Send a GET with:
  - User-Agent header
  - Referer set to ShowStudentImgsAction.a?findStudentImgs
- Read Content-Type to guess .jpg / .png extension.
- Save the bytes as:
  text
```
p01.jpg, p02.jpg, ...
```
- Write a row to index.csv with:
  - Exam IDs
  - Class info
  - Student info
  - Page index
  - Relative local path
  - Original src URL
Logging “missing” cases
- If the HTML for a student has no DemoAction images, the script logs a warning and writes that student into missing.csv with a reason (no_demoaction_img or error message).
- After finishing all students, the script prints total counts (students processed, images downloaded) and the paths of index.csv and missing.csv.

6. How to run it

Assuming you already have Python 3 and the requests library installed.

Log into k12media
- Open the browser, log in as usual.
- Navigate to the student image report page for the target exam.
Collect constants
- From page HTML:
  - TEST_ID, SCHOOL_ID, TEST_STATE, SUBJECT_ID.
- From class dropdown and/or Network:
  - class_id values and labels → fill CLASSES.
- From Network:
  - Cookie header → paste into RAW_COOKIE.
Choose an output directory

Decide where to store images, e.g.:
text
```
/Users/yourname/Desktop/yue_imgs
```
Create the folder if it doesn’t exist.

Run the script

On macOS, for example:

bash

/Users/ylsuen/.venv/bin/python3 /path/to/k12media_download_imgs.py /Users/ylsuen/Desktop/yue_imgs

(Replace paths as needed on other systems.)

You should see logs like:

    text
    
    
  
[info] DWR 拉學生列表：class_id=91268 (格物3班, teacher=0)
[info]  班級 格物3班(91268) → 學生數：XXX
[info] 全部班級合計學生數：227
[info] 去重後學生數：227
[info]  拉圖片頁：致知3班 2722216 name
[info]  致知3班 2722216 name 共 2 張
[ok]    [1] -> /Users/.../致知3班_91272/2722216_name/p01.jpg
[ok]    [2] -> /Users/.../致知3班_91272/2722216_name/p02.jpg
...

Inspect output

Folder layout:

    text
    
    
  
yue_imgs/
  格物3班_91268/
    2721227_name/
      p01.jpg
      p02.jpg
    ...
  致知3班_91272/
    2722216_name/
      p01.jpg
      p02.jpg
    ...
  index.csv
  missing.csv

You can open any p01.jpg in an image viewer to confirm resolution; these are the same full-size scans the browser uses.

7. When things break: common failure modes

Login expired / cookies invalid
- Symptoms: HTML of a login page instead of images; HTTP 302 to login.
- Fix: Re-copy Cookie from a fresh logged-in browser session into RAW_COOKIE.
DWR errors about batchId or methodName
- Usually means the body format changed.
- Fix: Re-capture SelectSchoolUtil.findStudentListByClassId in DevTools and mirror the exact fields in the script.
findStudentImgs returns “No images”
- If UI also shows no images → student genuinely has no scanned script.
- If UI does show images → parameters in the script (especially studentName, isTeacherClass) don’t match the UI; recapture from Network.
Images download as tiny thumbnails
- In this deployment, this does not happen: showImg serves full pages.
- If a future version adds thumb= or size= parameters, always copy the exact URL the carousel loads, not some separate thumbnail URL.

8. Summary

To download an entire exam’s answer sheets from this legacy k12media system:

Use the browser to log in and open the student image report page.
Extract exam constants (testId, schoolId, subjectId, testState) from hidden inputs.
Build the CLASSES list from the class dropdown and/or captured requests.
Copy your browser Cookie header into RAW_COOKIE.
Run k12media_download_imgs.py <output_dir>.
Use the generated index.csv and folder tree to confirm that every student’s pages are present.

The script does not “guess” anything magical: it strictly replays the same DWR + form + image requests that the report page uses, but does it for all classes and all students without you having to click through hundreds of names.

跑通了觀察班級編號，增增減減，順帶可以把全年級下載了⋯⋯

Screenshot 2025-11-15 at 18.54.39.png (20251115002)

陳舊的網頁有陳舊的毛病⋯⋯

下午更新，加入單人全科圖下載模式：

Screenshot 2025-11-16 at 00.50.06.png (20251116001)