SUEN

真的吐了

k12media 閱卷系統的成績報表有學生試卷原圖,有下載,但要一個一個點擊,煩。於是想自動爬下來。結果吐了。
這套系統基本上是「2008–2012 年 Java Web 技術堆疊」,為了一勞永逸,硬生生填了一整套舊體系跟現代腳本世界之間的坑;我去找找二十多年前的塑料袋⋯⋯
之所以痛罵所謂教育信息化,原因之一就是這種服務應試的所謂信息化,骨子裏,這些玩意兒之所以是純垃圾⋯⋯就是因為完全不是用來幫學生學習的。

下載圖片也是想補充到 NotebookLM ,但實際測試學生的考試報告:

Screenshot 2025-11-15 at 18.37.54.png (20251115001)

大面積的手寫識別實在還是災難,所以,也就用來看下文面,說下手寫用得到了。
手寫漢字這種落後玩意兒會消亡的,漢字手寫的識別,相信也會解決,畢竟要處理之前遺產;而考試不手寫的日子,繼續純期待吧。

引入這類閱卷和所謂成績報表乃至所謂 AI 的學校越多,教育的希望也自然越微茫;而教育信息化的牛皮也會越大而光鮮。


k12media exam image scraper manual

This document explains how to download all scanned answer sheet images for an exam from the legacy k12media system, using a small Python script that replays the same HTTP requests your browser sends.

The goal: a new admin, with no prior context, can follow this manual and successfully pull all images for a full grade/year.


1. Big picture

The exam image system is an early-2010s Java Web stack split over two domains:

On the student image report page you see:

The browser does not load all students at once. For each selected student it:

  1. Submits a form to ShowStudentImgsAction.a?findStudentImgs on test.k12media.cn.
  2. That returns an HTML snippet containing several <img> tags.
  3. Each <img> points to yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&....
  4. The browser then fetches those image URLs and renders them in the carousel.

The Python script simply:

There is no “thumbnail” API in between. The script downloads exactly the same images the teacher sees in the page.


2. Site architecture & data flow

2.1 Domains & roles

You log in once in your browser. The Python script then reuses your browser cookies to talk to both domains.


2.2 Key endpoints

Exact query strings vary, but the structure is stable.

  1. Student image report page (what you open in the browser):

    text
    https://test.k12media.cn/tqms/report/ShowStudentImgsAction.a?method=showStudentImgReport&testId=<TEST_ID>&subjectId=<SUBJECT_ID>&schoolId=<SCHOOL_ID>&testState=<STATE>&...

    On this page you see:

    • Top: exam summary
    • Left: student list
    • Middle: image carousel
    • Several hidden <input>s with metadata.
  2. Student list by class (DWR):

    text
    POST https://test.k12media.cn/tqms/dwr/call/plaincall/SelectSchoolUtil.findStudentListByClassId.dwr

    The response is a JavaScript snippet that contains objects like:

    js
    dwr.engine.remote.handleCallback("1","0",[
      { classId: 91268,
        noInClass: "2721101",
        orgUser: { name:"\u5F20\u4F55..." },
        ... },
      ...
    ]);

    The script parses classId, noInClass, and name from this.

  3. Student image list (findStudentImgs):

    In this deployment, the image list is identified by student name + class + type of class, not by student number:

    text
    POST https://test.k12media.cn/tqms/report/ShowStudentImgsAction.a?findStudentImgs

    Form fields include:

    • schoolId
    • testId
    • testState
    • studentName
    • classId
    • isTeacherClass (0 = administrative class, 1 = teaching group)
    • subjectId

    Response: HTML with several <img> tags, e.g.:

    html
    <img src="/tqms_image_server/DemoAction.a?showImg&imgFliePath=...&imgFileName=...">
  4. Actual images:

    text
    GET https://yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&imgFliePath=<...>&imgFileName=<...>

    Headers:

    • Content-Type: image/jpeg (or sometimes image/png)
    • Content-Length: ...

    These are the full-size page scans, exactly what the carousel shows.


2.3 Front-end behaviour

When you click a student in the left list:

  1. A form is submitted to ShowStudentImgsAction.a?findStudentImgs with:

    • studentName = the label you clicked
    • classId = current class
    • isTeacherClass determined by the page (administrative vs teaching class)
    • plus testId, subjectId, schoolId, testState.
  2. The server returns HTML that contains <img src="...DemoAction.a?showImg..."> for that student.

  3. The browser then issues GET requests to yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&... for each page and passes them into a jQuery FlexSlider carousel.

The script does the same thing, but loops over:


3. Packet capture: rediscovering things if they break

The system is old and unlikely to change radically, but if it does, you can always re-discover the APIs.

3.1 Tools

For this site, Chrome/Brave DevTools is enough; no need for full Wireshark.

Steps:

  1. Open the student image report page for the exam.
  2. Press F12 or ⌥⌘I → open Developer Tools.
  3. Go to the Network tab.
  4. Enable “Preserve log” so navigation doesn’t clear the list.

3.2 Finding findStudentListByClassId

  1. Filter by dwr in the Network tab.

  2. Change class in the dropdown or reload the page with a specific class selected.

  3. Look for:

    text
    /tqms/dwr/call/plaincall/SelectSchoolUtil.findStudentListByClassId.dwr
  4. Click it. Under Request Payload you’ll see a body similar to:

    text
    callCount=1
    nextReverseAjaxIndex=0
    c0-scriptName=SelectSchoolUtil
    c0-methodName=findStudentListByClassId
    c0-id=0
    c0-param0=string:<TEST_ID>
    c0-param1=string:<SCHOOL_ID>
    c0-param2=string:<CLASS_ID>
    c0-param3=string:<0 or 1>  # isTeacherClass
    batchId=1
    instanceId=0
    page=/tqms/report/ShowStudentImgsAction.a
    scriptSessionId=<DWRSESSIONID>/<TIMESTAMP>

    This is exactly the body the Python script constructs.

  5. In Response, you’ll see JavaScript representing the student list. The script parses out:

    • noInClass (class-internal student number)
    • orgUser.name (student name)
    • classId.

3.3 Finding findStudentImgs

  1. With Network tab open, click a student on the left.

  2. Filter by findStudentImgs.

  3. Look for:

    text
    POST /tqms/report/ShowStudentImgsAction.a?findStudentImgs
  4. Under Form Data you should see the parameters described above:

    text
    schoolId: ...
    testId: ...
    testState: ...
    studentName: (Chinese name)
    classId: ...
    isTeacherClass: 0 or 1
    subjectId: ...
  5. Under Response or Preview, you’ll see the small HTML snippet including:

    html
    <img src="/tqms_image_server/DemoAction.a?showImg&imgFliePath=...&imgFileName=...">

The script calls this endpoint once per student and collects all such src values.

3.4 Finding the image URLs

  1. Filter Network by DemoAction.

  2. After you click a student, you’ll see several GET requests like:

    text
    GET https://yue.k12media.cn/tqms_image_server/DemoAction.a?showImg&imgFliePath=...&imgFileName=...
  3. These URLs are exactly what the script downloads; there is no intermediate “thumbnail” version.

Screenshot 2025-11-15 at 19.10.49.png (20251115003)


4. Credentials & constants you must copy

The script does not perform login. It relies on your browser session.

  1. Make sure you’re already logged in and can see the student image report page.

  2. Open DevTools → Network.

  3. Click any request to https://test.k12media.cn.

  4. Under Request Headers, find the line:

    text
    Cookie: JSESSIONID=...; DWRSESSIONID=...; SERVERID=...; <possibly more>
  5. Copy the entire value (everything after Cookie:) and paste it into the script:

    python
    RAW_COOKIE = (
        "JSESSIONID=...; "
        "DWRSESSIONID=...; "
        "SERVERID=...; "
        "<other cookies if present>"
    )

When cookies expire (you start getting redirected to a login page), just repeat these steps and update RAW_COOKIE.

4.2 Exam metadata

On the student image report page:

  1. View page source or use DevTools Elements panel.

  2. Search for testId, schoolId, subjectId, testState.

  3. You should see hidden <input>s like:

    html
    <input type="hidden" id="testId" name="testId" value="119274">
    <input type="hidden" id="schoolId" name="schoolId" value="3600">
    <input type="hidden" id="testState" name="testState" value="1">
    <input type="hidden" id="subjectId" name="subjectId" value="1">
  4. Copy these values into the script:

    python
    TEST_ID    = 119274
    SCHOOL_ID  = 3600
    TEST_STATE = 1
    SUBJECT_ID = 1

4.3 Class list

In this version, the script uses a small dataclass:

python
@dataclass
class ClassConfig:
    class_id: int
    is_teacher_class: bool  # False = administrative class, True = teaching class
    label: str

CLASSES: List[ClassConfig] = [
    ClassConfig(class_id=91266,   is_teacher_class=False, label="格物1班"),
    ClassConfig(class_id=91267,   is_teacher_class=False, label="格物2班"),
    ClassConfig(class_id=91270,   is_teacher_class=False, label="致知1班"),
    ClassConfig(class_id=91271,   is_teacher_class=False, label="致知2班"),
    ClassConfig(class_id=91268,   is_teacher_class=False, label="格物3班"),
    ClassConfig(class_id=91272,   is_teacher_class=False, label="致知3班"),
    ClassConfig(class_id=1883835, is_teacher_class=True,  label="格物3班班"),
    ClassConfig(class_id=1883842, is_teacher_class=True,  label="致知3班班"),
]

How to obtain class_id:

If a new exam uses different classes, just adjust this list.


5. Python script behaviour

High-level phases:

  1. Session setup

    • Creates a requests.Session().
    • Sets User-Agent to a realistic browser agent.
    • Parses RAW_COOKIE into separate cookies and attaches them.
  2. Fetch all students for all classes (via DWR)

    For each ClassConfig in CLASSES:

    • Build a DWR POST body:

      text
      callCount=1
      nextReverseAjaxIndex=0
      c0-scriptName=SelectSchoolUtil
      c0-methodName=findStudentListByClassId
      c0-id=0
      c0-param0=string:<TEST_ID>
      c0-param1=string:<SCHOOL_ID>
      c0-param2=string:<CLASS_ID>
      c0-param3=string:<0 or 1>  # isTeacherClass
      batchId=1
      instanceId=0
      page=/tqms/report/ShowStudentImgsAction.a
      scriptSessionId=<DWRSESSIONID>/<TIMESTAMP>
    • POST it to:

      text
      https://test.k12media.cn/tqms/dwr/call/plaincall/SelectSchoolUtil.findStudentListByClassId.dwr
    • Parse the response text with a regex to extract:

      • classId
      • noInClass
      • orgUser.name (decoded from \uXXXX)
    • Construct Student objects with fields:

      • class_id
      • class_label
      • is_teacher_class
      • no_in_class
      • name
    • Merge students from all classes into one list and de-duplicate by (class_id, no_in_class, name).

  3. For each student: fetch image HTML + image URLs

    For each unique Student:

    • Build a student directory:

      • Class folder: <class_label>_<class_id>/
      • Student folder: <no_in_class>_<student_name>/
    • POST to:

      text
      https://test.k12media.cn/tqms/report/ShowStudentImgsAction.a?findStudentImgs

      with form data:

      text
      schoolId      = SCHOOL_ID
      testId        = TEST_ID
      testState     = TEST_STATE
      studentName   = student.name
      classId       = student.class_id
      isTeacherClass= 1 if student.is_teacher_class else 0
      subjectId     = SUBJECT_ID
    • Get back the HTML and extract all <img src="..."> values whose src contains DemoAction.a.

  4. Download all images for that student

    For each src extracted:

    • If it’s relative, join with:

      text
      IMG_SERVER_BASE = "https://yue.k12media.cn/tqms_image_server/"
    • Send a GET with:

      • User-Agent header
      • Referer set to ShowStudentImgsAction.a?findStudentImgs
    • Read Content-Type to guess .jpg / .png extension.

    • Save the bytes as:

      text
      p01.jpg, p02.jpg, ...
    • Write a row to index.csv with:

      • Exam IDs
      • Class info
      • Student info
      • Page index
      • Relative local path
      • Original src URL
  5. Logging “missing” cases

    • If the HTML for a student has no DemoAction images, the script logs a warning and writes that student into missing.csv with a reason (no_demoaction_img or error message).
    • After finishing all students, the script prints total counts (students processed, images downloaded) and the paths of index.csv and missing.csv.

6. How to run it

Assuming you already have Python 3 and the requests library installed.

  1. Log into k12media

    • Open the browser, log in as usual.
    • Navigate to the student image report page for the target exam.
  2. Collect constants

    • From page HTML:
      • TEST_ID, SCHOOL_ID, TEST_STATE, SUBJECT_ID.
    • From class dropdown and/or Network:
      • class_id values and labels → fill CLASSES.
    • From Network:
      • Cookie header → paste into RAW_COOKIE.
  3. Choose an output directory

    Decide where to store images, e.g.:

    text
    /Users/yourname/Desktop/yue_imgs

    Create the folder if it doesn’t exist.

  4. Run the script

    On macOS, for example:

    bash
    /Users/ylsuen/.venv/bin/python3 /path/to/k12media_download_imgs.py /Users/ylsuen/Desktop/yue_imgs

    (Replace paths as needed on other systems.)

    You should see logs like:

    text
    [info] DWR 拉學生列表:class_id=91268 (格物3班, teacher=0)
    [info]  班級 格物3班(91268) → 學生數:XXX
    [info] 全部班級合計學生數:227
    [info] 去重後學生數:227
    [info]  拉圖片頁:致知3班 2722216 name
    [info]  致知3班 2722216 name 共 2 張
    [ok]    [1] -> /Users/.../致知3班_91272/2722216_name/p01.jpg
    [ok]    [2] -> /Users/.../致知3班_91272/2722216_name/p02.jpg
    ...
  5. Inspect output

    Folder layout:

    text
    yue_imgs/
      格物3班_91268/
        2721227_name/
          p01.jpg
          p02.jpg
        ...
      致知3班_91272/
        2722216_name/
          p01.jpg
          p02.jpg
        ...
      index.csv
      missing.csv

    You can open any p01.jpg in an image viewer to confirm resolution; these are the same full-size scans the browser uses.


7. When things break: common failure modes


8. Summary

To download an entire exam’s answer sheets from this legacy k12media system:

  1. Use the browser to log in and open the student image report page.
  2. Extract exam constants (testId, schoolId, subjectId, testState) from hidden inputs.
  3. Build the CLASSES list from the class dropdown and/or captured requests.
  4. Copy your browser Cookie header into RAW_COOKIE.
  5. Run k12media_download_imgs.py <output_dir>.
  6. Use the generated index.csv and folder tree to confirm that every student’s pages are present.

The script does not “guess” anything magical: it strictly replays the same DWR + form + image requests that the report page uses, but does it for all classes and all students without you having to click through hundreds of names.


跑通了觀察班級編號,增增減減,順帶可以把全年級下載了⋯⋯

Screenshot 2025-11-15 at 18.54.39.png (20251115002)

陳舊的網頁有陳舊的毛病⋯⋯