K12 教材 PDF 們
因為空出一台機子,謀劃將 AIK12 教科書項目做完。
從檢索一個關鍵詞,呈現 K12 教科書全部相關內容開始,AI 繼續實時對話。
先拿教材,大約三小時完成,考慮學生其實一直需要電子教材,順手開源了就:

SmartEdu Textbooks Downloader — Operator’s Manual (v1.0)
Script: jks.sh
Last updated: 2025‑10‑23
Scope: Download K‑12 textbooks from the SmartEdu platform to your local machine or server, with resilient retries, resume, and a local HTML index page (no anti‑leech 403).
1) What this tool does
- Crawls the SmartEdu textbook catalog for the phase(s) and subject(s) you choose.
- Resolves the direct PDF links per book and downloads them to a local directory.
- Resumes incomplete downloads and retries failed files automatically.
- Deduplicates intelligently: files with the same title are disambiguated by appending a short content ID suffix; when you rerun, paths are reused by book‑ID to avoid re‑downloads.
- Writes structured results:
index.json— success list withid/title/subject/pdf_url/pathfailed.json— failed items to retry laterindex.html— a local, filterable index page linking to local PDFs (no 403).
Important: The HTML index links to local files, not remote URLs, so the SmartEdu anti‑leech barriers (403 Forbidden) are not triggered.
2) Supported platforms
- Linux: Debian/Ubuntu, RHEL/CentOS/Fedora, Arch, openSUSE, Alpine
- macOS: Intel/Apple Silicon with Homebrew (recommended)
The script self‑checks and installs missing dependencies per distro. If a Python virtual environment (venv) cannot be created (e.g., minimal OS images), the script falls back to system Python transparently.
3) Quick start
3.1 Ubuntu/Debian (root or sudo)
# Recommended: run as root on a clean VPS (it will auto-install python3/pip/venv if needed)
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/jks.sh)3.2 macOS (Homebrew)
# Ensure Homebrew is installed (https://brew.sh). Then:
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/jks.sh)3.3 Non‑interactive “just run”
# Example: High School, pick two subjects; skip interactive prompts
bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/jks.sh) -p 高中 -s 语文,数学 -y3.4 Force system Python (skip venv)
# For minimal containers or constrained images
USE_SYSTEM_PY=1 bash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/jks.sh) -p 高中 -yTip: Interactive mode asks only two things (phase and subjects) and then starts immediately. Existing files are resumed without asking.
4) Interactive wizard (default)
When -y is not set and STDIN is a TTY, you’ll see:
- Phase (enter a number 1–6):
1) 小学 2) 初中 3) 高中 4) 特殊教育 5) 小学54 6) 初中54 - Subjects (comma‑separated, leave empty for defaults):
语文,数学,英语,思想政治,历史,地理,物理,化学,生物
After that the downloader starts—no confirmation prompt, no questions about retries or output path.
Existing files are validated and skipped or resumed automatically.
5) Command‑line options
-p PHASE Phase (小学|初中|高中|特殊教育|小学54|初中54)
-s SUBJECTS Comma-separated subjects (default: all standard subjects)
-m "WORDS" Optional title keywords (space-separated) to narrow selection
-i IDS Specific book IDs (comma-separated) to download
-o DIR Output directory (default: ./smartedu_textbooks)
-R Retry only the items recorded in failed.json
-c N Max HTTP concurrency (integer). Leave unset for auto-tuning
-d N Max disk write concurrency (integer). Leave unset for auto-tuning
-n N Limit to first N books (debug/smoke test)
-T N Post-round auto-retry rounds (default: 2; 0 disables)
-y Non-interactive run (skip prompts; use provided options)
-h HelpEnvironment variables
USE_SYSTEM_PY=1— skipvenventirely and use system Python.DEBUG=1— verbose shell tracing for diagnosis.- Standard proxies are honored if exported:
HTTP_PROXY,HTTPS_PROXY,NO_PROXY.
6) Output layout
smartedu_textbooks/
├─ <Subject>/
│ ├─ <Title>.pdf
│ ├─ <Title>__<id8>.pdf # when same-title collision occurs
│ └─ ... # partial/in-progress: <file>.part
├─ index.json # success list
├─ failed.json # failures (for -R), may be empty
├─ index.html # local, filterable index page
└─ smartedu_download.log # detailed run log- Stable paths: when you rerun the script, book paths are reused by book‑ID from
index.json. This prevents “download again” after a naming update. - Partial files: downloader writes
.partand renames to.pdfwhen complete.
7) HTML index
- Generated every run from current
index.json/failed.json. - Uses relative links to local PDFs, so you won’t hit SmartEdu’s 403 (anti‑leech).
- Includes: subject dropdown, title keyword filter, “Completed” vs “Failed” sections.
7.1 Local preview
cd smartedu_textbooks
python3 -m http.server 8000
# Open: http://localhost:8000/index.html7.2 Share on your server (optional, production‑friendly)
- Move public files to a web‑servable path and serve via Nginx:
# One-time setup
sudo mkdir -p /srv/smartedu_textbooks
sudo rsync -a --delete smartedu_textbooks/ /srv/smartedu_textbooks/
sudo chmod -R o+rX /srv/smartedu_textbooks
# Nginx site
sudo tee /etc/nginx/sites-available/smartedu >/dev/null <<'EOF'
server {
listen 80;
server_name _;
root /srv/smartedu_textbooks;
index index.html;
autoindex on;
autoindex_exact_size off;
autoindex_localtime on;
location / { try_files $uri $uri/ =404; }
types { application/pdf pdf; }
add_header X-Content-Type-Options nosniff;
}
EOF
sudo ln -sf /etc/nginx/sites-available/smartedu /etc/nginx/sites-enabled/smartedu
sudo nginx -t && sudo systemctl reload nginx
# Update after each run
rsync -a --delete smartedu_textbooks/ /srv/smartedu_textbooks/If you need access control, add
auth_basic+htpasswdor put the site behind your VPN.
8) How selection & downloads work
- Catalog fetch → filters by phase & subjects (and optional keywords or IDs).
- Direct link resolution per book.
- Target path planning
- Prefer a previously recorded path from
index.jsonfor the samebook_id. - If the planned
<Title>.pdfalready exists (or collides), append__<id8>and retry with numeric suffixes as needed.
- Prefer a previously recorded path from
- Validation & planning
- If a target already exists and looks like a valid PDF, it is counted as “already OK” (no task created).
- Incomplete/corrupted or missing files are scheduled for download.
- Download (async, resumable) →
.part→.pdf. - Post‑round retries — the round summary identifies failures and repeats up to
-Trounds (default 2). - Writes
index.json,failed.json, and regeneratesindex.html.
9) Typical workflows
-
First complete download (interactive)
bashbash <(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/jks.sh) # choose: 3) 高中, subjects: (enter to keep defaults) -
Add a subject later
bashbash jks.sh -p 高中 -s 物理,化学 -y # Already-completed files are skipped; new ones are fetched -
Retry only failed items
bashbash jks.sh -R -y -
Limit to first N for a smoke test
bashbash jks.sh -p 高中 -s 数学 -n 5 -y
10) Notes on performance & reliability
- HTTP/disk concurrency is tunable (
-c,-d) but defaults are sensible. High values may hit server throttling. - Post‑round retries
-T(default 2) catch transient network/store errors. - The resolver knows multiple SmartEdu edge hosts; the downloader resumes on
.partfiles and validates PDFs before counting success.
11) Safety & compliance
- Educational resources are subject to the platform’s terms of use. Use this tool only where permitted.
- This project is for personal education/research. You are responsible for any local laws or agreements.
- Avoid public mirroring without authorization.
12) Changelog (excerpt)
- 2025‑10‑23
- Interactive wizard simplified (phase & subjects only).
- Stable path reuse by book‑ID; collision suffix
__<id8>; cleaner counts. index.htmlstatic index generated from local files (no 403).- Auto‑install/fallback across distros; non‑interactive apt; venv hardening.
- Retry‑only mode
-R; auto post‑round retries-T(default 2). USE_SYSTEM_PYandDEBUGenvironment flags.
13) FAQ
Q: Can I run this as a normal user?
Yes. The script attempts a local venv. If creation fails and you don’t have sudo, use USE_SYSTEM_PY=1 and ensure you can write to the output directory.
Q: Can I pause and resume?
Yes—interrupt at any time. Rerun later; the downloader will validate, resume partials, and skip completed items.
Q: How do I filter to a single book?
Use -m "keyword1 keyword2" to match title words, or -i bookId1,bookId2 for exact IDs.
Q: Why does the browser download fail if I click a remote link?
Because of the Referer checks. Use the generated index.html (local links), or serve your local copies via a web server you control.
14) Support
If you hit a cryptic error, rerun with DEBUG=1 and share the tail of smartedu_textbooks/smartedu_download.log and the console snippet around the failure.