臺灣華文電子書庫/Taiwan eBooks Downloader
偶爾需要下載庫內書,把該寫的寫了吧還是。
當年超星duxiu,園地數典,電子書的製作與拖庫史,相信有一天 AI 會進化到可以寫出那些人,那些書,那些事。
<(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/taiwanebook.sh)Safe-by-default, fast-when-possible. Works with Taiwan eBooks (National Central Library) PDF viewer pages by resolving the actual PDF file(s) and downloading them with the correct headers and resilient fallbacks.
Overview
taiwanebook.sh is a Bash script that downloads books from the Taiwan eBooks website by extracting the true PDF file paths from the PDF.js viewer and fetching them with the right headers.
Design goals:
- Always complete: prioritize robustness and correctness.
- Be polite: validate links first, then download with proper
Referer. - Be fast when it helps: parallelize multi-file downloads; optionally use
aria2for big files. - Give clear feedback: progress bar, size, average speed, elapsed time, and final absolute path.
This manual targets macOS with Homebrew users.
Key Features
- Reader-first parsing: reliably extracts all
file=entries from/book/<ID>/readerpages (en/zh). Robust Python-based HTML scan with shell fallback. - Strict validation: only treats HTTP 200/206 as existing; probes size via
Range: bytes=0-0andContent-Range/Length. - Correct Referer: every probe and download uses the matching
viewer.html?file=...URL as theRefererheader, complying with the site’s anti-hotlink checks. - Resume support:
curl -C -for partial re-runs. - Layered fallbacks: try your HTTP settings → force HTTP/1.1 → force HTTP/2 → relax low-speed guards with extended retries.
- Speed options:
--jobs Nto download multiple files concurrently; optionalaria2engine for large files (multi-connection). - Friendly UX: English messages, progress bar, and final success line with absolute path.
- Alias: installs
twbalias automatically (zsh/bash).
Requirements
- macOS (tested with recent releases).
- Bash and curl (preinstalled on macOS).
- Python 3 (optional but recommended): used for robust HTML parsing; script falls back to shell regex if unavailable.
- Install via Homebrew:
brew install python
- Install via Homebrew:
- aria2 (optional): for faster large-file downloads via multi-connection transfers.
- Install via Homebrew:
brew install aria2
- Install via Homebrew:
Installation
-
Place
taiwanebook.shsomewhere convenient (e.g.,~/bdfz/taiwanebook.sh) and make it executable:bashchmod +x ~/bdfz/taiwanebook.sh -
Run once to install the
twbalias (the script appends to your shell rc file):bash~/bdfz/taiwanebook.sh --help # You will see a message like: # Installed alias: twb (added to ~/.zshrc or ~/.bashrc). Reload with: source '<rc>' -
Reload your rc file so the alias is active in the current session:
bashsource ~/.zshrc # or: source ~/.bashrc
From now on you can call the script as
twbfrom anywhere.
Quick Start
-
Single book by ID:
bashtwb NCL-9900010967 -
Book page URL (with or without
/reader, any locale):bashtwb "https://taiwanebook.ncl.edu.tw/zh-tw/book/NCL-9900010967" -
Speed up multi-file books (concurrent downloads):
bashtwb --jobs 3 NCL-9900010967 -
Force HTTP version if throughput is poor:
bashtwb --http1.1 NCL-9900010967 # or twb --http2 NCL-9900010967 -
Use aria2 for large files:
bashbrew install aria2 twb --engine aria2 NCL-9900010967
Usage
taiwanebook.sh [options] <BOOK_ID|URL> [more...]
cat list.txt | taiwanebook.sh [options] -Preferred input is the BOOK ID (e.g., NCL-9900010967). You may also pass:
- Book page URL (any locale, with/without
/reader) - PDF.js viewer URL (contains
?file=/ebkFiles/...)
Options
--outdir DIR— Save files underDIR(default:.).--sleep N— SleepNseconds between downloads (default:0).--mode auto|single|all— Multi-part strategy (default:auto).auto: if partsF01..are found, download the contiguous series; else fetch the single file.single: only attempt the single-file PDF.all: download all parts found up toMAX_PARTS.
--jobs N— Download up toNfiles in parallel when the book resolves to multiple files (default:1).--http1.1|--http2— Force HTTP version forcurl(quicker on some servers; it varies).--engine curl|aria2— Choose the download engine.auto(default): usesaria2when available for files ≥ 20MB; otherwise falls back tocurl.aria2: always tryaria2first, then fall back tocurlif it fails.curl: usecurlonly.
--safe|--no-safe— Enable/disable layered fallbacks (default:--safe).--quiet— Minimal output.-h,--help— Show help.
Environment Variables
LOG_FILE=./taiwanebook_error.logMAX_PARTS=40MULTIPART_MODE=autoSLEEP_BETWEEN=0JOBS=1CURL_HTTP_OPTS=""ENGINE=autoFORCE_SAFE=1
Examples
# Multiple IDs from a file (4-way parallel at the task level)
cat ids.txt | xargs -P4 -n1 twb
# Save to a specific directory
twb --outdir ~/Downloads NCL-002573320
# Be polite to the server during large batches
twb --jobs 3 --sleep 1 NCL-002573320
# Force HTTP/1.1 if HTTP/2 underperforms (or vice versa)
twb --http1.1 NCL-002573320
twb --http2 NCL-002573320
# Force aria2 engine explicitly
twb --engine aria2 NCL-002573320How It Works
Reader-first Resolution
The script prefers /book/<ID>/reader pages (both en and zh-tw) and extracts every occurrence of viewer.html?file=.... The file parameter is URL-decoded to get real paths like /ebkFiles/<ID>/<ID>.PDF or /ebkFiles/<ID>/<ID>F01.PDF.
Parsing path extraction uses Python when available (robust to attribute order and quoting), with a grep/sed fallback for portability.
Anti-hotlink (Referer) Handling
Each validated PDF gets assigned the matching viewer.html?file=... URL as its Referer. Both the probe and the download include this header:
- Probes use
Range: bytes=0-0to check existence and parse total size. - Downloads pass the same
Referer, ensuring consistency with server checks.
Single vs Multi-part Books
Some books are a single file (<ID>.PDF), others are split into parts (<ID>F01.PDF, <ID>F02.PDF, …). The script:
- Tries to collect all parts directly from the reader page.
- If none are found, it guesses common patterns, including lower-case variants (
.pdf,f01).
Size Probing & Resume
A successful probe must yield HTTP 200 or 206. The script parses Content-Range or Content-Length for file size. Downloads use curl -C - to enable resume, and completion prints:
- Absolute path
- Human-friendly size
- Average speed
- Total time
Layered Fallbacks (Safety First)
When a download fails, the script tries the following (until success):
- Your HTTP settings (
CURL_HTTP_OPTS). - Force HTTP/1.1.
- Force HTTP/2.
- Relax guards & extend retries (disable low-speed aborts, longer timeouts).
If ENGINE=auto and aria2 is available with a known size ≥ 20MB, the script first attempts an aria2 multi-connection transfer, then falls back to curl if needed.
Performance Knobs
--jobs N: parallelizes multi-file books to reduce total completion time.--engine aria2: multi-connection per file (great for large PDFs if the server permits).--http1.1/--http2: protocol-level tuning; try both to see which performs better with the server.
Logging & Troubleshooting
- Errors are appended to
${LOG_FILE}(default./taiwanebook_error.log) with timestamps. - If you see
HTTP 22fromcurl, it typically means a 4xx/5xx response. Check the log and ensure theRefereris correct (the script handles this automatically). - If progress reaches 100% and then stalls, upgrade to the latest script (we removed a redundant post-download probe that caused extra full GETs).
- For persistently slow transfers:
- Try
--http1.1or--http2. - Use
--jobsfor multi-part books. - Install
aria2and run--engine aria2for large single files.
- Try
Security & Etiquette
- Use reasonable
--jobs(e.g., 2–4) to avoid putting unnecessary load on public resources. - This tool is for personal, lawful access to content you are permitted to download.
- Respect the site’s terms of service and copyright.
Compatibility Notes
- macOS default Bash and curl are supported.
- Python 3 improves HTML parsing reliability but is optional.
- The script auto-detects zsh/bash and writes the
twbalias accordingly.
FAQ
Q: Why is it slow?
A: Likely server-side throttling and cross-ocean latency. Try --http1.1 or --http2, use --jobs for multi-part books, or --engine aria2 for large files.
Q: Can I resume a partial download?
A: Yes. Just rerun the same command; -C - enables resume if the server supports it.
Q: Does it support books with parts in mixed case?
A: Yes. When guessing, the script tries .PDF/.pdf and F01../f01.. variants.
Changelog
- 2025-10-27 — Safe + fast release
- Reader-first parsing with Python fallback
- Strict probe (200/206) and dynamic
Referer - Layered fallbacks (http1.1/http2/relaxed guards)
- Optional
aria2engine;--jobsconcurrency - English success line with absolute path
- Removed redundant post-download HTTP probe
License
This script and manual are provided “as is”, without warranty. Use responsibly and within the bounds of applicable law and the website’s terms.