臺灣華文電子書庫/Taiwan eBooks Downloader

2025-10-27 21:04

偶爾需要下載庫內書，把該寫的寫了吧還是。
當年超星duxiu，園地數典，電子書的製作與拖庫史，相信有一天 AI 會進化到可以寫出那些人，那些書，那些事。

    bash
    
<(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/taiwanebook.sh)

Safe-by-default, fast-when-possible. Works with Taiwan eBooks (National Central Library) PDF viewer pages by resolving the actual PDF file(s) and downloading them with the correct headers and resilient fallbacks.

Overview

taiwanebook.sh is a Bash script that downloads books from the Taiwan eBooks website by extracting the true PDF file paths from the PDF.js viewer and fetching them with the right headers.

Design goals:

Always complete: prioritize robustness and correctness.
Be polite: validate links first, then download with proper Referer.
Be fast when it helps: parallelize multi-file downloads; optionally use aria2 for big files.
Give clear feedback: progress bar, size, average speed, elapsed time, and final absolute path.

This manual targets macOS with Homebrew users.

Key Features

Reader-first parsing: reliably extracts all file= entries from /book/<ID>/reader pages (en/zh). Robust Python-based HTML scan with shell fallback.
Strict validation: only treats HTTP 200/206 as existing; probes size via Range: bytes=0-0 and Content-Range/Length.
Correct Referer: every probe and download uses the matching viewer.html?file=... URL as the Referer header, complying with the site’s anti-hotlink checks.
Resume support: curl -C - for partial re-runs.
Layered fallbacks: try your HTTP settings → force HTTP/1.1 → force HTTP/2 → relax low-speed guards with extended retries.
Speed options: --jobs N to download multiple files concurrently; optional aria2 engine for large files (multi-connection).
Friendly UX: English messages, progress bar, and final success line with absolute path.
Alias: installs twb alias automatically (zsh/bash).

Requirements

macOS (tested with recent releases).
Bash and curl (preinstalled on macOS).
Python 3 (optional but recommended): used for robust HTML parsing; script falls back to shell regex if unavailable.
- Install via Homebrew: brew install python
aria2 (optional): for faster large-file downloads via multi-connection transfers.
- Install via Homebrew: brew install aria2

Installation

Place taiwanebook.sh somewhere convenient (e.g., ~/bdfz/taiwanebook.sh) and make it executable:
bash
```
chmod +x ~/bdfz/taiwanebook.sh
```

Run once to install the twb alias (the script appends to your shell rc file):

    bash
    
~/bdfz/taiwanebook.sh --help
# You will see a message like:
# Installed alias: twb (added to ~/.zshrc or ~/.bashrc). Reload with: source '<rc>'

Reload your rc file so the alias is active in the current session:
bash
```
source ~/.zshrc     # or: source ~/.bashrc
```

From now on you can call the script as twb from anywhere.

Quick Start

Single book by ID:
bash
```
twb NCL-9900010967
```

Book page URL (with or without /reader, any locale):

bash

twb "https://taiwanebook.ncl.edu.tw/zh-tw/book/NCL-9900010967"

Speed up multi-file books (concurrent downloads):
bash
```
twb --jobs 3 NCL-9900010967
```

Force HTTP version if throughput is poor:

    bash
    
twb --http1.1 NCL-9900010967
# or
twb --http2 NCL-9900010967

Use aria2 for large files:

bash

brew install aria2
twb --engine aria2 NCL-9900010967

Usage

text

taiwanebook.sh [options] <BOOK_ID|URL> [more...]
cat list.txt | taiwanebook.sh [options] -

Preferred input is the BOOK ID (e.g., NCL-9900010967). You may also pass:

Book page URL (any locale, with/without /reader)
PDF.js viewer URL (contains ?file=/ebkFiles/...)

Options

--outdir DIR — Save files under DIR (default: .).
--sleep N — Sleep N seconds between downloads (default: 0).
--mode auto|single|all — Multi-part strategy (default: auto).
- auto: if parts F01.. are found, download the contiguous series; else fetch the single file.
- single: only attempt the single-file PDF.
- all: download all parts found up to MAX_PARTS.
--jobs N — Download up to N files in parallel when the book resolves to multiple files (default: 1).
--http1.1 | --http2 — Force HTTP version for curl (quicker on some servers; it varies).
--engine curl|aria2 — Choose the download engine.
- auto (default): uses aria2 when available for files ≥ 20MB; otherwise falls back to curl.
- aria2: always try aria2 first, then fall back to curl if it fails.
- curl: use curl only.
--safe | --no-safe — Enable/disable layered fallbacks (default: --safe).
--quiet — Minimal output.
-h, --help — Show help.

Environment Variables

LOG_FILE=./taiwanebook_error.log
MAX_PARTS=40
MULTIPART_MODE=auto
SLEEP_BETWEEN=0
JOBS=1
CURL_HTTP_OPTS=""
ENGINE=auto
FORCE_SAFE=1

Examples

    bash
    
    
  
# Multiple IDs from a file (4-way parallel at the task level)
cat ids.txt | xargs -P4 -n1 twb

# Save to a specific directory
twb --outdir ~/Downloads NCL-002573320

# Be polite to the server during large batches
twb --jobs 3 --sleep 1 NCL-002573320

# Force HTTP/1.1 if HTTP/2 underperforms (or vice versa)
twb --http1.1 NCL-002573320
twb --http2 NCL-002573320

# Force aria2 engine explicitly
twb --engine aria2 NCL-002573320

How It Works

Reader-first Resolution

The script prefers /book/<ID>/reader pages (both en and zh-tw) and extracts every occurrence of viewer.html?file=.... The file parameter is URL-decoded to get real paths like /ebkFiles/<ID>/<ID>.PDF or /ebkFiles/<ID>/<ID>F01.PDF.

Parsing path extraction uses Python when available (robust to attribute order and quoting), with a grep/sed fallback for portability.

Anti-hotlink (Referer) Handling

Each validated PDF gets assigned the matching viewer.html?file=... URL as its Referer. Both the probe and the download include this header:

Probes use Range: bytes=0-0 to check existence and parse total size.
Downloads pass the same Referer, ensuring consistency with server checks.

Single vs Multi-part Books

Some books are a single file (<ID>.PDF), others are split into parts (<ID>F01.PDF, <ID>F02.PDF, …). The script:

Tries to collect all parts directly from the reader page.
If none are found, it guesses common patterns, including lower-case variants (.pdf, f01).

Size Probing & Resume

A successful probe must yield HTTP 200 or 206. The script parses Content-Range or Content-Length for file size. Downloads use curl -C - to enable resume, and completion prints:

Absolute path
Human-friendly size
Average speed
Total time

Layered Fallbacks (Safety First)

When a download fails, the script tries the following (until success):

Your HTTP settings (CURL_HTTP_OPTS).
Force HTTP/1.1.
Force HTTP/2.
Relax guards & extend retries (disable low-speed aborts, longer timeouts).

If ENGINE=auto and aria2 is available with a known size ≥ 20MB, the script first attempts an aria2 multi-connection transfer, then falls back to curl if needed.

Performance Knobs

--jobs N: parallelizes multi-file books to reduce total completion time.
--engine aria2: multi-connection per file (great for large PDFs if the server permits).
--http1.1 / --http2: protocol-level tuning; try both to see which performs better with the server.

Logging & Troubleshooting

Errors are appended to ${LOG_FILE} (default ./taiwanebook_error.log) with timestamps.
If you see HTTP 22 from curl, it typically means a 4xx/5xx response. Check the log and ensure the Referer is correct (the script handles this automatically).
If progress reaches 100% and then stalls, upgrade to the latest script (we removed a redundant post-download probe that caused extra full GETs).
For persistently slow transfers:
- Try --http1.1 or --http2.
- Use --jobs for multi-part books.
- Install aria2 and run --engine aria2 for large single files.

Security & Etiquette

Use reasonable --jobs (e.g., 2–4) to avoid putting unnecessary load on public resources.
This tool is for personal, lawful access to content you are permitted to download.
Respect the site’s terms of service and copyright.

Compatibility Notes

macOS default Bash and curl are supported.
Python 3 improves HTML parsing reliability but is optional.
The script auto-detects zsh/bash and writes the twb alias accordingly.

FAQ

Q: Why is it slow?
A: Likely server-side throttling and cross-ocean latency. Try --http1.1 or --http2, use --jobs for multi-part books, or --engine aria2 for large files.

Q: Can I resume a partial download?
A: Yes. Just rerun the same command; -C - enables resume if the server supports it.

Q: Does it support books with parts in mixed case?
A: Yes. When guessing, the script tries .PDF/.pdf and F01../f01.. variants.

Changelog

2025-10-27 — Safe + fast release
- Reader-first parsing with Python fallback
- Strict probe (200/206) and dynamic Referer
- Layered fallbacks (http1.1/http2/relaxed guards)
- Optional aria2 engine; --jobs concurrency
- English success line with absolute path
- Removed redundant post-download HTTP probe

License

This script and manual are provided “as is”, without warranty. Use responsibly and within the bounds of applicable law and the website’s terms.