SUEN

臺灣華文電子書庫/Taiwan eBooks Downloader

偶爾需要下載庫內書,把該寫的寫了吧還是。
當年超星duxiu,園地數典,電子書的製作與拖庫史,相信有一天 AI 會進化到可以寫出那些人,那些書,那些事。

bash
<(curl -Ls https://raw.githubusercontent.com/ieduer/bdfz/main/taiwanebook.sh)

Safe-by-default, fast-when-possible. Works with Taiwan eBooks (National Central Library) PDF viewer pages by resolving the actual PDF file(s) and downloading them with the correct headers and resilient fallbacks.


Overview

taiwanebook.sh is a Bash script that downloads books from the Taiwan eBooks website by extracting the true PDF file paths from the PDF.js viewer and fetching them with the right headers.

Design goals:

This manual targets macOS with Homebrew users.


Key Features


Requirements


Installation

  1. Place taiwanebook.sh somewhere convenient (e.g., ~/bdfz/taiwanebook.sh) and make it executable:

    bash
    chmod +x ~/bdfz/taiwanebook.sh
  2. Run once to install the twb alias (the script appends to your shell rc file):

    bash
    ~/bdfz/taiwanebook.sh --help
    # You will see a message like:
    # Installed alias: twb (added to ~/.zshrc or ~/.bashrc). Reload with: source '<rc>'
  3. Reload your rc file so the alias is active in the current session:

    bash
    source ~/.zshrc     # or: source ~/.bashrc

From now on you can call the script as twb from anywhere.


Quick Start


Usage

text
taiwanebook.sh [options] <BOOK_ID|URL> [more...]
cat list.txt | taiwanebook.sh [options] -

Preferred input is the BOOK ID (e.g., NCL-9900010967). You may also pass:

Options

Environment Variables

Examples

bash
# Multiple IDs from a file (4-way parallel at the task level)
cat ids.txt | xargs -P4 -n1 twb

# Save to a specific directory
twb --outdir ~/Downloads NCL-002573320

# Be polite to the server during large batches
twb --jobs 3 --sleep 1 NCL-002573320

# Force HTTP/1.1 if HTTP/2 underperforms (or vice versa)
twb --http1.1 NCL-002573320
twb --http2 NCL-002573320

# Force aria2 engine explicitly
twb --engine aria2 NCL-002573320

How It Works

Reader-first Resolution

The script prefers /book/<ID>/reader pages (both en and zh-tw) and extracts every occurrence of viewer.html?file=.... The file parameter is URL-decoded to get real paths like /ebkFiles/<ID>/<ID>.PDF or /ebkFiles/<ID>/<ID>F01.PDF.

Parsing path extraction uses Python when available (robust to attribute order and quoting), with a grep/sed fallback for portability.

Each validated PDF gets assigned the matching viewer.html?file=... URL as its Referer. Both the probe and the download include this header:

Single vs Multi-part Books

Some books are a single file (<ID>.PDF), others are split into parts (<ID>F01.PDF, <ID>F02.PDF, …). The script:

  1. Tries to collect all parts directly from the reader page.
  2. If none are found, it guesses common patterns, including lower-case variants (.pdf, f01).

Size Probing & Resume

A successful probe must yield HTTP 200 or 206. The script parses Content-Range or Content-Length for file size. Downloads use curl -C - to enable resume, and completion prints:

Layered Fallbacks (Safety First)

When a download fails, the script tries the following (until success):

  1. Your HTTP settings (CURL_HTTP_OPTS).
  2. Force HTTP/1.1.
  3. Force HTTP/2.
  4. Relax guards & extend retries (disable low-speed aborts, longer timeouts).

If ENGINE=auto and aria2 is available with a known size ≥ 20MB, the script first attempts an aria2 multi-connection transfer, then falls back to curl if needed.

Performance Knobs


Logging & Troubleshooting


Security & Etiquette


Compatibility Notes


FAQ

Q: Why is it slow?
A: Likely server-side throttling and cross-ocean latency. Try --http1.1 or --http2, use --jobs for multi-part books, or --engine aria2 for large files.

Q: Can I resume a partial download?
A: Yes. Just rerun the same command; -C - enables resume if the server supports it.

Q: Does it support books with parts in mixed case?
A: Yes. When guessing, the script tries .PDF/.pdf and F01../f01.. variants.


Changelog


License

This script and manual are provided “as is”, without warranty. Use responsibly and within the bounds of applicable law and the website’s terms.