pikuri-pdf plugs PDF → text extraction into pikuri-core's +Pikuri::Extractor+ registry. The bundled +Pikuri::Extractors::PDF+ extractor wraps the pure-Ruby pdf-reader gem and extracts lazily: paged reads (the +read+ tool's windows) parse only the pages the window needs, so the first page of a 500-page PDF never pays for the other 499. Shipped separately from pikuri-core so the core's dependency tree stays minimal and auditable: pdf-reader and its transitive deps (Ascii85, afm, hashery, ruby-rc4, ttfunk) ride along only for hosts that opt into PDF support. Registration is explicit — +Pikuri::Extractors::PDF.register+ — so requiring the gem changes nothing by itself; the host script picks which extractors it wires in. One registration extends the +read+ tool, +web_scrape+, and the pikuri-vectordb indexer simultaneously.

Required Ruby Version

>= 3.3

Authors

Martin Vysny

Versions

  1. 0.0.7 June 11, 2026 (9 KB)
  2. 0.0.6 June 04, 2026 (9 KB)

Pushed by

SHA 256 checksum