ArrayBuffer Detach: Why Reprocess Threw Empty PDF on Every Call
TLDR
Bug: every reprocess call after the first threw "empty PDF." Cause: pdfjsLib.getDocument({ data: bytes }) transfers the underlying ArrayBuffer, detaching it in the calling thread. Fix: _cachedBytes = bytes.slice() before the transfer.
Repo: tools/pdf-processor
Symptom
First extraction worked. Re-extraction with a pipeline override (triggered by the region editor) always produced "empty PDF" or a geometry worker crash. _cachedBytes.byteLength in the worker was 0.
Root Cause
pdfjsLib.getDocument({ data: bytes }) uses a structured clone with transfer to pass the PDF bytes to PDF.js's internal sub-worker. Transfer is not a copy: it moves ownership of the underlying ArrayBuffer to the recipient thread, setting byteLength to 0 in the sender.
// before fix
const bytes = new Uint8Array(buffer);
_cachedBytes = bytes; // reference to the original buffer
await pdfjsLib.getDocument({ data: bytes }).promise; // bytes.buffer is now detached -- byteLength is 0 // _cachedBytes points to the same detached buffer
// later, on reprocess: const pdf = await pdfjsLib.getDocument({ data: _cachedBytes }).promise; // "empty PDF" -- _cachedBytes.byteLength === 0
The fix is to copy the bytes before the transfer takes ownership:
// after fix
const bytes = new Uint8Array(buffer);
_cachedBytes = bytes.slice(); // independent copy, survives the transfer
await pdfjsLib.getDocument({ data: bytes }).promise; // bytes is now detached, _cachedBytes is intact
Guard
Any time you pass a TypedArray or ArrayBuffer to a Web Worker via postMessage with a transfer list, or to a library that performs a transfer internally, your reference to that buffer is dead after the call. Either slice before passing, or do not retain the reference.
PDF.js specifically: getDocument({ data: ... }) with a Uint8Array always transfers. If you need to re-use the bytes, slice before calling.
Lesson
Structured clone with transfer is a move, not a copy. Holding a reference to the original array after a transfer is holding a reference to nothing.