Engineering Journal
Pdf Processor
Pdf Processor

ArrayBuffer Detach: Why Reprocess Threw Empty PDF on Every Call

2026-06-01

TLDR

Bug: every reprocess call after the first threw "empty PDF." Cause: pdfjsLib.getDocument({ data: bytes }) transfers the underlying ArrayBuffer, detaching it in the calling thread. Fix: _cachedBytes = bytes.slice() before the transfer.

Repo: tools/pdf-processor

Symptom

First extraction worked. Re-extraction with a pipeline override (triggered by the region editor) always produced "empty PDF" or a geometry worker crash. _cachedBytes.byteLength in the worker was 0.


Root Cause

pdfjsLib.getDocument({ data: bytes }) uses a structured clone with transfer to pass the PDF bytes to PDF.js's internal sub-worker. Transfer is not a copy: it moves ownership of the underlying ArrayBuffer to the recipient thread, setting byteLength to 0 in the sender.

// before fix
const bytes = new Uint8Array(buffer);
_cachedBytes = bytes; // reference to the original buffer

await pdfjsLib.getDocument({ data: bytes }).promise; // bytes.buffer is now detached -- byteLength is 0 // _cachedBytes points to the same detached buffer

// later, on reprocess: const pdf = await pdfjsLib.getDocument({ data: _cachedBytes }).promise; // "empty PDF" -- _cachedBytes.byteLength === 0

The fix is to copy the bytes before the transfer takes ownership:

// after fix
const bytes = new Uint8Array(buffer);
_cachedBytes = bytes.slice(); // independent copy, survives the transfer

await pdfjsLib.getDocument({ data: bytes }).promise; // bytes is now detached, _cachedBytes is intact


Guard

Any time you pass a TypedArray or ArrayBuffer to a Web Worker via postMessage with a transfer list, or to a library that performs a transfer internally, your reference to that buffer is dead after the call. Either slice before passing, or do not retain the reference.

PDF.js specifically: getDocument({ data: ... }) with a Uint8Array always transfers. If you need to re-use the bytes, slice before calling.


Lesson

Structured clone with transfer is a move, not a copy. Holding a reference to the original array after a transfer is holding a reference to nothing.

Read this post in the full Engineering Journal →