Pdf Processor

Path Reconciler Deepdive

2026-07-11

Under the Hood: Building a Correct PDF Path Reconciler

TLDR: PDF path operators are not segments. They are instructions for a stateful drawing machine. Turning them into clean, classified segments requires a three-stage pipeline: capture the CTM per-subpath in the adapter, reconcile subpath geometry analytically in the reconciler, merge fragmented dashes via global partition. Every stage has a correctness trap that most implementations miss.

The Problem

PDF.js gives you an operator list: a flat array of drawing commands. moveTo, lineTo, curveTo, rectangle, fill, stroke, closePath, and their variants. The path state machine is implicit: moveTo opens a new subpath, fill/stroke closes and paints it, save/restore pushes and pops the CTM.

To reconstruct table grids and figure bounding boxes, you need axis-aligned segments in viewport space. Getting there without losing provenance or correctness requires three non-obvious design decisions.

Stage 1: ctmAdapter: SubpathRecords, Not Segments

The adapter's job is to read the operator list and emit SubpathRecord objects. The key constraint: do not convert to viewport space yet.

Every subpath captures the CTM at the time moveTo was called:

const openSubpath = (constructPathId) => {
    if (currentSubpath.segs.length > 0 || currentSubpath.curves.length > 0) {
        subpaths.push(currentSubpath);
    }
    currentSubpath = {
        segs: [],
        curves: [],
        closed: false,
        filled: false,
        strokeWidth,
        strokeColor: strokeColor.slice(),
        fillColor: fillColor.slice(),
        constructPathId,
        ctm: ctm.slice(),
        id: subpathIdCounter++
    };
};

Segments go into segs[] in PDF user-space (not viewport space):

const bufferSeg = (ax, ay, bx, by) => {
    currentSubpath.segs.push({ ax, ay, bx, by });
};

Curves go into curves[] as raw control points, never converted:

currentSubpath.curves.push({
    p0: [rawPendingX, rawPendingY],
    p1: [args[ai], args[ai+1]],
    p2: [args[ai+2], args[ai+3]],
    p3: [args[ai+4], args[ai+5]]
});

Why defer the viewport transform? Because the reconciler needs to classify subpaths (RECT vs FREE_PATH) and detect axis alignment. That classification is more reliable in PDF user-space before rotation is applied. The CTM is applied per-subpath in the reconciler where it is needed.

The constructPathId counter increments for every constructPath call. This gives compound paths (a single PDF op that draws multiple sub-shapes) a shared ID, which the reconciler can use to detect siblings without ever touching LatticeReconstructor.

Stage 2: pathReconciler: Classification and Thin Rect Normalization

The reconciler receives all SubpathRecords and emits canonical segments in viewport space.

Classifying Subpaths

function classifySubpath(subpath, vpTransform) {
    const toViewport = (pdfX, pdfY) => {
        const [cx, cy] = applyMatrix(subpath.ctm, pdfX, pdfY);
        return [
            vpTransform[0]  cx + vpTransform[2]  cy + vpTransform[4],
            vpTransform[1]  cx + vpTransform[3]  cy + vpTransform[5],
        ];
    };
    // ... expand bbox from segs and curves
    // classify as RECT / ROUNDED_RECT / POLYGON / FREE_PATH
}

Each subpath gets its own CTM applied. This is the critical difference from a naive implementation that applies a single global transform to all segments.

Analytical Bezier Bounding Box

For curves[], the reconciler computes the exact bounding box by finding derivative roots:

export function curveBboxContrib(p0, p1, p2, p3) {
    const extrema = (a, b, c, d) => {
        const dA = -3a + 9b - 9c + 3d;
        const dB =  6a - 12b + 6*c;
        const dC = -3a + 3b;
        const roots = [];
        if (Math.abs(dA) > 1e-6) {
            const disc = dBdB - 4dA*dC;
            if (disc >= 0) {
                const sq = Math.sqrt(disc);
                const t1 = (-dB + sq) / (2*dA);
                const t2 = (-dB - sq) / (2*dA);
                if (t1 > 0 && t1 < 1) roots.push(t1);
                if (t2 > 0 && t2 < 1) roots.push(t2);
            }
        } else if (Math.abs(dB) > 1e-6) {
            roots.push(-dC / dB);
        }
        const eval_ = t => a(1-t)3 + 3bt(1-t)2 + 3ct2(1-t) + dt3;
        return [a, d, ...roots.map(eval_)];
    };
    const xs = extrema(p0[0], p1[0], p2[0], p3[0]);
    const ys = extrema(p0[1], p1[1], p2[1], p3[1]);
    return { xMin: Math.min(...xs), xMax: Math.max(...xs),
             yMin: Math.min(...ys), yMax: Math.max(...ys) };
}

This runs separately for X and Y. The derivative of a cubic Bezier B(t) is a quadratic. We solve it with the quadratic formula and evaluate B(t) at the real roots in (0, 1). Always correct, always O(1) per curve.

Thin Rect Normalization

A filled rectangle with h < eps is semantically a horizontal rule, not a box. The normalization:

function normalizeThinRect(classified, eps) {
    const w = classified.bbox.xMax - classified.bbox.xMin;
    const h = classified.bbox.yMax - classified.bbox.yMin;
if (h < eps) {         const cy = (classified.bbox.yMin + classified.bbox.yMax) / 2;         return [{ ax: classified.bbox.xMin, ay: cy,                   bx: classified.bbox.xMax, by: cy }];     } else if (w < eps) {         const cx = (classified.bbox.xMin + classified.bbox.xMax) / 2;         return [{ ax: cx, ay: classified.bbox.yMin,                   bx: cx, by: classified.bbox.yMax }];     }     // return 4 edge segments from bbox corners }

The threshold is eps = 4 (the lattice reconstructor's default merge tolerance), not a hardcoded constant. Using eps means the normalization condition is exactly the condition under which the lattice would merge and average the two thin edges anyway.

Stage 3: Global Dash Partition

Dashed lines in PDF are not one path. They are many short single-segment subpaths, each drawn independently. The dash merge algorithm cannot assume adjacency.

function mergeDashes(classifiedSubpaths, eps) {
    const freePaths = classifiedSubpaths.filter(c => c.type === 'FREE_PATH');
    const partitions = new Map();
for (const c of freePaths) {         if (c.segsViewport.length !== 1) continue;         const s = c.segsViewport[0];
// determine orientation + band center         const colorHex = c.strokeColor.map(v =>             Math.round(v * 255).toString(16).padStart(2, '0')).join('');         const swBucket = Math.round(c.strokeWidth * 2);         const key = ${colorHex}|${swBucket}|${orientation}|${yBucket};
partitions.get(key).push({ c, s, posMin, posMax });     }
// sort each partition by posMin, gap-test, emit merged canonical segments }

The partition key has four components: strokeColor hex, strokeWidth bucketed to 0.5px precision, orientation (H or V), and Y-band rounded to eps. Within each partition, segments are sorted by start position, then gap-tested against max(8, 0.4 × avgDashLength). Runs with gaps smaller than this threshold are merged into one canonical segment.

This handles any number of non-adjacent dashes at arbitrary positions in the operator stream. It also correctly keeps apart lines that are collinear but have different stroke widths, a situation that would have silently merged a thick border with a thin hairline under the original key.

What This Feeds

The output of reconcile() is { segments, filledRects }:

segments: canonical H/V/diagonal segments in viewport coordinates, each with strokeWidth and strokeColor
filledRects: closed filled rectangles (potential BOX regions, background fills, decorative panels)

These go directly into contextClassifier.js for region typing, and into latticeReconstructor.js for table grid inference. The lattice reconstructor is completely unchanged: it never needed to know about subpath provenance. The reconciler does all the isolation work upstream.

Read this post in the full Engineering Journal →