Coordinate Spaces, Page Snap, and Three Surfaces of Truth: The Math Behind a Browser-Based PDF Editor
A working tool can hide a lot of incorrect math.
Our browser-based PDF processor has been in flight for weeks. The extraction engine produces clean HTML. The Monaco editor lets you tweak the output. The visual diff places the rendered PDF next to the extracted HTML. On the surface, everything works.
Then I sat down to add some UI polish: alignment buttons, a zoom control, scroll sync between the visual diff panes. Three things. Half a day's work.
By the end of the day I had retrofitted three foundational invariants into the codebase. Each one was a piece of math that had been silently wrong, surviving every test because nothing crashed.
This post is about those three invariants. They are unrelated on the surface. They are the same problem underneath.
1. Coordinate spaces are not optional
The PDF processor uses pdfjs-dist to render and parse PDFs. PDF.js gives you two things on every page: a list of vector path operators (opList) and a list of text items (textContent.items).
Both come with positional information. Both use PDF user space. Y origin is bottom-left. Y increases upward. Coordinates are in points (1/72 inch).
The viewport is what you actually paint. We render at scale 1.5×, so the viewport's transform looks like:
viewport.transform = [1.5, 0, 0, -1.5, 0, height]
The -1.5 flips Y so the screen origin is top-left. The matrix takes a point in PDF user space and produces a point in viewport (screen) space.
Our extraction pipeline has two coordinate-space discontinuities:
- Vector segments are baked through the viewport transform inside
ctmAdapter.js. They emerge in viewport space. - Text items stay in PDF user space. We transform their positions on demand with
viewport.convertToViewportPoint.
item.transform[4] and item.transform[5] as the text's PDF-space position. It also gives you item.width and a font size you can derive from item.transform[3]. Those are not transformed by the viewport. The position is in PDF points. The width is in PDF points. The font size is in PDF points. If you transform the position to viewport space and then add the untransformed width to it, you have just compared apples to oranges.
In our case, this looked like:
// inside contextClassifier — the buggy version
const [vx, vy] = toViewport(vpT, item.transform[4], item.transform[5]);
const width = item.width || (fontSize 0.5 (item.str.length || 1));
return { idx, vx, vy, fontSize, width };
Then later, in the underline-detection heuristic:
const textXEnd = tm.vx + tm.width; // viewport-X + PDF-points-width
if (yDist >= -1 && yDist <= 5 &&
tm.vx <= hXMax + 2 && textXEnd >= hXMin - 2) {
// ...
}
tm.vx is in viewport pixels. tm.width is in PDF points. At scale 1.5, the addition produces a value that is 33 percent narrower than the actual visual extent of the text. Most underlines escaped detection because the math thought the text didn't reach far enough to overlap them.
The same mismatch broke the column-coverage map (which fills a screen-pixel-indexed array with text widths in PDF points), the Y-band tolerance (bodyFontSize 0.45 applied to viewport-Y values), and the paragraph-gap detector (bodyFontSize 1.8 applied to viewport-Y differences).
Four heuristics, one root cause: a unit mismatch hidden inside an addition.
The fix:
// derive the effective scale from the viewport's column vectors
const scaleX = Math.hypot(vpT[0], vpT[1]) || 1;
const scaleY = Math.hypot(vpT[2], vpT[3]) || 1;
const textMeta = textItems.map((item, idx) => { const [vx, vy] = toViewport(vpT, item.transform[4], item.transform[5]); const fontSizePt = Math.abs(item.transform[3] || 12); const widthPt = item.width || (fontSizePt 0.5 (item.str.length || 1)); return { idx, vx, vy, vWidth: widthPt * scaleX, // viewport pixels for vx-relative checks vFont: fontSizePt * scaleY, // viewport pixels for vy-relative checks fontSize: fontSizePt, // PDF points for ratio comparisons }; });
We keep both. vWidth and vFont are viewport pixels and get used wherever a comparison hits a viewport-space coordinate. fontSize stays in PDF points and gets used wherever the comparison is a ratio between two font sizes (heading detection compares lineFontSize / bodyFontSize, where both are in the same unit and the unit cancels out).
The lesson is small but it has teeth: when an SDK gives you positions in one space and dimensions in another, never mix them in a single arithmetic expression. Keep the converted values right next to the originals on every record.
2. Semantic and spatial layouts can't share a ruler
Our visual diff places two scrollable panes side by side. The left pane renders the original PDF using pdfjs-dist. Each page becomes a <div class="page-wrapper"> containing a <canvas>. The right pane contains the extracted HTML, where each PDF page maps to a <section data-page="N"> card.
The brief: when you scroll one pane, the other should follow.
My first implementation used the obvious approach. Find which page the source pane is currently anchored to. Compute the proportional offset within that page. Apply the same proportion to the matching page in the other pane.
// the doomed first version
function readActivePage(container, selector) {
const items = container.querySelectorAll(selector);
const top = container.scrollTop;
for (const el of items) {
if (top < el.offsetTop + el.offsetHeight) {
return {
page: el.getAttribute('data-page'),
ratio: (top - el.offsetTop) / el.offsetHeight,
};
}
}
}
function applyToTarget(container, info, selector) { const el = container.querySelector(${selector}[data-page="${info.page}"]); container.scrollTop = el.offsetTop + info.ratio * el.offsetHeight; }
This passed the smoke test. I shipped it. The user reported it didn't work.
Two things were wrong, layered on top of each other.
The CSS zoom trap
The PDF zoom feature uses the CSS zoom property. We picked zoom over transform: scale() specifically because zoom causes real layout reflow. Scrollbars know about it. Container heights track it. That's the whole point.
But zoom interacts oddly with the JavaScript geometry properties. In Chromium, element.offsetTop and element.offsetHeight report unzoomed values for elements with zoom: 1.5. The element's layout box is still the original size from JavaScript's perspective. Meanwhile, container.scrollTop is in zoomed pixels, because that's what the scrollbar actually moves.
At zoom 1.5, if the user has scrolled 600 zoomed pixels into the container, the math says:
top = 600
el.offsetTop = 400 (page 2's unzoomed start)
el.offsetHeight = 1100 (page 2's unzoomed height)
ratio = (600 - 400) / 1100 = 0.18
The math claims we are 18 percent into page 2. The actual visual position is somewhere on page 1. The bracketing math points at the wrong page entirely.
You can fix this by switching the CSS to transform: scale() and writing your own scrollbar logic. We didn't want to. We chose zoom because it gave us correct reflow for free. The price is that direct DOM measurements become unreliable.
The semantic vs. spatial gap
But the deeper problem was unrelated to zoom.
A PDF page is spatial. It is a physical artifact: 8.5 by 11 inches, every page roughly the same height. If page 1 of a PDF is 1100 pixels tall, every page 1 of every PDF is roughly 1100 pixels tall.
An HTML page produced by extraction is semantic. It is whatever the content of that page happens to be. A page with one heading and a paragraph is 200 pixels. A page with three tables and dense text is 2500 pixels. There is no relationship between the height of the HTML section and the height of the PDF page it came from.
The within-page ratio is fiction. "50 percent down the PDF page" does not correspond to "50 percent down the HTML page." A user scrolling through a paragraph in the PDF is not scrolling through the same paragraph at the same proportional rate in the HTML, because the HTML version of that paragraph is a wildly different size.
Page-snap with IntersectionObserver
The fix throws out within-page math entirely.
function setupScrollSync() {
const left = document.getElementById('visual-diff-pdf');
const right = document.getElementById('visual-diff-html');
for (const o of _observers) o.observer.disconnect(); _observers = [];
const leftPages = left.querySelectorAll('.page-wrapper'); const rightPages = right.querySelectorAll('.pdf-page-content'); if (!leftPages.length || !rightPages.length) return;
let suppress = 0;
const scrollOtherTo = (page, targetPane, targetSelector) => { if (suppress > 0) { suppress--; return; } const el = targetPane.querySelector(${targetSelector}[data-page="${page}"]); if (!el) return; suppress = 1; el.scrollIntoView({ block: 'start', behavior: 'auto' }); };
_observers.push(createPaneObserver(left, leftPages, page => scrollOtherTo(page, right, '.pdf-page-content'))); _observers.push(createPaneObserver(right, rightPages, page => scrollOtherTo(page, left, '.page-wrapper'))); }
function createPaneObserver(pane, pages, onActivePageChange) { const ratios = new Map(); let activePage = null;
const observer = new IntersectionObserver(entries => { for (const e of entries) ratios.set(e.target, e.intersectionRatio);
let topEl = null, topRatio = -1; for (const [el, r] of ratios) { if (r > topRatio) { topRatio = r; topEl = el; } } if (!topEl) return;
const page = topEl.getAttribute('data-page'); if (page === activePage) return; activePage = page; onActivePageChange(page); }, { root: pane, threshold: [0, 0.25, 0.5, 0.75, 1] });
pages.forEach(p => observer.observe(p)); return { observer }; }
Each pane gets an IntersectionObserver with root: pane. The observer's intersection logic is computed natively by the browser, in actual rendered coordinates. It does not care about CSS zoom. It does not care about offsetTop. It cares about whether the element is currently visible inside the scroll viewport, which is exactly the question we want answered.
When the most-visible page changes on one pane, the matching page on the other pane gets scrollIntoView({ block: 'start' }). There is no within-page math. Within-page scrolling is independent on both sides.
The suppress counter is a re-entrancy guard. When we programmatically scroll pane B, that scroll triggers pane B's observer, which would call scrollOtherTo on pane A and create a feedback loop. The counter blocks one cycle of bounce-back.
The result is correct at every zoom level. It works on every PDF. It is thirty lines.
The lesson here generalizes: when two layouts represent the same logical content but use different physical metrics, sync them via the structural anchors they actually share, not the pixel positions they happen to have. In our case the shared anchor was data-page. In another tool it might be a heading ID, a section landmark, or an element role.
3. The single source of truth pattern, retrofitted
The PDF processor has three editable surfaces that all show the extracted HTML:
#html-previewin the HTML tab.contenteditable="true".#visual-diff-htmlin the Visual Diff right pane. Alsocontenteditable="true".- The Monaco editor. Editable through Monaco's own model.
Reading the original code, I found three completely different lifecycles:
- The HTML tab was populated once on extraction. Edits made by the user persisted in that DOM, but never propagated anywhere.
- The Visual Diff HTML pane was re-rendered from state every time the user clicked the Visual Diff tab, which silently overwrote any edits made in the HTML tab.
- The Monaco editor synced one-way. Monaco changes propagated to the HTML preview. The HTML preview did not propagate back.
The fix is the controlled-input pattern, applied to a multi-surface DOM. A single coordinator. A single guard flag. Every change handler routes through the coordinator. Every change handler returns early if the coordinator is mid-write.
// htmlSync.js
import { state } from '../state.js';
import { initTableFeatures } from '../utils/tableLogic.js';
let _syncing = false; const _debouncers = new WeakMap();
const SURFACE_IDS = ['html-preview', 'visual-diff-html']; const DEBOUNCE_MS = 200;
export function isSyncing() { return _syncing; }
export function initHTMLSync() { SURFACE_IDS.forEach(wirePreview); }
function wirePreview(id) { const el = document.getElementById(id); if (!el) return; el.addEventListener('input', () => { if (_syncing) return; const prev = _debouncers.get(el); if (prev) clearTimeout(prev); const t = setTimeout(() => applyHtmlEverywhere(el.innerHTML, el), DEBOUNCE_MS); _debouncers.set(el, t); }); }
export function applyHtmlEverywhere(html, skipEl = null) { if (_syncing) return; _syncing = true; try { state.pdf1.extractedHTML = html; const clean = sanitize(html);
for (const id of SURFACE_IDS) { const el = document.getElementById(id); if (!el || el === skipEl) continue; if (el.innerHTML !== clean) { el.innerHTML = clean; initTableFeatures(el); } }
const editor = state.monacoEditor; if (editor && editor.getValue() !== html) { editor.getModel()?.setValue(html); } } finally { _syncing = false; } }
function sanitize(html) { return typeof window.DOMPurify !== 'undefined' ? window.DOMPurify.sanitize(html, { ADD_TAGS: ['img'], ALLOW_DATA_ATTR: true }) : html; }
Three things in this module deserve attention.
The shared flag, exported. Monaco's onDidChangeModelContent lives in another module. It needs to skip the synchronous re-fire that happens when applyHtmlEverywhere itself calls model.setValue(). Rather than pass the flag through props, we export isSyncing() and let any change handler in the codebase ask whether a write is in progress:
// monacoSetup.js
editor.onDidChangeModelContent(() => {
if (isSyncing()) return;
applyHtmlEverywhere(editor.getValue(), null);
});
The skip-source pattern. When the user is typing in #html-preview, we want their edits to flow to Monaco and to #visual-diff-html, but we do not want to overwrite #html-preview's own innerHTML on every keystroke. That would obliterate their caret. So applyHtmlEverywhere takes a skipEl argument. The source surface gets skipped on cross-write. Its own natural typing keeps it correct.
Asymmetric sanitization. DOMPurify sanitizes the HTML before mirroring it across surfaces. But the surface the user is typing into is left raw. Sanitizing on every keystroke would strip half-typed tags and re-create the DOM on every input event, which destroys the cursor. We accept some risk on the typed-in surface because the alternative is unusable.
The debounce on the input listeners (200ms) is there for performance. The user can type freely; the cross-write only fires when they pause briefly. Monaco's setValue is heavyweight enough that doing it on every keystroke would tear the experience.
After this module landed, the visual diff pane stopped clobbering edits on tab switch. The download button now picks up edits made anywhere. The Monaco editor and the rendered preview agree at all times.
4. The pattern under all three
The three bugs above look unrelated. They are the same bug.
In each case, two things were being treated as equivalent that were not. The form of equivalence was different each time:
- Coordinate-space mismatch: two values had the same shape (numbers) but different units (PDF points vs. viewport pixels).
- Layout-metric mismatch: two pixel measurements had the same unit but different meanings (spatial vs. semantic).
- State-surface mismatch: three DOM strings had the same role but no shared canonical record.
Quiet bugs are the expensive ones. They survive code review. They survive integration testing. They survive the deploy. They die only when a user notices that something feels off, and someone goes back to first principles and asks: what am I actually comparing?
The discipline is small. Before any arithmetic between two values, ask whether they share a unit, a meaning, and a source. Before any sync between two surfaces, ask whether they have a shared canonical record. Before any sync between two layouts, ask whether they share a metric or merely share an axis.
If any of those answers is no, the math will lie to you. It will not crash. It will just lie.
The PDF processor now handles all three. Coordinate spaces are explicit on every text record. Visual diff scroll uses page-snap and the IntersectionObserver. The editor surfaces share a single coordinator with a single guard.
The whole rebuild is open source. If you spot a fourth invariant we missed, the issue tracker is open.