Postmortem: The Shape Classifier Broke Every Time We Added a New Domain
TLDR
The original shape classifier matched on CSS class names and element IDs. It worked for one domain. Adding a second domain required branching. Adding a third made the branches inconsistent. The geometry-based replacement works identically across all domains with no domain-specific code.
The Assumption That Seemed Reasonable
The editor controlled every shape it produced. Every wire had class="wire". Every component had data-geo-class="component". Every connector had class="junction". The classifier was a direct lookup:
function classify(el) {
const cls = el.getAttribute('data-geo-class') || el.className;
if (cls.includes('wire')) return 'wire';
if (cls.includes('junction') || el.tagName === 'circle') return 'connector';
return 'component';
}
This was correct for the electrical domain where these conventions were defined. It was fast. It required no geometric computation. For a single domain with controlled authoring, it was the right implementation.
When It Failed
The construction domain was added. Construction elements used different class naming: walls were <path class="wall">, not <path class="wire">. Structural components were <rect class="struct-element">. The classifier returned 'component' for walls and 'component' for structural elements, treating them identically even though walls should behave like wires in connectivity analysis.
A branch was added:
if (domainMode === 'construction') {
if (cls.includes('wall')) return 'wire';
}
The software domain was added. Software architecture diagrams use different shapes: arrows for dependencies, boxes for modules, circles for external systems. The classifier needed more branches.
After three domains, the classifier had a top-level domain check that dispatched to one of three sub-classifiers. Each sub-classifier had different rules and different failure modes. A bug fixed in the electrical sub-classifier might not be fixed in the construction sub-classifier because they were separate code paths.
The deeper problem appeared with import. When a user imported an SVG drawn in another tool (Inkscape, Lucidchart, or a downloaded template), the elements had no data-geo-class attributes. The classifier returned 'component' for everything. Wires were not detected. Topology analysis produced no connectivity.
What Was Actually Wrong
The class-based classifier was correct only for diagrams produced by the specific tool version that assigned those classes. It was wrong for any other input. This was not discovered until the import feature exposed diagrams the tool did not produce.
The domain branching was papering over the same fundamental problem: the classifier trusted author-assigned metadata. When the metadata was wrong or absent, the classifier was wrong.
What Got Deleted
The domain-specific branches in the classifier: the domainMode === 'electrical' check, the domainMode === 'construction' check, and the domainMode === 'software' check. Also deleted: the hardcoded class-name lists per domain (WIRE_CLASSES, COMPONENT_CLASSES).
Combined, this was about 120 lines of branching logic.
What Replaced It
A 4-phase geometry pipeline: path parsing, coordinate canonicalization (applying transforms to get global coordinates), geometric metric computation (circularity, linearity), and classification based on those metrics.
The classification thresholds (linearity > 0.85 = wire, circularity > 0.65 = connector) are the same for all domains. A wall in a construction diagram and a wire in an electrical diagram both have high linearity. The classifier correctly identifies both as wire-type elements without knowing anything about walls or wires.
Domain-specific metadata (data-geo-class) is still respected as an override when present. The geometry classifier provides the fallback for any element without it. This makes the system work correctly for both controlled diagrams (fast path: trust the metadata) and imported diagrams (fallback path: classify by geometry).
The Lesson
A classifier that relies on author-assigned metadata is correct only when you control the author. For any system that processes external input, derive classification from the data's intrinsic properties. Geometry does not change based on who drew the shape.