Building a Shape Classifier Around CSS Classes Is Coupling Your Analysis to Your Authoring Conventions
TLDR
Class-name-based shape classification is not wrong; it is fragile. It works exactly until a diagram arrives that was not produced by your tool. Geometric classification using circularity and linearity works for any SVG from any source. The geometric approach is not harder and it is always more correct.
What the Industry Does
SVG shape classification in diagram tools almost universally relies on CSS classes, element IDs, or custom data attributes. Tools like mxGraph (draw.io), JointJS, and most custom diagram editors check class names to determine how to render and analyze shapes. The pattern is natural: you wrote the tool, you assigned the classes, you know what they mean.
This is not wrong. It is simply incomplete. It works for diagrams the tool produced and fails for any other input.
Why It Fails for This Problem Class
The failure mode is not a crash. It is silent wrong results. The classifier returns 'component' for a shape that is actually a wire. The topology analysis runs, produces results, and reports incorrect connectivity. The user does not know the results are wrong unless they manually verify.
For a document editor this is acceptable: a misclassified paragraph style is a cosmetic problem. For a circuit editor, incorrect connectivity means the BOM is wrong and the exported netlist is wrong. For a software architecture tool, missed dependencies means the dependency graph is incorrect. Silent wrong results in an analysis tool are worse than crashes.
The Better Approach
Compute two metrics from the path geometry:
- Linearity ratio (endpoint span / path length): wires are straight or nearly straight, scoring above 0.85.
- Isoperimetric quotient (4πA/P²): connector dots are nearly circular, scoring above 0.65. Components are in between.
The two approaches compose well. Use class-name lookup as the fast path when attributes are present and trusted. Fall back to geometric classification when attributes are absent or unrecognized. This handles both the controlled case (your own diagrams, fast) and the uncontrolled case (imports, correct).
What You Give Up
Geometric classification requires tuning thresholds for your domain's typical shapes. A linearity threshold of 0.85 is correct for straight wires. For a tool that supports curved wires, the threshold needs to be lower. Threshold tuning requires test diagrams from representative sources. Class-name classification has no threshold tuning.
Geometric classification is also slower for large diagrams. A diagram with 5000 path elements takes tens of milliseconds to classify geometrically, versus microseconds for class-name lookup. For real-time classification during drawing, the class-name path is worth keeping. For batch analysis on load or export, the geometric path is acceptable.
When the Common Pattern Is Right
Use class-name classification when: you control all diagram inputs (no external SVG import), the classification needs to run in real-time (while the user is drawing), and the class naming convention is stable. For closed-source tools with fixed input formats, class-name classification is correct and faster.
For any tool that accepts SVG input from external sources, processes diagrams across multiple domains, or needs to be correct even when diagrams are malformed or incorrectly classed: classify by geometry first, trust the metadata second.