A Visual DAG Runner for Multi-Source Data Joins in the Browser
TLDR: Reaching for VLOOKUP usually means you need a join pipeline, not a formula. A directed acyclic graph where each node is a data operation (fetch, filter, transform, join) and each wire is a data dependency covers the common multi-source cases. The executor is a topological sort followed by per-node handler dispatch. Here is the full model.
Repo: TAFNE
The Problem
Two tables. A shared key. You want to join them. In SQL, that is one statement. In most browser-based tools, it is a VLOOKUP formula that breaks the moment a column moves.
VLOOKUPs are a symptom. They mean you are doing manually, with a fragile cell formula, what should be a deterministic pipeline that runs the same way every time regardless of column order.
A DAG runner solves this. Each node is a data transformation. Each wire is a dependency. The executor runs the graph in topological order and produces a table.
The Graph Model
Each node has:
- A type (source, transform, or output)
- A set of input slot IDs (the nodes wired into it)
- A configuration object (URL, column name, join key, filter expression)
- A result, populated during execution
Five node types cover the common cases:
Sheet: a loaded data source (CSV, JSON, pasted table). Output: array of row objects.
API: a GET request to a URL with a JSON path extractor. Output: array of row objects from the extracted array.
Filter: keeps rows matching a condition (ten operators: equals, contains, greater than, etc.). Input: one array. Output: filtered array.
Formula: adds a computed column using a column reference syntax ({HourlyRate} * {HoursWorked}). No eval; the expression is parsed into an AST by a structured picker. Input: one array. Output: array with additional column.
Join: merges two inputs. Three modes: Stack (vertical concatenation), Lateral (side-by-side by row index), Inner (matching on key columns).
The Executor
Execution is topological sort followed by per-node handler dispatch:
function execute(nodes, edges) {
const order = topologicalSort(nodes, edges);
const results = new Map(); // nodeId → row[]
for (const nodeId of order) { const node = nodes.get(nodeId); const inputs = getInputIds(nodeId, edges).map(id => results.get(id) ?? []); results.set(nodeId, runNode(node, inputs)); }
return results; }
Each runNode call receives the node's configuration and the already-computed results of its upstream dependencies. It returns a row array that downstream nodes can read.
The result map is keyed by node ID. A Join node reads its two inputs by the IDs of whatever nodes are wired into its left and right input slots. This keeps execution stateless between nodes: each node sees only its declared inputs, not the full graph state.
The Inner Join
function innerJoin(leftRows, rightRows, leftKey, rightKey) {
const result = [];
for (const leftRow of leftRows) {
const match = rightRows.find(r => r[rightKey] === leftRow[leftKey]);
if (match) result.push({ ...leftRow, ...match });
}
return result;
}
Standard relational join semantics. Right-side keys that conflict with left-side keys are overwritten (last writer wins). For the Stack mode, leftRows.concat(rightRows). For Lateral, zip by index.
An Example: API Fetch, Filter, Rename
[API: pokeapi.co/api/v2/pokemon?limit=50]
→ [Filter: name contains 'b']
→ [Formula: Pokemon_Name = {name}]
→ Build Table
- API node: fetch 50 records from PokéAPI, extract the
resultsarray - Filter node: keep only entries where
namecontains"b" - Formula node: copy
nameinto a labeled columnPokemon_Name - Build Table: output lands as a new sheet
When This Is the Right Tool
The DAG runner is not a substitute for SQL when SQL is available. It is for the cases where SQL is not the right tool:
The data does not live in a database. It is a CSV someone sent, plus a live API. Loading both into a database to run a query is more work than the DAG runner requires.
The pipeline needs to be shared with someone who does not write SQL. A visual graph is readable by anyone who understands dependency order.
Everything must run in the browser with no server. That constraint rules out most SQL-based tools. The DAG runner has no server, no account, no data upload.
Where It Fits in a Multi-Mode Table Editor
The node editor is the connection layer between single-dataset operations and multi-dataset output. Tab 1 cleans a single CSV. The node editor joins it with data from an API. Tab 2 displays the joined result for final formatting and export. Any sheet produced by any mode is available as a source in any other.
Source: github.com/carnworkstudios/TAFNE