Engineering Journal
Table Formatter
Table Formatter

A Visual DAG Runner for Multi-Source Data Joins in the Browser

2026-06-17

TLDR: Reaching for VLOOKUP usually means you need a join pipeline, not a formula. A directed acyclic graph where each node is a data operation (fetch, filter, transform, join) and each wire is a data dependency covers the common multi-source cases. The executor is a topological sort followed by per-node handler dispatch. Here is the full model.

Repo: TAFNE

The Problem

Two tables. A shared key. You want to join them. In SQL, that is one statement. In most browser-based tools, it is a VLOOKUP formula that breaks the moment a column moves.

VLOOKUPs are a symptom. They mean you are doing manually, with a fragile cell formula, what should be a deterministic pipeline that runs the same way every time regardless of column order.

A DAG runner solves this. Each node is a data transformation. Each wire is a dependency. The executor runs the graph in topological order and produces a table.


The Graph Model

Each node has:

Wires are directed edges from a source node's output slot to a target node's input slot. The graph is valid when it is acyclic: a cycle means infinite dependency.

Five node types cover the common cases:

Sheet: a loaded data source (CSV, JSON, pasted table). Output: array of row objects.

API: a GET request to a URL with a JSON path extractor. Output: array of row objects from the extracted array.

Filter: keeps rows matching a condition (ten operators: equals, contains, greater than, etc.). Input: one array. Output: filtered array.

Formula: adds a computed column using a column reference syntax ({HourlyRate} * {HoursWorked}). No eval; the expression is parsed into an AST by a structured picker. Input: one array. Output: array with additional column.

Join: merges two inputs. Three modes: Stack (vertical concatenation), Lateral (side-by-side by row index), Inner (matching on key columns).


The Executor

Execution is topological sort followed by per-node handler dispatch:

function execute(nodes, edges) {
  const order = topologicalSort(nodes, edges);
  const results = new Map(); // nodeId → row[]

for (const nodeId of order) { const node = nodes.get(nodeId); const inputs = getInputIds(nodeId, edges).map(id => results.get(id) ?? []); results.set(nodeId, runNode(node, inputs)); }

return results; }

Each runNode call receives the node's configuration and the already-computed results of its upstream dependencies. It returns a row array that downstream nodes can read.

The result map is keyed by node ID. A Join node reads its two inputs by the IDs of whatever nodes are wired into its left and right input slots. This keeps execution stateless between nodes: each node sees only its declared inputs, not the full graph state.


The Inner Join

function innerJoin(leftRows, rightRows, leftKey, rightKey) {
  const result = [];
  for (const leftRow of leftRows) {
    const match = rightRows.find(r => r[rightKey] === leftRow[leftKey]);
    if (match) result.push({ ...leftRow, ...match });
  }
  return result;
}

Standard relational join semantics. Right-side keys that conflict with left-side keys are overwritten (last writer wins). For the Stack mode, leftRows.concat(rightRows). For Lateral, zip by index.


An Example: API Fetch, Filter, Rename

[API: pokeapi.co/api/v2/pokemon?limit=50]
  → [Filter: name contains 'b']
  → [Formula: Pokemon_Name = {name}]
  → Build Table
  1. API node: fetch 50 records from PokéAPI, extract the results array
  2. Filter node: keep only entries where name contains "b"
  3. Formula node: copy name into a labeled column Pokemon_Name
  4. Build Table: output lands as a new sheet
Setup time: about 90 seconds. The pipeline runs the same way every time.

When This Is the Right Tool

The DAG runner is not a substitute for SQL when SQL is available. It is for the cases where SQL is not the right tool:

The data does not live in a database. It is a CSV someone sent, plus a live API. Loading both into a database to run a query is more work than the DAG runner requires.

The pipeline needs to be shared with someone who does not write SQL. A visual graph is readable by anyone who understands dependency order.

Everything must run in the browser with no server. That constraint rules out most SQL-based tools. The DAG runner has no server, no account, no data upload.


Where It Fits in a Multi-Mode Table Editor

The node editor is the connection layer between single-dataset operations and multi-dataset output. Tab 1 cleans a single CSV. The node editor joins it with data from an API. Tab 2 displays the joined result for final formatting and export. Any sheet produced by any mode is available as a source in any other.

Source: github.com/carnworkstudios/TAFNE

Read this post in the full Engineering Journal →