Design Decisions & Trade-offs

Replicated vs. Partitioned Graph Distribution

The system uses replicated distribution (full graph to each worker) rather than partitioned distribution (each worker gets a subset of nodes/edges).

Why Replication?

Advantages:

Correctness — algorithms like BFS, Dijkstra, and SCC require the full graph. Partitioning creates cross-partition edges that require multi-round communication to resolve.
Simplicity — result merging is trivial: prefer the worker result.
Determinism — both processes produce identical results; the merge is predictable.

Disadvantages:

Memory — each process stores a full copy of the graph
Transfer — larger initial serialization payload

For the target graph sizes (up to ~100K edges), memory is not a constraint. For billion-edge graphs, a proper partitioning strategy (METIS, graph streaming) would be required.

WASM in a Web Worker

Loading WASM in the main thread would block the UI during initialization (~20ms for the 128KB module). Running inside a Web Worker means:

WASM loads while the user is still uploading or selecting their file
Parsing happens off-thread — UI remains interactive
The main thread only receives the finished ParsedGraph object

Trade-off: WASM in workers requires Vite's worker build to use ES module format and vite-plugin-wasm applied separately to the worker plugins list — slightly more complex build configuration.

JSON as the WASM API Boundary

WASM functions accept and return JSON strings rather than typed arrays or shared memory. This trades raw performance for:

Simplicity — no manual memory management, no pointer arithmetic in JS
Compatibility — works with any JSON-serializable data structure
Debuggability — output is inspectable in browser DevTools

For the workloads in this system (3,000 nodes, ~10,000 edges), JSON serialization overhead is <2ms — acceptable.

For larger graphs, the API could be migrated to use SharedArrayBuffer or typed array views for zero-copy data transfer.

Iterative DFS Instead of Recursive

All DFS-based algorithms (DFS traversal, Kosaraju's first pass, Kosaraju's second pass) use explicit stacks rather than recursive calls.

Why: Recursive DFS causes stack overflows for graphs with paths longer than ~8,000 nodes on a default Rust stack (~8MB). The iterative (node, post: bool) technique exactly mimics recursive DFS post-order without the stack depth constraint.

This is critical for Kosaraju's SCC — both DFS passes must be iterative for the algorithm to be safe on production graphs.

Compact Index Mapping for Algorithms

Node IDs can be large sparse integers (0, 100, 999,999). Algorithms that need contiguous array indexing (Dijkstra's dist vector, Bellman-Ford, Union-Find for Kruskal) maintain a bidirectional mapping:

compact_to_id: Vec<usize>            — index 0..n → original ID
id_to_compact: HashMap<usize, usize> — original ID → index 0..n

Why: A Vec<f64> indexed 0..n uses n × 8 bytes. A HashMap indexed by sparse IDs would use far more and have worse cache behavior.

Rust Across the Stack

Using Rust for both the backend and WASM layer provides:

Type safety — graph algorithms and serialization are type-checked end-to-end
Performance — zero-cost abstractions, no GC, predictable latency
Code reuse — the same algorithm logic exists in both contexts, verified by the same compiler
Memory safety — no buffer overflows, no use-after-free — critical for a networked service accepting user-uploaded files

Trade-off: Higher complexity than a JS-only or Python stack. Requires familiarity with Rust's ownership system and the wasm-bindgen toolchain.

bincode for MPI Serialization

The system uses bincode (a compact binary serialization format) rather than JSON for MPI messages.

Why bincode over JSON:

~3–5× smaller serialized size for numeric data
~10× faster serialization/deserialization
Type safety from the Serde derive macros

The slightly reduced debuggability (binary format) is acceptable for internal MPI communication.

Tokio + MPI Threading

MPI initialization uses Threading::Multiple to allow the Rocket async runtime (Tokio) and MPI to coexist on the same process. MPI calls are dispatched via tokio::task::spawn_blocking to avoid blocking the async executor.

Alternative considered: Running MPI and Rocket in separate processes communicating via Unix socket. Rejected for complexity — spawn_blocking is simpler and sufficient for this workload.

Replicated vs. Partitioned Graph Distribution​

Why Replication?​

WASM in a Web Worker​

JSON as the WASM API Boundary​

Iterative DFS Instead of Recursive​

Compact Index Mapping for Algorithms​

Rust Across the Stack​

bincode for MPI Serialization​

Tokio + MPI Threading​