Design Decisions & Trade-offs
Replicated vs. Partitioned Graph Distribution
The system uses replicated distribution (full graph to each worker) rather than partitioned distribution (each worker gets a subset of nodes/edges).
Why Replication?
Advantages:
- Correctness — algorithms like BFS, Dijkstra, and SCC require the full graph. Partitioning creates cross-partition edges that require multi-round communication to resolve.
- Simplicity — result merging is trivial: prefer the worker result.
- Determinism — both processes produce identical results; the merge is predictable.
Disadvantages:
- Memory — each process stores a full copy of the graph
- Transfer — larger initial serialization payload
For the target graph sizes (up to ~100K edges), memory is not a constraint. For billion-edge graphs, a proper partitioning strategy (METIS, graph streaming) would be required.
WASM in a Web Worker
Loading WASM in the main thread would block the UI during initialization (~20ms for the 128KB module). Running inside a Web Worker means:
- WASM loads while the user is still uploading or selecting their file
- Parsing happens off-thread — UI remains interactive
- The main thread only receives the finished
ParsedGraphobject
Trade-off: WASM in workers requires Vite's worker build to use ES module format and vite-plugin-wasm applied separately to the worker plugins list — slightly more complex build configuration.
JSON as the WASM API Boundary
WASM functions accept and return JSON strings rather than typed arrays or shared memory. This trades raw performance for:
- Simplicity — no manual memory management, no pointer arithmetic in JS
- Compatibility — works with any JSON-serializable data structure
- Debuggability — output is inspectable in browser DevTools
For the workloads in this system (3,000 nodes, ~10,000 edges), JSON serialization overhead is <2ms — acceptable.
For larger graphs, the API could be migrated to use SharedArrayBuffer or typed array views for zero-copy data transfer.
Iterative DFS Instead of Recursive
All DFS-based algorithms (DFS traversal, Kosaraju's first pass, Kosaraju's second pass) use explicit stacks rather than recursive calls.
Why: Recursive DFS causes stack overflows for graphs with paths longer than ~8,000 nodes on a default Rust stack (~8MB). The iterative (node, post: bool) technique exactly mimics recursive DFS post-order without the stack depth constraint.
This is critical for Kosaraju's SCC — both DFS passes must be iterative for the algorithm to be safe on production graphs.
Compact Index Mapping for Algorithms
Node IDs can be large sparse integers (0, 100, 999,999). Algorithms that need contiguous array indexing (Dijkstra's dist vector, Bellman-Ford, Union-Find for Kruskal) maintain a bidirectional mapping:
compact_to_id: Vec<usize> — index 0..n → original ID
id_to_compact: HashMap<usize, usize> — original ID → index 0..n
Why: A Vec<f64> indexed 0..n uses n × 8 bytes. A HashMap indexed by sparse IDs would use far more and have worse cache behavior.
Rust Across the Stack
Using Rust for both the backend and WASM layer provides:
- Type safety — graph algorithms and serialization are type-checked end-to-end
- Performance — zero-cost abstractions, no GC, predictable latency
- Code reuse — the same algorithm logic exists in both contexts, verified by the same compiler
- Memory safety — no buffer overflows, no use-after-free — critical for a networked service accepting user-uploaded files
Trade-off: Higher complexity than a JS-only or Python stack. Requires familiarity with Rust's ownership system and the wasm-bindgen toolchain.
bincode for MPI Serialization
The system uses bincode (a compact binary serialization format) rather than JSON for MPI messages.
Why bincode over JSON:
- ~3–5× smaller serialized size for numeric data
- ~10× faster serialization/deserialization
- Type safety from the Serde derive macros
The slightly reduced debuggability (binary format) is acceptable for internal MPI communication.
Tokio + MPI Threading
MPI initialization uses Threading::Multiple to allow the Rocket async runtime (Tokio) and MPI to coexist on the same process. MPI calls are dispatched via tokio::task::spawn_blocking to avoid blocking the async executor.
Alternative considered: Running MPI and Rocket in separate processes communicating via Unix socket. Rejected for complexity — spawn_blocking is simpler and sufficient for this workload.