Backend: Distributed Algorithm Engine
Graph Representation
The core graph data structure (src/graph.rs) is a directed weighted graph backed by a HashMap<usize, Vec<(usize, f64)>> adjacency list. Nodes are addressed by arbitrary integer IDs.
Compact Index Mapping
Because node IDs can be large and sparse (e.g., 0, 100, 999_999), algorithms that need contiguous array indexing maintain a bidirectional map:
compact_to_id: Vec<usize> — compact_index → original_id
id_to_compact: HashMap<usize, usize> — original_id → compact_index
This allows distance vectors to be stored as Vec<f64> indexed 0..n without wasting memory on sparse ID spaces.
Optional Node Features
The Graph struct supports:
VectorPerNode— dense feature vector per node (e.g., embeddings)SparseFeatures— sparse feature maps for large feature dimensionsego_features— structural neighborhood features
MPI Distribution Model
The MPI layer (src/mpi_processor.rs) uses the replicated computation model: the full graph is sent to each worker process. Both master and worker execute the same algorithm on identical data. The worker's result is preferred on merge, demonstrating that remote computation actually occurred.
Why Replication?
Many graph algorithms (BFS, Dijkstra, SCC) require the full graph structure to produce correct results. Partitioning would require multi-round communication to resolve cross-partition edges. Replication keeps things simple and correct for graphs up to ~100K edges.
Communication Protocol
1. Master serializes graph + task into a GraphTask struct using bincode
2. Master sends serialized bytes to each worker via MPI_Send
3. Workers receive bytes, deserialize, run the algorithm,
serialize TaskResult, and send back
4. Master collects all results and merges them
Panic Safety
Worker processes protect against crashes with std::panic::catch_unwind: a panicking worker still sends an empty result to unblock the master's receive_vec call. Without this, a worker crash would hang the entire system.
MPI + Async Coexistence
MPI initialization uses Threading::Multiple to allow the Rocket async runtime (Tokio) and MPI to coexist. MPI calls are dispatched via tokio::task::spawn_blocking to avoid blocking the async executor.
HTTP API
The Rocket server (src/bin/server.rs) exposes:
| Method | Path | Description |
|---|---|---|
| GET | /mpi_status | Returns MPI process count, mode, connectivity |
| POST | /process_file | Accepts graph file + algorithm config, runs distributed algorithm |
| POST | /graph_metrics | Computes structural metrics (no MPI, pure analysis) |
| GET | /health | Health check |
| OPTIONS | /* | CORS preflight |
The /process_file endpoint accepts a multipart/form-data body with the graph file and a JSON request field. The server saves the file to /tmp, runs the algorithm (blocking, off-thread), deletes the file, and returns results as JSON.
The /graph_metrics endpoint runs independently of MPI, computing:
- Node count, edge count
- Graph density (
E / N(N−1)) - Connected component count
- DAG status (via topological sort)
- Average out-degree
- Top 5 hub nodes by out-degree
Implemented Algorithms
Nine algorithms are fully implemented in src/graph.rs:
| Algorithm | Category | Implementation |
|---|---|---|
| BFS | Traversal | Queue-based level-order |
| DFS | Traversal | Iterative stack (stack-safe) |
| Dijkstra | Shortest Path | Binary heap (max-heap negated) |
| A* | Shortest Path | Open set heap, uniform heuristic h=1 |
| Bellman-Ford | Shortest Path | V−1 relaxation passes + N+1 cycle detection |
| Kruskal MST | Graph | Union-Find with path compression + rank |
| PageRank | Graph | Iterative, d=0.85, 30 iterations, dangling nodes handled |
| SCC | Graph | Kosaraju's — fully iterative two-pass DFS |
| Topological Sort | Graph | Kahn's algorithm (BFS-based) |
See the Algorithms section for full implementation details.