Five Phases · Active Development
VectorMBE Implementation Roadmap
Five phases covering reasoning-in-the-loop ingestion, hybrid graph + vector retrieval, multi-LLM routing, SysML v2 integration, and a graph evaluation framework. Each phase is a discrete, shippable capability increment.
At a Glance
| Phase | Focus | Priority |
|---|---|---|
| 1 | Reasoning Gate + Entity Validator | High |
| 2 | Dual Retrieval + DSA-BFS | High |
| 3 | Multi-LLM Router + Completeness Checker | Medium |
| 4 | SysML v2 Integration + ITI | Medium |
| 5 | Evaluation Framework + Digital Thread | Ongoing |
Reasoning Gate + Entity Validator
A significant share of LLM-generated ontology entities contain disjoint violations, wrong subclass hierarchies, or missing restrictions — none of which are catchable without an OWL reasoner. The ReasoningGate sits in the ingestion pipeline and partitions every entity into consistent, inconsistent, or unclassified before it reaches the production graph. A three-tier confidence scorer (string + fuzzy + semantic) validates extracted entities against source documents before promotion.
- Implement
ReasoningGate::validate()using Konclude / Pellet / HermiT — route consistent entities to production, inconsistent tovectormbe:quarantine/{id} - Add
EntityValidatorto ingestion pipeline with weighted confidence: 0.40 · StringMatch + 0.30 · FuzzyMatch + 0.30 · SemanticMatch - Thresholds: auto-accept > 0.85 · flag for review 0.65–0.85 · reject < 0.65
- Update
demo-data/vectormbe-demo-dataset.jsongeneration to pre-validate all 408 nodes
Dual Retrieval + DSA-BFS
Vector similarity alone misses structured traceability relationships — "which requirements flow down to this component" is a graph traversal, not a nearest-neighbor lookup. The dual-retrieval architecture combines vector ANN search with Cypher graph queries, scoring candidates as β · GraphRelevance + (1−β) · VectorSimilarity where β ∈ [0.5, 0.8] for engineering domains. Naive k-hop neighbor retrieval is replaced by DSA-BFS, which de-duplicates redundant paths and prioritizes diverse, query-relevant nodes.
- Build graph retriever module — Cypher queries against Neo4j backend for structured traceability
- Build retrieval router: "find similar components" → vector (ANN) · "what requirements trace here?" → graph (Cypher) · "explain failure chain" → hybrid
- Add β slider in UI to tune graph vs. vector weight per query session
- Replace k-hop retrieval with DSA-BFS subgraph extraction (two-step mean pooling of node + edge embeddings)
Multi-LLM Router + Completeness Checker
No single LLM is best for every ingestion task. An LLMRouter classifies the task type and dispatches to the best-fit model with automatic failover. In parallel, an R-GCN CompletenessChecker predicts missing links in the MBSE graph, surfacing incomplete traces before they become integration problems.
| Task | Preferred Model | Fallback |
|---|---|---|
| Formal OWL definitions | GPT-4o / o3-mini | Claude Sonnet |
| Datasheet / tabular extraction | Gemini 1.5 Pro | GPT-4o |
| Domain-specific extraction (RAG) | Llama 3.3 70B | DeepSeek V3 |
| Reasoning / chain-of-thought | DeepSeek R1 | o3-mini |
| Video transcript → triples | Gemini 1.5 Flash | Whisper + GPT-4o |
| Requirement verification (ITI) | Llama 3-8B + ITI | Claude Haiku |
- Build task classifier (small fine-tuned model or heuristic) to route ingestion tasks
- Build pluggable model registry with automatic failover
- Add cost / quality tracking — log token cost vs. reasoning gate pass rate per model
- Train R-GCN completeness model on existing demo dataset; surface suggestions in UI via
AnchorRegistry
SysML v2 Integration + ITI
SysML v2's REST API exposes models as navigable graphs, eliminating fragile XMI batch imports. This phase connects VectorMBE directly to SysML v2 repositories with bidirectional sync, and adds an Inference-Time Intervention (ITI) path for requirement verification — steering the LLM over a BFS-extracted subgraph rather than relying on prompt engineering alone.
- Implement
SysMLv2Adapter— API-first connection to SysML v2 repositories (PTC Integrity / Cameo / open-source) - Build bidirectional sync protocol: SysML v2 ↔ VectorMBE ontology via MCP
ContextUpdateevents - Implement NL → SysML BDD pipeline: text → KG → BDD → simulation code
- Run ITI experiment for anchor validation — train intervention directions on anchor violation patterns, combine with expression evaluator for guaranteed correctness
Evaluation Framework + Digital Thread
Standard accuracy / precision / recall don't capture whether a generated graph is structurally correct. This phase adds graph-level metrics — edit distance, spectral distance, anchor violation rate — and builds the cross-phase digital thread: versioned subgraphs with KG-based diffing traced from requirements through design to operations.
- Expand demo dataset to 1000+ nodes with ground truth traceability links
- Implement graph metrics suite: edit distance
d_L, spectral distanced_spec, anchor violation rate, completeness score - Run A/B testing: vector-only (β=0) vs. hybrid (β=0.5) vs. graph-heavy (β=0.8)
- Add versioned subgraphs with KG-based diffing for digital thread
- Build cross-phase traceability queries: requirements → design → manufacturing → operations
- PLM integration via KG+GNN cross-domain knowledge recommendations
Questions about the roadmap or want to shape what gets built next? We're actively developing VectorMBE and welcome input from engineering teams in the field.
