Skip to content
Vector Stream Systems logoVector Stream Systems

Five Phases · Active Development

VectorMBE Implementation Roadmap

Five phases covering reasoning-in-the-loop ingestion, hybrid graph + vector retrieval, multi-LLM routing, SysML v2 integration, and a graph evaluation framework. Each phase is a discrete, shippable capability increment.

At a Glance

PhaseFocusPriority
1Reasoning Gate + Entity ValidatorHigh
2Dual Retrieval + DSA-BFSHigh
3Multi-LLM Router + Completeness CheckerMedium
4SysML v2 Integration + ITIMedium
5Evaluation Framework + Digital ThreadOngoing
1
Phase 1High Priority

Reasoning Gate + Entity Validator

A significant share of LLM-generated ontology entities contain disjoint violations, wrong subclass hierarchies, or missing restrictions — none of which are catchable without an OWL reasoner. The ReasoningGate sits in the ingestion pipeline and partitions every entity into consistent, inconsistent, or unclassified before it reaches the production graph. A three-tier confidence scorer (string + fuzzy + semantic) validates extracted entities against source documents before promotion.

  • Implement ReasoningGate::validate() using Konclude / Pellet / HermiT — route consistent entities to production, inconsistent to vectormbe:quarantine/{id}
  • Add EntityValidator to ingestion pipeline with weighted confidence: 0.40 · StringMatch + 0.30 · FuzzyMatch + 0.30 · SemanticMatch
  • Thresholds: auto-accept > 0.85 · flag for review 0.65–0.85 · reject < 0.65
  • Update demo-data/vectormbe-demo-dataset.json generation to pre-validate all 408 nodes
2
Phase 2High Priority

Dual Retrieval + DSA-BFS

Vector similarity alone misses structured traceability relationships — "which requirements flow down to this component" is a graph traversal, not a nearest-neighbor lookup. The dual-retrieval architecture combines vector ANN search with Cypher graph queries, scoring candidates as β · GraphRelevance + (1−β) · VectorSimilarity where β ∈ [0.5, 0.8] for engineering domains. Naive k-hop neighbor retrieval is replaced by DSA-BFS, which de-duplicates redundant paths and prioritizes diverse, query-relevant nodes.

  • Build graph retriever module — Cypher queries against Neo4j backend for structured traceability
  • Build retrieval router: "find similar components" → vector (ANN) · "what requirements trace here?" → graph (Cypher) · "explain failure chain" → hybrid
  • Add β slider in UI to tune graph vs. vector weight per query session
  • Replace k-hop retrieval with DSA-BFS subgraph extraction (two-step mean pooling of node + edge embeddings)
3
Phase 3Medium Priority

Multi-LLM Router + Completeness Checker

No single LLM is best for every ingestion task. An LLMRouter classifies the task type and dispatches to the best-fit model with automatic failover. In parallel, an R-GCN CompletenessChecker predicts missing links in the MBSE graph, surfacing incomplete traces before they become integration problems.

TaskPreferred ModelFallback
Formal OWL definitionsGPT-4o / o3-miniClaude Sonnet
Datasheet / tabular extractionGemini 1.5 ProGPT-4o
Domain-specific extraction (RAG)Llama 3.3 70BDeepSeek V3
Reasoning / chain-of-thoughtDeepSeek R1o3-mini
Video transcript → triplesGemini 1.5 FlashWhisper + GPT-4o
Requirement verification (ITI)Llama 3-8B + ITIClaude Haiku
  • Build task classifier (small fine-tuned model or heuristic) to route ingestion tasks
  • Build pluggable model registry with automatic failover
  • Add cost / quality tracking — log token cost vs. reasoning gate pass rate per model
  • Train R-GCN completeness model on existing demo dataset; surface suggestions in UI via AnchorRegistry
4
Phase 4Medium Priority

SysML v2 Integration + ITI

SysML v2's REST API exposes models as navigable graphs, eliminating fragile XMI batch imports. This phase connects VectorMBE directly to SysML v2 repositories with bidirectional sync, and adds an Inference-Time Intervention (ITI) path for requirement verification — steering the LLM over a BFS-extracted subgraph rather than relying on prompt engineering alone.

  • Implement SysMLv2Adapter — API-first connection to SysML v2 repositories (PTC Integrity / Cameo / open-source)
  • Build bidirectional sync protocol: SysML v2 ↔ VectorMBE ontology via MCP ContextUpdate events
  • Implement NL → SysML BDD pipeline: text → KG → BDD → simulation code
  • Run ITI experiment for anchor validation — train intervention directions on anchor violation patterns, combine with expression evaluator for guaranteed correctness
5
Phase 5Ongoing

Evaluation Framework + Digital Thread

Standard accuracy / precision / recall don't capture whether a generated graph is structurally correct. This phase adds graph-level metrics — edit distance, spectral distance, anchor violation rate — and builds the cross-phase digital thread: versioned subgraphs with KG-based diffing traced from requirements through design to operations.

  • Expand demo dataset to 1000+ nodes with ground truth traceability links
  • Implement graph metrics suite: edit distance d_L, spectral distance d_spec, anchor violation rate, completeness score
  • Run A/B testing: vector-only (β=0) vs. hybrid (β=0.5) vs. graph-heavy (β=0.8)
  • Add versioned subgraphs with KG-based diffing for digital thread
  • Build cross-phase traceability queries: requirements → design → manufacturing → operations
  • PLM integration via KG+GNN cross-domain knowledge recommendations

Questions about the roadmap or want to shape what gets built next? We're actively developing VectorMBE and welcome input from engineering teams in the field.

View DocsGet in Touch