Optionally run prompts through multiple models. Find where they disagree.
Three evaluation strategies for multi-perspective analysis — from fast side-by-side comparison to adversarial contradiction detection. Available across all three strategies: Basic, Multi-perspective, and Adversarial. Bring your own key. Accessible via the FusionLayer MCP server in Claude Code, Cursor, and any MCP-compatible host.
The engine runs in the background. Your users never see the seams.
When a user sends a message, FusionLayer classifies the task type (code, writing, analysis, etc.) and uses that signal — along with aggregated routing data from across the user base — to decide which model is most likely to produce the best result for that task. Classification runs server-side. The classifier output (a topic label) — not the prompt — drives routing decisions and bandit selection.
The response your user sees is the best one FusionLayer found. If the setting is off, the user's preferred model handles everything. One toggle, one tradeoff: routing quality vs. latency budget.
- Task classification runs server-side — prompt text is not stored in routing signals, only the resulting topic label
- Routing is informed by aggregated signal across the user base
- Fastest acceptable model wins in a tie
- Implicit feedback (retry, edit, continue) improves routing over time
- Works with any vendor connector or bring-your-own-key setup
What enters the aggregator
The shared aggregator receives only a task-type label, token counts, latency, and an implicit quality signal — no prompt text, no conversation content. In managed Smart mode, classification runs server-side and prompt text enters the routing pipeline transiently to produce the label; it is not stored in routing signals. This is enforced by a server-side field whitelist — not a policy, an enforcement.
Three strategies. Pick the right one for the task.
The eval engine exposes three explicit strategies via MCP tools. Each is a distinct tradeoff between speed, cost, and depth of analysis.
Side-by-side comparison
Send the same prompt to multiple vendors in parallel. Responses come back independently — no aggregation, no judge. Compare directly.
Best for: quick sanity checks, spotting outlier responses, cost benchmarking across vendors.
vendors: [anthropic, openai, google]
→ three responses, side-by-side
Parallel aggregation
Run multiple vendors and aggregate with one of three strategies: consensus (median response), union (all responses combined), or best (lowest latency winner).
Best for: high-stakes answers where you want a synthesized view, not just the fastest reply.
aggregation: consensus
→ single aggregated response + confidence
Contradiction detection
Run two or more vendors, then use an LLM judge to find specific factual contradictions between their responses. Each contradiction is returned with severity (low / medium / high) and an explanation.
Best for: factual research, medical / legal questions, any domain where model hallucination is expensive.
vendors: [anthropic, openai]
→ contradictions[] + consensus
All three strategies are available as MCP tools — install the FusionLayer MCP server and call them from Claude Code, Cursor, or any MCP-compatible host. Bring your own API keys.
Install via MCP →