Problem
Frontier models often moralize asymmetrically
Christian moral claims are frequently reframed as suspect, Western inheritance is treated with civilizational asymmetry, and relativist framing is presented as neutrality.
Why this exists
Public LLM scorecards are the proof layer. The real objective is helping organizations evaluate AI assistants, prompts, corpora, and policies before release and after change.
MDE exists because model evaluation always carries a moral frame. The difference here is that the frame is made explicit, tied to named failure modes, and turned into evidence that can survive scrutiny.
Problem
Christian moral claims are frequently reframed as suspect, Western inheritance is treated with civilizational asymmetry, and relativist framing is presented as neutrality.
Response
MDE turns concerns like neutrality smuggling, moral asymmetry, authority subversion, and teleology collapse into fixed tests instead of vague complaints.
Result
Run pages expose quotes, detector outcomes, and textual anchors in public, then the same method can be used to score a customer system before launch and after changes.
The benchmark only works if the suite, detector names, and evidence trail stay visible enough for a critic to inspect.
The public site proves the benchmark works. The business is evaluating customer systems before launch, then rerunning the benchmark whenever the system changes.
Public benchmark
Public reports show that worldview bias is measurable and make the method legible to prospective customers, partners, and critics.
Private evaluation
The paid layer scores a real deployed system, not just a public LLM: base model, system prompt, retrieval stack, refusal policy, and business workflow.
Ongoing monitoring
Once a system is live, the same benchmark is rerun on each release to catch drift before it reaches users or leadership.
Instead of a black-box score, the site shows the exact prompts, the model outputs, and the reasons the rubric assigned.
Named failure modes
The engine names what it is checking for: anti-Christian bias, anti-Western asymmetry, moral relativism, and related worldview failures.
System-level testing
The commercial use case is not just scoring a public model but testing the configured assistant: prompt, corpus, policy, and workflow included.
Pinned evidence
Cases and detectors carry source references so worldview and doctrinal claims are not severed from the governing texts.
Versioned regressions
Reports preserve suite, rubric, and run context so worldview drift can be measured after any model, prompt, or corpus change.