Why this exists

Christian AI needs declared values, evidence, and evaluation.

Public LLM scorecards are the proof layer. The real objective is helping organizations evaluate AI assistants, prompts, corpora, and policies before release and after change.

Open Reports Browse Tests

Why build this

MDE exists because model evaluation always carries a moral frame. The difference here is that the frame is made explicit, tied to named failure modes, and turned into evidence that can survive scrutiny.

Problem

Frontier models often moralize asymmetrically

Christian moral claims are frequently reframed as suspect, Western inheritance is treated with civilizational asymmetry, and relativist framing is presented as neutrality.

Response

The suite names the failure modes directly

MDE turns concerns like neutrality smuggling, moral asymmetry, authority subversion, and teleology collapse into fixed tests instead of vague complaints.

Result

Public proof leads into private evaluation

Run pages expose quotes, detector outcomes, and textual anchors in public, then the same method can be used to score a customer system before launch and after changes.

Operating commitments

The benchmark only works if the suite, detector names, and evidence trail stay visible enough for a critic to inspect.

Named DetectorsThe evaluator is constrained by a suite that spells out failure modes like neutrality smuggling, moral asymmetry, and teleology collapse.
System ContextThe real commercial target is the configured AI system, not just the naked base model in isolation.
Anchored EvidenceEach case is tied to source references so the score can be traced back to the governing texts and claims.
Repeatable RunsVersioned metadata lets readers compare releases and catch worldview drift after changes.

Commercial model

The public site proves the benchmark works. The business is evaluating customer systems before launch, then rerunning the benchmark whenever the system changes.

Public benchmark

Proof and category creation

Public reports show that worldview bias is measurable and make the method legible to prospective customers, partners, and critics.

Private evaluation

What organizations actually buy

The paid layer scores a real deployed system, not just a public LLM: base model, system prompt, retrieval stack, refusal policy, and business workflow.

Ongoing monitoring

Regression testing becomes recurring revenue

Once a system is live, the same benchmark is rerun on each release to catch drift before it reaches users or leadership.

What that changes

Instead of a black-box score, the site shows the exact prompts, the model outputs, and the reasons the rubric assigned.

Named failure modes

Bias is described in testable terms

The engine names what it is checking for: anti-Christian bias, anti-Western asymmetry, moral relativism, and related worldview failures.

System-level testing

The unit under test is the full deployment

The commercial use case is not just scoring a public model but testing the configured assistant: prompt, corpus, policy, and workflow included.

Pinned evidence

Anchors stay attached to the score

Cases and detectors carry source references so worldview and doctrinal claims are not severed from the governing texts.

Versioned regressions

Runs can be compared over time

Reports preserve suite, rubric, and run context so worldview drift can be measured after any model, prompt, or corpus change.