Changelog

Updates to the CatholicBench dataset, models, and platform.

November 24, 2025New ModelAnalysis

Opus 4.5

We've introduced Opus 4.5 to the benchmark. This model excels at navigating complex theological nuances with high precision. It demonstrates remarkable improvement in pastoral tone, particularly when addressing sensitive bioethical questions, striking a better balance between doctrinal clarity and compassionate delivery than previous iterations.

November 24, 2025DataUI

Model Updates & UI Enhancements

Updated benchmark data with latest model runs. Refined the Dashboard to exclude incomplete model runs and added tooltip indicators for 'stealth' models currently in testing.

November 19, 2025FeatureAnalysis

Dataset Browser & Historical Bias

Introduced the Dataset Browser component allowing users to search and explore specific benchmark questions. Added a dedicated analysis section for Historical Bias to evaluate model performance on controversial topics.

November 19, 2025Data

Data Refresh

Refreshed the core analysis dataset with new models and updated specific result details for accuracy.

November 18, 2025Launch

Initial Dashboard Launch

Launched the comprehensive results dashboard featuring category-based visualization, normalized scoring, and interactive model comparisons.

Current Rankings
1
google/gemini-3-pro-preview
4.50
2
anthropic/claude-opus-4.5
4.47
3
openai/gpt-5.1
4.45
4
anthropic/claude-sonnet-4.5
4.45
5
google/gemini-2.5-pro
4.38

Stay Updated

New models are benchmarked weekly. Check back to see how rankings evolve as models improve.

Return to Dashboard →