Changelog
Updates to the CatholicBench dataset, models, and platform.
Opus 4.5
We've introduced Opus 4.5 to the benchmark. This model excels at navigating complex theological nuances with high precision. It demonstrates remarkable improvement in pastoral tone, particularly when addressing sensitive bioethical questions, striking a better balance between doctrinal clarity and compassionate delivery than previous iterations.
Model Updates & UI Enhancements
Updated benchmark data with latest model runs. Refined the Dashboard to exclude incomplete model runs and added tooltip indicators for 'stealth' models currently in testing.
Dataset Browser & Historical Bias
Introduced the Dataset Browser component allowing users to search and explore specific benchmark questions. Added a dedicated analysis section for Historical Bias to evaluate model performance on controversial topics.
Data Refresh
Refreshed the core analysis dataset with new models and updated specific result details for accuracy.
Initial Dashboard Launch
Launched the comprehensive results dashboard featuring category-based visualization, normalized scoring, and interactive model comparisons.
Stay Updated
New models are benchmarked weekly. Check back to see how rankings evolve as models improve.
Return to Dashboard →