Abstract: Internal ratings-based models play a central role in bank risk management and regulatory capital determination, yet their validation remains methodologically challenging and operationally resource-intensive. In this paper, we contribute to the quantitative validation of probability of default models through a systematic backtesting exercise using a new proprietary dataset collected by the European Banking Authority between 2017 and 2024. We propose a generalised correction to the canonical binomial test that simultaneously accounts for both asset and serial correlation and is supported by extensive simulations. Acknowledging the iterative nature of model validation, we use order statistics to identify persistent miscalibrations over time. We present an approach to aggregate the results of backtesting procedures, which are typically designed for bank level evaluation, whereas our focus is to provide evidence on the performance of the models across EU banks. Empirically, we find that the share of miscalibrated exposures of the small and medium-sized enterprises corporates asset class ranges from around 3.0% under realistic assumptions to a conservative upper bound of 16.7% implied by the canonical binomial test. We also quantify the impact on capital requirements and show that prudent model recalibrations would reduce system-wide Tier 1 capital ratios by 4 to 10 basis points. By offering scalable backtesting tools and enhancing transparency, we support more effective supervisory oversight and contribute to restoring market confidence in internal models.