What benchmarks can tell you
Benchmarks can show whether a model looks plausibly strong, compare models on narrow capability bands, and signal where one tier may outperform another on difficult structured tasks.
That is useful, but it is only the beginning of evaluation.