doi.org
Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond
August 2025 • Manish Shukla
Agentic artificial intelligence (AI)—multi-agent systems that combine large language models with external tools and autonomous planning—are rapidly transitioning from research labs into high-stakes domains. Existing evaluations emphasise narrow technical metrics such as task success or latency, leaving important sociotechnical dimensions like human trust, ethical compliance and economic sustainability under-measured. We propose a balanced evaluation framework spanning five axes (capability&efficiency, robustne…