Arjun R. Loomba YOU? Author Swipe

Last 10y

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models Open

Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang , et al. · 2023

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capa…