Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning

Exploring foci of: arXiv (Cornell University) Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning October 2023 • Qiming Bao, G. Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neşet Tan, Yang Chen, Michael Witbrock, Jiamou Liu Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus" that extend standard logical reasoning datasets to evaluate the robu… Open Article Page

Computer Science Artificial Intelligence Machine Learning Generative Grammar Biochemistry Chemistry Open Article