Exploring foci of:
arXiv (Cornell University)
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
October 2023 • Qiming Bao, G. Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neşet Tan, Yang Chen, Michael Witbrock, Jiamou Liu
Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus" that extend standard logical reasoning datasets to evaluate the robu…
Computer Science
Artificial Intelligence
Machine Learning
Generative Grammar
Biochemistry
Chemistry