Keith Tyser
YOU?
Author Swipe
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews Open
Automatic reviewing helps handle a large volume of papers, provides early feedback and quality control, reduces bias, and allows the analysis of trends. We evaluate the alignment of automatic paper reviews with human reviews using an arena…
View article: From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams Open
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human le…
View article: Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models Open
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree.…
View article: Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark Open
We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate student…