FutureSearch
YOU?
Author Swipe
View article: Bench to the Future: A Pastcasting Benchmark for Forecasting Agents
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents Open
Forecasting is a challenging task that offers a clearly measurable way to study AI systems. Forecasting requires a large amount of research on the internet, and evaluations require time for events to happen, making the development of forec…
View article: Deep Research Bench: Evaluating AI Web Research Agents
Deep Research Bench: Evaluating AI Web Research Agents Open
Amongst the most common use cases of modern AI is LLM chat with web search enabled. However, no direct evaluations of the quality of web research agents exist that control for the continually-changing web. We introduce Deep Research Bench,…