Explanipedia

VideoScore2: Think before You Score in Generative Video Evaluation Open

Xuan He, Dongfu Jiang, Ping Nie, M. Liu, Zhengxuan Jiang , et al. · 2025

Recent advances in text-to-video generation have produced increasingly realistic and diverse content, yet evaluating such videos remains a fundamental challenge due to their multi-faceted nature encompassing visual quality, semantic alignm…

Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Open

Ruan Chi, Dongfu Jiang, Yubo Wang, Wenhu Chen · 2025

Reinforcement Learning (RL) has emerged as a popular training paradigm, particularly when paired with reasoning models. While effective, it primarily focuses on generating responses and lacks mechanisms to explicitly foster critique or ref…

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Open

Jialin Yang, Dongfu Jiang, Lipeng He, Sherman Siu, Yuxuan Zhang , et al. · 2025

As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce StructEval, a comprehensive benchmark for evaluating LLMs' capabi…

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Open

Jiacheng Chen, Teng Liang, Sherman Siu, Zhengqing Wang, Kai Wang , et al. · 2024

We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500 real-world tasks, to address the highly heterogeneous daily use cases of end users. Our objective is to optimize for a set of high-quality data sample…

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Open

Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni , et al. · 2024

The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. …

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Open

Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi , et al. · 2024

Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that co…

GenAI Arena: An Open Evaluation Platform for Generative Models Open

Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun , et al. · 2024

Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has…

MANTIS: Interleaved Multi-Image Instruction Tuning Open

Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku , et al. · 2024

Large multimodal models (LMMs) have shown great results in single-image vision language tasks. However, their abilities to solve multi-image visual language tasks is yet to be improved. The existing LMMs like OpenFlamingo, Emu2, and Idefic…

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation Open

Max Ku, Dongfu Jiang, Cong Wei, Yue Xiang, Wenhu Chen · 2023

In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIEScore, a Visual I…

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Open

Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu , et al. · 2023

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions…

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks Open

Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Lin , et al. · 2023

We present TIGERScore, a \textbf{T}rained metric that follows \textbf{I}nstruction \textbf{G}uidance to perform \textbf{E}xplainable, and \textbf{R}eference-free evaluation over a wide spectrum of text generation tasks. Different from othe…

LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion Open

Dongfu Jiang, Xiang Ren, Bill Lin · 2023

We present LLM-Blender, an ensembling framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple open-source large language models (LLMs). Our framework consists of two modules: PairRanke…

LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion Open

Dongfu Jiang, Xiang Ren, Bill Lin · 2023

We present LLM-Blender, an ensembling framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple open-source large language models (LLMs). Our framework consists of two modules: PairRanke…

PairReranker: Pairwise Reranking for Natural Language Generation Open

Dongfu Jiang, Bill Lin, Xiang Ren · 2022

Pre-trained language models have been successful in natural language generation (NLG) tasks. While various decoding methods have been employed, they often produce suboptimal results. We first present an empirical analysis of three NLG task…

Effect of probiotics on growth performance, immune function, serum indices and intestinal flora of broilers Open

Shuo Zhao, Ning Zhang, Bingxia Lu, Ying He, Jiaxing Liang , et al. · 2022

To exploring the effects of probiotics on growth performance, immune function, blood biochemical indicators and cecal flora of broilers, 5 native experimental probiotics (Bacillus coagulans, Lactobacillus fermentum, Bacillus subtilis, Baci…

Complete Genome Sequence of a Novel Porcine Circovirus Type 3 Strain, CH/GX/1776D/2017, Isolated from Guangxi, China Open

Bingxia Lu, Yibin Qin, Ying He, Lei Liu, Qunpeng Duan , et al. · 2018

Porcine circovirus type 3 (PCV3) was first described in 2016 in U.S. swine herds as a pathogenic agent for pigs. To date, PCV3 has been reported to be widely circulating in the United States, China, South Korea, Brazil, Italy, and Poland. …

Dongfu Jiang YOU? Author Swipe