Dongfu Jiang
YOU?
Author Swipe
View article: VideoScore2: Think before You Score in Generative Video Evaluation
VideoScore2: Think before You Score in Generative Video Evaluation Open
Recent advances in text-to-video generation have produced increasingly realistic and diverse content, yet evaluating such videos remains a fundamental challenge due to their multi-faceted nature encompassing visual quality, semantic alignm…
View article: Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Open
Reinforcement Learning (RL) has emerged as a popular training paradigm, particularly when paired with reasoning models. While effective, it primarily focuses on generating responses and lacks mechanisms to explicitly foster critique or ref…
View article: StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Open
As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce StructEval, a comprehensive benchmark for evaluating LLMs' capabi…
View article: MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Open
We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500 real-world tasks, to address the highly heterogeneous daily use cases of end users. Our objective is to optimize for a set of high-quality data sample…
View article: VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Open
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. …
View article: WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Open
Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that co…
View article: GenAI Arena: An Open Evaluation Platform for Generative Models
GenAI Arena: An Open Evaluation Platform for Generative Models Open
Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has…
View article: MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning Open
Large multimodal models (LMMs) have shown great results in single-image vision language tasks. However, their abilities to solve multi-image visual language tasks is yet to be improved. The existing LMMs like OpenFlamingo, Emu2, and Idefic…
View article: VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation Open
In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIEScore, a Visual I…
View article: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Open
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions…
View article: TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks Open
We present TIGERScore, a \textbf{T}rained metric that follows \textbf{I}nstruction \textbf{G}uidance to perform \textbf{E}xplainable, and \textbf{R}eference-free evaluation over a wide spectrum of text generation tasks. Different from othe…
View article: LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion Open
We present LLM-Blender, an ensembling framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple open-source large language models (LLMs). Our framework consists of two modules: PairRanke…
View article: LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion Open
We present LLM-Blender, an ensembling framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple open-source large language models (LLMs). Our framework consists of two modules: PairRanke…
View article: PairReranker: Pairwise Reranking for Natural Language Generation
PairReranker: Pairwise Reranking for Natural Language Generation Open
Pre-trained language models have been successful in natural language generation (NLG) tasks. While various decoding methods have been employed, they often produce suboptimal results. We first present an empirical analysis of three NLG task…
View article: Effect of probiotics on growth performance, immune function, serum indices and intestinal flora of broilers
Effect of probiotics on growth performance, immune function, serum indices and intestinal flora of broilers Open
To exploring the effects of probiotics on growth performance, immune function, blood biochemical indicators and cecal flora of broilers, 5 native experimental probiotics (Bacillus coagulans, Lactobacillus fermentum, Bacillus subtilis, Baci…
View article: Complete Genome Sequence of a Novel Porcine Circovirus Type 3 Strain, CH/GX/1776D/2017, Isolated from Guangxi, China
Complete Genome Sequence of a Novel Porcine Circovirus Type 3 Strain, CH/GX/1776D/2017, Isolated from Guangxi, China Open
Porcine circovirus type 3 (PCV3) was first described in 2016 in U.S. swine herds as a pathogenic agent for pigs. To date, PCV3 has been reported to be widely circulating in the United States, China, South Korea, Brazil, Italy, and Poland. …