Explanipedia

Qwen3-VL Technical Report Open

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Xionghui Chen, Linyu Deng , et al. · 2025

We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamles…

Qwen3-Omni Technical Report Open

Xu Jin, Zhifang Guo, H. Hu, Yunfei Chu, Xiong Wang , et al. · 2025

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts. Qwen3-Omni matches the pe…

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning Open

Xiaoyuan Li, Moxin Li, Wenjie Wang, Rui Men, Yichang Zhang , et al. · 2025

Recent progress in Multi-modal Large Language Models (MLLMs) has enabled step-by-step multi-modal mathematical reasoning by performing visual operations based on the textual instructions. A promising approach uses code as an intermediate r…

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation Open

Xiaoyuan Li, Keqin Bao, Yubo Ma, Moxin Li, Wenjie Wang , et al. · 2025

Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We …

Qwen3 Technical Report Open

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui , et al. · 2025

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes mod…

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Open

Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kuo-Liang Wen , et al. · 2025

Gating mechanisms have been widely utilized, from early models like LSTMs and Highway Networks to recent state space models, linear attention, and also softmax attention. Yet, existing literature rarely examines the specific effects of gat…

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning Open

Xiaoyuan Li, Moxin Li, Rui Men, Yichang Zhang, Keqin Bao , et al. · 2025

Large language models (LLMs) have shown remarkable capabilities in commonsense reasoning; however, some variations in questions can trigger incorrect responses. Do these models truly understand commonsense knowledge, or just memorize expre…

Qwen2.5-1M Technical Report Open

Yang An, B. X. Yu, Chengyuan Li, Dayiheng Liu, Fei Huang , et al. · 2025

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-tra…

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Open

Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Z. Y. Wang , et al. · 2025

This paper revisits the implementation of $\textbf{L}$oad-$\textbf{b}$alancing $\textbf{L}$oss (LBL) when training Mixture-of-Experts (MoEs) models. Specifically, LBL for MoEs is defined as $N_E \sum_{i=1}^{N_E} f_i p_i$, where $N_E$ is th…

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Open

Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Z. Y. Wang , et al. · 2025

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning Open

Xiaoyuan Li, Moxin Li, Rui Men, Yichang Zhang, Keqin Bao , et al. · 2025

Qwen2.5 Technical Report Open

Qwen, NULL AUTHOR_ID, Yang An, Baosong Yang, Beichen Zhang , et al. · 2024

In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-tr…

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Open

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan , et al. · 2024

We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing. Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which en…

Qwen2.5-Coder Technical Report Open

Binyuan Hui, Jian Yang, Z. Q. Cui, Jiaxi Yang, Dayiheng Liu , et al. · 2024

In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes six models: Qwen2.5-Coder-(0.5B/1.5B/3B/7B/14B/32B). As a code-specific model, Qwen2.5-Coder is built upon…

Qwen2 Technical Report Open

Yang An, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu , et al. · 2024

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range f…

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization Open

Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng , et al. · 2024

The emergence of Large Language Models (LLMs) has necessitated the adoption of distributed training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, the efficiency of large-scale distributed…

Mobility-Aware Parallel Offloading and Resource Allocation Scheme for Vehicular Edge Computing Open

Rui Men, Xiumei Fan, Kok‐Lim Alvin Yau, Axida Shan, Xiao Yan · 2024

Fuzzy Logic Based Binary Computation Offloading Scheme in V2X Communication Networks Open

Rui Men, Xiumei Fan, Axida Shan, Gang Yuan · 2024

With the recent development of intelligent transportation systems, vehicles are getting more and more powerful and involving huge number of real-time applications including computation-intensive and delay-sensitive applications in vehicle-…

Qwen Technical Report Open

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang , et al. · 2023

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installmen…

Multi-Objective Optimization Method for Signalized Intersections in Intelligent Traffic Network Open

Xinghui Zhang, Xiumei Fan, Shunyuan Yu, Axida Shan, Rui Men · 2023

Urban intersections are one of the most common sources of traffic congestion. Especially for multiple intersections, an appropriate control method should be able to regulate the traffic flow within the control area. The intersection signal…

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models Open

Jinze Bai, Rui Men, Yang Hao, Xuancheng Ren, Kai Dang , et al. · 2022

Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist…

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Open

Yang An, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang , et al. · 2022

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining. In this work, we construct a large-scale dataset of image-text pairs in Chinese, where…

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Open

Peng Wang, Yang An, Rui Men, Junyang Lin, Shuai Bai , et al. · 2022

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiven…

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis. Open

Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li , et al. · 2021

Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of inves…

M6-T: Exploring Sparse Expert Models and Beyond Open

Yang An, Junyang Lin, Rui Men, Chang Zhou, Le Jiang , et al. · 2021

Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling. Still it is a mystery how MoE layers bring quality gai…

Exploring Sparse Expert Models and Beyond Open

Yang An, Junyang Lin, Rui Men, Chang Zhou, Le Jiang , et al. · 2021

Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling. Still it is a mystery how MoE layers bring quality gai…

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers Open

Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li , et al. · 2021

Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of inves…

Learning Relation Alignment for Calibrated Cross-modal Retrieval Open

Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, Yang An , et al. · 2021

Despite the achievements of large-scale multimodal pre-training approaches, cross-modal retrieval, e.g., image-text retrieval, remains a challenging task. To bridge the semantic gap between the two modalities, previous studies mainly focus…

M6: A Chinese Multimodal Pretrainer Open

Junyang Lin, Rui Men, Yang An, Chang Zhou, Ming Ding , et al. · 2021

In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1.9TB images and 292GB texts that cover a wide range of domains. We propose a cross-modal pretraining method called M6, referring …

Learning Relation Alignment for Calibrated Cross-modal Retrieval Open

Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, Yang An , et al. · 2021

Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural…

Rui Men YOU? Author Swipe