Quanfeng Lu
YOU?
Author Swipe
View article: Evidence of scaling advantage on an NP-Complete problem with enhanced quantum solvers
Evidence of scaling advantage on an NP-Complete problem with enhanced quantum solvers Open
Achieving quantum advantage remains a key milestone in the noisy intermediate-scale quantum era. Without rigorous complexity proofs, scaling advantage-where quantum resource requirements grow more slowly than their classical counterparts-s…
View article: Quantum-classical hybrid algorithm for solving the learning-with-errors problem on NISQ devices
Quantum-classical hybrid algorithm for solving the learning-with-errors problem on NISQ devices Open
The Learning-With-Errors (LWE) problem is a fundamental computational challenge with implications for post-quantum cryptography and computational learning theory. Here we propose a quantum-classical hybrid algorithm with Ising model to add…
View article: MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning Open
DeepSeek R1, and o1 have demonstrated powerful reasoning capabilities in the text domain through stable large-scale reinforcement learning. To enable broader applications, some works have attempted to transfer these capabilities to multimo…
View article: An Investigation of Energy Consumption Characteristics of the Pump-Control System for Electric Excavator Arms
An Investigation of Energy Consumption Characteristics of the Pump-Control System for Electric Excavator Arms Open
The conventional hydraulic system of excavators suffers from significant valve throttling losses and inadequate matching between the hydraulic power source and the load, which substantially impact the system’s overall energy consumption an…
View article: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Open
Text-to-video (T2V) models like Sora have made significant strides in visualizing complex prompts, which is increasingly viewed as a promising path towards constructing the universal world simulator. Cognitive psychologists believe that th…
View article: MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Open
The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluat…
View article: PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models Open
Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulat…
View article: GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Open
Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, entertainment, and productivity by streamlining workflows and reducing manual intervention. However, prior GUI agents often trained w…
View article: MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Open
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimod…
View article: OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM Open
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of dive…
View article: ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning Open
Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses cha…