Zhengxiong Luo
YOU?
Author Swipe
View article: Specific Scenario Generation Method for Trustworthiness Testing of Autonomous Vehicles Based on Interaction Coding
Specific Scenario Generation Method for Trustworthiness Testing of Autonomous Vehicles Based on Interaction Coding Open
In response to the problems of rough modeling and insufficient coverage of edge interaction scenarios in autonomous driving tests, this paper proposes a scene generation method based on interaction coding. The method constructs a hierarchi…
View article: Seedream 4.0: Toward Next-generation Multimodal Image Generation
Seedream 4.0: Toward Next-generation Multimodal Image Generation Open
We introduce Seedream 4.0, an efficient and high-performance multimodal image generation system that unifies text-to-image (T2I) synthesis, image editing, and multi-image composition within a single framework. We develop a highly efficient…
View article: Rolling Bearing Life Prediction Based on Improved Transformer Encoding Layer and Multi-Scale Convolution
Rolling Bearing Life Prediction Based on Improved Transformer Encoding Layer and Multi-Scale Convolution Open
To accurately and reliably characterize the degradation trend of rolling bearings and predict their life cycle, this paper proposes a bearing life prediction model based on an improved transformer encoder layer and multi-scale convolution.…
View article: Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation
Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation Open
Recent progress in unified models for image understanding and generation has been impressive, yet most approaches remain limited to single-modal generation conditioned on multiple modalities. In this paper, we present Mogao, a unified fram…
View article: Autoregressive Video Generation without Vector Quantization
Autoregressive Video Generation without Vector Quantization Open
This paper presents a novel approach that enables autoregressive video generation with high efficiency. We propose to reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction…
View article: You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale Open
Recent 3D generation models typically rely on limited-scale 3D `gold-labels' or 2D diffusion priors for 3D content creation. However, their performance is upper-bounded by constrained 3D priors due to the lack of scalable learning paradigm…
View article: Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need Open
While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional app…
View article: LawLuo: A Multi-Agent Collaborative Framework for Multi-Round Chinese Legal Consultation
LawLuo: A Multi-Agent Collaborative Framework for Multi-Round Chinese Legal Consultation Open
Legal Large Language Models (LLMs) have shown promise in providing legal consultations to non-experts. However, most existing Chinese legal consultation models are based on single-agent systems, which differ from real-world legal consultat…
View article: Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners Open
The human ability to easily solve multimodal tasks in context (i.e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate. In this work, we demonstrate that the task-a…
View article: Notice of Removal: VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Notice of Removal: VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation Open
A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distributi…
View article: VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation Open
A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distributi…
View article: Learning the Degradation Distribution for Blind Image Super-Resolution
Learning the Degradation Distribution for Blind Image Super-Resolution Open
Synthetic high-resolution (HR) \& low-resolution (LR) pairs are widely used in existing super-resolution (SR) methods. To avoid the domain gap between synthetic and test images, most previous methods try to adaptively learn the synthesizin…
View article: DFAN: Dual Feature Aggregation Network for Lightweight Image Super‐Resolution
DFAN: Dual Feature Aggregation Network for Lightweight Image Super‐Resolution Open
With the power of deep learning, super‐resolution (SR) methods enjoy a dramatic boost in performance. However, they usually have a large model size and high computational complexity, which hinders the application in devices with limited me…
View article: Approaching the Limit of Image Rescaling via Flow Guidance
Approaching the Limit of Image Rescaling via Flow Guidance Open
Image downscaling and upscaling are two basic rescaling operations. Once the image is downscaled, it is difficult to be reconstructed via upscaling due to the loss of information. To make these two processes more compatible and improve the…
View article: Adaptive Dilated Convolution For Human Pose Estimation
Adaptive Dilated Convolution For Human Pose Estimation Open
Most existing human pose estimation (HPE) methods exploit multi-scale information by fusing feature maps of four different spatial sizes, \ie $1/4$, $1/8$, $1/16$, and $1/32$ of the input image. There are two drawbacks of this strategy: 1)…
View article: End-to-end Alternating Optimization for Blind Super Resolution
End-to-end Alternating Optimization for Blind Super Resolution Open
Previous methods decompose the blind super-resolution (SR) problem into two sequential steps: \textit{i}) estimating the blur kernel from given low-resolution (LR) image and \textit{ii}) restoring the SR image based on the estimated kernel…
View article: Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation
Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation Open
Heatmap regression has become the most prevalent choice for nowadays human pose estimation methods. The ground-truth heatmaps are usually constructed via covering all skeletal keypoints by 2D gaussian kernels. The standard deviations of th…
View article: Efficient Human Pose Estimation by Learning Deeply Aggregated Representations
Efficient Human Pose Estimation by Learning Deeply Aggregated Representations Open
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale information mainly from features with different spatial sizes. Powerful mu…
View article: Unfolding the Alternating Optimization for Blind Super Resolution
Unfolding the Alternating Optimization for Blind Super Resolution Open
Previous methods decompose blind super resolution (SR) problem into two sequential steps: \textit{i}) estimating blur kernel from given low-resolution (LR) image and \textit{ii}) restoring SR image based on estimated kernel. This two-step …
View article: Learning Delicate Local Representations for Multi-Person Pose Estimation
Learning Delicate Local Representations for Multi-Person Pose Estimation Open
In this paper, we propose a novel method called Residual Steps Network (RSN). RSN aggregates features with the same spatial size (Intra-level features) efficiently to obtain delicate local representations, which retain rich low-level spati…