Xiaochen Lian
YOU?
Author Swipe
View article: Seedream 4.0: Toward Next-generation Multimodal Image Generation
Seedream 4.0: Toward Next-generation Multimodal Image Generation Open
We introduce Seedream 4.0, an efficient and high-performance multimodal image generation system that unifies text-to-image (T2I) synthesis, image editing, and multi-image composition within a single framework. We develop a highly efficient…
View article: SeedEdit 3.0: Fast and High-Quality Generative Image Editing
SeedEdit 3.0: Fast and High-Quality Generative Image Editing Open
We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real im…
View article: Seedream 3.0 Technical Report
Seedream 3.0 Technical Report Open
We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts…
View article: Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Open
Rapid advancement of diffusion models has catalyzed remarkable progress in the field of image generation. However, prevalent models such as Flux, SD3.5 and Midjourney, still grapple with issues like model bias, limited text rendering capab…
View article: NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night
NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night Open
The semantic segmentation of nighttime scenes is a challenging problem that is key to impactful applications like self-driving cars. Yet, it has received little attention compared to its daytime counterpart. In this paper, we propose Night…
View article: HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers Open
High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that …
View article: AutoSpace: Neural Architecture Search with Less Human Interference
AutoSpace: Neural Architecture Search with Less Human Interference Open
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction. In this paper, we consider automating the search space design to minimize human interference, …
View article: DeepViT: Towards Deeper Vision Transformer
DeepViT: Towards Deeper Vision Transformer Open
Vision transformers (ViTs) have been successfully applied in image classification tasks recently. In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the perform…
View article: Neural Architecture Search for Lightweight Non-Local Networks
Neural Architecture Search for Lightweight Non-Local Networks Open
Non-Local (NL) blocks have been widely studied in various vision tasks. However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy comput…
View article: AtomNAS: Fine-Grained End-to-End Neural Architecture Search
AtomNAS: Fine-Grained End-to-End Neural Architecture Search Open
Search space design is very critical to neural architecture search (NAS) algorithms. We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms.…
View article: Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents
Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents Open
Recently there has been a rising interest in training agents, embodied in virtual environments, to perform language-directed tasks by deep reinforcement learning. In this paper, we propose a simple but effective neural language grounding m…
View article: Mining Spatial and Spatio-Temporal ROIs for Action Recognition
Mining Spatial and Spatio-Temporal ROIs for Action Recognition Open
In this paper, we propose an approach to classify action sequences. We observe that in action sequences the critical features for discriminating between actions occur only within sub-regions of the image. Hence deep network approaches will…