Massimo Bertozzi
YOU?
Author Swipe
SISMA: Semantic Face Image Synthesis with Mamba Open
Diffusion Models have become very popular for Semantic Image Synthesis (SIS) of human faces. Nevertheless, their training and inference is computationally expensive and their computational requirements are high due to the quadratic complex…
U-Shape Mamba: State Space Model for Faster Diffusion Open
Diffusion models have become the most popular approach for high-quality image generation, but their high computational cost still remains a significant challenge. To address this problem, we propose U-Shape Mamba (USM), a novel diffusion m…
$^R$FLAV: Rolling Flow matching for infinite Audio Video generation Open
Joint audio-video (AV) generation is still a significant challenge in generative AI, primarily due to three critical requirements: quality of the generated samples, seamless multimodal synchronization and temporal coherence, with audio tra…
MARS: Paying More Attention to Visual Attributes for Text-Based Person Search Open
Text-Based Person Search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature …
Swin2‐MoSE: A new single image supersolution model for remote sensing Open
Due to the limitations of current optical and sensor technologies and the high cost of updating them, the spectral and spatial resolution of satellites may not always meet desired requirements. For these reasons, Remote‐Sensing Single‐Imag…
Semantic Image Synthesis via Class-Adaptive Cross-Attention Open
In semantic image synthesis the state of the art is dominated by methods that use customized variants of the SPatially-Adaptive DE-normalization (SPADE) layers, which allow for good visual generation quality and editing versatility. By des…
CFTS-GAN: Continual Few-Shot Teacher Student for Generative Adversarial Networks Open
Few-shot and continual learning face two well-known challenges in GANs: overfitting and catastrophic forgetting. Learning new tasks results in catastrophic forgetting in deep learning models. In the case of a few-shot setting, the model le…
Mamba-ST: State Space Model for Efficient Style Transfer Open
The goal of style transfer is, given a content image and a style source, generating a new image preserving the content but with the artistic representation of the style source. Most of the state-of-the-art architectures use transformers or…
Masked Style Transfer for Source-Coherent Image-to-Image Translation Open
The goal of image-to-image translation (I2I) is to translate images from one domain to another while maintaining the content representations. A popular method for I2I translation involves the use of a reference image to guide the transform…
MARS: Paying more attention to visual attributes for text-based person search Open
Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature …
Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing Open
Due to the limitations of current optical and sensor technologies and the high cost of updating them, the spectral and spatial resolution of satellites may not always meet desired requirements. For these reasons, Remote-Sensing Single-Imag…
Controllable Face Synthesis with Semantic Latent Diffusion Models Open
Semantic Image Synthesis (SIS) is among the most popular and effective techniques in the field of face generation and editing, thanks to its good generation quality and the versatility is brings along. Recent works attempted to go beyond t…
Informative Rays Selection for Few-Shot Neural Radiance Fields Open
Neural Radiance Fields (NeRF) have recently emerged as a powerful method for image-based 3D reconstruction, but the lengthy per-scene optimization limits their practical usage, especially in resource-constrained settings. Existing approach…
FrankenMask: Manipulating semantic masks with transformers for face parts editing Open
In this paper, we propose FrankenMask, a novel framework that allows swapping and rearranging face parts in semantic masks for automatic editing of shape-related facial attributes. This is a novel yet challenging task as substituting face …
Semantic Image Synthesis via Class-Adaptive Cross-Attention Open
In semantic image synthesis the state of the art is dominated by methods that use customized variants of the SPatially-Adaptive DE-normalization (SPADE) layers, which allow for good visual generation quality and editing versatility. By des…
Automatic Generation of Semantic Parts for Face Image Synthesis Open
Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of …
Real-Time Semantic Segmentation of Spherical Images for Automotive Applications Open
Recent advancements in autonomous driving technology have resulted in a growing need for robust algorithms that can effectively detect, recognize, and segment objects in the surrounding environment. Semantic segmentation systems, which cla…
Memory-augmented Online Video Anomaly Detection Open
The ability to understand the surrounding scene is of paramount importance for Autonomous Vehicles (AVs). This paper presents a system capable to work in an online fashion, giving an immediate response to the arise of anomalies surrounding…
Learning Neural Radiance Fields from Multi-View Geometry Open
We present a framework, called MVG-NeRF, that combines classical Multi-View Geometry algorithms and Neural Radiance Fields (NeRF) for image-based 3D reconstruction. NeRF has revolutionized the field of implicit 3D representations, mainly d…
Arbitrary Point Cloud Upsampling with Spherical Mixture of Gaussians Open
Generating dense point clouds from sparse raw data benefits downstream 3D understanding tasks, but existing models are limited to a fixed upsampling ratio or to a short range of integer values. In this paper, we present APU-SMOG, a Transfo…
Revisiting PatchMatch Multi-View Stereo for Urban 3D Reconstruction Open
In this paper, a complete pipeline for image-based 3D reconstruction of urban scenarios is proposed, based on PatchMatch Multi-View Stereo (MVS). Input images are firstly fed into an off-the-shelf visual SLAM system to extract camera poses…
Leveraging Local Domains for Image-to-Image Translation Open
Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore o…
View article: Energy and Path‐Aware‐Reliable Routing in Underwater Acoustic Wireless Sensor Networks
Energy and Path‐Aware‐Reliable Routing in Underwater Acoustic Wireless Sensor Networks Open
In underwater acoustic sensor networks (UASNs), energy awareness, best path selection, reliability, and scalability are among the key factors that decide information delivery to the sea surface. Existing protocols usually do not combine su…
Leveraging Local Domains for Image-to-Image Translation Open
Image-to-image (i2i) networks struggle to capture local changes because they\ndo not affect the global scene structure. For example, translating from highway\nscenes to offroad, i2i networks easily focus on global color features but\nignor…
Leveraging Local Domains for Image-to-Image Translation Open
Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore o…