Mariella Dimiccoli
YOU?
Author Swipe
View article: Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints Open
This paper proposes a method for long-term action anticipation (LTA), the task of predicting action labels and their duration in a video given the observation of an initial untrimmed video interval. We build on an encoder-decoder architect…
View article: 3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction
3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction Open
Predicting turn-taking in multiparty conversations has many practical applications in human-computer/robot interaction. However, the complexity of human communication makes it a challenging task. Recent advances have shown that synchronous…
View article: Leveraging Triplet Loss for Unsupervised Action Segmentation
Leveraging Triplet Loss for Unsupervised Action Segmentation Open
In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metri…
View article: Leveraging triplet loss for unsupervised action segmentation
Leveraging triplet loss for unsupervised action segmentation Open
Trabajo presentado en la Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), celebrada en Vancouver (Canadá), del 17 al 24 de junio de 2023
View article: Recognizing object surface material from impact sounds for robot manipulation
Recognizing object surface material from impact sounds for robot manipulation Open
We investigated the use of impact sounds generated during exploratory behaviors in a robotic manipulation setup as cues for predicting object surface material and for recognizing individual objects. We collected and make available the YCB-…
View article: Debiased-CAM to mitigate image perturbations with faithful visual explanations of machine learning
Debiased-CAM to mitigate image perturbations with faithful visual explanations of machine learning Open
Trabajo presentado en la CHI Conference on Human Factors in Computing Systems, celebrada en New Orleans, LA (Estados Unidos), del 29 de abril al 5 de mayo de 2022
View article: Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning
Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning Open
Model explanations such as saliency maps can improve user trust in AI by highlighting important features for a prediction. However, these become distorted and misleading when explaining predictions of images that are subject to systematic …
View article: Enhancing Egocentric 3D Pose Estimation with Third Person Views
Enhancing Egocentric 3D Pose Estimation with Third Person Views Open
In this paper, we propose a novel approach to enhance the 3D body pose estimation of a person computed from videos captured from a single wearable camera. The key idea is to leverage high-level features linking first- and third-views in a …
View article: Learning grounded word meaning representations on similarity graphs
Learning grounded word meaning representations on similarity graphs Open
International audience
View article: Graph Constrained Data Representation Learning for Human Motion Segmentation
Graph Constrained Data Representation Learning for Human Motion Segmentation Open
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,crea…
View article: Interaction-GCN: A Graph Convolutional Network Based Framework for Social Interaction Recognition in Egocentric Videos
Interaction-GCN: A Graph Convolutional Network Based Framework for Social Interaction Recognition in Egocentric Videos Open
In this paper we propose a new framework to categorize social interactions in egocentric videos, we named InteractionGCN. Our method extracts patterns of relational and non-relational cues at the frame level and uses them to build a relati…
View article: Graph Constrained Data Representation Learning for Human Motion\n Segmentation
Graph Constrained Data Representation Learning for Human Motion\n Segmentation Open
Recently, transfer subspace learning based approaches have shown to be a\nvalid alternative to unsupervised subspace clustering and temporal data\nclustering for human motion segmentation (HMS). These approaches leverage prior\nknowledge f…
View article: Graph Constrained Data Representation Learning for Human Motion Segmentation
Graph Constrained Data Representation Learning for Human Motion Segmentation Open
Recently, transfer subspace learning based approaches have shown to be a valid alternative to unsupervised subspace clustering and temporal data clustering for human motion segmentation (HMS). These approaches leverage prior knowledge from…
View article: Modeling Long-Term Interactions to Enhance Action Recognition
Modeling Long-Term Interactions to Enhance Action Recognition Open
In this paper, we propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. At the frame level, we use a region-based approach that takes as inp…
View article: Modeling long-term interactions to enhance action recognition
Modeling long-term interactions to enhance action recognition Open
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, cre…
View article: Learning grounded word meaning representations on similarity graphs
Learning grounded word meaning representations on similarity graphs Open
This paper introduces a novel approach to learn visually grounded meaning representations of words as low-dimensional node embeddings on an underlying graph hierarchy. The lower level of the hierarchy models modality-specific word represen…
View article: Learning Event Representations for Temporal Segmentation of Image Sequences by Dynamic Graph Embedding
Learning Event Representations for Temporal Segmentation of Image Sequences by Dynamic Graph Embedding Open
Recently, self-supervised learning has proved to be effective to learn representations of events suitable for temporal segmentation in image sequences, where events are understood as sets of temporally adjacent images that are semantically…
View article: Debiased-CAM for bias-agnostic faithful visual explanations of deep convolutional networks.
Debiased-CAM for bias-agnostic faithful visual explanations of deep convolutional networks. Open
Class activation maps (CAMs) explain convolutional neural network predictions by identifying salient pixels, but they become misaligned and misleading when explaining predictions on images under bias, such as images blurred accidentally or…
View article: Debiased-CAM to mitigate image perturbations with faithful visual explanations of machine learning
Debiased-CAM to mitigate image perturbations with faithful visual explanations of machine learning Open
Model explanations such as saliency maps can improve user trust in AI by highlighting important features for a prediction. However, these become distorted and misleading when explaining predictions of images that are subject to systematic …
View article: Activities of Daily Living Monitoring via a Wearable Camera: Toward Real-World Applications
Activities of Daily Living Monitoring via a Wearable Camera: Toward Real-World Applications Open
Activity recognition from wearable photo-cameras is crucial for lifestyle characterization and health monitoring. However, to enable its wide-spreading use in real-world applications, a high level of generalization needs to be ensured on u…
View article: Learning event representations for temporal segmentation of image\n sequences by dynamic graph embedding
Learning event representations for temporal segmentation of image\n sequences by dynamic graph embedding Open
Recently, self-supervised learning has proved to be effective to learn\nrepresentations of events suitable for temporal segmentation in image\nsequences, where events are understood as sets of temporally adjacent images\nthat are semantica…
View article: Learning event representations in image sequences by dynamic graph embedding.
Learning event representations in image sequences by dynamic graph embedding. Open
Recently, self-supervised learning has proved to be effective to learn representations of events in image sequences, where events are understood as sets of temporally adjacent images that are semantically perceived as a whole. However, alt…
View article: Seeing and Hearing Egocentric Actions: How Much Can We Learn?
Seeing and Hearing Egocentric Actions: How Much Can We Learn? Open
Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have …
View article: Enhancing Temporal Segmentation by Nonlocal Self-Similarity
Enhancing Temporal Segmentation by Nonlocal Self-Similarity Open
Temporal segmentation of untrimmed videos and photo-streams is currently an active area of research in computer vision and image processing. This paper proposes a new approach to improve the temporal segmentation of photo-streams. The meth…
View article: Social Relation Recognition in Egocentric Photostreams
Social Relation Recognition in Egocentric Photostreams Open
This paper proposes an approach to automatically categorize the social interactions of a user wearing a photo-camera 2fpm, by relying solely on what the camera is seeing. The problem is challenging due to the overwhelming complexity of soc…
View article: How Much Does Audio Matter to Recognize Egocentric Object Interactions?
How Much Does Audio Matter to Recognize Egocentric Object Interactions? Open
Sounds are an important source of information on our daily interactions with objects. For instance, a significant amount of people can discern the temperature of water that it is being poured just by using the sense of hearing. However, on…