Action recognition
View article: Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Open
Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power …
View article
Is Space-Time Attention All You Need for Video Understanding? Open
We present a convolution-free approach to video classification built exclusively on self-attention over space and time. Our method, named "TimeSformer," adapts the standard Transformer architecture to video by enabling spatiotemporal featu…
View article
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions Open
International audience
View article
Long-Term Temporal Convolutions for Action Recognition Open
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, ho…
View article
Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks Open
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Ter…
View article
Temporal Segment Networks for Action Recognition in Videos Open
We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation…
View article
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data Open
Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In thi…
View article
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Open
Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power …
View article
2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning Open
Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still i…
View article
ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification Open
In this work, we introduce a new video representation for action\nclassification that aggregates local convolutional features across the entire\nspatio-temporal extent of the video. We do so by integrating state-of-the-art\ntwo-stream netw…
View article
Spatiotemporal Residual Networks for Video Action Recognition Open
Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, w…
View article
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data Open
Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In thi…
View article
A Comprehensive Survey of Vision-Based Human Action Recognition Methods Open
Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action rec…
View article
Generating Videos with Scene Dynamics Open
We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative advers…
View article
Action-Conditioned 3D Human Motion Synthesis with Transformer VAE Open
We tackle the problem of action-conditioned generation of realistic and\ndiverse human motion sequences. In contrast to methods that complete, or\nextend, motion sequences, this task does not require an initial pose or\nsequence. Here we l…
View article
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis Open
Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recogn…
View article
Convolutional Two-Stream Network Fusion for Video Action Recognition Open
Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet t…
View article
Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching Open
Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined g…
View article
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Open
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal …
View article
PoTion: Pose MoTion Representation for Action Recognition Open
International audience
View article
Rank Pooling for Action Recognition Open
We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to t…
View article
Action2Activity: Recognizing Complex Activities from Sensor Data Open
As compared to simple actions, activities are much more complex, but semantically consistent with a human's real life. Techniques for action recognition from sensor generated data are mature. However, there has been relatively little work …
View article
Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition Open
Graph convolutional networks have been widely used for skeleton-based action recognition due to their excellent modeling ability of non-Euclidean data. As the graph convolution is a local operation, it can only utilize the short-range join…
View article
EPIC-Fusion:Audio-Visual Temporal Binding for Egocentric Action Recognition Open
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multimodal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. We train the architecture with three…
View article
Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks Open
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Ter…
View article
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Open
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles…
View article
Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition Open
With the representation effectiveness, skeleton-based human action recognition has received considerable research attention, and has a wide range of real applications. In this area, many existing methods typically rely on fixed physicalcon…
View article
A Real Time System For Dynamic Hand Gesture Recognition With A Depth Sensor Open
Publication in the conference proceedings of EUSIPCO, Bucharest, Romania, 2012
View article
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition Open
Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and me…
View article
TEINet: Towards an Efficient Architecture for Video Recognition Open
Efficiency is an important issue in designing video architectures for action recognition. 3D CNNs have witnessed remarkable progress in action recognition from videos. However, compared with their 2D counterparts, 3D convolutions often int…