Explanipedia

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Open

Sijie Yan, Yuanjun Xiong, Dahua Lin · 2018

Computer science Mathematics

Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power …

Is Space-Time Attention All You Need for Video Understanding? Open

Gedas Bertasius, Heng Wang, Lorenzo Torresani · 2021

Computer science Mathematics Physics

We present a convolution-free approach to video classification built exclusively on self-attention over space and time. Our method, named "TimeSformer," adapts the standard Transformer architecture to video by enabling spatiotemporal featu…

AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions Open

Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru , et al. · 2018

Computer science Geography Physics

International audience

Long-Term Temporal Convolutions for Action Recognition Open

Gül Varol, Ivan Laptev, Cordelia Schmid · 2017

Computer science Physics

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, ho…

Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks Open

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li , et al. · 2016

Computer science Mathematics Political science

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Ter…

Temporal Segment Networks for Action Recognition in Videos Open

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin , et al. · 2018

Computer science Political science

We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation…

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data Open

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu · 2017

Computer science Economics

Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In thi…

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Open

Sijie Yan, Yuanjun Xiong, Dahua Lin · 2018

Computer science Mathematics Physics

Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power …

2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning Open

Diogo Luvizon, David Picard, Hedi Tabia · 2018

Computer science Engineering Physics

Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still i…

ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification Open

Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Šivic, Bryan Russell · 2017

Computer science Political science Philosophy

In this work, we introduce a new video representation for action\nclassification that aggregates local convolutional features across the entire\nspatio-temporal extent of the video. We do so by integrating state-of-the-art\ntwo-stream netw…

Spatiotemporal Residual Networks for Video Action Recognition Open

Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes · 2016

Computer science Physics

Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, w…

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data Open

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu · 2016

Computer science Economics

Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In thi…

A Comprehensive Survey of Vision-Based Human Action Recognition Methods Open

Hongbo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang , et al. · 2019

Computer science Philosophy Mathematics

Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action rec…

Generating Videos with Scene Dynamics Open

Carl Vondrick, Hamed Pirsiavash, Antonio Torralba · 2016

Computer science Political science Physics

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative advers…

Action-Conditioned 3D Human Motion Synthesis with Transformer VAE Open

Mathis Petrovich, Michael J. Black, Gül Varol · 2021

Computer science Physics Biology

We tackle the problem of action-conditioned generation of realistic and\ndiverse human motion sequences. In contrast to methods that complete, or\nextend, motion sequences, this task does not require an initial pose or\nsequence. Here we l…

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis Open

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang · 2016

Computer science Geography Political science

Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recogn…

Convolutional Two-Stream Network Fusion for Video Action Recognition Open

Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman · 2016

Computer science Chemistry Engineering

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet t…

Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching Open

Wei Peng, Xiaopeng Hong, Haoyu Chen, Guoying Zhao · 2020

Computer science

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined g…

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Open

Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih‐Fu Chang , et al. · 2021

Computer science Mathematics Physics

We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal …

PoTion: Pose MoTion Representation for Action Recognition Open

Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid · 2018

Computer science Mathematics Physics

International audience

Rank Pooling for Action Recognition Open

Basura Fernando, Efstratios Gavves, José Oramas, Amir Ghodrati, Tinne Tuytelaars · 2016

Computer science Physics

We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to t…

Action2Activity: Recognizing Complex Activities from Sensor Data Open

Ye Liu, Liqiang Nie, Lei Han, Luming Zhang, David S. Rosenblum · 2016

Computer science Engineering Political science

As compared to simple actions, activities are much more complex, but semantically consistent with a human's real life. Techniques for action recognition from sensor generated data are mature. However, there has been relatively little work …

Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition Open

Zhan Chen, Sicheng Li, Bing Yang, Qinghan Li, Hong Liu · 2021

Computer science

Graph convolutional networks have been widely used for skeleton-based action recognition due to their excellent modeling ability of non-Euclidean data. As the graph convolution is a local operation, it can only utilize the short-range join…

EPIC-Fusion:Audio-Visual Temporal Binding for Egocentric Action Recognition Open

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen · 2019

Computer science Sociology

We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multimodal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. We train the architecture with three…

Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks Open

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li , et al. · 2016

Computer science Mathematics Political science

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Ter…

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Open

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin , et al. · 2016

Computer science Physics

Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles…

Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition Open

Bin Li, Xi Li, Zhongfei Zhang, Fei Wu · 2019

Computer science

With the representation effectiveness, skeleton-based human action recognition has received considerable research attention, and has a wide range of real applications. In this area, many existing methods typically rely on fixed physicalcon…

A Real Time System For Dynamic Hand Gesture Recognition With A Depth Sensor Open

Alexey Kurakin, Zheng Zhang, Zhimin Liu · 2016

Computer science Philosophy Sociology

Publication in the conference proceedings of EUSIPCO, Bucharest, Romania, 2012

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition Open

Shoufa Chen, Chongjian Ge, Tong Zhan, Jiangliu Wang, Yibing Song , et al. · 2022

Computer science Physics

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and me…

TEINet: Towards an Efficient Architecture for Video Recognition Open

Zhaoyang Liu, Donghao Luo, Yabiao Wang, Limin Wang, Ying Tai , et al. · 2020

Computer science Engineering

Efficiency is an important issue in designing video architectures for action recognition. 3D CNNs have witnessed remarkable progress in action recognition from videos. However, compared with their 2D counterparts, 3D convolutions often int…