Viorica Pătrăucean
YOU?
Author Swipe
View article: Unique Lives, Shared World: Learning from Single-Life Videos
Unique Lives, Shared World: Learning from Single-Life Videos Open
We introduce the "single-life" learning paradigm, where we train a distinct vision model exclusively on egocentric videos captured by one individual. We leverage the multiple viewpoints naturally captured within a single life to learn a vi…
View article: How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models
How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models Open
Large language models (LLMs) exhibit strikingly conflicting behaviors: they can appear steadfastly overconfident in their initial answers whilst at the same time being prone to excessive doubt when challenged. To investigate this apparent …
View article: Scaling 4D Representations
Scaling 4D Representations Open
Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. I…
View article: TRecViT: A Recurrent Video Transformer
TRecViT: A Recurrent Video Transformer Open
We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform…
View article: Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark Open
Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video…
View article: Perception Test 2023: A Summary of the First Challenge And Outcome
Perception Test 2023: A Summary of the First Challenge And Outcome Open
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023, with the goal of benchmarking state-of-the-art video models on the recently proposed Percep…
View article: A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames Open
Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shal…
View article: Learning from One Continuous Video Stream
Learning from One Continuous Video Stream Open
We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between co…
View article: Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Perception Test: A Diagnostic Benchmark for Multimodal Video Models Open
We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computa…
View article: Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task Open
We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT). In many real-world scenarios, input features are not readily available at test time and must instead be acquired at signif…
View article: Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning Open
Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extra…
View article: Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling Open
How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal sig…
View article: Sideways: Depth-Parallel Training of Video Models
Sideways: Depth-Parallel Training of Video Models Open
We propose Sideways, an approximate backpropagation scheme for training video models. In standard backpropagation, the gradients and activations at every computation step through the model are temporally synchronized. The forward activatio…
View article: Massively Parallel Video Networks
Massively Parallel Video Networks Open
We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-…
View article: gvnn: Neural Network Library for Geometric Computer Vision
gvnn: Neural Network Library for Geometric Computer Vision Open
We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers …
View article: Scene Structure Inference through Scene Map Estimation
Scene Structure Inference through Scene Map Estimation Open
Understanding indoor scene structure from a single RGB image is useful for a wide variety of applications ranging from the editing of scenes to the mining of statistics about space utilization. Most efforts in scene understanding focus on …
View article: SceneNet: Understanding Real World Indoor Scenes With Synthetic Data
SceneNet: Understanding Real World Indoor Scenes With Synthetic Data Open
Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted t…
View article: Spatio-temporal video autoencoder with differentiable memory
Spatio-temporal video autoencoder with differentiable memory Open
We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long…
View article: State of Research in Automatic As-Built Modelling
State of Research in Automatic As-Built Modelling Open
Building Information Models (BIMs) are becoming the official standard in the construction industry for encoding, reusing, and exchanging information about structural assets. Automatically generating such representations for existing assets…
View article: SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes
SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes Open
We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained excl…
View article: SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes
SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes Open
We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained excl…
View article: State of research in automatic as-built modelling
State of research in automatic as-built modelling Open