James M. Rehg
YOU?
Author Swipe
View article: Layer-Aware Video Composition via Split-then-Merge
Layer-Aware Video Composition via Split-then-Merge Open
We present Split-then-Merge (StM), a novel framework designed to enhance control in generative video composition and address its data scarcity problem. Unlike conventional methods relying on annotated datasets or handcrafted rules, StM spl…
View article: Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications across Lab and Field Settings
Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications across Lab and Field Settings Open
Photoplethysmography (PPG)-based foundation models are gaining traction due to the widespread use of PPG in biosignal monitoring and their potential to track diverse health indicators. In this paper, we introduce Pulse-PPG, an open-source …
View article: AI for Creative Visual Content Generation, Editing and Understanding
AI for Creative Visual Content Generation, Editing and Understanding Open
View article: LSM-2: Learning from Incomplete Wearable Sensor Data
LSM-2: Learning from Incomplete Wearable Sensor Data Open
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challe…
View article: Unified Text-Image-to-Video Generation: A Training-Free Approach to Flexible Visual Conditioning
Unified Text-Image-to-Video Generation: A Training-Free Approach to Flexible Visual Conditioning Open
Text-image-to-video (TI2V) generation is a critical problem for controllable video generation using both semantic and visual conditions. Most existing methods typically add visual conditions to text-to-video (T2V) foundation models by fine…
View article: MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models
MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models Open
This paper introduces MEBench, a novel benchmark for evaluating mutual exclusivity (ME) bias, a cognitive phenomenon observed in children during word learning. Unlike traditional ME tasks, MEBench further incorporates spatial reasoning to …
View article: SocialGesture: Delving into Multi-person Gesture Understanding
SocialGesture: Delving into Multi-person Gesture Understanding Open
Previous research in human gesture recognition has largely overlooked multi-person interactions, which are crucial for understanding the social context of naturally occurring gestures. This limitation in existing datasets presents a signif…
View article: Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium
Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium Open
The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British …
View article: Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings
Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings Open
Photoplethysmography (PPG)-based foundation models are gaining traction due to the widespread use of PPG in biosignal monitoring and their potential to generalize across diverse health applications. In this paper, we introduce Pulse-PPG, t…
View article: SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Open
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occl…
View article: Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders Open
We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior work…
View article: PyPulse: A Python Library for Biosignal Imputation
PyPulse: A Python Library for Biosignal Imputation Open
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. Missingness is commonplace in these settings and can arise from multiple causes, such as insecure sensor attachment or data …
View article: Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation Open
Text-guided image manipulation has experienced notable advancement in recent years. In order to mitigate linguistic ambiguity, few-shot learning with visual examples has been applied for instructions that are underrepresented in the traini…
View article: Optimization-Free Image Immunization Against Diffusion-Based Editing
Optimization-Free Image Immunization Against Diffusion-Based Editing Open
Current image immunization defense techniques against diffusion-based editing embed imperceptible noise in target images to disrupt editing models. However, these methods face scalability challenges, as they require time-consuming re-optim…
View article: RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data Open
We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and do…
View article: Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation Open
Symmetry is a ubiquitous and fundamental property in the visual world, serving as a critical cue for perception and structure interpretation. This paper investigates the detection of 3D reflection symmetry from a single RGB image, and reve…
View article: Medical Video Generation for Disease Progression Simulation
Medical Video Generation for Disease Progression Simulation Open
Modeling disease progression is crucial for improving the quality and efficacy of clinical diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image monitoring for individual patients. To address this challe…
View article: Human Action Anticipation: A Survey
Human Action Anticipation: A Survey Open
Predicting future human behavior is an increasingly popular topic in computer vision, driven by the interest in applications such as autonomous vehicles, digital assistants and human-robot interactions. The literature on behavior predictio…
View article: Leveraging Object Priors for Point Tracking
Leveraging Object Priors for Point Tracking Open
Point tracking is a fundamental problem in computer vision with numerous applications in AR and robotics. A common failure mode in long-term point tracking occurs when the predicted point leaves the object it belongs to and lands on the ba…
View article: Towards Social AI: A Survey on Understanding Social Interactions
Towards Social AI: A Survey on Understanding Social Interactions Open
Social interactions form the foundation of human societies. Artificial intelligence has made significant progress in certain areas, but enabling machines to seamlessly understand social interactions remains an open challenge. It is importa…
View article: Ego4D: Around the World in 3,600 Hours of Egocentric Video
Ego4D: Around the World in 3,600 Hours of Egocentric Video Open
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camer…
View article: 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
3x2: 3D Object Part Segmentation by 2D Semantic Correspondences Open
3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D …
View article: Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation Open
Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of…
View article: MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs Open
Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimoda…
View article: What is the Visual Cognition Gap between Humans and Multimodal LLMs?
What is the Visual Cognition Gap between Humans and Multimodal LLMs? Open
Recently, Multimodal Large Language Models (MLLMs) and Vision Language Models (VLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addr…
View article: PointInfinity: Resolution-Invariant Point Diffusion Models
PointInfinity: Resolution-Invariant Point Diffusion Models Open
We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low…
View article: Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations Open
Understanding social interactions involving both verbal and non-verbal cues is essential for effectively interpreting social situations. However, most prior works on multimodal social cues focus predominantly on single-person behaviors or …
View article: Web Based Programming Guide For Allen Bradley Pl Cs
Web Based Programming Guide For Allen Bradley Pl Cs Open
NOTE: The first page of text has been automatically extracted and included below in lieu of an abstract Session 1647 Web-based Programming Guide for Allen Bradley PLCs James A. Rehg Penn State Altoona Abstract Programmable logic controller…
View article: ZeroShape: Regression-based Zero-shot Shape Reconstruction
ZeroShape: Regression-based Zero-shot Shape Reconstruction Open
We study the problem of single-image zero-shot 3D shape reconstruction. Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets, but these models are computationally expensive at train and inference time.…
View article: The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective Open
In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior wo…