Explanipedia

Movie Gen: A Cast of Media Foundation Models Open

Adam Polyak, Amit Zohar, Andrew H. Brown, Andros Tjandra, Animesh A. Sinha , et al. · 2024

Computer science History

We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and ge…

DISGO: Automatic End-to-End Evaluation for Scene Text OCR Open

Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan , et al. · 2023

Computer science Engineering Mathematics

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) a…

Episodic Memory Question Answering Open

Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna , et al. · 2022

Computer science Psychology Economics

Egocentric augmented reality devices such as wearable glasses passively capture visual data as a human wearer tours a home environment. We envision a scenario wherein the human communicates with an AI agent powering such a device by asking…

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents Open

Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra , et al. · 2020

Computer science Mathematics

Recent work has presented embodied agents that can navigate to point-goal targets in novel indoor environments with near-perfect accuracy. However, these agents are equipped with idealized sensors for localization and take deterministic ac…

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents Open

Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra , et al. · 2020

Computer science Engineering Mathematics

Recent work has presented embodied agents that can navigate to point-goal targets in novel indoor environments with near-perfect accuracy. However, these agents are equipped with idealized sensors for localization and take deterministic ac…

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment Open

Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh , et al. · 2019

Computer science Engineering Chemistry

We address the problem of grounding free-form textual phrases by using weak supervision from image-caption pairs. We propose a novel end-to-end model that uses caption-to-image retrieval as a `downstream' task to guide the process of phras…

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception Open

Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari , et al. · 2019

Computer science Engineering Medicine

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D). W…

Unsupervised Learning of Face Representations Open

Samyak Datta, Gaurav Sharma, C. V. Jawahar · 2018

Computer science Geography Sociology

We present an approach for unsupervised training of CNNs in order to learn discriminative face representations. We mine supervised training data by noting that multiple faces in the same video frame must belong to different persons and the…

Embodied Question Answering Open

Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh , et al. · 2017

Computer science Psychology Engineering

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligen…

Samyak Datta YOU? Author Swipe