SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data Article Swipe

PDF

Ziyan Yang , Kushal Kafle , Zhe Lin , Scott Cohen , Zhihong Ding , Vicente Ordóñez ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2308.12910

We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of $\langle$subject, relation, object$\rangle$ triplets. To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens. First, we show that previous scene-graph prediction methods fail to produce as exhaustive an enumeration of relation-object pairs when conditioned on a subject on this benchmark. Particularly, we obtain a recall@3 of 83.8% for our relation-object predictions compared to the 49.75% obtained by a recent scene graph detector. Then, we show improved generalization on both relation-object and object-box predictions by leveraging during training relation-object pairs obtained automatically from textual captions and for which no object-box annotations are available. Particularly, for $\langle$subject, relation, object$\rangle$ triplets for which no object locations are available during training, we are able to obtain a recall@3 of 33.80% for relation-object pairs and 26.75% for their box locations.

Related Topics To Compare & Contrast

Computer Science

Generalization

Scene Graph

Benchmark (Surveying)

Artificial Intelligence

Object Detection

Theoretical Computer Science

Mathematics

Data Mining

Geodesy

Rendering (Computer Graphics)

Library Science

Geography

Concepts

Relation (database) Object (grammar) Subject (documents) Computer science Generalization Scene graph Benchmark (surveying) Artificial intelligence Graph Object detection Pattern recognition (psychology) Theoretical computer science Mathematics Data mining Geodesy Mathematical analysis Rendering (computer graphics) Library science Geography

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2308.12910
PDF: https://arxiv.org/pdf/2308.12910
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4386185526

All OpenAlex metadata

Raw OpenAlex JSON

No additional metadata available.